Home > io > importExcelModel.m

importExcelModel

PURPOSE ^

importExcelModel

SYNOPSIS ^

function model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)

DESCRIPTION ^

 importExcelModel
   Imports a constraint-based model from a Excel file

   fileName      a Microsoft Excel file to import
   removeExcMets true if exchange metabolites should be removed. This is
                 needed to be able to run simulations, but it could also
                 be done using simplifyModel at a later stage (optional,
                 default true)
   printWarnings true if warnings should be printed (optional, default true)
   ignoreErrors  true if errors should be ignored. See below for details
                 (optional, default false)

   model
       annotation       
           taxonomy     String with the NCBI Taxonomy ID, as valid
                        identifiers.org annotation
           defaultLB     Double    with the default lower bound values for reactions
           defaultUB     Double    with the default upper bound values for reactions
           givenName     String    with the name of the main model author
           familyName     String    with the surname of the main model author
           email        String    with the e-mail address of the main model author
           organization String    with the organization of the main model author
           note         String    with additional comments about the model
       name      name of model
       id               model ID
       rxns             reaction ids
       mets             metabolite ids
       S                stoichiometric matrix
       lb               lower bounds
       ub               upper bounds
       rev              reversibility vector
       c                objective coefficients
       b                equality constraints for the metabolite equations
       comps            compartment ids
       compNames        compartment names
       compOutside      the id (as in comps) for the compartment
                        surrounding each of the compartments
       compMiriams      structure with MIRIAM information about the
                        compartments
       rxnNames         reaction name
       rxnComps         compartments for reactions
       grRules          reaction to gene rules in text form
       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
       subSystems       subsystem name for each reaction
       eccodes          EC-codes for the reactions
       rxnMiriams       structure with MIRIAM information about the reactions
       rxnNotes         reaction notes
       rxnReferences    reaction references
       rxnConfidenceScores reaction confidence scores
       genes            list of all genes
       geneComps        compartments for genes
       geneMiriams      structure with MIRIAM information about the genes
       geneShortNames   gene alternative names (e.g. ERG10)
       metNames         metabolite name
       metComps         compartments for metabolites
       inchis           InChI-codes for metabolites
       metFormulas      metabolite chemical formula
       metMiriams       structure with MIRIAM information about the metabolites
       metCharges       metabolite charge
       unconstrained    true if the metabolite is an exchange metabolite

   Loads models in the RAVEN Toolbox Excel format. A number of consistency
   checks are performed in order to ensure that the model is valid. These
   can be ignored by putting ignoreErrors to true. However, this is highly
   advised against, as it can result in errors in simulations or other
   functionalities. The RAVEN Toolbox is made to function only on consistent
   models, and the only checks performed are when the model is imported.

   NOTE: Most errors are checked for by checkModelStruct, but some
   are checked for in this function as well. Those are ones which relate
   to missing model elements and so on, and which would make it impossible
   to construct the model structure. Those errors cannot be ignored by
   setting ignoreErrors to true.

 Usage: model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)
0002 % importExcelModel
0003 %   Imports a constraint-based model from a Excel file
0004 %
0005 %   fileName      a Microsoft Excel file to import
0006 %   removeExcMets true if exchange metabolites should be removed. This is
0007 %                 needed to be able to run simulations, but it could also
0008 %                 be done using simplifyModel at a later stage (optional,
0009 %                 default true)
0010 %   printWarnings true if warnings should be printed (optional, default true)
0011 %   ignoreErrors  true if errors should be ignored. See below for details
0012 %                 (optional, default false)
0013 %
0014 %   model
0015 %       annotation
0016 %           taxonomy     String with the NCBI Taxonomy ID, as valid
0017 %                        identifiers.org annotation
0018 %           defaultLB     Double    with the default lower bound values for reactions
0019 %           defaultUB     Double    with the default upper bound values for reactions
0020 %           givenName     String    with the name of the main model author
0021 %           familyName     String    with the surname of the main model author
0022 %           email        String    with the e-mail address of the main model author
0023 %           organization String    with the organization of the main model author
0024 %           note         String    with additional comments about the model
0025 %       name      name of model
0026 %       id               model ID
0027 %       rxns             reaction ids
0028 %       mets             metabolite ids
0029 %       S                stoichiometric matrix
0030 %       lb               lower bounds
0031 %       ub               upper bounds
0032 %       rev              reversibility vector
0033 %       c                objective coefficients
0034 %       b                equality constraints for the metabolite equations
0035 %       comps            compartment ids
0036 %       compNames        compartment names
0037 %       compOutside      the id (as in comps) for the compartment
0038 %                        surrounding each of the compartments
0039 %       compMiriams      structure with MIRIAM information about the
0040 %                        compartments
0041 %       rxnNames         reaction name
0042 %       rxnComps         compartments for reactions
0043 %       grRules          reaction to gene rules in text form
0044 %       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
0045 %       subSystems       subsystem name for each reaction
0046 %       eccodes          EC-codes for the reactions
0047 %       rxnMiriams       structure with MIRIAM information about the reactions
0048 %       rxnNotes         reaction notes
0049 %       rxnReferences    reaction references
0050 %       rxnConfidenceScores reaction confidence scores
0051 %       genes            list of all genes
0052 %       geneComps        compartments for genes
0053 %       geneMiriams      structure with MIRIAM information about the genes
0054 %       geneShortNames   gene alternative names (e.g. ERG10)
0055 %       metNames         metabolite name
0056 %       metComps         compartments for metabolites
0057 %       inchis           InChI-codes for metabolites
0058 %       metFormulas      metabolite chemical formula
0059 %       metMiriams       structure with MIRIAM information about the metabolites
0060 %       metCharges       metabolite charge
0061 %       unconstrained    true if the metabolite is an exchange metabolite
0062 %
0063 %   Loads models in the RAVEN Toolbox Excel format. A number of consistency
0064 %   checks are performed in order to ensure that the model is valid. These
0065 %   can be ignored by putting ignoreErrors to true. However, this is highly
0066 %   advised against, as it can result in errors in simulations or other
0067 %   functionalities. The RAVEN Toolbox is made to function only on consistent
0068 %   models, and the only checks performed are when the model is imported.
0069 %
0070 %   NOTE: Most errors are checked for by checkModelStruct, but some
0071 %   are checked for in this function as well. Those are ones which relate
0072 %   to missing model elements and so on, and which would make it impossible
0073 %   to construct the model structure. Those errors cannot be ignored by
0074 %   setting ignoreErrors to true.
0075 %
0076 % Usage: model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)
0077 fileName=char(fileName);
0078 
0079 if nargin<2
0080     removeExcMets=true;
0081 end
0082 if nargin<3
0083     printWarnings=true;
0084 end
0085 if nargin<4
0086     ignoreErrors=false;
0087 end
0088 
0089 if ~isfile(fileName)
0090     error('Excel file %s cannot be found',string(fileName));
0091 end
0092 
0093 %This is to match the order of the fields to those you get from importing
0094 %from SBML
0095 model=[];
0096 model.id=[];
0097 model.name=[];
0098 model.annotation=[];
0099 %Default bounds if not defined
0100 model.annotation.defaultLB=-1000;
0101 model.annotation.defaultUB=1000;
0102 model.rxns={};
0103 model.mets={};
0104 model.S=[];
0105 model.lb=[];
0106 model.ub=[];
0107 model.rev=[];
0108 model.c=[];
0109 model.b=[];
0110 model.comps={};
0111 model.compNames={};
0112 model.compOutside={};
0113 model.compMiriams={};
0114 model.rxnNames={};
0115 model.rxnComps={}; %Will be double later
0116 model.grRules={};
0117 model.rxnGeneMat=[];
0118 model.subSystems={};
0119 model.eccodes={};
0120 model.rxnMiriams={};
0121 model.rxnNotes={};
0122 model.rxnReferences={};
0123 model.rxnConfidenceScores={}; %Will be double later
0124 model.genes={};
0125 model.geneComps={}; %Will be double later
0126 model.geneMiriams={};
0127 model.geneShortNames={};
0128 model.metNames={};
0129 model.metComps=[];
0130 model.inchis={};
0131 model.metFormulas={};
0132 model.metMiriams={};
0133 model.metCharges={}; %Will be double later
0134 model.unconstrained=[];
0135 
0136 workbook=loadWorkbook(fileName);
0137 
0138 [raw, flag]=loadSheet(workbook,'MODEL');
0139 
0140 if flag<0
0141     if printWarnings==true
0142         EM='Could not load the MODEL sheet';
0143         dispEM(EM,false);
0144     end
0145     model.id='UNKNOWN';
0146     model.name='No model details available';
0147 else
0148     raw=cleanSheet(raw);
0149     
0150     %It is assumed that the first line is labels and that the second one is
0151     %info
0152     raw(1,:)=upper(raw(1,:));
0153     raw(1,:)=strrep(raw(1,:),'MODELID','ID');
0154     raw(1,:)=strrep(raw(1,:),'MODELNAME','NAME');
0155     raw(1,:)=strrep(raw(1,:),'DESCRIPTION','NAME');
0156     
0157     %Loop through the labels
0158     for i=1:numel(raw(1,:))
0159         switch raw{1,i}
0160             case 'ID'
0161                 if any(raw{2,i})
0162                     model.id=toStr(raw{2,i}); %Should be string already
0163                 else
0164                     EM='No model ID supplied';
0165                     dispEM(EM);
0166                 end
0167             case 'NAME'
0168                 if any(raw{2,i})
0169                     model.name=toStr(raw{2,i}); %Should be string already
0170                 else
0171                     EM='No model name supplied';
0172                     dispEM(EM);
0173                 end
0174             case 'DEFAULT LOWER'
0175                 if ~isempty(raw{2,i})
0176                     try
0177                         model.annotation.defaultLB=toDouble(raw{2,i},NaN);
0178                     catch
0179                         EM='DEFAULT LOWER must be numeric';
0180                         dispEM(EM);
0181                     end
0182                 else
0183                     if printWarnings==true
0184                         fprintf('NOTE: DEFAULT LOWER not supplied. Uses -1000\n');
0185                     end
0186                     model.annotation.defaultLB=-1000;
0187                 end
0188             case 'DEFAULT UPPER'
0189                 if ~isempty(raw{2,i})
0190                     try
0191                         model.annotation.defaultUB=toDouble(raw{2,i},NaN);
0192                     catch
0193                         EM='DEFAULT UPPER must be numeric';
0194                         dispEM(EM);
0195                     end
0196                 else
0197                     if printWarnings==true
0198                         fprintf('NOTE: DEFAULT UPPER not supplied. Uses 1000\n');
0199                     end
0200                     model.annotation.defaultUB=1000;
0201                 end
0202             case 'TAXONOMY'
0203                 if any(raw{2,i})
0204                     model.annotation.taxonomy=toStr(raw{2,i}); %Should be string already
0205                 end
0206             case 'CONTACT GIVEN NAME'
0207                 if any(raw{2,i})
0208                     model.annotation.givenName=toStr(raw{2,i}); %Should be string already
0209                 end
0210             case 'CONTACT FAMILY NAME'
0211                 if any(raw{2,i})
0212                     model.annotation.familyName=toStr(raw{2,i}); %Should be string already
0213                 end
0214             case 'CONTACT EMAIL'
0215                 if any(raw{2,i})
0216                     model.annotation.email=toStr(raw{2,i}); %Should be string already
0217                 end
0218             case 'ORGANIZATION'
0219                 if any(raw{2,i})
0220                     model.annotation.organization=toStr(raw{2,i}); %Should be string already
0221                 end
0222             case 'NOTES'
0223                 if any(raw{2,i})
0224                     model.annotation.note=toStr(raw{2,i}); %Should be string already
0225                 end
0226         end
0227     end
0228 end
0229 
0230 %Get compartment information
0231 [raw, flag]=loadSheet(workbook,'COMPS');
0232 
0233 if flag<0
0234     if printWarnings==true
0235         EM='Could not load the COMPS sheet. All elements will be assigned to a compartment "s" for "System"';
0236         dispEM(EM,false);
0237     end
0238     model.comps={'s'};
0239     model.compNames={'System'};
0240 else
0241     raw=cleanSheet(raw);
0242     
0243     %Map to new captions
0244     raw(1,:)=upper(raw(1,:));
0245     raw(1,:)=strrep(raw(1,:),'COMPABBREV','ABBREVIATION');
0246     raw(1,:)=strrep(raw(1,:),'COMPNAME','NAME');
0247         
0248     %Loop through the labels
0249     for i=1:numel(raw(1,:))
0250         switch raw{1,i}
0251             case 'ABBREVIATION'
0252                 model.comps=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0253             case 'NAME'
0254                 model.compNames=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0255             case 'INSIDE'
0256                 model.compOutside=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0257             case 'MIRIAM'
0258                 model.compMiriams=raw(2:end,i);
0259         end
0260     end
0261     
0262     %Check that necessary fields are loaded
0263     if isempty(model.comps)
0264         EM='There must be a column named ABBREVIATION in the COMPS sheet';
0265         dispEM(EM);
0266     end
0267     if isempty(model.compNames)
0268         model.compNames=model.comps;
0269         if printWarnings==true
0270             EM='There is no column named NAME in the COMPS sheet. ABBREVIATION will be used as name';
0271             dispEM(EM,false);
0272         end
0273     end
0274     model.compMiriams=parseMiriam(model.compMiriams);
0275 end
0276 
0277 %Get all the genes and info about them
0278 [raw, flag]=loadSheet(workbook,'GENES');
0279 
0280 if flag<0
0281     if printWarnings==true
0282         EM='There is no spreadsheet named GENES';
0283         dispEM(EM,false)
0284     end
0285 else
0286     raw=cleanSheet(raw);
0287     
0288     %Map to new captions
0289     raw(1,:)=upper(raw(1,:));
0290     raw(1,:)=strrep(raw(1,:),'GENE NAME','NAME');
0291         
0292     %Loop through the labels
0293     foundGenes=false;
0294     for i=1:numel(raw(1,:))
0295         switch raw{1,i}
0296             case 'NAME'
0297                 model.genes=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0298                 foundGenes=true;
0299             case 'MIRIAM'
0300                 model.geneMiriams=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0301             case 'SHORT NAME'
0302                 model.geneShortNames=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0303             case 'COMPARTMENT'
0304                 model.geneComps=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0305         end
0306     end
0307     
0308     if foundGenes==false
0309         EM='There must be a column named NAME in the GENES sheet';
0310         dispEM(EM);
0311     end
0312     
0313     %Its ok if all of them are empty
0314     if all(cellfun(@isempty,model.geneComps))
0315         model.geneComps=[];
0316     end
0317     
0318     %Check that geneName contain only strings and no empty strings
0319     if ~iscellstr(model.genes)
0320         EM='All gene names have to be strings';
0321         dispEM(EM);
0322     else
0323         if any(strcmp('',model.genes))
0324             EM='There can be no empty strings in gene names';
0325             dispEM(EM);
0326         end
0327     end
0328     
0329     %Check that geneComp contains only strings and no empty string
0330     if ~isempty(model.geneComps)
0331         if ~iscellstr(model.geneComps)
0332             EM='All gene compartments have to be strings';
0333             dispEM(EM);
0334         else
0335             if any(strcmp('',model.geneComps))
0336                 EM='There can be no empty strings in gene compartments';
0337                 dispEM(EM);
0338             end
0339         end
0340         [I, model.geneComps]=ismember(model.geneComps,model.comps);
0341         EM='The following genes have compartment abbreviations which could not be found:';
0342         dispEM(EM,true,model.genes(~I));
0343     end
0344 end
0345 
0346 model.geneMiriams=parseMiriam(model.geneMiriams);
0347 
0348 %Loads the reaction data
0349 [raw, flag]=loadSheet(workbook,'RXNS');
0350 
0351 if flag<0
0352     EM='Could not load the RXNS sheet';
0353     dispEM(EM);
0354 end
0355 
0356 raw=cleanSheet(raw);
0357 
0358 %Map to new captions
0359 raw(1,:)=upper(raw(1,:));
0360 raw(1,:)=strrep(raw(1,:),'RXNID','ID');
0361 
0362 %Loop through the labels
0363 equations={};
0364 reactionReplacement={};
0365 for i=1:numel(raw(1,:))
0366     switch raw{1,i}
0367         case 'ID'
0368             model.rxns=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0369         case 'NAME'
0370             model.rxnNames=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0371         case 'EQUATION'
0372             equations=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0373         case 'EC-NUMBER'
0374             model.eccodes=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0375         case 'GENE ASSOCIATION'
0376             model.grRules=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0377         case 'LOWER BOUND'
0378             try
0379                 model.lb=cellfun(@(x) toDouble(x,NaN),raw(2:end,i));
0380             catch
0381                 EM='The lower bounds must be numerical values';
0382                 dispEM(EM);
0383             end
0384         case 'UPPER BOUND'
0385             try
0386                 model.ub=cellfun(@(x) toDouble(x,NaN),raw(2:end,i));
0387             catch
0388                 EM='The upper bounds must be numerical values';
0389                 dispEM(EM);
0390             end
0391         case 'OBJECTIVE'
0392             try
0393                 model.c=cellfun(@(x) toDouble(x,0),raw(2:end,i));
0394             catch
0395                 EM='The objective coefficients must be numerical values';
0396                 dispEM(EM);
0397             end
0398         case 'COMPARTMENT'
0399             model.rxnComps=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0400         case 'SUBSYSTEM'
0401             subsystems=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0402         case 'REPLACEMENT ID'
0403             reactionReplacement=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0404         case 'MIRIAM'
0405             model.rxnMiriams=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0406         case 'NOTE'
0407             model.rxnNotes=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0408         case 'REFERENCE'
0409             model.rxnReferences=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0410         case 'CONFIDENCE SCORE'
0411             model.rxnConfidenceScores=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0412     end
0413 end
0414 
0415 if ~isempty(model.rxnConfidenceScores)
0416     model.rxnConfidenceScores=str2double(model.rxnConfidenceScores);
0417 end
0418 for i=1:numel(subsystems)
0419     model.subSystems{i,1}=cellstr(strsplit(subsystems{i,1},';'));
0420 end
0421 
0422 %Check that all necessary reaction info has been loaded
0423 if isempty(equations)
0424     EM='There must be a column named EQUATION in the RXNS sheet';
0425     dispEM(EM);
0426 end
0427 if isempty(model.rxns)
0428     if printWarnings==true
0429         EM='There is no column named ID in the RXNS sheet. The reactions will be named as "r1", "r2"...';
0430         dispEM(EM,false);
0431     end
0432     I=num2cell((1:numel(equations))');
0433     model.rxns=strcat('r',cellfun(@num2str,I,'UniformOutput',false));
0434 end
0435 
0436 %Check if some other stuff is loaded and populate with default values
0437 %otherwise
0438 if isempty(model.rxnNames)
0439     model.rxnNames=cell(numel(model.rxns),1);
0440     model.rxnNames(:)={''};
0441     if printWarnings==true
0442         EM='There is no column named NAME in the RXNS sheet. Empty strings will be used as reaction names';
0443         dispEM(EM,false);
0444     end
0445 end
0446 if isempty(model.lb)
0447     %This is not set here since the reversibility isn't known yet
0448     model.lb=nan(numel(model.rxns),1);
0449     if printWarnings==true
0450         EM='There is no column named LOWER BOUND in the RXNS sheet. Default bounds will be used';
0451         dispEM(EM,false);
0452     end
0453 end
0454 if isempty(model.ub)
0455     model.ub=nan(numel(model.rxns),1);
0456     if printWarnings==true
0457         EM='There is no column named UPPER BOUND in the RXNS sheet. Default bounds will be used';
0458         dispEM(EM,false);
0459     end
0460 end
0461 if isempty(model.c)
0462     model.c=zeros(numel(model.rxns),1);
0463     if printWarnings==true
0464         EM='There is no column named OBJECTIVE in the RXNS sheet';
0465         dispEM(EM,false);
0466     end
0467 end
0468 
0469 %Either all reactions must have a compartment string or none of them. Check
0470 %if it's only empty and if so return it to []
0471 if ~isempty(model.rxnComps)
0472     if all(cellfun(@isempty,model.rxnComps))
0473         model.rxnComps=[];
0474     end
0475 end
0476 
0477 %Construct the rxnMiriams structure
0478 model.rxnMiriams=parseMiriam(model.rxnMiriams);
0479 
0480 %Replace the reaction IDs for those IDs that have a corresponding
0481 %replacement name.
0482 I=cellfun(@any,reactionReplacement);
0483 model.rxns(I)=reactionReplacement(I);
0484 
0485 %Check that there are no empty strings in reactionIDs or equations
0486 if any(strcmp('',model.rxns))
0487     EM='There are empty reaction IDs';
0488     dispEM(EM);
0489 end
0490 
0491 if any(strcmp('',equations))
0492     EM='There are empty equations';
0493     dispEM(EM);
0494 end
0495 
0496 if ~isempty(model.rxnComps)
0497     if any(strcmp('',model.rxnComps))
0498         EM='Either all reactions must have an associated compartment string or none of them';
0499         dispEM(EM);
0500     end
0501 end
0502 
0503 if ~isempty(model.grRules)
0504     tempRules=model.grRules;
0505     for i=1:length(model.rxns)
0506         %Check that all gene associations have a match in the gene list
0507         if ~isempty(model.grRules{i})
0508             tempRules{i}=regexprep(tempRules{i},' and | or ','>'); %New format: Genes are separated 'and' and 'or' strings with parentheses
0509             tempRules{i}=regexprep(tempRules{i},'(',''); %New format: Genes are separated 'and' and 'or' strings with parentheses
0510             tempRules{i}=regexprep(tempRules{i},')',''); %New format: Genes are separated 'and' and 'or' strings with parentheses
0511             indexesNew=strfind(tempRules{i},'>'); %Old format: Genes are separated by ":" for AND and ";" for OR
0512             indexes=strfind(tempRules{i},':'); %Old format: Genes are separated by ":" for AND and ";" for OR
0513             indexes=unique([indexesNew indexes strfind(tempRules{i},';')]);
0514             if isempty(indexes)
0515                 %See if you have a match
0516                 I=find(strcmp(tempRules{i},model.genes));
0517                 if isempty(I)
0518                     EM=['The gene association in reaction ' model.rxns{i} ' (' tempRules{i} ') is not present in the gene list'];
0519                     dispEM(EM);
0520                 end
0521             else
0522                 temp=[0 indexes numel(tempRules{i})+1];
0523                 for j=1:numel(indexes)+1
0524                     %The reaction has several associated genes
0525                     geneName=tempRules{i}(temp(j)+1:temp(j+1)-1);
0526                     I=find(strcmp(geneName,model.genes));
0527                     if isempty(I)
0528                         EM=['The gene association in reaction ' model.rxns{i} ' (' geneName ') is not present in the gene list'];
0529                         dispEM(EM);
0530                     end
0531                 end
0532             end
0533             %In order to adhere to the COBRA standards it should be like
0534             %this: -If only one gene then no parentheses -If only "and" or
0535             %only "or" there should only be one set of parentheses -If both
0536             %"and" and "or", then split on "or". This is not complete, but
0537             %it's the type of relationship supported by the Excel
0538             %formulation
0539             aSign=strfind(model.grRules{i},':');
0540             oSign=strfind(model.grRules{i},';');
0541             if isempty(aSign) && isempty(oSign)
0542                 model.grRules{i}=model.grRules{i};
0543             else
0544                 if isempty(aSign)
0545                     model.grRules{i}=['(' strrep(model.grRules{i},';',' or ') ')'];
0546                 else
0547                     if isempty(oSign)
0548                         model.grRules{i}=['(' strrep(model.grRules{i},':',' and ') ')'];
0549                     else
0550                         model.grRules{i}=['((' strrep(model.grRules{i},';',') or (') '))'];
0551                         model.grRules{i}=strrep(model.grRules{i},':',' and ');
0552                     end
0553                 end
0554             end
0555         end
0556     end
0557 end
0558 
0559 %Check that the compartment for each reaction can be found
0560 if ~isempty(model.rxnComps)
0561     [I, model.rxnComps]=ismember(model.rxnComps,model.comps);
0562     EM='The following reactions have compartment abbreviations which could not be found:';
0563     dispEM(EM,true,model.rxns(~I));
0564 end
0565 
0566 %Get all the metabolites and info about them
0567 [raw, flag]=loadSheet(workbook,'METS');
0568 
0569 if flag<0
0570     if printWarnings==true
0571         EM='There is no spreadsheet named METS. The metabolites will be named "m1", "m2"... and assigned to the first compartment';
0572         dispEM(EM,false);
0573     end
0574     %Parse the equations to find out how many metabolites there are
0575     metsForParsing=parseRxnEqu(equations);
0576     I=num2cell((1:numel(metsForParsing))');
0577     model.mets=strcat('m',cellfun(@num2str,I,'UniformOutput',false));
0578     model.metComps=ones(numel(model.mets),1);
0579     model.unconstrained=zeros(numel(model.mets),1);
0580     model.metNames=metsForParsing;
0581 else
0582     raw=cleanSheet(raw);
0583     
0584     %Map to new captions
0585     raw(1,:)=upper(raw(1,:));
0586     raw(1,:)=strrep(raw(1,:),'METID','ID');
0587     raw(1,:)=strrep(raw(1,:),'METNAME','NAME');
0588     
0589     %Loop through the labels
0590     metReplacement={};
0591     for i=1:numel(raw(1,:))
0592         switch raw{1,i}
0593             case 'ID'
0594                 model.mets=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0595             case 'NAME'
0596                 model.metNames=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0597             case 'UNCONSTRAINED'
0598                 model.unconstrained=cellfun(@boolToDouble,raw(2:end,i));
0599                 %NaN is returned if the values couldn't be parsed
0600                 EM='The UNCONSTRAINED property for the following metabolites must be "true"/"false", 1/0, TRUE/FALSE or not set:';
0601                 dispEM(EM,true,model.mets(isnan(model.unconstrained)));
0602             case 'MIRIAM'
0603                 model.metMiriams=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0604             case 'COMPOSITION'
0605                 model.metFormulas=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0606             case 'INCHI'
0607                 model.inchis=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0608             case 'COMPARTMENT'
0609                 model.metComps=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0610                 
0611                 %Check that all metabolites have compartments defined
0612                 if any(strcmp('',model.metComps))
0613                     EM='All metabolites must have an associated compartment string';
0614                     dispEM(EM);
0615                 end
0616             case 'REPLACEMENT ID'
0617                 metReplacement=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0618             case 'CHARGE'
0619                 model.metCharges=cellfun(@toStr,raw(2:end,i),'UniformOutput',false);
0620         end
0621     end
0622     
0623     %Check that necessary fields are loaded (METID)
0624     if isempty(model.mets)
0625         EM='There must be a column named ID in the METS sheet';
0626         dispEM(EM);
0627     end
0628     
0629     %Check that some other stuff is loaded and use default values otherwise
0630     if isempty(model.metNames)
0631         model.metNames=cell(numel(model.mets),1);
0632         if printWarnings==true
0633             EM='There is no column named NAME in the METS sheet. ID will be used as name';
0634             dispEM(EM,false);
0635         end
0636     end
0637     if isempty(model.unconstrained)
0638         model.unconstrained=zeros(numel(model.mets),1);
0639         if printWarnings==true
0640             EM='There is no column named UNCONSTRAINED in the METS sheet. All metabolites will be constrained';
0641             dispEM(EM,false);
0642         end
0643     end
0644     
0645     if isempty(model.metComps)
0646         model.metComps=cell(numel(model.mets),1);
0647         model.metComps(:)=model.comps(1);
0648         if printWarnings==true
0649             EM='There is no column named COMPARTMENT in the METS sheet. All metabolites will be assigned to the first compartment in COMPS. Note that RAVEN makes extensive use of metabolite names and compartments. Some features will therefore not function correctly if metabolite compartments are not correctly assigned';
0650             dispEM(EM,false);
0651         end
0652     end
0653     
0654     %The composition should be loaded from InChIs when available
0655     I=find(~cellfun(@isempty,model.inchis));
0656     for i=1:numel(I)
0657         S=regexp(model.inchis(I(i)),'/','split');
0658         S=S{1};
0659         if numel(S)>=2
0660             %Don't copy if it doesn't look good
0661             model.metFormulas(I(i))=S(2);
0662         end
0663     end
0664     
0665     %Check that the compartment for each metabolite can be found. Also
0666     %convert from id to index
0667     [I, model.metComps]=ismember(model.metComps,model.comps);
0668     EM='The following metabolites have compartment abbreviations which could not be found:';
0669     dispEM(EM,true,model.mets(~I));
0670     
0671     %Check that the model.mets vector is unique. The problem is that the
0672     %checkModelStruct cannot check for that since only the metReplacements
0673     %(if used) end up in the model structure
0674     I=false(numel(model.mets),1);
0675     [J, K]=unique(model.mets);
0676     if numel(J)~=numel(model.mets)
0677         L=1:numel(model.mets);
0678         L(K)=[];
0679         I(L)=true;
0680     end
0681     EM='The following metabolites are duplicates:';
0682     dispEM(EM,~ignoreErrors,model.mets(I));
0683     
0684     %Check that there are no metabolite IDs which are numbers. This would
0685     %give errors when parsing the equations
0686     I=cellfun(@str2double,model.mets);
0687     EM='The following metabolites have names which cannot be distinguished from numbers:';
0688     dispEM(EM,~ignoreErrors,model.mets(~isnan(I)));
0689     I=cellfun(@str2double,metReplacement);
0690     EM='The following metabolites have names which cannot be distinguished from numbers:';
0691     dispEM(EM,~ignoreErrors,metReplacement(~isnan(I)));
0692     
0693     %Replace the metabolite IDs for those IDs that have a corresponding
0694     %replacement metabolite. This is not used for matching, but will be
0695     %checked for consistency with SBML naming conventions
0696     metsForParsing=model.mets; %This is because the equations are written with this
0697     I=cellfun(@any,metReplacement);
0698     model.mets(I)=metReplacement(I);
0699     
0700     %If the metabolite name isn't set, replace it with the metabolite id
0701     I=~cellfun(@any,model.metNames);
0702     model.metNames(I)=model.mets(I);
0703     
0704     %Construct the metMiriams structure
0705     model.metMiriams=parseMiriam(model.metMiriams);
0706     
0707     %Either all metabolites have charge or none of them. Check if it's only
0708     %empty and if so return it to []
0709     if ~isempty(model.metCharges)
0710         if all(cellfun(@isempty,model.metCharges))
0711             model.metCharges=[];
0712         end
0713     end
0714     if ~isempty(model.metCharges)
0715         model.metCharges=str2double(model.metCharges);
0716     end
0717 end
0718 
0719 %Everything seems fine with the metabolite IDs, compartments, genes, and
0720 %reactions
0721 
0722 %Parse the equations
0723 [model.S, mets, badRxns, model.rev]=constructS(equations,metsForParsing,model.rxns);
0724 model.rev=model.rev*1; %Typecast to double
0725 
0726 %Add default constraints
0727 model.lb(isnan(model.lb))=model.annotation.defaultLB.*model.rev(isnan(model.lb));
0728 model.ub(isnan(model.ub))=model.annotation.defaultUB;
0729 
0730 %Reorder the S matrix so that it fits with the metabolite list in the
0731 %structure
0732 [~, I]=ismember(mets,metsForParsing);
0733 model.S=model.S(I,:);
0734 
0735 %Print warnings about the reactions which contain the same metabolite as
0736 %both reactants and products
0737 EM='The following reactions have metabolites which are present more than once. Only the net reactions will be exported:';
0738 dispEM(EM,false,model.rxns(badRxns));
0739 
0740 model.b=zeros(numel(model.mets),1);
0741 
0742 %Fix grRules and reconstruct rxnGeneMat
0743 [grRules,rxnGeneMat] = standardizeGrRules(model,true);
0744 model.grRules = grRules;
0745 model.rxnGeneMat = rxnGeneMat;
0746 
0747 %Remove unused fields
0748 if all(cellfun(@isempty,model.compOutside))
0749     model=rmfield(model,'compOutside');
0750 end
0751 if all(cellfun(@isempty,model.compMiriams))
0752     model=rmfield(model,'compMiriams');
0753 end
0754 if all(cellfun(@isempty,model.rxnNames))
0755     model=rmfield(model,'rxnNames');
0756 end
0757 if isempty(model.rxnComps)
0758     model=rmfield(model,'rxnComps');
0759 end
0760 if all(cellfun(@isempty,model.grRules))
0761     model=rmfield(model,'grRules');
0762 end
0763 if isfield(model,'rxnGeneMat') && isempty(model.rxnGeneMat)
0764     model=rmfield(model,'rxnGeneMat');
0765 end
0766 if all(cellfun(@isempty,model.subSystems))
0767     model=rmfield(model,'subSystems');
0768 end
0769 if all(cellfun(@isempty,model.eccodes))
0770     model=rmfield(model,'eccodes');
0771 end
0772 if all(cellfun(@isempty,model.rxnMiriams))
0773     model=rmfield(model,'rxnMiriams');
0774 end
0775 if all(cellfun(@isempty,model.rxnNotes))
0776     model=rmfield(model,'rxnNotes');
0777 end
0778 if all(cellfun(@isempty,model.rxnReferences))
0779     model=rmfield(model,'rxnReferences');
0780 end
0781 if isempty(model.rxnConfidenceScores)
0782     model=rmfield(model,'rxnConfidenceScores');
0783 end
0784 if isempty(model.genes)
0785     model=rmfield(model,'genes');
0786 end
0787 if isempty(model.geneComps)
0788     model=rmfield(model,'geneComps');
0789 end
0790 if isempty(model.geneMiriams)
0791     model=rmfield(model,'geneMiriams');
0792 end
0793 if all(cellfun(@isempty,model.geneShortNames))
0794     model=rmfield(model,'geneShortNames');
0795 end
0796 if all(cellfun(@isempty,model.inchis))
0797     model=rmfield(model,'inchis');
0798 end
0799 if all(cellfun(@isempty,model.metFormulas))
0800     model=rmfield(model,'metFormulas');
0801 end
0802 if all(cellfun(@isempty,model.metMiriams))
0803     model=rmfield(model,'metMiriams');
0804 end
0805 if isempty(model.metCharges)
0806     model=rmfield(model,'metCharges');
0807 end
0808 
0809 %The model structure has now been reconstructed but it can still contain
0810 %many types of errors. The checkModelConsistency function is used to make
0811 %sure that naming and mapping of stuff looks good
0812 checkModelStruct(model,~ignoreErrors);
0813 
0814 if removeExcMets==true
0815     model=simplifyModel(model);
0816 end
0817 end
0818 
0819 function miriamStruct=parseMiriam(strings,miriamStruct)
0820 %Gets the names and values of Miriam-string. Nothing fancy at all, just to
0821 %prevent using the same code for metabolites, genes, and reactions. The
0822 %function also allows for supplying a miriamStruct and the info will then
0823 %be added
0824 
0825 if nargin<2
0826     miriamStruct=cell(numel(strings),1);
0827 end
0828 for i=1:numel(strings)
0829     if any(strings{i})
0830         %A Miriam string can be several ids separated by ";". Each id is
0831         %"name(..:..)/value"; an old format when value is separated by
0832         %colon is also supported
0833         I=regexp(strings{i},';','split');
0834         if isfield(miriamStruct{i},'name')
0835             startIndex=numel(miriamStruct{i}.name);
0836             miriamStruct{i}.name=[miriamStruct{i}.name;cell(numel(I),1)];
0837             miriamStruct{i}.value=[miriamStruct{i}.value;cell(numel(I),1)];
0838         else
0839             startIndex=0;
0840             miriamStruct{i}.name=cell(numel(I),1);
0841             miriamStruct{i}.value=cell(numel(I),1);
0842         end
0843         
0844         for j=1:numel(I)
0845             if any(strfind(I{j},'/'))
0846                 index=max(strfind(I{j},'/'));
0847             elseif any(strfind(I{j},':'))
0848                 index=max(strfind(I{j},':'));
0849             end
0850             if exist('index','var') & any(index)
0851                 miriamStruct{i}.name{startIndex+j}=I{j}(1:index-1);
0852                 miriamStruct{i}.value{startIndex+j}=I{j}(index+1:end);
0853             else
0854                 EM=['"' I{j} '" is not a valid MIRIAM string. The format must be "identifier/value" or identifier:value'];
0855                 dispEM(EM);
0856             end
0857         end
0858     end
0859 end
0860 end
0861 
0862 %For converting a value to string. This is used instead of num2str because
0863 %I want to convert empty cells to {''}.
0864 function y=toStr(x)
0865 %x can be empty, numerical, string or boolean. It cannot be NaN. Boolean
0866 %values will be converted to '1'/'0'
0867 if isempty(x)
0868     y='';
0869 else
0870     y=num2str(x);
0871 end
0872 end
0873 
0874 %For converting to numeric. This is used instead of str2num because I want
0875 %to be able to choose what empty values should be mapped to.
0876 %
0877 % default the value to use for empty input
0878 function y=toDouble(x,default)
0879 if isempty(x) %Note that this catches '' as well
0880     y=default;
0881 else
0882     if isnumeric(x)
0883         y=x;
0884     else
0885         y=str2double(x);
0886         
0887         %This happens if the input couldn't be converted. Note that the
0888         %input itself cannot be NaN since it was fixed in clean imported
0889         if isnan(y)
0890             EM=['Cannot convert the string "' x '" to double'];
0891             dispEM(EM);
0892         end
0893     end
0894 end
0895 end
0896 
0897 %For converting boolean (the UNCONSTRAINED field) to double (the
0898 %model.unconstrained field)
0899 function y=boolToDouble(x)
0900 if isempty(x)
0901     y=0;
0902     return;
0903 end
0904 if islogical(x)
0905     y=x*1; %Typecast to double
0906     return;
0907 end
0908 if isnumeric(x)
0909     if x~=0
0910         y=1;
0911         return;
0912     else
0913         y=0;
0914         return;
0915     end
0916 end
0917 if ischar(x)
0918     if strcmpi(x,'TRUE')
0919         y=1;
0920         return;
0921     end
0922     if strcmpi(x,'FALSE')
0923         y=0;
0924         return;
0925     end
0926 end
0927 y=NaN; %This means that the input couldn't be parsed
0928 end

Generated by m2html © 2005