Home > core > getModelFromHomology.m

getModelFromHomology

PURPOSE ^

getModelFromHomology

SYNOPSIS ^

function [draftModel, hitGenes]=getModelFromHomology(models,blastStructure,getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,minLen,minIde,mapNewGenesToOld)

DESCRIPTION ^

 getModelFromHomology
   Constructs a new model from a set of existing models and gene homology
   information.

   models            a cell array of model structures to build the model
                     from. These models must be sorted by importance in
                     decreasing order
   blastStructure    a blastStructure as produced by getBlast or
                     getBlastFromExcel
   getModelFor       a three-four letter abbreviation of the organism to
                     build a model for. Must have BLASTP hits in both
                     directions to the organisms in 'models'
   preferredOrder    the order in which reactions should be added from the
                     models. If not supplied, reactions will be included
                     from all models, otherwise one gene will only result
                     in reactions from one model (optional, default {})
   strictness        integer that specifies which reactions should be
                     included:
                     1: Map new genes to old for all pairs, which have
                     acceptable BLASTP results in both directions
                     2: Map new genes to old for all pairs, which have
                     acceptable BLASTP results in correspondent direction
                     (mapping can be done in the opposite direction, see
                     mapNewGenesToOld below)
                     3: Check all BLASTP results and retain only the best
                     results by E-value for all gene pairs in each
                     direction separately. Then map new genes to old for
                     all pairs, which have acceptable BLASTP results in
                     both directions (optional, default 1).
   onlyGenesInModels consider BLASTP results only for genes that exist in
                     the models. This tends to import a larger fraction
                     from the existing models but may give less reliable
                     results. Has effect only if strictness=3 (optional,
                     default false)
   maxE              only look at genes with E-values <= this value (optional,
                     default 10^-30)
   minLen            only look at genes with alignment length >= this
                     value (optional, default 200)
   minIde            only look at genes with identity >= this value
                     (optional, default 40 (%))
   mapNewGenesToOld  determines how to match genes if not looking at only
                     1-1 orthologs. Either map the new genes to the old or
                     old genes to new. The default is to map the new genes
                     (optional, default true)

   draftModel        a model structure for the new organism
   hitGenes          collect the old and new genes

   The models in the 'models' structure should have named the metabolites
   in the same manner, have their reversible reactions in the same
   direction (run sortModel), and use the same compartment names. To avoid
   keeping unneccesary old genes, the models should not have
   'or'-relations in their grRules (use expandModel).

   The resulting draft model contains only reactions associated with
   orthologous genes. The old (original) genes involved in 'and'
   relations in grRules without any orthologs are still included in
   the draft model as OLD_MODELID_geneName.

   NOTE: "to" and "from" means relative to the new organism

 Usage: draftModel=getModelFromHomology(models,blastStructure,...
    getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
    minLen,minIde,mapNewGenesToOld)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [draftModel, hitGenes]=getModelFromHomology(models,blastStructure,...
0002     getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
0003     minLen,minIde,mapNewGenesToOld)
0004 % getModelFromHomology
0005 %   Constructs a new model from a set of existing models and gene homology
0006 %   information.
0007 %
0008 %   models            a cell array of model structures to build the model
0009 %                     from. These models must be sorted by importance in
0010 %                     decreasing order
0011 %   blastStructure    a blastStructure as produced by getBlast or
0012 %                     getBlastFromExcel
0013 %   getModelFor       a three-four letter abbreviation of the organism to
0014 %                     build a model for. Must have BLASTP hits in both
0015 %                     directions to the organisms in 'models'
0016 %   preferredOrder    the order in which reactions should be added from the
0017 %                     models. If not supplied, reactions will be included
0018 %                     from all models, otherwise one gene will only result
0019 %                     in reactions from one model (optional, default {})
0020 %   strictness        integer that specifies which reactions should be
0021 %                     included:
0022 %                     1: Map new genes to old for all pairs, which have
0023 %                     acceptable BLASTP results in both directions
0024 %                     2: Map new genes to old for all pairs, which have
0025 %                     acceptable BLASTP results in correspondent direction
0026 %                     (mapping can be done in the opposite direction, see
0027 %                     mapNewGenesToOld below)
0028 %                     3: Check all BLASTP results and retain only the best
0029 %                     results by E-value for all gene pairs in each
0030 %                     direction separately. Then map new genes to old for
0031 %                     all pairs, which have acceptable BLASTP results in
0032 %                     both directions (optional, default 1).
0033 %   onlyGenesInModels consider BLASTP results only for genes that exist in
0034 %                     the models. This tends to import a larger fraction
0035 %                     from the existing models but may give less reliable
0036 %                     results. Has effect only if strictness=3 (optional,
0037 %                     default false)
0038 %   maxE              only look at genes with E-values <= this value (optional,
0039 %                     default 10^-30)
0040 %   minLen            only look at genes with alignment length >= this
0041 %                     value (optional, default 200)
0042 %   minIde            only look at genes with identity >= this value
0043 %                     (optional, default 40 (%))
0044 %   mapNewGenesToOld  determines how to match genes if not looking at only
0045 %                     1-1 orthologs. Either map the new genes to the old or
0046 %                     old genes to new. The default is to map the new genes
0047 %                     (optional, default true)
0048 %
0049 %   draftModel        a model structure for the new organism
0050 %   hitGenes          collect the old and new genes
0051 %
0052 %   The models in the 'models' structure should have named the metabolites
0053 %   in the same manner, have their reversible reactions in the same
0054 %   direction (run sortModel), and use the same compartment names. To avoid
0055 %   keeping unneccesary old genes, the models should not have
0056 %   'or'-relations in their grRules (use expandModel).
0057 %
0058 %   The resulting draft model contains only reactions associated with
0059 %   orthologous genes. The old (original) genes involved in 'and'
0060 %   relations in grRules without any orthologs are still included in
0061 %   the draft model as OLD_MODELID_geneName.
0062 %
0063 %   NOTE: "to" and "from" means relative to the new organism
0064 %
0065 % Usage: draftModel=getModelFromHomology(models,blastStructure,...
0066 %    getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
0067 %    minLen,minIde,mapNewGenesToOld)
0068 
0069 hitGenes.oldGenes = [];  % collect the old genes from the template model (organism)
0070 hitGenes.newGenes = [];  % collect the new genes of the draft model (target organism)
0071 
0072 getModelFor=char(getModelFor);
0073 
0074 if nargin<4
0075     preferredOrder=[];
0076 else
0077     preferredOrder=convertCharArray(preferredOrder);
0078     [row,col]=size(preferredOrder);
0079     if col>row
0080         preferredOrder=transpose(preferredOrder);
0081     end
0082 end
0083 if nargin<5
0084     strictness=1;
0085 end
0086 if nargin<6
0087     onlyGenesInModels=false;
0088 end
0089 if nargin<7
0090     maxE=10^-30;
0091 end
0092 if nargin<8
0093     minLen=200;
0094 end
0095 if nargin<9
0096     minIde=40;
0097 end
0098 if nargin<10
0099     mapNewGenesToOld=true;
0100 end
0101 
0102 if isfield(models,'S')
0103     models={models};
0104 end
0105     
0106 %Check that all the information is in the blast structure
0107 modelNames=cell(numel(models),1);
0108 for i=1:numel(models)
0109     modelNames{i}=models{i}.id;
0110     %Gene short names, geneMiriams and proteins are often different
0111     %between species, safer not to include them
0112     if isfield(models{i},'geneShortNames')
0113         models{i}=rmfield(models{i},'geneShortNames');
0114     end
0115     if isfield(models{i},'geneMiriams')
0116         models{i}=rmfield(models{i},'geneMiriams');
0117     end
0118     if isfield(models{i},'proteins')
0119         models{i}=rmfield(models{i},'proteins');
0120     end
0121     %The geneFrom field also loses meaning if the genes are replaced by
0122     %orthologs
0123     if isfield(models{i},'geneFrom')
0124         models{i}=rmfield(models{i},'geneFrom');
0125     end
0126 end
0127 
0128 %Check that genes do not begin with ( or end with ), as this makes problematic grRules
0129 for i=1:numel(blastStructure)
0130     problemGenes = startsWith(blastStructure(i).fromGenes,'(') | endsWith(blastStructure(i).fromGenes,')');
0131     if any(problemGenes)
0132         error(['One or multiple gene identifiers from ' blastStructure(i).fromId ...
0133                ' starts with ''('' and/or ends with '')'', which is not allowed'])
0134     end
0135 end
0136 
0137 %Assume for now that all information is there and that it's correct. This
0138 %is important to fix since no further checks are being made!
0139 
0140 %Check whether provided fasta files use the same gene identifiers as
0141 %provided template models
0142 for i=1:numel(blastStructure)
0143     if ~strcmp(blastStructure(i).fromId,getModelFor)
0144         j=strcmpi(blastStructure(i).fromId,modelNames);
0145         if j==0
0146             error(['While the blastStructure contains sequences from '...
0147                 'organismID "%s" (as\nprovided in getBlast), none of '...
0148                 'template models have this id (as model.id)'],...
0149                 string(blastStructure(i).fromId));
0150         end
0151         k=sum(ismember(blastStructure(i).fromGenes,models{j}.genes));
0152         if k<(numel(models{j}.genes)*0.05)
0153             error(['Less than 5%% of the genes in the template model '...
0154                 'with model.id "%s"\ncan be found in the blastStructure. '...
0155                 'Ensure that the protein FASTA\nused in getBlast and '...
0156                 'the template model used in getModelFromHomology\nuse '...
0157                 'the same style of gene identifiers'],models{j}.id)
0158         end
0159     end
0160 end
0161 
0162 %Standardize grRules of template models
0163 for i=1:length(models)
0164     fprintf('\nStandardizing grRules of template model with ID "%s" ...',models{i}.id);
0165     [models{i}.grRules,models{i}.rxnGeneMat]=standardizeGrRules(models{i},false);
0166 end
0167 fprintf(' done\n');
0168 
0169 %Remove all gene matches that are below the cutoffs
0170 for i=1:numel(blastStructure)
0171     indexes=blastStructure(i).evalue<maxE & blastStructure(i).aligLen>=minLen & blastStructure(i).identity>=minIde; %Do it in this direction to lose NaNs
0172     blastStructure(i).fromGenes(~indexes)=[];
0173     blastStructure(i).toGenes(~indexes)=[];
0174     blastStructure(i).evalue(~indexes)=[];
0175     blastStructure(i).identity(~indexes)=[];
0176     blastStructure(i).aligLen(~indexes)=[];
0177     blastStructure(i).bitscore(~indexes)=[];
0178     blastStructure(i).ppos(~indexes)=[];
0179 end
0180 
0181 %Remove all reactions from the models that have no genes encoding for them.
0182 %Also remove all genes that encode for no reactions. There shouldn't be any
0183 %but there might be mistakes
0184 for i=1:numel(models)
0185     [hasGenes, ~]=find(models{i}.rxnGeneMat);
0186     hasNoGenes=1:numel(models{i}.rxns);
0187     hasNoGenes(hasGenes)=[];
0188     models{i}=removeReactions(models{i},hasNoGenes,true,true);
0189 end
0190 
0191 %Create a structure that contains all genes used in the blasts in any
0192 %direction for each of the models in 'models' and for the new organism. The
0193 %first cell is for the new organism and then according to the preferred
0194 %order. If no such order is supplied, then according to the order in
0195 %'models'
0196 allGenes=cell(numel(models)+1,1);
0197 if isempty(preferredOrder)
0198     useOrder=modelNames;
0199 else
0200     useOrder=preferredOrder;
0201 end
0202 
0203 %Get the corresponding indexes for those models in the 'models' structure
0204 useOrderIndexes=zeros(numel(models),1);
0205 for i=1:numel(models)
0206     [~, index]=ismember(models{i}.id,useOrder);
0207     useOrderIndexes(index)=i;
0208 end
0209 
0210 %Remove all genes from the blast structure that have no genes in the models
0211 if onlyGenesInModels==true
0212     modelGenes={};
0213     for i=1:numel(models)
0214         modelGenes=[modelGenes;models{i}.genes];
0215     end
0216     for i=1:numel(blastStructure)
0217         %Check to see if it should match the toId or fromId
0218         if strcmpi(blastStructure(i).fromId,getModelFor)
0219             I=ismember(blastStructure(i).toGenes,modelGenes);
0220         else
0221             I=ismember(blastStructure(i).fromGenes,modelGenes);
0222         end
0223         blastStructure(i).fromGenes(~I)=[];
0224         blastStructure(i).toGenes(~I)=[];
0225         blastStructure(i).evalue(~I)=[];
0226         blastStructure(i).identity(~I)=[];
0227         blastStructure(i).aligLen(~I)=[];
0228         blastStructure(i).bitscore(~I)=[];
0229         blastStructure(i).ppos(~I)=[];
0230         
0231         %Check that no matching in blastStructure is empty. This happens if
0232         %no genes in the models are present in the corresponding sheet
0233         if isempty(blastStructure(i).fromGenes)
0234             EM=['No genes in matching from ' blastStructure(i).fromId ' to ' blastStructure(i).toId ' are present in the corresponding model'];
0235             dispEM(EM);
0236         end
0237     end
0238 end
0239 
0240 %If only best orthologs are to be used then all other measurements are
0241 %deleted from the blastStructure. All code after this stays the same. This
0242 %means that preferred order can still matter. The best ortholog scoring is
0243 %based only on the E-value
0244 if strictness==3
0245     for i=1:numel(blastStructure)
0246         keep=false(numel(blastStructure(i).toGenes),1);
0247         [allFromGenes, ~, I]=unique(blastStructure(i).fromGenes);
0248         
0249         %It would be nice to get rid of this loop
0250         for j=1:numel(allFromGenes)
0251             allMatches=find(I==j);
0252             bestMatches=allMatches(blastStructure(i).evalue(allMatches)==min(blastStructure(i).evalue(allMatches)));
0253             
0254             %Keep the best matches
0255             keep(bestMatches)=true;
0256         end
0257         
0258         %Delete all matches that were not best matches
0259         blastStructure(i).fromGenes(~keep)=[];
0260         blastStructure(i).toGenes(~keep)=[];
0261         blastStructure(i).evalue(~keep)=[];
0262         blastStructure(i).identity(~keep)=[];
0263         blastStructure(i).aligLen(~keep)=[];
0264         blastStructure(i).bitscore(~keep)=[];
0265         blastStructure(i).ppos(~keep)=[];
0266     end
0267 end
0268 
0269 useOrder=[{getModelFor};useOrder];
0270 
0271 for i=1:numel(blastStructure)
0272     [~, toIndex]=ismember(blastStructure(i).toId,useOrder);
0273     [~, fromIndex]=ismember(blastStructure(i).fromId,useOrder);
0274     
0275     %Add all genes to the corresponding list in allGenes
0276     allGenes{toIndex}=[allGenes{toIndex};blastStructure(i).toGenes];
0277     allGenes{fromIndex}=[allGenes{fromIndex};blastStructure(i).fromGenes];
0278 end
0279 
0280 %Keep only the unique gene names
0281 maxOtherGeneNr=0; %Determines the dimension of the connectivity matrixes
0282 for i=1:numel(allGenes)
0283     allGenes{i}=unique(allGenes{i});
0284     if i>1
0285         if numel(allGenes{i})>maxOtherGeneNr
0286             maxOtherGeneNr=numel(allGenes{i});
0287         end
0288     end
0289 end
0290 
0291 %Generate a cell array of matrixes that describes how the genes in the new
0292 %organism map to the models. Each cell matches to the corresponding model
0293 %in useOrder (starting at 2 of course). First dimension is gene in new
0294 %organism, second which gene it is in the other organism. The second matrix
0295 %describes how they map back.
0296 
0297 %As it is now, a significant match is indicated by a 1. This could be
0298 %expanded to contain some kind of significance level. The first dimension
0299 %is still the genes in the new model.
0300 
0301 allTo=cell(numel(useOrder)-1,1);
0302 allFrom=cell(numel(useOrder)-1,1);
0303 
0304 for i=1:numel(useOrder)-1
0305     allTo{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1}));
0306     allFrom{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1}));
0307 end
0308 
0309 %Fill the matches to other species
0310 for i=1:numel(blastStructure)
0311     if strcmp(blastStructure(i).toId,getModelFor)
0312         %This was 'to' the new organism. They should all match so no checks
0313         %are being made
0314         [~, a]=ismember(blastStructure(i).toGenes,allGenes{1});
0315         [~, fromModel]=ismember(blastStructure(i).fromId,useOrder);
0316         [~, b]=ismember(blastStructure(i).fromGenes,allGenes{fromModel});
0317         idx = sub2ind(size(allTo{fromModel-1}), a, b);
0318         allTo{fromModel-1}(idx)=1;
0319     else
0320         %This was 'from' the new organism
0321         [~, a]=ismember(blastStructure(i).fromGenes,allGenes{1});
0322         [~, toModel]=ismember(blastStructure(i).toId,useOrder);
0323         [~, b]=ismember(blastStructure(i).toGenes,allGenes{toModel});
0324         idx = sub2ind(size(allFrom{toModel-1}), a, b);
0325         allFrom{toModel-1}(idx)=1;
0326     end
0327 end
0328 
0329 %Now we have all the gene matches in a convenient way. For all the genes in
0330 %the new organism get the genes that should be included from other
0331 %organisms. If all genes should be included this simply means keep the
0332 %allFrom matrix as it is. If only orthologs which could be mapped in both
0333 %BLAST directions are to be included then only those elements are kept.
0334 
0335 finalMappings=cell(numel(useOrder)-1,1);
0336 if strictness==1 || strictness==3
0337     for j=1:numel(allFrom)
0338         finalMappings{j}=allTo{j}~=0 & allFrom{j}~=0;
0339     end
0340 else
0341     if mapNewGenesToOld==true
0342         finalMappings=allFrom;
0343     else
0344         finalMappings=allTo;
0345     end
0346 end
0347 
0348 %Remove all genes from the mapping that are not in the models. This doesn't
0349 %do much if only genes in the models were used for the original mapping.
0350 %Also simplify the finalMapping and allGenes structures so that they only
0351 %contain mappings that exist
0352 usedNewGenes=false(numel(allGenes{1}),1);
0353 
0354 for i=1:numel(allGenes)-1
0355     %First remove mappings for those genes that are not in the model
0356     if onlyGenesInModels==false
0357         a=ismember(allGenes{i+1},models{useOrderIndexes(i)}.genes);
0358         finalMappings{i}(:,~a)=false;
0359     end
0360     
0361     %Then remove unused ones and simplify
0362     [a, b]=find(finalMappings{i});
0363     usedGenes=false(numel(allGenes{i+1}),1);
0364     usedNewGenes(a)=true;
0365     usedGenes(b)=true;
0366     finalMappings{i}=finalMappings{i}(:,usedGenes);
0367     allGenes{i+1}=allGenes{i+1}(usedGenes);
0368 end
0369 
0370 %Remove all new genes that have not been mapped to anything
0371 allGenes{1}=allGenes{1}(usedNewGenes);
0372 for i=1:numel(finalMappings)
0373     finalMappings{i}=finalMappings{i}(usedNewGenes,:);
0374 end
0375 
0376 %Now is it time to choose which reactions should be included from which
0377 %models. If there is a preferred order specified then each gene can only
0378 %result in reactions from one model, otherwise they should all be included
0379 
0380 %Start by simplifying the models by removing genes/reactions that are not
0381 %used. This is where it gets weird with complexes, especially "or"
0382 %complexes. In this step only reactions which are encoded by one single
0383 %gene, or where all genes should be deleted, are deleted. The info on the
0384 %full complex is still present in the grRules
0385 
0386 for i=1:numel(models)
0387     a=ismember(models{useOrderIndexes(i)}.genes,allGenes{i+1});
0388     
0389     %Remove reactions that are not associated to any of the genes in
0390     %allGenes, thereby also keeping complexes where only for one of the
0391     %genes was matched
0392     [rxnsToKeep,~] = find(models{useOrderIndexes(i)}.rxnGeneMat(:,a));
0393     rxnsToRemove = repmat(1,numel(models{useOrderIndexes(i)}.rxns),1);
0394     rxnsToRemove(rxnsToKeep) = 0;
0395     rxnsToRemove = find(rxnsToRemove);
0396     models{useOrderIndexes(i)}=removeReactions(models{useOrderIndexes(i)},rxnsToRemove,true,true,true);
0397 end
0398 
0399 %Since mergeModels function will be used in the end, the models are
0400 %simplified further by deleting genes/reactions in the order specified by
0401 %preferredOrder. This means that the last model will only contain reactions
0402 %for genes that mapped only to that model
0403 
0404 allUsedGenes=false(numel(allGenes{1}),1);
0405 
0406 if ~isempty(preferredOrder) && numel(models)>1
0407     [usedGenes, ~]=find(finalMappings{1}); %All that are used in the first model in preferredOrder
0408     allUsedGenes(usedGenes)=true;
0409     for i=2:numel(finalMappings)
0410         [usedGenes, ~]=find(finalMappings{i});
0411         usedGenes=unique(usedGenes);
0412         a=ismember(usedGenes,find(allUsedGenes));
0413         
0414         [~, genesToDelete]=find(finalMappings{i}(usedGenes(a),:)); %IMPORTANT! IS it really correct to remove all genes that map?
0415         genesToDelete=unique(genesToDelete); %Maybe not needed, but for clarity if nothing else
0416         
0417         %Remove all the genes that were already found and add the other
0418         %ones to allUsedGenes
0419         models{useOrderIndexes(i)}=removeGenes(models{useOrderIndexes(i)},allGenes{i+1}(genesToDelete),true,true,false);
0420         allUsedGenes(usedGenes)=true;
0421         
0422         %Remove the deleted genes from finalMappings and allGenes.
0423         finalMappings{i}(:,genesToDelete)=[];
0424         allGenes{i+1}(genesToDelete)=[];
0425     end
0426 end
0427 
0428 %Now loop through the models and update the gene associations. Genes not
0429 %belonging to the new organism will be renamed as 'OLD_MODELID_gene'
0430 for i=1:numel(models)
0431     %Find all the new genes that should be used for this model
0432     [newGenes, oldGenes]=find(finalMappings{i});
0433     
0434     %Create a new gene list with the genes from the new organism and those
0435     %genes that could not be removed
0436     replaceableGenes=allGenes{i+1}(unique(oldGenes));
0437     nonReplaceableGenes=setdiff(models{useOrderIndexes(i)}.genes,replaceableGenes);
0438     fullGeneList=[allGenes{1}(unique(newGenes));nonReplaceableGenes];
0439     
0440     %Just to save some indexing later. This is the LAST index of
0441     %replaceable ones
0442     nonRepStartIndex=numel(unique(newGenes));
0443     
0444     %Construct a new rxnGeneMat
0445     newRxnGeneMat=sparse(numel(models{useOrderIndexes(i)}.rxns),numel(fullGeneList));
0446     
0447     %Now update the rxnGeneMat. This is a little tricky and could
0448     %probably be done in a more efficient way, but I just loop through the
0449     %reactions and add them one by one
0450     for j=1:numel(models{useOrderIndexes(i)}.rxns)
0451         %Get the old genes encoding for this reaction
0452         [~, oldGeneIds]=find(models{useOrderIndexes(i)}.rxnGeneMat(j,:));
0453         
0454         %Update the matrix for each gene. This includes replacing one gene
0455         %with several new ones if there were several matches
0456         for k=1:numel(oldGeneIds)
0457             %Match the gene to one in the gene list. This is done as a text
0458             %match. Could probably be done better, but I'm a little lost in
0459             %the indexing
0460             
0461             geneName=models{useOrderIndexes(i)}.genes(oldGeneIds(k));
0462             
0463             %First search in the mappable genes
0464             mapIndex=find(ismember(allGenes{i+1},geneName));
0465             
0466             if ~isempty(mapIndex)
0467                 % add the old genes
0468                 hitGenes.oldGenes = [hitGenes.oldGenes, {geneName}];
0469                 
0470                 %Get the new genes for that gene
0471                 a=find(finalMappings{i}(:,mapIndex));
0472                 
0473                 %Find the positions of these genes in the final gene list
0474                 [~, b]=ismember(allGenes{1}(a),fullGeneList);
0475                 
0476                 %Update the matrix
0477                 newRxnGeneMat(j,b)=1;
0478                 
0479                 %Update the grRules string. This is tricky, but I hope that
0480                 %it's ok to replace the old gene name with the new one and
0481                 %add ') or (' if there were several matches. Be sure of
0482                 %this!
0483                 repString=fullGeneList{b(1)};
0484                 if numel(b)>1
0485                     for l=2:numel(b)
0486                         repString=[repString ' or ' fullGeneList{b(l)}];
0487                     end
0488                     repString=['(' repString ')'];
0489                 end
0490                 
0491                 % add the new matched genes
0492                 hitGenes.newGenes = [hitGenes.newGenes, {repString}];
0493                 
0494                 %Use regexprep instead of strrep to prevent partial matches
0495                 models{useOrderIndexes(i)}.grRules{j}=regexprep(models{useOrderIndexes(i)}.grRules{j},['(^|\s|\()' geneName{1} '($|\s|\))'],['$1' repString '$2']);
0496             else
0497                 %Then search in the non-replaceable genes. There could only
0498                 %be one match here
0499                 index=find(ismember(nonReplaceableGenes,geneName));
0500                 
0501                 %Update the matrix
0502                 newRxnGeneMat(j,nonRepStartIndex+index)=1;
0503                 
0504                 models{useOrderIndexes(i)}.grRules{j}=strrep(models{useOrderIndexes(i)}.grRules{j},geneName{1},strcat('OLD_',models{useOrderIndexes(i)}.id,'_',geneName{1}));
0505             end
0506         end
0507     end
0508     
0509     %Add the new list of genes
0510     models{useOrderIndexes(i)}.rxnGeneMat=newRxnGeneMat;
0511     if ~isempty(nonReplaceableGenes)
0512         models{useOrderIndexes(i)}.genes=[allGenes{1}(unique(newGenes));strcat('OLD_',models{useOrderIndexes(i)}.id,'_',nonReplaceableGenes)];
0513     else
0514         models{useOrderIndexes(i)}.genes=allGenes{1}(unique(newGenes));
0515     end
0516     if isfield(models{useOrderIndexes(i)},'geneComps')
0517         geneComps=models{useOrderIndexes(i)}.geneComps(1);
0518         models{useOrderIndexes(i)}.geneComps=zeros(numel(models{useOrderIndexes(i)}.genes),1);
0519         %Assume that all genes are in the same compartment, and this
0520         %compartment is specified for the first gene
0521         models{useOrderIndexes(i)}.geneComps(:)=geneComps;
0522     end
0523 end
0524 
0525 %Now merge the models. All information should be correct except for 'or'
0526 %complexes
0527 draftModel=mergeModels(models,'metNames');
0528 
0529 %Remove unnecessary OLD_ genes, that were added with OR relationships
0530 regexStr=['OLD_(', strjoin(modelNames(:),'|'),')_(\S^\))+'];
0531 draftModel.grRules=regexprep(draftModel.grRules,[' or ' regexStr],'');
0532 draftModel.grRules=regexprep(draftModel.grRules,[regexStr ' or '],'');
0533 
0534 %Change name of the resulting model
0535 draftModel.id=getModelFor;
0536 name='Generated by getModelFromHomology using ';
0537 for i=1:numel(models)
0538     if i<numel(models)
0539         name=[name models{i}.id ', '];
0540     else
0541         name=[name models{i}.id];
0542     end
0543 end
0544 draftModel.name=name;
0545 draftModel.rxnNotes=cell(length(draftModel.rxns),1);
0546 draftModel.rxnNotes(:)={'Included by getModelFromHomology'};
0547 draftModel.rxnConfidenceScores=NaN(length(draftModel.rxns),1);
0548 draftModel.rxnConfidenceScores(:)=2;
0549 draftModel=deleteUnusedGenes(draftModel,0);
0550 %Standardize grRules and notify if problematic grRules are found
0551 [draftModel.grRules,draftModel.rxnGeneMat]=standardizeGrRules(draftModel,false);
0552 draftModel=deleteUnusedGenes(draftModel,false);
0553 end

Generated by m2html © 2005