Home > core > getModelFromHomology.m

getModelFromHomology

PURPOSE ^

getModelFromHomology

SYNOPSIS ^

function [draftModel, hitGenes]=getModelFromHomology(models,blastStructure,getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,minLen,minIde,mapNewGenesToOld)

DESCRIPTION ^

 getModelFromHomology
   Constructs a new model from a set of existing models and gene homology
   information.

   models            a cell array of model structures to build the model
                     from. These models must be sorted by importance in
                     decreasing order
   blastStructure    a blastStructure as produced by getBlast or
                     getBlastFromExcel
   getModelFor       a three-four letter abbreviation of the organism to
                     build a model for. Must have BLASTP hits in both
                     directions to the organisms in 'models'
   preferredOrder    the order in which reactions should be added from the
                     models. If not supplied, reactions will be included
                     from all models, otherwise one gene will only result
                     in reactions from one model (opt, default {})
   strictness        integer that specifies which reactions should be
                     included:
                     1: Map new genes to old for all pairs, which have
                     acceptable BLASTP results in both directions
                     2: Map new genes to old for all pairs, which have
                     acceptable BLASTP results in correspondent direction
                     (mapping can be done in the opposite direction, see
                     mapNewGenesToOld below)
                     3: Check all BLASTP results and retain only the best
                     results by E-value for all gene pairs in each
                     direction separately. Then map new genes to old for
                     all pairs, which have acceptable BLASTP results in
                     both directions (opt, default 1).
   onlyGenesInModels consider BLASTP results only for genes that exist in
                     the models. This tends to import a larger fraction
                     from the existing models but may give less reliable
                     results. Has effect only if strictness=3 (opt,
                     default false)
   maxE              only look at genes with E-values <= this value (opt,
                     default 10^-30)
   minLen            only look at genes with alignment length >= this
                     value (opt, default 200)
   minIde            only look at genes with identity >= this value
                     (opt, default 40 (%))
   mapNewGenesToOld  determines how to match genes if not looking at only
                     1-1 orthologs. Either map the new genes to the old or
                     old genes to new. The default is to map the new genes
                     (opt, default true)

   draftModel        a model structure for the new organism
   hitGenes          collect the old and new genes

   The models in the 'models' structure should have named the metabolites
   in the same manner, have their reversible reactions in the same
   direction (run sortModel), and use the same compartment names. To avoid
   keeping unneccesary old genes, the models should not have
   'or'-relations in their grRules (use expandModel).

   The resulting draft model contains only reactions associated with
   orthologous genes. The old (original) genes involved in 'and'
   relations in grRules without any orthologs are still included in
   the draft model as OLD_MODELID_geneName.

   NOTE: "to" and "from" means relative to the new organism

   Usage: draftModel=getModelFromHomology(models,blastStructure,...
    getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
    minLen,minIde,mapNewGenesToOld)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [draftModel, hitGenes]=getModelFromHomology(models,blastStructure,...
0002     getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
0003     minLen,minIde,mapNewGenesToOld)
0004 % getModelFromHomology
0005 %   Constructs a new model from a set of existing models and gene homology
0006 %   information.
0007 %
0008 %   models            a cell array of model structures to build the model
0009 %                     from. These models must be sorted by importance in
0010 %                     decreasing order
0011 %   blastStructure    a blastStructure as produced by getBlast or
0012 %                     getBlastFromExcel
0013 %   getModelFor       a three-four letter abbreviation of the organism to
0014 %                     build a model for. Must have BLASTP hits in both
0015 %                     directions to the organisms in 'models'
0016 %   preferredOrder    the order in which reactions should be added from the
0017 %                     models. If not supplied, reactions will be included
0018 %                     from all models, otherwise one gene will only result
0019 %                     in reactions from one model (opt, default {})
0020 %   strictness        integer that specifies which reactions should be
0021 %                     included:
0022 %                     1: Map new genes to old for all pairs, which have
0023 %                     acceptable BLASTP results in both directions
0024 %                     2: Map new genes to old for all pairs, which have
0025 %                     acceptable BLASTP results in correspondent direction
0026 %                     (mapping can be done in the opposite direction, see
0027 %                     mapNewGenesToOld below)
0028 %                     3: Check all BLASTP results and retain only the best
0029 %                     results by E-value for all gene pairs in each
0030 %                     direction separately. Then map new genes to old for
0031 %                     all pairs, which have acceptable BLASTP results in
0032 %                     both directions (opt, default 1).
0033 %   onlyGenesInModels consider BLASTP results only for genes that exist in
0034 %                     the models. This tends to import a larger fraction
0035 %                     from the existing models but may give less reliable
0036 %                     results. Has effect only if strictness=3 (opt,
0037 %                     default false)
0038 %   maxE              only look at genes with E-values <= this value (opt,
0039 %                     default 10^-30)
0040 %   minLen            only look at genes with alignment length >= this
0041 %                     value (opt, default 200)
0042 %   minIde            only look at genes with identity >= this value
0043 %                     (opt, default 40 (%))
0044 %   mapNewGenesToOld  determines how to match genes if not looking at only
0045 %                     1-1 orthologs. Either map the new genes to the old or
0046 %                     old genes to new. The default is to map the new genes
0047 %                     (opt, default true)
0048 %
0049 %   draftModel        a model structure for the new organism
0050 %   hitGenes          collect the old and new genes
0051 %
0052 %   The models in the 'models' structure should have named the metabolites
0053 %   in the same manner, have their reversible reactions in the same
0054 %   direction (run sortModel), and use the same compartment names. To avoid
0055 %   keeping unneccesary old genes, the models should not have
0056 %   'or'-relations in their grRules (use expandModel).
0057 %
0058 %   The resulting draft model contains only reactions associated with
0059 %   orthologous genes. The old (original) genes involved in 'and'
0060 %   relations in grRules without any orthologs are still included in
0061 %   the draft model as OLD_MODELID_geneName.
0062 %
0063 %   NOTE: "to" and "from" means relative to the new organism
0064 %
0065 %   Usage: draftModel=getModelFromHomology(models,blastStructure,...
0066 %    getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
0067 %    minLen,minIde,mapNewGenesToOld)
0068 
0069 hitGenes.oldGenes = [];  % collect the old genes from the template model (organism)
0070 hitGenes.newGenes = [];  % collect the new genes of the draft model (target organism)
0071 
0072 getModelFor=char(getModelFor);
0073 
0074 if nargin<4
0075     preferredOrder=[];
0076 else
0077     preferredOrder=convertCharArray(preferredOrder);
0078     [row,col]=size(preferredOrder);
0079     if col>row
0080         preferredOrder=transpose(preferredOrder);
0081     end
0082 end
0083 if nargin<5
0084     strictness=1;
0085 end
0086 if nargin<6
0087     onlyGenesInModels=false;
0088 end
0089 if nargin<7
0090     maxE=10^-30;
0091 end
0092 if nargin<8
0093     minLen=200;
0094 end
0095 if nargin<9
0096     minIde=40;
0097 end
0098 if nargin<10
0099     mapNewGenesToOld=true;
0100 end
0101 
0102 if isfield(models,'S')
0103     models={models};
0104 end
0105     
0106 %Check that all the information is in the blast structure
0107 modelNames=cell(numel(models),1);
0108 for i=1:numel(models)
0109     modelNames{i}=models{i}.id;
0110     %Gene short names and geneMiriams are often different between species,
0111     %safer not to include them
0112     if isfield(models{i},'geneShortNames')
0113         models{i}=rmfield(models{i},'geneShortNames');
0114     end
0115     if isfield(models{i},'geneMiriams')
0116         models{i}=rmfield(models{i},'geneMiriams');
0117     end
0118     %The geneFrom field also loses meaning if the genes are replaced by
0119     %orthologs
0120     if isfield(models{i},'geneFrom')
0121         models{i}=rmfield(models{i},'geneFrom');
0122     end
0123 end
0124 
0125 %Check that genes do not begin with ( or end with ), as this makes problematic grRules
0126 for i=1:numel(blastStructure)
0127     problemGenes = startsWith(blastStructure(i).fromGenes,'(') | endsWith(blastStructure(i).fromGenes,')');
0128     if any(problemGenes)
0129         error(['One or multiple gene identifiers from ' blastStructure(i).fromId ...
0130                ' starts with ''('' and/or ends with '')'', which is not allowed'])
0131     end
0132 end
0133 
0134 %Assume for now that all information is there and that it's correct. This
0135 %is important to fix since no further checks are being made!
0136 
0137 %Check whether provided fasta files use the same gene identifiers as
0138 %provided template models
0139 for i=1:numel(blastStructure)
0140     if ~strcmp(blastStructure(i).fromId,getModelFor)
0141         j=strcmpi(blastStructure(i).fromId,modelNames);
0142         if j==0
0143             error(['While the blastStructure contains sequences from '...
0144                 'organismID "%s" (as\nprovided in getBlast), none of '...
0145                 'template models have this id (as model.id)'],...
0146                 string(blastStructure(i).fromId));
0147         end
0148         k=sum(ismember(blastStructure(i).fromGenes,models{j}.genes));
0149         if k<(numel(models{j}.genes)*0.05)
0150             error(['Less than 5%% of the genes in the template model '...
0151                 'with model.id "%s"\ncan be found in the blastStructure. '...
0152                 'Ensure that the protein FASTA\nused in getBlast and '...
0153                 'the template model used in getModelFromHomology\nuse '...
0154                 'the same style of gene identifiers'],models{j}.id)
0155         end
0156     end
0157 end
0158 
0159 %Standardize grRules of template models
0160 for i=1:length(models)
0161     fprintf('\nStandardizing grRules of template model with ID "%s" ...',models{i}.id);
0162     [models{i}.grRules,models{i}.rxnGeneMat]=standardizeGrRules(models{i},false);
0163 end
0164 fprintf(' done\n');
0165 
0166 %Remove all gene matches that are below the cutoffs
0167 for i=1:numel(blastStructure)
0168     indexes=blastStructure(i).evalue<maxE & blastStructure(i).aligLen>=minLen & blastStructure(i).identity>=minIde; %Do it in this direction to lose NaNs
0169     blastStructure(i).fromGenes(~indexes)=[];
0170     blastStructure(i).toGenes(~indexes)=[];
0171     blastStructure(i).evalue(~indexes)=[];
0172     blastStructure(i).identity(~indexes)=[];
0173     blastStructure(i).aligLen(~indexes)=[];
0174     blastStructure(i).bitscore(~indexes)=[];
0175     blastStructure(i).ppos(~indexes)=[];
0176 end
0177 
0178 %Remove all reactions from the models that have no genes encoding for them.
0179 %Also remove all genes that encode for no reactions. There shouldn't be any
0180 %but there might be mistakes
0181 for i=1:numel(models)
0182     [hasGenes, ~]=find(models{i}.rxnGeneMat);
0183     hasNoGenes=1:numel(models{i}.rxns);
0184     hasNoGenes(hasGenes)=[];
0185     models{i}=removeReactions(models{i},hasNoGenes,true,true);
0186 end
0187 
0188 %Create a structure that contains all genes used in the blasts in any
0189 %direction for each of the models in 'models' and for the new organism. The
0190 %first cell is for the new organism and then according to the preferred
0191 %order. If no such order is supplied, then according to the order in
0192 %'models'
0193 allGenes=cell(numel(models)+1,1);
0194 if isempty(preferredOrder)
0195     useOrder=modelNames;
0196 else
0197     useOrder=preferredOrder;
0198 end
0199 
0200 %Get the corresponding indexes for those models in the 'models' structure
0201 useOrderIndexes=zeros(numel(models),1);
0202 for i=1:numel(models)
0203     [~, index]=ismember(models{i}.id,useOrder);
0204     useOrderIndexes(index)=i;
0205 end
0206 
0207 %Remove all genes from the blast structure that have no genes in the models
0208 if onlyGenesInModels==true
0209     modelGenes={};
0210     for i=1:numel(models)
0211         modelGenes=[modelGenes;models{i}.genes];
0212     end
0213     for i=1:numel(blastStructure)
0214         %Check to see if it should match the toId or fromId
0215         if strcmpi(blastStructure(i).fromId,getModelFor)
0216             I=ismember(blastStructure(i).toGenes,modelGenes);
0217         else
0218             I=ismember(blastStructure(i).fromGenes,modelGenes);
0219         end
0220         blastStructure(i).fromGenes(~I)=[];
0221         blastStructure(i).toGenes(~I)=[];
0222         blastStructure(i).evalue(~I)=[];
0223         blastStructure(i).identity(~I)=[];
0224         blastStructure(i).aligLen(~I)=[];
0225         blastStructure(i).bitscore(~I)=[];
0226         blastStructure(i).ppos(~I)=[];
0227         
0228         %Check that no matching in blastStructure is empty. This happens if
0229         %no genes in the models are present in the corresponding sheet
0230         if isempty(blastStructure(i).fromGenes)
0231             EM=['No genes in matching from ' blastStructure(i).fromId ' to ' blastStructure(i).toId ' are present in the corresponding model'];
0232             dispEM(EM);
0233         end
0234     end
0235 end
0236 
0237 %If only best orthologs are to be used then all other measurements are
0238 %deleted from the blastStructure. All code after this stays the same. This
0239 %means that preferred order can still matter. The best ortholog scoring is
0240 %based only on the E-value
0241 if strictness==3
0242     for i=1:numel(blastStructure)
0243         keep=false(numel(blastStructure(i).toGenes),1);
0244         [allFromGenes, ~, I]=unique(blastStructure(i).fromGenes);
0245         
0246         %It would be nice to get rid of this loop
0247         for j=1:numel(allFromGenes)
0248             allMatches=find(I==j);
0249             bestMatches=allMatches(blastStructure(i).evalue(allMatches)==min(blastStructure(i).evalue(allMatches)));
0250             
0251             %Keep the best matches
0252             keep(bestMatches)=true;
0253         end
0254         
0255         %Delete all matches that were not best matches
0256         blastStructure(i).fromGenes(~keep)=[];
0257         blastStructure(i).toGenes(~keep)=[];
0258         blastStructure(i).evalue(~keep)=[];
0259         blastStructure(i).identity(~keep)=[];
0260         blastStructure(i).aligLen(~keep)=[];
0261         blastStructure(i).bitscore(~keep)=[];
0262         blastStructure(i).ppos(~keep)=[];
0263     end
0264 end
0265 
0266 useOrder=[{getModelFor};useOrder];
0267 
0268 for i=1:numel(blastStructure)
0269     [~, toIndex]=ismember(blastStructure(i).toId,useOrder);
0270     [~, fromIndex]=ismember(blastStructure(i).fromId,useOrder);
0271     
0272     %Add all genes to the corresponding list in allGenes
0273     allGenes{toIndex}=[allGenes{toIndex};blastStructure(i).toGenes];
0274     allGenes{fromIndex}=[allGenes{fromIndex};blastStructure(i).fromGenes];
0275 end
0276 
0277 %Keep only the unique gene names
0278 maxOtherGeneNr=0; %Determines the dimension of the connectivity matrixes
0279 for i=1:numel(allGenes)
0280     allGenes{i}=unique(allGenes{i});
0281     if i>1
0282         if numel(allGenes{i})>maxOtherGeneNr
0283             maxOtherGeneNr=numel(allGenes{i});
0284         end
0285     end
0286 end
0287 
0288 %Generate a cell array of matrixes that describes how the genes in the new
0289 %organism map to the models. Each cell matches to the corresponding model
0290 %in useOrder (starting at 2 of course). First dimension is gene in new
0291 %organism, second which gene it is in the other organism. The second matrix
0292 %describes how they map back.
0293 
0294 %As it is now, a significant match is indicated by a 1. This could be
0295 %expanded to contain some kind of significance level. The first dimension
0296 %is still the genes in the new model.
0297 
0298 allTo=cell(numel(useOrder)-1,1);
0299 allFrom=cell(numel(useOrder)-1,1);
0300 
0301 for i=1:numel(useOrder)-1
0302     allTo{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1}));
0303     allFrom{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1}));
0304 end
0305 
0306 %Fill the matches to other species
0307 for i=1:numel(blastStructure)
0308     if strcmp(blastStructure(i).toId,getModelFor)
0309         %This was 'to' the new organism. They should all match so no checks
0310         %are being made
0311         [~, a]=ismember(blastStructure(i).toGenes,allGenes{1});
0312         [~, fromModel]=ismember(blastStructure(i).fromId,useOrder);
0313         [~, b]=ismember(blastStructure(i).fromGenes,allGenes{fromModel});
0314         idx = sub2ind(size(allTo{fromModel-1}), a, b);
0315         allTo{fromModel-1}(idx)=1;
0316     else
0317         %This was 'from' the new organism
0318         [~, a]=ismember(blastStructure(i).fromGenes,allGenes{1});
0319         [~, toModel]=ismember(blastStructure(i).toId,useOrder);
0320         [~, b]=ismember(blastStructure(i).toGenes,allGenes{toModel});
0321         idx = sub2ind(size(allFrom{toModel-1}), a, b);
0322         allFrom{toModel-1}(idx)=1;
0323     end
0324 end
0325 
0326 %Now we have all the gene matches in a convenient way. For all the genes in
0327 %the new organism get the genes that should be included from other
0328 %organisms. If all genes should be included this simply means keep the
0329 %allFrom matrix as it is. If only orthologs which could be mapped in both
0330 %BLAST directions are to be included then only those elements are kept.
0331 
0332 finalMappings=cell(numel(useOrder)-1,1);
0333 if strictness==1 || strictness==3
0334     for j=1:numel(allFrom)
0335         finalMappings{j}=allTo{j}~=0 & allFrom{j}~=0;
0336     end
0337 else
0338     if mapNewGenesToOld==true
0339         finalMappings=allFrom;
0340     else
0341         finalMappings=allTo;
0342     end
0343 end
0344 
0345 %Remove all genes from the mapping that are not in the models. This doesn't
0346 %do much if only genes in the models were used for the original mapping.
0347 %Also simplify the finalMapping and allGenes structures so that they only
0348 %contain mappings that exist
0349 usedNewGenes=false(numel(allGenes{1}),1);
0350 
0351 for i=1:numel(allGenes)-1
0352     %First remove mappings for those genes that are not in the model
0353     if onlyGenesInModels==false
0354         a=ismember(allGenes{i+1},models{useOrderIndexes(i)}.genes);
0355         finalMappings{i}(:,~a)=false;
0356     end
0357     
0358     %Then remove unused ones and simplify
0359     [a, b]=find(finalMappings{i});
0360     usedGenes=false(numel(allGenes{i+1}),1);
0361     usedNewGenes(a)=true;
0362     usedGenes(b)=true;
0363     finalMappings{i}=finalMappings{i}(:,usedGenes);
0364     allGenes{i+1}=allGenes{i+1}(usedGenes);
0365 end
0366 
0367 %Remove all new genes that have not been mapped to anything
0368 allGenes{1}=allGenes{1}(usedNewGenes);
0369 for i=1:numel(finalMappings)
0370     finalMappings{i}=finalMappings{i}(usedNewGenes,:);
0371 end
0372 
0373 %Now is it time to choose which reactions should be included from which
0374 %models. If there is a preferred order specified then each gene can only
0375 %result in reactions from one model, otherwise they should all be included
0376 
0377 %Start by simplifying the models by removing genes/reactions that are not
0378 %used. This is where it gets weird with complexes, especially "or"
0379 %complexes. In this step only reactions which are encoded by one single
0380 %gene, or where all genes should be deleted, are deleted. The info on the
0381 %full complex is still present in the grRules
0382 
0383 for i=1:numel(models)
0384     a=ismember(models{useOrderIndexes(i)}.genes,allGenes{i+1});
0385     
0386     %Remove reactions that are not associated to any of the genes in
0387     %allGenes, thereby also keeping complexes where only for one of the
0388     %genes was matched
0389     [rxnsToKeep,~] = find(models{useOrderIndexes(i)}.rxnGeneMat(:,a));
0390     rxnsToRemove = repmat(1,numel(models{useOrderIndexes(i)}.rxns),1);
0391     rxnsToRemove(rxnsToKeep) = 0;
0392     rxnsToRemove = find(rxnsToRemove);
0393     models{useOrderIndexes(i)}=removeReactions(models{useOrderIndexes(i)},rxnsToRemove,true,true,true);
0394 end
0395 
0396 %Since mergeModels function will be used in the end, the models are
0397 %simplified further by deleting genes/reactions in the order specified by
0398 %preferredOrder. This means that the last model will only contain reactions
0399 %for genes that mapped only to that model
0400 
0401 allUsedGenes=false(numel(allGenes{1}),1);
0402 
0403 if ~isempty(preferredOrder) && numel(models)>1
0404     [usedGenes, ~]=find(finalMappings{1}); %All that are used in the first model in preferredOrder
0405     allUsedGenes(usedGenes)=true;
0406     for i=2:numel(finalMappings)
0407         [usedGenes, ~]=find(finalMappings{i});
0408         usedGenes=unique(usedGenes);
0409         a=ismember(usedGenes,find(allUsedGenes));
0410         
0411         [~, genesToDelete]=find(finalMappings{i}(usedGenes(a),:)); %IMPORTANT! IS it really correct to remove all genes that map?
0412         genesToDelete=unique(genesToDelete); %Maybe not needed, but for clarity if nothing else
0413         
0414         %Remove all the genes that were already found and add the other
0415         %ones to allUsedGenes
0416         models{useOrderIndexes(i)}=removeGenes(models{useOrderIndexes(i)},allGenes{i+1}(genesToDelete),true,true,false);
0417         allUsedGenes(usedGenes)=true;
0418         
0419         %Remove the deleted genes from finalMappings and allGenes.
0420         finalMappings{i}(:,genesToDelete)=[];
0421         allGenes{i+1}(genesToDelete)=[];
0422     end
0423 end
0424 
0425 %Now loop through the models and update the gene associations. Genes not
0426 %belonging to the new organism will be renamed as 'OLD_MODELID_gene'
0427 for i=1:numel(models)
0428     %Find all the new genes that should be used for this model
0429     [newGenes, oldGenes]=find(finalMappings{i});
0430     
0431     %Create a new gene list with the genes from the new organism and those
0432     %genes that could not be removed
0433     replaceableGenes=allGenes{i+1}(unique(oldGenes));
0434     nonReplaceableGenes=setdiff(models{useOrderIndexes(i)}.genes,replaceableGenes);
0435     fullGeneList=[allGenes{1}(unique(newGenes));nonReplaceableGenes];
0436     
0437     %Just to save some indexing later. This is the LAST index of
0438     %replaceable ones
0439     nonRepStartIndex=numel(unique(newGenes));
0440     
0441     %Construct a new rxnGeneMat
0442     newRxnGeneMat=sparse(numel(models{useOrderIndexes(i)}.rxns),numel(fullGeneList));
0443     
0444     %Now update the rxnGeneMat. This is a little tricky and could
0445     %probably be done in a more efficient way, but I just loop through the
0446     %reactions and add them one by one
0447     for j=1:numel(models{useOrderIndexes(i)}.rxns)
0448         %Get the old genes encoding for this reaction
0449         [~, oldGeneIds]=find(models{useOrderIndexes(i)}.rxnGeneMat(j,:));
0450         
0451         %Update the matrix for each gene. This includes replacing one gene
0452         %with several new ones if there were several matches
0453         for k=1:numel(oldGeneIds)
0454             %Match the gene to one in the gene list. This is done as a text
0455             %match. Could probably be done better, but I'm a little lost in
0456             %the indexing
0457             
0458             geneName=models{useOrderIndexes(i)}.genes(oldGeneIds(k));
0459             
0460             %First search in the mappable genes
0461             mapIndex=find(ismember(allGenes{i+1},geneName));
0462             
0463             if ~isempty(mapIndex)
0464                 % add the old genes
0465                 hitGenes.oldGenes = [hitGenes.oldGenes, {geneName}];
0466                 
0467                 %Get the new genes for that gene
0468                 a=find(finalMappings{i}(:,mapIndex));
0469                 
0470                 %Find the positions of these genes in the final gene list
0471                 [~, b]=ismember(allGenes{1}(a),fullGeneList);
0472                 
0473                 %Update the matrix
0474                 newRxnGeneMat(j,b)=1;
0475                 
0476                 %Update the grRules string. This is tricky, but I hope that
0477                 %it's ok to replace the old gene name with the new one and
0478                 %add ') or (' if there were several matches. Be sure of
0479                 %this!
0480                 repString=fullGeneList{b(1)};
0481                 if numel(b)>1
0482                     for l=2:numel(b)
0483                         repString=[repString ' or ' fullGeneList{b(l)}];
0484                     end
0485                     repString=['(' repString ')'];
0486                 end
0487                 
0488                 % add the new matched genes
0489                 hitGenes.newGenes = [hitGenes.newGenes, {repString}];
0490                 
0491                 %Use regexprep instead of strrep to prevent partial matches
0492                 models{useOrderIndexes(i)}.grRules{j}=regexprep(models{useOrderIndexes(i)}.grRules{j},['(^|\s|\()' geneName{1} '($|\s|\))'],['$1' repString '$2']);
0493             else
0494                 %Then search in the non-replaceable genes. There could only
0495                 %be one match here
0496                 index=find(ismember(nonReplaceableGenes,geneName));
0497                 
0498                 %Update the matrix
0499                 newRxnGeneMat(j,nonRepStartIndex+index)=1;
0500                 
0501                 models{useOrderIndexes(i)}.grRules{j}=strrep(models{useOrderIndexes(i)}.grRules{j},geneName{1},strcat('OLD_',models{useOrderIndexes(i)}.id,'_',geneName{1}));
0502             end
0503         end
0504     end
0505     
0506     %Add the new list of genes
0507     models{useOrderIndexes(i)}.rxnGeneMat=newRxnGeneMat;
0508     if ~isempty(nonReplaceableGenes)
0509         models{useOrderIndexes(i)}.genes=[allGenes{1}(unique(newGenes));strcat('OLD_',models{useOrderIndexes(i)}.id,'_',nonReplaceableGenes)];
0510     else
0511         models{useOrderIndexes(i)}.genes=allGenes{1}(unique(newGenes));
0512     end
0513     if isfield(models{useOrderIndexes(i)},'geneComps')
0514         geneComps=models{useOrderIndexes(i)}.geneComps(1);
0515         models{useOrderIndexes(i)}.geneComps=zeros(numel(models{useOrderIndexes(i)}.genes),1);
0516         %Assume that all genes are in the same compartment, and this
0517         %compartment is specified for the first gene
0518         models{useOrderIndexes(i)}.geneComps(:)=geneComps;
0519     end
0520 end
0521 
0522 %Now merge the models. All information should be correct except for 'or'
0523 %complexes
0524 draftModel=mergeModels(models,'metNames');
0525 
0526 %Remove unnecessary OLD_ genes, that were added with OR relationships
0527 regexStr=['OLD_(', strjoin(modelNames(:),'|'),')_(\S^\))+'];
0528 draftModel.grRules=regexprep(draftModel.grRules,[' or ' regexStr],'');
0529 draftModel.grRules=regexprep(draftModel.grRules,[regexStr ' or '],'');
0530 
0531 %Change name of the resulting model
0532 draftModel.id=getModelFor;
0533 name='Generated by getModelFromHomology using ';
0534 for i=1:numel(models)
0535     if i<numel(models)
0536         name=[name models{i}.id ', '];
0537     else
0538         name=[name models{i}.id];
0539     end
0540 end
0541 draftModel.name=name;
0542 draftModel.rxnNotes=cell(length(draftModel.rxns),1);
0543 draftModel.rxnNotes(:)={'Included by getModelFromHomology'};
0544 draftModel.rxnConfidenceScores=NaN(length(draftModel.rxns),1);
0545 draftModel.rxnConfidenceScores(:)=2;
0546 draftModel=deleteUnusedGenes(draftModel,0);
0547 %Standardize grRules and notify if problematic grRules are found
0548 [draftModel.grRules,draftModel.rxnGeneMat]=standardizeGrRules(draftModel,false);
0549 draftModel=deleteUnusedGenes(draftModel,false);
0550 end

Generated by m2html © 2005