


getModelFromHomology
Constructs a new model from a set of existing models and gene homology
information.
models a cell array of model structures to build the model
from. These models must be sorted by importance in
decreasing order
blastStructure a blastStructure as produced by getBlast or
getBlastFromExcel
getModelFor a three-four letter abbreviation of the organism to
build a model for. Must have BLASTP hits in both
directions to the organisms in 'models'
preferredOrder the order in which reactions should be added from the
models. If not supplied, reactions will be included
from all models, otherwise one gene will only result
in reactions from one model (optional, default {})
strictness integer that specifies which reactions should be
included:
1: Map new genes to old for all pairs, which have
acceptable BLASTP results in both directions
2: Map new genes to old for all pairs, which have
acceptable BLASTP results in correspondent direction
(mapping can be done in the opposite direction, see
mapNewGenesToOld below)
3: Check all BLASTP results and retain only the best
results by E-value for all gene pairs in each
direction separately. Then map new genes to old for
all pairs, which have acceptable BLASTP results in
both directions (optional, default 1).
onlyGenesInModels consider BLASTP results only for genes that exist in
the models. This tends to import a larger fraction
from the existing models but may give less reliable
results. Has effect only if strictness=3 (optional,
default false)
maxE only look at genes with E-values <= this value (optional,
default 10^-30)
minLen only look at genes with alignment length >= this
value (optional, default 200)
minIde only look at genes with identity >= this value
(optional, default 40 (%))
mapNewGenesToOld determines how to match genes if not looking at only
1-1 orthologs. Either map the new genes to the old or
old genes to new. The default is to map the new genes
(optional, default true)
draftModel a model structure for the new organism
hitGenes collect the old and new genes
The models in the 'models' structure should have named the metabolites
in the same manner, have their reversible reactions in the same
direction (run sortModel), and use the same compartment names. To avoid
keeping unneccesary old genes, the models should not have
'or'-relations in their grRules (use expandModel).
The resulting draft model contains only reactions associated with
orthologous genes. The old (original) genes involved in 'and'
relations in grRules without any orthologs are still included in
the draft model as OLD_MODELID_geneName.
NOTE: "to" and "from" means relative to the new organism
Usage: draftModel=getModelFromHomology(models,blastStructure,...
getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,...
minLen,minIde,mapNewGenesToOld)

0001 function [draftModel, hitGenes]=getModelFromHomology(models,blastStructure,... 0002 getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,... 0003 minLen,minIde,mapNewGenesToOld) 0004 % getModelFromHomology 0005 % Constructs a new model from a set of existing models and gene homology 0006 % information. 0007 % 0008 % models a cell array of model structures to build the model 0009 % from. These models must be sorted by importance in 0010 % decreasing order 0011 % blastStructure a blastStructure as produced by getBlast or 0012 % getBlastFromExcel 0013 % getModelFor a three-four letter abbreviation of the organism to 0014 % build a model for. Must have BLASTP hits in both 0015 % directions to the organisms in 'models' 0016 % preferredOrder the order in which reactions should be added from the 0017 % models. If not supplied, reactions will be included 0018 % from all models, otherwise one gene will only result 0019 % in reactions from one model (optional, default {}) 0020 % strictness integer that specifies which reactions should be 0021 % included: 0022 % 1: Map new genes to old for all pairs, which have 0023 % acceptable BLASTP results in both directions 0024 % 2: Map new genes to old for all pairs, which have 0025 % acceptable BLASTP results in correspondent direction 0026 % (mapping can be done in the opposite direction, see 0027 % mapNewGenesToOld below) 0028 % 3: Check all BLASTP results and retain only the best 0029 % results by E-value for all gene pairs in each 0030 % direction separately. Then map new genes to old for 0031 % all pairs, which have acceptable BLASTP results in 0032 % both directions (optional, default 1). 0033 % onlyGenesInModels consider BLASTP results only for genes that exist in 0034 % the models. This tends to import a larger fraction 0035 % from the existing models but may give less reliable 0036 % results. Has effect only if strictness=3 (optional, 0037 % default false) 0038 % maxE only look at genes with E-values <= this value (optional, 0039 % default 10^-30) 0040 % minLen only look at genes with alignment length >= this 0041 % value (optional, default 200) 0042 % minIde only look at genes with identity >= this value 0043 % (optional, default 40 (%)) 0044 % mapNewGenesToOld determines how to match genes if not looking at only 0045 % 1-1 orthologs. Either map the new genes to the old or 0046 % old genes to new. The default is to map the new genes 0047 % (optional, default true) 0048 % 0049 % draftModel a model structure for the new organism 0050 % hitGenes collect the old and new genes 0051 % 0052 % The models in the 'models' structure should have named the metabolites 0053 % in the same manner, have their reversible reactions in the same 0054 % direction (run sortModel), and use the same compartment names. To avoid 0055 % keeping unneccesary old genes, the models should not have 0056 % 'or'-relations in their grRules (use expandModel). 0057 % 0058 % The resulting draft model contains only reactions associated with 0059 % orthologous genes. The old (original) genes involved in 'and' 0060 % relations in grRules without any orthologs are still included in 0061 % the draft model as OLD_MODELID_geneName. 0062 % 0063 % NOTE: "to" and "from" means relative to the new organism 0064 % 0065 % Usage: draftModel=getModelFromHomology(models,blastStructure,... 0066 % getModelFor,preferredOrder,strictness,onlyGenesInModels,maxE,... 0067 % minLen,minIde,mapNewGenesToOld) 0068 0069 hitGenes.oldGenes = []; % collect the old genes from the template model (organism) 0070 hitGenes.newGenes = []; % collect the new genes of the draft model (target organism) 0071 0072 getModelFor=char(getModelFor); 0073 0074 if nargin<4 0075 preferredOrder=[]; 0076 else 0077 preferredOrder=convertCharArray(preferredOrder); 0078 [row,col]=size(preferredOrder); 0079 if col>row 0080 preferredOrder=transpose(preferredOrder); 0081 end 0082 end 0083 if nargin<5 0084 strictness=1; 0085 end 0086 if nargin<6 0087 onlyGenesInModels=false; 0088 end 0089 if nargin<7 0090 maxE=10^-30; 0091 end 0092 if nargin<8 0093 minLen=200; 0094 end 0095 if nargin<9 0096 minIde=40; 0097 end 0098 if nargin<10 0099 mapNewGenesToOld=true; 0100 end 0101 0102 if isfield(models,'S') 0103 models={models}; 0104 end 0105 0106 %Check that all the information is in the blast structure 0107 modelNames=cell(numel(models),1); 0108 for i=1:numel(models) 0109 modelNames{i}=models{i}.id; 0110 %Gene short names, geneMiriams and proteins are often different 0111 %between species, safer not to include them 0112 if isfield(models{i},'geneShortNames') 0113 models{i}=rmfield(models{i},'geneShortNames'); 0114 end 0115 if isfield(models{i},'geneMiriams') 0116 models{i}=rmfield(models{i},'geneMiriams'); 0117 end 0118 if isfield(models{i},'proteins') 0119 models{i}=rmfield(models{i},'proteins'); 0120 end 0121 %The geneFrom field also loses meaning if the genes are replaced by 0122 %orthologs 0123 if isfield(models{i},'geneFrom') 0124 models{i}=rmfield(models{i},'geneFrom'); 0125 end 0126 end 0127 0128 %Check that genes do not begin with ( or end with ), as this makes problematic grRules 0129 for i=1:numel(blastStructure) 0130 problemGenes = startsWith(blastStructure(i).fromGenes,'(') | endsWith(blastStructure(i).fromGenes,')'); 0131 if any(problemGenes) 0132 error(['One or multiple gene identifiers from ' blastStructure(i).fromId ... 0133 ' starts with ''('' and/or ends with '')'', which is not allowed']) 0134 end 0135 end 0136 0137 %Assume for now that all information is there and that it's correct. This 0138 %is important to fix since no further checks are being made! 0139 0140 %Check whether provided fasta files use the same gene identifiers as 0141 %provided template models 0142 for i=1:numel(blastStructure) 0143 if ~strcmp(blastStructure(i).fromId,getModelFor) 0144 j=strcmpi(blastStructure(i).fromId,modelNames); 0145 if j==0 0146 error(['While the blastStructure contains sequences from '... 0147 'organismID "%s" (as\nprovided in getBlast), none of '... 0148 'template models have this id (as model.id)'],... 0149 string(blastStructure(i).fromId)); 0150 end 0151 k=sum(ismember(blastStructure(i).fromGenes,models{j}.genes)); 0152 if k<(numel(models{j}.genes)*0.05) 0153 error(['Less than 5%% of the genes in the template model '... 0154 'with model.id "%s"\ncan be found in the blastStructure. '... 0155 'Ensure that the protein FASTA\nused in getBlast and '... 0156 'the template model used in getModelFromHomology\nuse '... 0157 'the same style of gene identifiers'],models{j}.id) 0158 end 0159 end 0160 end 0161 0162 %Standardize grRules of template models 0163 for i=1:length(models) 0164 fprintf('\nStandardizing grRules of template model with ID "%s" ...',models{i}.id); 0165 [models{i}.grRules,models{i}.rxnGeneMat]=standardizeGrRules(models{i},false); 0166 end 0167 fprintf(' done\n'); 0168 0169 %Remove all gene matches that are below the cutoffs 0170 for i=1:numel(blastStructure) 0171 indexes=blastStructure(i).evalue<maxE & blastStructure(i).aligLen>=minLen & blastStructure(i).identity>=minIde; %Do it in this direction to lose NaNs 0172 blastStructure(i).fromGenes(~indexes)=[]; 0173 blastStructure(i).toGenes(~indexes)=[]; 0174 blastStructure(i).evalue(~indexes)=[]; 0175 blastStructure(i).identity(~indexes)=[]; 0176 blastStructure(i).aligLen(~indexes)=[]; 0177 blastStructure(i).bitscore(~indexes)=[]; 0178 blastStructure(i).ppos(~indexes)=[]; 0179 end 0180 0181 %Remove all reactions from the models that have no genes encoding for them. 0182 %Also remove all genes that encode for no reactions. There shouldn't be any 0183 %but there might be mistakes 0184 for i=1:numel(models) 0185 [hasGenes, ~]=find(models{i}.rxnGeneMat); 0186 hasNoGenes=1:numel(models{i}.rxns); 0187 hasNoGenes(hasGenes)=[]; 0188 models{i}=removeReactions(models{i},hasNoGenes,true,true); 0189 end 0190 0191 %Create a structure that contains all genes used in the blasts in any 0192 %direction for each of the models in 'models' and for the new organism. The 0193 %first cell is for the new organism and then according to the preferred 0194 %order. If no such order is supplied, then according to the order in 0195 %'models' 0196 allGenes=cell(numel(models)+1,1); 0197 if isempty(preferredOrder) 0198 useOrder=modelNames; 0199 else 0200 useOrder=preferredOrder; 0201 end 0202 0203 %Get the corresponding indexes for those models in the 'models' structure 0204 useOrderIndexes=zeros(numel(models),1); 0205 for i=1:numel(models) 0206 [~, index]=ismember(models{i}.id,useOrder); 0207 useOrderIndexes(index)=i; 0208 end 0209 0210 %Remove all genes from the blast structure that have no genes in the models 0211 if onlyGenesInModels==true 0212 modelGenes={}; 0213 for i=1:numel(models) 0214 modelGenes=[modelGenes;models{i}.genes]; 0215 end 0216 for i=1:numel(blastStructure) 0217 %Check to see if it should match the toId or fromId 0218 if strcmpi(blastStructure(i).fromId,getModelFor) 0219 I=ismember(blastStructure(i).toGenes,modelGenes); 0220 else 0221 I=ismember(blastStructure(i).fromGenes,modelGenes); 0222 end 0223 blastStructure(i).fromGenes(~I)=[]; 0224 blastStructure(i).toGenes(~I)=[]; 0225 blastStructure(i).evalue(~I)=[]; 0226 blastStructure(i).identity(~I)=[]; 0227 blastStructure(i).aligLen(~I)=[]; 0228 blastStructure(i).bitscore(~I)=[]; 0229 blastStructure(i).ppos(~I)=[]; 0230 0231 %Check that no matching in blastStructure is empty. This happens if 0232 %no genes in the models are present in the corresponding sheet 0233 if isempty(blastStructure(i).fromGenes) 0234 EM=['No genes in matching from ' blastStructure(i).fromId ' to ' blastStructure(i).toId ' are present in the corresponding model']; 0235 dispEM(EM); 0236 end 0237 end 0238 end 0239 0240 %If only best orthologs are to be used then all other measurements are 0241 %deleted from the blastStructure. All code after this stays the same. This 0242 %means that preferred order can still matter. The best ortholog scoring is 0243 %based only on the E-value 0244 if strictness==3 0245 for i=1:numel(blastStructure) 0246 keep=false(numel(blastStructure(i).toGenes),1); 0247 [allFromGenes, ~, I]=unique(blastStructure(i).fromGenes); 0248 0249 %It would be nice to get rid of this loop 0250 for j=1:numel(allFromGenes) 0251 allMatches=find(I==j); 0252 bestMatches=allMatches(blastStructure(i).evalue(allMatches)==min(blastStructure(i).evalue(allMatches))); 0253 0254 %Keep the best matches 0255 keep(bestMatches)=true; 0256 end 0257 0258 %Delete all matches that were not best matches 0259 blastStructure(i).fromGenes(~keep)=[]; 0260 blastStructure(i).toGenes(~keep)=[]; 0261 blastStructure(i).evalue(~keep)=[]; 0262 blastStructure(i).identity(~keep)=[]; 0263 blastStructure(i).aligLen(~keep)=[]; 0264 blastStructure(i).bitscore(~keep)=[]; 0265 blastStructure(i).ppos(~keep)=[]; 0266 end 0267 end 0268 0269 useOrder=[{getModelFor};useOrder]; 0270 0271 for i=1:numel(blastStructure) 0272 [~, toIndex]=ismember(blastStructure(i).toId,useOrder); 0273 [~, fromIndex]=ismember(blastStructure(i).fromId,useOrder); 0274 0275 %Add all genes to the corresponding list in allGenes 0276 allGenes{toIndex}=[allGenes{toIndex};blastStructure(i).toGenes]; 0277 allGenes{fromIndex}=[allGenes{fromIndex};blastStructure(i).fromGenes]; 0278 end 0279 0280 %Keep only the unique gene names 0281 maxOtherGeneNr=0; %Determines the dimension of the connectivity matrixes 0282 for i=1:numel(allGenes) 0283 allGenes{i}=unique(allGenes{i}); 0284 if i>1 0285 if numel(allGenes{i})>maxOtherGeneNr 0286 maxOtherGeneNr=numel(allGenes{i}); 0287 end 0288 end 0289 end 0290 0291 %Generate a cell array of matrixes that describes how the genes in the new 0292 %organism map to the models. Each cell matches to the corresponding model 0293 %in useOrder (starting at 2 of course). First dimension is gene in new 0294 %organism, second which gene it is in the other organism. The second matrix 0295 %describes how they map back. 0296 0297 %As it is now, a significant match is indicated by a 1. This could be 0298 %expanded to contain some kind of significance level. The first dimension 0299 %is still the genes in the new model. 0300 0301 allTo=cell(numel(useOrder)-1,1); 0302 allFrom=cell(numel(useOrder)-1,1); 0303 0304 for i=1:numel(useOrder)-1 0305 allTo{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1})); 0306 allFrom{i}=sparse(numel(allGenes{1}),numel(allGenes{i+1})); 0307 end 0308 0309 %Fill the matches to other species 0310 for i=1:numel(blastStructure) 0311 if strcmp(blastStructure(i).toId,getModelFor) 0312 %This was 'to' the new organism. They should all match so no checks 0313 %are being made 0314 [~, a]=ismember(blastStructure(i).toGenes,allGenes{1}); 0315 [~, fromModel]=ismember(blastStructure(i).fromId,useOrder); 0316 [~, b]=ismember(blastStructure(i).fromGenes,allGenes{fromModel}); 0317 idx = sub2ind(size(allTo{fromModel-1}), a, b); 0318 allTo{fromModel-1}(idx)=1; 0319 else 0320 %This was 'from' the new organism 0321 [~, a]=ismember(blastStructure(i).fromGenes,allGenes{1}); 0322 [~, toModel]=ismember(blastStructure(i).toId,useOrder); 0323 [~, b]=ismember(blastStructure(i).toGenes,allGenes{toModel}); 0324 idx = sub2ind(size(allFrom{toModel-1}), a, b); 0325 allFrom{toModel-1}(idx)=1; 0326 end 0327 end 0328 0329 %Now we have all the gene matches in a convenient way. For all the genes in 0330 %the new organism get the genes that should be included from other 0331 %organisms. If all genes should be included this simply means keep the 0332 %allFrom matrix as it is. If only orthologs which could be mapped in both 0333 %BLAST directions are to be included then only those elements are kept. 0334 0335 finalMappings=cell(numel(useOrder)-1,1); 0336 if strictness==1 || strictness==3 0337 for j=1:numel(allFrom) 0338 finalMappings{j}=allTo{j}~=0 & allFrom{j}~=0; 0339 end 0340 else 0341 if mapNewGenesToOld==true 0342 finalMappings=allFrom; 0343 else 0344 finalMappings=allTo; 0345 end 0346 end 0347 0348 %Remove all genes from the mapping that are not in the models. This doesn't 0349 %do much if only genes in the models were used for the original mapping. 0350 %Also simplify the finalMapping and allGenes structures so that they only 0351 %contain mappings that exist 0352 usedNewGenes=false(numel(allGenes{1}),1); 0353 0354 for i=1:numel(allGenes)-1 0355 %First remove mappings for those genes that are not in the model 0356 if onlyGenesInModels==false 0357 a=ismember(allGenes{i+1},models{useOrderIndexes(i)}.genes); 0358 finalMappings{i}(:,~a)=false; 0359 end 0360 0361 %Then remove unused ones and simplify 0362 [a, b]=find(finalMappings{i}); 0363 usedGenes=false(numel(allGenes{i+1}),1); 0364 usedNewGenes(a)=true; 0365 usedGenes(b)=true; 0366 finalMappings{i}=finalMappings{i}(:,usedGenes); 0367 allGenes{i+1}=allGenes{i+1}(usedGenes); 0368 end 0369 0370 %Remove all new genes that have not been mapped to anything 0371 allGenes{1}=allGenes{1}(usedNewGenes); 0372 for i=1:numel(finalMappings) 0373 finalMappings{i}=finalMappings{i}(usedNewGenes,:); 0374 end 0375 0376 %Now is it time to choose which reactions should be included from which 0377 %models. If there is a preferred order specified then each gene can only 0378 %result in reactions from one model, otherwise they should all be included 0379 0380 %Start by simplifying the models by removing genes/reactions that are not 0381 %used. This is where it gets weird with complexes, especially "or" 0382 %complexes. In this step only reactions which are encoded by one single 0383 %gene, or where all genes should be deleted, are deleted. The info on the 0384 %full complex is still present in the grRules 0385 0386 for i=1:numel(models) 0387 a=ismember(models{useOrderIndexes(i)}.genes,allGenes{i+1}); 0388 0389 %Remove reactions that are not associated to any of the genes in 0390 %allGenes, thereby also keeping complexes where only for one of the 0391 %genes was matched 0392 [rxnsToKeep,~] = find(models{useOrderIndexes(i)}.rxnGeneMat(:,a)); 0393 rxnsToRemove = repmat(1,numel(models{useOrderIndexes(i)}.rxns),1); 0394 rxnsToRemove(rxnsToKeep) = 0; 0395 rxnsToRemove = find(rxnsToRemove); 0396 models{useOrderIndexes(i)}=removeReactions(models{useOrderIndexes(i)},rxnsToRemove,true,true,true); 0397 end 0398 0399 %Since mergeModels function will be used in the end, the models are 0400 %simplified further by deleting genes/reactions in the order specified by 0401 %preferredOrder. This means that the last model will only contain reactions 0402 %for genes that mapped only to that model 0403 0404 allUsedGenes=false(numel(allGenes{1}),1); 0405 0406 if ~isempty(preferredOrder) && numel(models)>1 0407 [usedGenes, ~]=find(finalMappings{1}); %All that are used in the first model in preferredOrder 0408 allUsedGenes(usedGenes)=true; 0409 for i=2:numel(finalMappings) 0410 [usedGenes, ~]=find(finalMappings{i}); 0411 usedGenes=unique(usedGenes); 0412 a=ismember(usedGenes,find(allUsedGenes)); 0413 0414 [~, genesToDelete]=find(finalMappings{i}(usedGenes(a),:)); %IMPORTANT! IS it really correct to remove all genes that map? 0415 genesToDelete=unique(genesToDelete); %Maybe not needed, but for clarity if nothing else 0416 0417 %Remove all the genes that were already found and add the other 0418 %ones to allUsedGenes 0419 models{useOrderIndexes(i)}=removeGenes(models{useOrderIndexes(i)},allGenes{i+1}(genesToDelete),true,true,false); 0420 allUsedGenes(usedGenes)=true; 0421 0422 %Remove the deleted genes from finalMappings and allGenes. 0423 finalMappings{i}(:,genesToDelete)=[]; 0424 allGenes{i+1}(genesToDelete)=[]; 0425 end 0426 end 0427 0428 %Now loop through the models and update the gene associations. Genes not 0429 %belonging to the new organism will be renamed as 'OLD_MODELID_gene' 0430 for i=1:numel(models) 0431 %Find all the new genes that should be used for this model 0432 [newGenes, oldGenes]=find(finalMappings{i}); 0433 0434 %Create a new gene list with the genes from the new organism and those 0435 %genes that could not be removed 0436 replaceableGenes=allGenes{i+1}(unique(oldGenes)); 0437 nonReplaceableGenes=setdiff(models{useOrderIndexes(i)}.genes,replaceableGenes); 0438 fullGeneList=[allGenes{1}(unique(newGenes));nonReplaceableGenes]; 0439 0440 %Just to save some indexing later. This is the LAST index of 0441 %replaceable ones 0442 nonRepStartIndex=numel(unique(newGenes)); 0443 0444 %Construct a new rxnGeneMat 0445 newRxnGeneMat=sparse(numel(models{useOrderIndexes(i)}.rxns),numel(fullGeneList)); 0446 0447 %Now update the rxnGeneMat. This is a little tricky and could 0448 %probably be done in a more efficient way, but I just loop through the 0449 %reactions and add them one by one 0450 for j=1:numel(models{useOrderIndexes(i)}.rxns) 0451 %Get the old genes encoding for this reaction 0452 [~, oldGeneIds]=find(models{useOrderIndexes(i)}.rxnGeneMat(j,:)); 0453 0454 %Update the matrix for each gene. This includes replacing one gene 0455 %with several new ones if there were several matches 0456 for k=1:numel(oldGeneIds) 0457 %Match the gene to one in the gene list. This is done as a text 0458 %match. Could probably be done better, but I'm a little lost in 0459 %the indexing 0460 0461 geneName=models{useOrderIndexes(i)}.genes(oldGeneIds(k)); 0462 0463 %First search in the mappable genes 0464 mapIndex=find(ismember(allGenes{i+1},geneName)); 0465 0466 if ~isempty(mapIndex) 0467 % add the old genes 0468 hitGenes.oldGenes = [hitGenes.oldGenes, {geneName}]; 0469 0470 %Get the new genes for that gene 0471 a=find(finalMappings{i}(:,mapIndex)); 0472 0473 %Find the positions of these genes in the final gene list 0474 [~, b]=ismember(allGenes{1}(a),fullGeneList); 0475 0476 %Update the matrix 0477 newRxnGeneMat(j,b)=1; 0478 0479 %Update the grRules string. This is tricky, but I hope that 0480 %it's ok to replace the old gene name with the new one and 0481 %add ') or (' if there were several matches. Be sure of 0482 %this! 0483 repString=fullGeneList{b(1)}; 0484 if numel(b)>1 0485 for l=2:numel(b) 0486 repString=[repString ' or ' fullGeneList{b(l)}]; 0487 end 0488 repString=['(' repString ')']; 0489 end 0490 0491 % add the new matched genes 0492 hitGenes.newGenes = [hitGenes.newGenes, {repString}]; 0493 0494 %Use regexprep instead of strrep to prevent partial matches 0495 models{useOrderIndexes(i)}.grRules{j}=regexprep(models{useOrderIndexes(i)}.grRules{j},['(^|\s|\()' geneName{1} '($|\s|\))'],['$1' repString '$2']); 0496 else 0497 %Then search in the non-replaceable genes. There could only 0498 %be one match here 0499 index=find(ismember(nonReplaceableGenes,geneName)); 0500 0501 %Update the matrix 0502 newRxnGeneMat(j,nonRepStartIndex+index)=1; 0503 0504 models{useOrderIndexes(i)}.grRules{j}=strrep(models{useOrderIndexes(i)}.grRules{j},geneName{1},strcat('OLD_',models{useOrderIndexes(i)}.id,'_',geneName{1})); 0505 end 0506 end 0507 end 0508 0509 %Add the new list of genes 0510 models{useOrderIndexes(i)}.rxnGeneMat=newRxnGeneMat; 0511 if ~isempty(nonReplaceableGenes) 0512 models{useOrderIndexes(i)}.genes=[allGenes{1}(unique(newGenes));strcat('OLD_',models{useOrderIndexes(i)}.id,'_',nonReplaceableGenes)]; 0513 else 0514 models{useOrderIndexes(i)}.genes=allGenes{1}(unique(newGenes)); 0515 end 0516 if isfield(models{useOrderIndexes(i)},'geneComps') 0517 geneComps=models{useOrderIndexes(i)}.geneComps(1); 0518 models{useOrderIndexes(i)}.geneComps=zeros(numel(models{useOrderIndexes(i)}.genes),1); 0519 %Assume that all genes are in the same compartment, and this 0520 %compartment is specified for the first gene 0521 models{useOrderIndexes(i)}.geneComps(:)=geneComps; 0522 end 0523 end 0524 0525 %Now merge the models. All information should be correct except for 'or' 0526 %complexes 0527 draftModel=mergeModels(models,'metNames'); 0528 0529 %Remove unnecessary OLD_ genes, that were added with OR relationships 0530 regexStr=['OLD_(', strjoin(modelNames(:),'|'),')_(\S^\))+']; 0531 draftModel.grRules=regexprep(draftModel.grRules,[' or ' regexStr],''); 0532 draftModel.grRules=regexprep(draftModel.grRules,[regexStr ' or '],''); 0533 0534 %Change name of the resulting model 0535 draftModel.id=getModelFor; 0536 name='Generated by getModelFromHomology using '; 0537 for i=1:numel(models) 0538 if i<numel(models) 0539 name=[name models{i}.id ', ']; 0540 else 0541 name=[name models{i}.id]; 0542 end 0543 end 0544 draftModel.name=name; 0545 draftModel.rxnNotes=cell(length(draftModel.rxns),1); 0546 draftModel.rxnNotes(:)={'Included by getModelFromHomology'}; 0547 draftModel.rxnConfidenceScores=NaN(length(draftModel.rxns),1); 0548 draftModel.rxnConfidenceScores(:)=2; 0549 draftModel=deleteUnusedGenes(draftModel,0); 0550 %Standardize grRules and notify if problematic grRules are found 0551 [draftModel.grRules,draftModel.rxnGeneMat]=standardizeGrRules(draftModel,false); 0552 draftModel=deleteUnusedGenes(draftModel,false); 0553 end