Home > external > kegg > getModelFromKEGG.m

getModelFromKEGG

PURPOSE ^

getModelFromKEGG

SYNOPSIS ^

function [model,KOModel]=getModelFromKEGG(keggPath,keepSpontaneous,keepUndefinedStoich,keepIncomplete,keepGeneral)

DESCRIPTION ^

 getModelFromKEGG
   Retrieves information stored in KEGG database and generates a model

   Input:
   keggPath            if keggGenes.mat, keggMets.mat, keggPhylDist.mat or
                       keggRxns.mat are not in the RAVEN/external/kegg
                       directory, this function will attempt to read data
                       from a local FTP dump of the KEGG database.
                       keggPath is the path to the root of this database
                       (optional, default 'RAVEN/external/kegg'). If
                       keggModel.mat is present in the same directory, the
                       function reads the data from this file and ignores
                       keggGenes.mat, keggMets.mat and keggRxns.mat
   keepSpontaneous     include reactions labeled as "spontaneous" (optional,
                       default true)
   keepUndefinedStoich include reactions in the form n A <=> n+1 A. These
                       will be dealt with as two separate metabolites
                       (optional, default true)
   keepIncomplete      include reactions which have been labelled as
                       "incomplete", "erroneous" or "unclear" (optional,
                       default true)
   keepGeneral         include reactions which have been labelled as
                       "general reaction". These are reactions on the form
                       "an aldehyde <=> an alcohol", and are therefore
                       unsuited for modelling purposes. Note that not all
                       reactions have this type of annotation, and the
                       script will therefore not be able to remove all
                       such reactions (optional, default false)

   Output:
   model               a model structure generated from the database. All
                       reactions and the metabolites used in them will be
                       added
   KOModel             a model structure representing the KEGG Orthology
                       ids and their associated genes. The KO ids are
                       saved as reactions

   NOTE: The model output from getModelFromKEGG can be used as template
   for fillGaps. In that case, ensure that the genes and rxnGeneMat fields
   are removed before parsing: model=rmfield(model,'genes'), etc.

 Usage: [model,KOModel]=getModelFromKEGG(keggPath,keepSpontaneous,...
    keepUndefinedStoich,keepIncomplete,keepGeneral)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [model,KOModel]=getModelFromKEGG(keggPath,keepSpontaneous,...
0002     keepUndefinedStoich,keepIncomplete,keepGeneral)
0003 % getModelFromKEGG
0004 %   Retrieves information stored in KEGG database and generates a model
0005 %
0006 %   Input:
0007 %   keggPath            if keggGenes.mat, keggMets.mat, keggPhylDist.mat or
0008 %                       keggRxns.mat are not in the RAVEN/external/kegg
0009 %                       directory, this function will attempt to read data
0010 %                       from a local FTP dump of the KEGG database.
0011 %                       keggPath is the path to the root of this database
0012 %                       (optional, default 'RAVEN/external/kegg'). If
0013 %                       keggModel.mat is present in the same directory, the
0014 %                       function reads the data from this file and ignores
0015 %                       keggGenes.mat, keggMets.mat and keggRxns.mat
0016 %   keepSpontaneous     include reactions labeled as "spontaneous" (optional,
0017 %                       default true)
0018 %   keepUndefinedStoich include reactions in the form n A <=> n+1 A. These
0019 %                       will be dealt with as two separate metabolites
0020 %                       (optional, default true)
0021 %   keepIncomplete      include reactions which have been labelled as
0022 %                       "incomplete", "erroneous" or "unclear" (optional,
0023 %                       default true)
0024 %   keepGeneral         include reactions which have been labelled as
0025 %                       "general reaction". These are reactions on the form
0026 %                       "an aldehyde <=> an alcohol", and are therefore
0027 %                       unsuited for modelling purposes. Note that not all
0028 %                       reactions have this type of annotation, and the
0029 %                       script will therefore not be able to remove all
0030 %                       such reactions (optional, default false)
0031 %
0032 %   Output:
0033 %   model               a model structure generated from the database. All
0034 %                       reactions and the metabolites used in them will be
0035 %                       added
0036 %   KOModel             a model structure representing the KEGG Orthology
0037 %                       ids and their associated genes. The KO ids are
0038 %                       saved as reactions
0039 %
0040 %   NOTE: The model output from getModelFromKEGG can be used as template
0041 %   for fillGaps. In that case, ensure that the genes and rxnGeneMat fields
0042 %   are removed before parsing: model=rmfield(model,'genes'), etc.
0043 %
0044 % Usage: [model,KOModel]=getModelFromKEGG(keggPath,keepSpontaneous,...
0045 %    keepUndefinedStoich,keepIncomplete,keepGeneral)
0046 
0047 if nargin<1
0048     keggPath='RAVEN/external/kegg';
0049 else
0050     keggPath=char(keggPath);
0051 end
0052 if nargin<2
0053     keepSpontaneous=true;
0054 end
0055 if nargin<3
0056     keepUndefinedStoich=true;
0057 end
0058 if nargin<4
0059     keepIncomplete=true;
0060 end
0061 if nargin<5
0062     keepGeneral=false;
0063 end
0064 
0065 ravenPath=findRAVENroot();
0066 modelFile=fullfile(ravenPath,'external','kegg','keggModel.mat');
0067 if exist(modelFile, 'file') && isNewestFile(ravenPath)
0068     fprintf(['Importing the global KEGG model from ' strrep(modelFile,'\','/') '... ']);
0069     load(modelFile);
0070     fprintf('COMPLETE\n');
0071 else
0072     fprintf(['NOTE: The file ' strrep(modelFile,'\','/') ' does not exist or is out-of-date and therefore will be (re)generated\n']);
0073     %First get all reactions
0074     [model,isSpontaneous,isUndefinedStoich,isIncomplete,isGeneral]=getRxnsFromKEGG(keggPath);
0075     
0076     %Get the KO ids that are associated with any of the reactions. They
0077     %will be used later on to create a rxn-gene matrix
0078     KOs=cell(numel(model.rxns)*2,1);
0079     %Make room for two KO ids per reaction
0080     
0081     addToIndex=1;
0082     %Add all KO ids that are used
0083     for i=1:numel(model.rxns)
0084         if isstruct(model.rxnMiriams{i})
0085             for j=1:numel(model.rxnMiriams{i}.name)
0086                 if strcmpi('kegg.orthology',model.rxnMiriams{i}.name{j})
0087                     %Add the KO id
0088                     KOs(addToIndex)=model.rxnMiriams{i}.value(j);
0089                     addToIndex=addToIndex+1;
0090                 end
0091             end
0092         end
0093     end
0094     
0095     KOs=KOs(1:addToIndex-1);
0096     KOs=unique(KOs);
0097     
0098     %Get all genes from any organism in KEGG that is associated with any of
0099     %the KOs
0100     KOModel=getGenesFromKEGG(keggPath,KOs);
0101     
0102     fprintf('Pruning the global KEGG model from the partially annotated, lumped KEGG Orthology entries... ')
0103     model.genes=KOModel.genes;
0104     
0105     %It can be that there are KOs from the reactions that have no database
0106     %entry. These are (as far as I have seen) lumped versions of other KOs
0107     %and should be removed
0108     KOsToRemove=setdiff(KOs, KOModel.rxns);
0109     
0110     %Loop through all reactions and delete the KOs that were not found
0111     for i=1:numel(model.rxns)
0112         if isstruct(model.rxnMiriams{i})
0113             for j=1:numel(model.rxnMiriams{i}.name)
0114                 toDel=[];
0115                 if strcmp(model.rxnMiriams{i}.name{j},'kegg.orthology')
0116                     if ismember(model.rxnMiriams{i}.value{j},KOsToRemove)
0117                         toDel=[toDel;j];
0118                     end
0119                 end
0120             end
0121             %Delete the KOs
0122             if any(toDel)
0123                 %If all posts are deleted, then delete the whole structure
0124                 if numel(toDel)==j
0125                     model.rxnMiriams{i}=[];
0126                 else
0127                     model.rxnMiriams{i}.name(toDel)=[];
0128                     model.rxnMiriams{i}.value(toDel)=[];
0129                 end
0130             end
0131         end
0132     end
0133     fprintf('COMPLETE\n');
0134     
0135     fprintf('Constructing the rxnGeneMat for the global KEGG model...   0%% complete');
0136     %Create the rxnGeneMat for the reactions. This is simply done by
0137     %merging the gene associations for all the involved KOs
0138     r=zeros(10000000,1);
0139     %Store the positions since it's slow to write to a sparse array in a
0140     %loop
0141     c=zeros(10000000,1);
0142     counter=1;
0143     for i=1:numel(model.rxns)
0144         if isstruct(model.rxnMiriams{i})
0145             I=strncmp('kegg.orthology',model.rxnMiriams{i}.name,18);
0146             if any(I)
0147                 [J, K]=ismember(model.rxnMiriams{i}.value(I),KOModel.rxns);
0148                 %Find all gene indexes that correspond to any of these KOs
0149                 [~, L]=find(KOModel.rxnGeneMat(K(J),:));
0150                 if any(L)
0151                     %Allocate room for more elements if needed
0152                     if counter+numel(L)-1>=numel(r)
0153                         r=[r;zeros(numel(r),1)];
0154                         c=[c;zeros(numel(c),1)];
0155                     end
0156                     r(counter:counter+numel(L)-1)=ones(numel(L),1)*i;
0157                     c(counter:counter+numel(L)-1)=L(:);
0158                     counter=counter+numel(L);
0159                 end
0160             end
0161         end
0162         if rem(i-1,100) == 0
0163             progress=pad(num2str(floor(i/numel(model.rxns)*100)),3,'left');
0164             fprintf('\b\b\b\b\b\b\b\b\b\b\b\b\b%s%% complete',progress);
0165         end
0166     end
0167     
0168     model.rxnGeneMat=sparse(r(1:counter-1),c(1:counter-1),ones(counter-1,1));
0169     if size(model.rxnGeneMat,1)~=numel(model.rxns) || size(model.rxnGeneMat,2)~=numel(KOModel.genes)
0170         model.rxnGeneMat(numel(model.rxns),numel(KOModel.genes))=0;
0171     end
0172     fprintf('\b\b\b\b\b\b\b\b\b\b\b\b\bCOMPLETE\n');
0173     
0174     %Then get all metabolites
0175     metModel=getMetsFromKEGG(keggPath);
0176     
0177     fprintf('Finalizing the global KEGG model... ');
0178     %Add information about all metabolites to the model
0179     [a, b]=ismember(model.mets,metModel.mets);
0180     a=find(a);
0181     b=b(a);
0182     
0183     if ~isfield(model,'metNames')
0184         model.metNames=cell(numel(model.mets),1);
0185         model.metNames(:)={''};
0186     end
0187     model.metNames(a)=metModel.metNames(b);
0188     
0189     if ~isfield(model,'metFormulas')
0190         model.metFormulas=cell(numel(model.mets),1);
0191         model.metFormulas(:)={''};
0192     end
0193     model.metFormulas(a)=metModel.metFormulas(b);
0194     
0195     if ~isfield(model,'inchis')
0196         model.inchis=cell(numel(model.mets),1);
0197         model.inchis(:)={''};
0198     end
0199     model.inchis(a)=metModel.inchis(b);
0200     
0201     if ~isfield(model,'metMiriams')
0202         model.metMiriams=cell(numel(model.mets),1);
0203     end
0204     model.metMiriams(a)=metModel.metMiriams(b);
0205     
0206     %The composition should be loaded from InChIs when available
0207     I=find(~cellfun(@isempty,model.inchis));
0208     for i=1:numel(I)
0209         S=regexp(model.inchis(I(i)),'/','split');
0210         S=S{1};
0211         if numel(S)>=2
0212             %Don't copy if it doesn't look good
0213             model.metFormulas(I(i))=S(2);
0214         end
0215     end
0216     
0217     %Put all metabolites in one compartment called 's' (for system). This
0218     %is done just to be more compatible with the rest of the code
0219     model.comps={'s'};
0220     model.compNames={'System'};
0221     model.metComps=ones(numel(model.mets),1);
0222     
0223     %If reactions with undefined stoichiometry are kept, then the
0224     %corresponding metabolites will have ids such as "(n+1) C00001" and
0225     %their names will be empty. These ids are not valid SBML identifiers
0226     %and are therefore replaced with "undefined1, undefined2...". The
0227     %former ids are stored as the new names
0228     I=find(cellfun(@any,strfind(model.mets,'n')) | cellfun(@any,strfind(model.mets,'m')));
0229     model.metNames(I)=model.mets(I);
0230     repNums=1:numel(I);
0231     repIDs=strcat('undefined_',cellfun(@num2str,num2cell(repNums(:)),'UniformOutput',false));
0232     model.mets(I)=repIDs;
0233     
0234     %It could also be that the metabolite names are empty for some other
0235     %reason. In that case, use the ID instead
0236     I=cellfun(@isempty,model.metNames);
0237     model.metNames(I)=model.mets(I);
0238 
0239     %Deafult LB and UB
0240     model.annotation.defaultLB = -1000;
0241     model.annotation.defaultUB = 1000;
0242     
0243     %Save the model structure
0244     save(modelFile,'model','KOModel','isGeneral','isIncomplete','isUndefinedStoich','isSpontaneous');
0245     fprintf('COMPLETE\n');
0246 end
0247 
0248 %Delete reactions which are labeled as "incomplete", "erroneous",
0249 %"unclear", "general reaction" or having undefined stoichiometry (depending
0250 %on settings)
0251 if keepSpontaneous==false
0252     model=removeReactions(model,intersect(isSpontaneous,model.rxns),true,true);
0253 end
0254 if keepUndefinedStoich==false
0255     model=removeReactions(model,intersect(isUndefinedStoich,model.rxns),true,true);
0256 end
0257 if keepIncomplete==false
0258     model=removeReactions(model,intersect(isIncomplete,model.rxns),true,true);
0259 end
0260 if keepGeneral==false
0261     model=removeReactions(model,intersect(isGeneral,model.rxns),true,true);
0262 end
0263 
0264 end
0265 
0266 function output = isNewestFile(ravenPath)
0267 %The ad hoc function, which checks whether keggModel.mat is the more
0268 %recently modified than keggRxns.mat, keggGenes.mat and keggRxns.mat
0269 modelFile=fullfile(ravenPath,'external','kegg','keggModel.mat');
0270 rxnsFile=fullfile(ravenPath,'external','kegg','keggRxns.mat');
0271 genesFile=fullfile(ravenPath,'external','kegg','keggGenes.mat');
0272 metsFile=fullfile(ravenPath,'external','kegg','keggMets.mat');
0273 if (getFileTime(modelFile)>getFileTime(rxnsFile))&&...
0274         (getFileTime(modelFile)>getFileTime(genesFile))&&...
0275         (getFileTime(modelFile)>getFileTime(metsFile))
0276     output=1;
0277 else
0278     output=0;
0279 end
0280 end
0281 
0282 function modTime = getFileTime(fileName)
0283 %Gets a last modification time for a particular file in datenum format that
0284 %the numbers could be easily compared for different files
0285 listing = dir(fileName);
0286 assert(numel(listing) == 1, 'No such file: %s', fileName);
0287 modTime = listing.datenum;
0288 format long;
0289 end

Generated by m2html © 2005