Home > external > metacyc > getMetsFromMetaCyc.m

getMetsFromMetaCyc

PURPOSE ^

getMetsFromMetaCyc

SYNOPSIS ^

function metaCycMets=getMetsFromMetaCyc(metacycPath)

DESCRIPTION ^

 getMetsFromMetaCyc
   Retrieves information of all metabolites in MetaCyc database

   Input:
   metacycPath  if metacycMets.mat is not in the RAVEN\external\metacyc directory,
                this function will attempt to build it by reading info from
                a local dump of MetaCyc database, and metacycPath is the path
                to the MetaCyc data files

   Output:
   model        a model structure generated from the database. The following
                fields are filled
                id:             'MetaCyc'
                name:    'Automatically generated from MetaCyc database'
                mets:           MetaCyc compound ids
                metNames:       Compound name. Reuse compound id here if
                                there is no name provided
                metFormulas:    The chemical composition of the metabolite.
                inchis:         InChI string for the metabolite
                metCharges:     Compound charge state
                metMiriams:     If there is a CHEBI id available, then that
                                will be saved here
                keggid:         The corresponding KEGG compound id if available
                version:        MetaCyc database version

   If the file metaCycMets.mat is in the RAVEN\external\metacyc directory
   it will be directly loaded. Otherwise, it will be generated by parsing
   the MetaCyc database files. In general, this metaCycMets.mat file should
   be removed and rebuilt when a newer version of MetaCyc is released.
               
 Usage: model=getMetsFromMetaCyc(metacycPath)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function metaCycMets=getMetsFromMetaCyc(metacycPath)
0002 % getMetsFromMetaCyc
0003 %   Retrieves information of all metabolites in MetaCyc database
0004 %
0005 %   Input:
0006 %   metacycPath  if metacycMets.mat is not in the RAVEN\external\metacyc directory,
0007 %                this function will attempt to build it by reading info from
0008 %                a local dump of MetaCyc database, and metacycPath is the path
0009 %                to the MetaCyc data files
0010 %
0011 %   Output:
0012 %   model        a model structure generated from the database. The following
0013 %                fields are filled
0014 %                id:             'MetaCyc'
0015 %                name:    'Automatically generated from MetaCyc database'
0016 %                mets:           MetaCyc compound ids
0017 %                metNames:       Compound name. Reuse compound id here if
0018 %                                there is no name provided
0019 %                metFormulas:    The chemical composition of the metabolite.
0020 %                inchis:         InChI string for the metabolite
0021 %                metCharges:     Compound charge state
0022 %                metMiriams:     If there is a CHEBI id available, then that
0023 %                                will be saved here
0024 %                keggid:         The corresponding KEGG compound id if available
0025 %                version:        MetaCyc database version
0026 %
0027 %   If the file metaCycMets.mat is in the RAVEN\external\metacyc directory
0028 %   it will be directly loaded. Otherwise, it will be generated by parsing
0029 %   the MetaCyc database files. In general, this metaCycMets.mat file should
0030 %   be removed and rebuilt when a newer version of MetaCyc is released.
0031 %
0032 % Usage: model=getMetsFromMetaCyc(metacycPath)
0033 
0034 % NOTE: This is how one entry looks in the file
0035 
0036 % //
0037 % UNIQUE-ID - CPD-18846
0038 % TYPES - CPD-18866
0039 % COMMON-NAME - 12-ethyl-8-propyl-bacteriochlorophyllide <i>d</i>
0040 % ATOM-CHARGES - (43 -1)
0041 % ATOM-CHARGES - (12 -1)
0042 % CHEMICAL-FORMULA - (C 35)
0043 % CHEMICAL-FORMULA - (H 36)
0044 % CHEMICAL-FORMULA - (N 4)
0045 % CHEMICAL-FORMULA - (O 4)
0046 % CHEMICAL-FORMULA - (MG 1)
0047 % CITATIONS - 26331578
0048 % CITATIONS - 2350541
0049 % CREDITS - SRI
0050 % CREDITS - caspi
0051 % MOLECULAR-WEIGHT - 600.999
0052 % MONOISOTOPIC-MW - 602.274347629
0053 % NON-STANDARD-INCHI - InChI=1S/C35H39N4O4.Mg/c1-7-9-21-16(3)24-14-29-32(19(6)40)18(5)26(37-29)13-25-17(4)22(10-11-31(42)43)34(38-25)23-12-30(41)33-20(8-2)27(39-35(23)33)15-28(21)36-24;/h12-15,17,19,22,40H,7-11H2,1-6H3,(H3,36,37,38,39,41,42,43);/q-1;+2/p-3/t17-,19?,22-;/m0./s1
0054 % SMILES - CCCC5(=C(C)C9(N6([Mg]27(N1(C(C(C)C(CCC(=O)[O-])C=1C4([C-]C(=O)C3(=C(CC)C(N2C3=4)=CC5=6)))=CC8(=C(C)C(C(O)C)=C(N78)C=9))))))
0055 
0056 % A line that contains only '//' separates each object.
0057 
0058 % Check if the metabolites have been parsed before and saved. If so, load
0059 % the model.
0060 if nargin<1
0061     ravenPath=findRAVENroot();
0062     metacycPath=fullfile(ravenPath,'external','metacyc');
0063 else
0064     metacycPath=char(metacycPath);
0065 end
0066 
0067 metsFile=fullfile(metacycPath,'metaCycMets.mat');
0068 metaCycMetFile='compounds.dat';
0069 
0070 if exist(metsFile, 'file')
0071     fprintf(['Importing MetaCyc metabolites from ' strrep(metsFile,'\','/') '... ']);
0072     load(metsFile);
0073     fprintf('done\n');
0074 else
0075     fprintf(['Cannot locate ' strrep(metsFile,'\','/') '\nNow try to generate it from local MetaCyc data files...\n']);
0076     if ~isfile(fullfile(metacycPath,metaCycMetFile))
0077         EM=fprintf(['The file of metabolites cannot be located, and should be downloaded from MetaCyc.\n']);
0078         dispEM(EM);
0079     else
0080         %Add new functionality in the order specified in models
0081         metaCycMets.id='MetaCyc';
0082         metaCycMets.name='Automatically generated from MetaCyc database';
0083         
0084         %Preallocate memory for 50000 metabolites
0085         metaCycMets.mets=cell(50000,1);
0086         metaCycMets.metNames=cell(50000,1);
0087         metaCycMets.metFormulas=cell(50000,1);
0088         metaCycMets.inchis=cell(50000,1);
0089         metaCycMets.metCharges=zeros(50000,1);
0090         metaCycMets.metMiriams=cell(50000,1);
0091         metaCycMets.keggid=cell(50000,1);
0092         
0093         %First load information on metabolite ID, name, formula, and others
0094         fid = fopen(fullfile(metacycPath,metaCycMetFile), 'r');
0095         
0096         %Keeps track of how many metabolites that have been added
0097         metCounter=0;
0098         
0099         %Loop through the file
0100         while 1
0101             %Get the next line
0102             tline = fgetl(fid);
0103             %disp(tline);
0104             
0105             % Abort at end of file
0106             if ~ischar(tline)
0107                 break;
0108             end
0109             
0110             % Get the version of MetaCyc database
0111             if numel(tline)>11 && strcmp(tline(1:11),'# Version: ')
0112                 metaCycMets.version=tline(12:end);
0113             end
0114 
0115             %Check if it is a new entry
0116             if numel(tline)>12 && strcmp(tline(1:12),'UNIQUE-ID - ')
0117                 metCounter=metCounter+1;
0118                 
0119                 %Add empty strings as initial values
0120                 metaCycMets.metNames{metCounter}='';
0121                 metaCycMets.metFormulas{metCounter}='';
0122                 metaCycMets.inchis{metCounter}='';
0123                 %metaCycMets.smiles{metCounter}='';
0124                 %metaCycMets.pubchem{metCounter}='';
0125                 metaCycMets.keggid{metCounter}='';
0126                 nonStandardInchis = '';
0127                 
0128                 %Add compound ID
0129                 metaCycMets.mets{metCounter}=tline(13:end);
0130             end
0131             
0132             
0133             %Add name
0134             if numel(tline)>14 &&    strcmp(tline(1:14),'COMMON-NAME - ')
0135                 metaCycMets.metNames{metCounter}=tline(15:end);
0136                 
0137                 %Romve HTML symbols
0138                 metaCycMets.metNames{metCounter}=regexprep(metaCycMets.metNames{metCounter},'<(\w+)>','');
0139                 metaCycMets.metNames{metCounter}=regexprep(metaCycMets.metNames{metCounter},'</(\w+)>','');
0140                 metaCycMets.metNames{metCounter}=regexprep(metaCycMets.metNames{metCounter},'[&;]','');
0141             end
0142             
0143             %Add charge
0144             if numel(tline)>16 &&    strcmp(tline(1:16),'ATOM-CHARGES - (')
0145                 atomCharge=tline(17:end-1);
0146                 
0147                 s=strfind(atomCharge,' ');
0148                 if any(s)
0149                     atomCharge=atomCharge(s+1:end);
0150                     metaCycMets.metCharges(metCounter,1)=metaCycMets.metCharges(metCounter,1)+str2num(atomCharge);
0151                 end
0152             end
0153             
0154             %Add inchis
0155             if numel(tline)>14 && strcmp(tline(1:14),'INCHI - InChI=')
0156                 metaCycMets.inchis{metCounter}=tline(15:end);
0157             end
0158             
0159             %Add non-standard inchis
0160             if numel(tline)>27 && strcmp(tline(1:27),'NON-STANDARD-INCHI - InChI=')
0161                 nonStandardInchis=tline(28:end);
0162             end
0163             
0164             %Add SMILES
0165             if numel(tline)>9 && strcmp(tline(1:9),'SMILES - ')
0166                 
0167                 if isstruct(metaCycMets.metMiriams{metCounter})
0168                     addToIndex=numel(metaCycMets.metMiriams{metCounter}.name)+1;
0169                 else
0170                     addToIndex=1;
0171                 end
0172                 tempStruct=metaCycMets.metMiriams{metCounter};
0173                 tempStruct.name{addToIndex,1}='SMILES';
0174                 tempStruct.value{addToIndex,1}=tline(10:end);
0175                 metaCycMets.metMiriams{metCounter}=tempStruct;
0176             end
0177             
0178             %Add formula
0179             if numel(tline)>20 && strcmp(tline(1:20),'CHEMICAL-FORMULA - (')
0180                 metaCycMets.metFormulas{metCounter}=strcat(metaCycMets.metFormulas{metCounter},tline(21:end-1));
0181                 metaCycMets.metFormulas{metCounter}(isspace(metaCycMets.metFormulas{metCounter})) = [];
0182             end
0183             
0184             %Add KEGG id
0185             if numel(tline)>23 && strcmp(tline(1:23),'DBLINKS - (LIGAND-CPD "')
0186                 keggid=tline(24:end);
0187                 
0188                 s=strfind(keggid,'"');
0189                 if any(s)
0190                     keggid=keggid(1:s-1);
0191                 end
0192                 
0193                 metaCycMets.keggid{metCounter}=keggid;
0194             end
0195             
0196             %Add CHEBI id
0197             if numel(tline)>18 && strcmp(tline(1:18),'DBLINKS - (CHEBI "')
0198                 chebiID=tline(20:end); %This is because there is sometimes more than one CHEBI index
0199                 
0200                 s=strfind(chebiID,'"');
0201                 if any(s)
0202                     chebiID=chebiID(1:s-1);
0203                 end
0204                 
0205                 if isstruct(metaCycMets.metMiriams{metCounter})
0206                     addToIndex=numel(metaCycMets.metMiriams{metCounter}.name)+1;
0207                 else
0208                     addToIndex=1;
0209                 end
0210                 tempStruct=metaCycMets.metMiriams{metCounter};
0211                 tempStruct.name{addToIndex,1}='chebi';
0212                 tempStruct.value{addToIndex,1}=strcat('CHEBI:',chebiID);
0213                 metaCycMets.metMiriams{metCounter}=tempStruct;
0214             end
0215             
0216             %Add PubChem
0217             if numel(tline)>20 && strcmp(tline(1:20),'DBLINKS - (PUBCHEM "')
0218                 pubchemID=tline(21:end);
0219                 
0220                 s=strfind(pubchemID,'"');
0221                 if any(s)
0222                     pubchemID=pubchemID(1:s-1);
0223                 end
0224                 
0225                 if isstruct(metaCycMets.metMiriams{metCounter})
0226                     addToIndex=numel(metaCycMets.metMiriams{metCounter}.name)+1;
0227                 else
0228                     addToIndex=1;
0229                 end
0230                 tempStruct=metaCycMets.metMiriams{metCounter};
0231                 tempStruct.name{addToIndex,1}='pubchem.compound';
0232                 tempStruct.value{addToIndex,1}=pubchemID;
0233                 metaCycMets.metMiriams{metCounter}=tempStruct;
0234             end
0235             
0236             %Add non-standard inchis when standard one is unavailable
0237             if strcmp(tline,'//') && strcmp(metaCycMets.inchis{metCounter},'')
0238                 metaCycMets.inchis{metCounter}=nonStandardInchis;
0239                 nonStandardInchis = '';
0240                 
0241                 %Refine formula from inchis
0242                 s=strfind(metaCycMets.inchis{metCounter},'/');
0243                 if any(s)
0244                     inchiFormula=metaCycMets.inchis{metCounter}(s(1)+1:s(2)-1);
0245                     
0246                     %And remove dot characters
0247                     inchiFormula(regexp(inchiFormula,'[.]'))=[];
0248                     if ~strcmp(metaCycMets.metFormulas{metCounter},inchiFormula)
0249                         metaCycMets.metFormulas{metCounter}=inchiFormula;
0250                     end
0251                 end
0252                 
0253             end
0254             
0255         end
0256         
0257         %Close the file
0258         fclose(fid);
0259         
0260         %If too much space was allocated, shrink the model
0261         metaCycMets.mets=metaCycMets.mets(1:metCounter);
0262         metaCycMets.metNames=metaCycMets.metNames(1:metCounter);
0263         metaCycMets.metFormulas=metaCycMets.metFormulas(1:metCounter);
0264         metaCycMets.metMiriams=metaCycMets.metMiriams(1:metCounter);
0265         metaCycMets.inchis=metaCycMets.inchis(1:metCounter);
0266         metaCycMets.metCharges=metaCycMets.metCharges(1:metCounter,:);
0267         %metaCycMets.smiles=metaCycMets.smiles(1:metCounter);
0268         %metaCycMets.pubchem=metaCycMets.pubchem(1:metCounter);
0269         metaCycMets.keggid=metaCycMets.keggid(1:metCounter);
0270         
0271         %If the metMiriams structure is empty, use MetaCyc id as metMiriams
0272         for i=1:numel(metaCycMets.mets)
0273             if ~isstruct(metaCycMets.metMiriams{i})
0274                 miriamStruct.name{1}='metacyc.compound';
0275                 miriamStruct.value{1}=metaCycMets.mets{i};
0276                 metaCycMets.metMiriams{i}=miriamStruct;
0277             end
0278         end
0279         
0280         %Saves the model
0281         save(metsFile,'metaCycMets');
0282         fprintf(['New metaCycMets.mat has been successfully updated!\n\n']);
0283     end
0284 end
0285 end

Generated by m2html © 2005