Home > external > metacyc > getEnzymesFromMetaCyc.m

getEnzymesFromMetaCyc

PURPOSE ^

getEnzymesFromMetaCyc

SYNOPSIS ^

function metaCycEnzymes=getEnzymesFromMetaCyc(metacycPath)

DESCRIPTION ^

 getEnzymesFromMetaCyc
    Retrieves all enzymes and reaction-enzyme association information in MetaCyc

   Input:
    metacycPath  if metaCycEnzymes.mat is not in the RAVEN\external\metacyc
                directory, this function will attempt to build it by reading
                info from a local dump of MetaCyc database, and metacycPath
                is the path to the MetaCyc data files

   Output:
    model        a model structure generated from the database. The following
                fields are filled
                id:             'MetaCyc'
                name:    'Automatically generated from MetaCyc database'
                rxns:           Reaction id
                rxnNames:       Reaction name
                enzymes:        Enzyme id
                enzrxns:        Enzymatic-reaction id
                cplxs:          Enzyme complexes
                cplxComp:       Subunit components of enzyme complexes
                rxnEnzymeMat    A binary matrix that indicates association of a
                                specific enzyme to the reactions it catalyze
                version:        MetaCyc database version

    If the file metaCycEnzymes.mat is in the RAVEN\external\metacyc directory
    it will be loaded. Otherwise, it will be generated by parsing the MetaCyc
    database files. In general, this metaCycEnzymes.mat file should be removed
    and rebuilt when a newer version of MetaCyc is released.

    Usage: model=getEnzymesFromMetaCyc(metacycPath)

NOTE: This is how one entry looks in the files

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function metaCycEnzymes=getEnzymesFromMetaCyc(metacycPath)
0002 % getEnzymesFromMetaCyc
0003 %    Retrieves all enzymes and reaction-enzyme association information in MetaCyc
0004 %
0005 %   Input:
0006 %    metacycPath  if metaCycEnzymes.mat is not in the RAVEN\external\metacyc
0007 %                directory, this function will attempt to build it by reading
0008 %                info from a local dump of MetaCyc database, and metacycPath
0009 %                is the path to the MetaCyc data files
0010 %
0011 %   Output:
0012 %    model        a model structure generated from the database. The following
0013 %                fields are filled
0014 %                id:             'MetaCyc'
0015 %                name:    'Automatically generated from MetaCyc database'
0016 %                rxns:           Reaction id
0017 %                rxnNames:       Reaction name
0018 %                enzymes:        Enzyme id
0019 %                enzrxns:        Enzymatic-reaction id
0020 %                cplxs:          Enzyme complexes
0021 %                cplxComp:       Subunit components of enzyme complexes
0022 %                rxnEnzymeMat    A binary matrix that indicates association of a
0023 %                                specific enzyme to the reactions it catalyze
0024 %                version:        MetaCyc database version
0025 %
0026 %    If the file metaCycEnzymes.mat is in the RAVEN\external\metacyc directory
0027 %    it will be loaded. Otherwise, it will be generated by parsing the MetaCyc
0028 %    database files. In general, this metaCycEnzymes.mat file should be removed
0029 %    and rebuilt when a newer version of MetaCyc is released.
0030 %
0031 %    Usage: model=getEnzymesFromMetaCyc(metacycPath)
0032 %
0033 %NOTE: This is how one entry looks in the files
0034 
0035 %proteins.dat
0036 % UNIQUE-ID - MONOMER-8570
0037 % TYPES - Polypeptides
0038 % COMMON-NAME - D-carnitine dehydrogenase subunit
0039 % COMPONENT-OF - CPLX-5143
0040 % MOLECULAR-WEIGHT-EXP - 23
0041 % MOLECULAR-WEIGHT-KD - 23
0042 % SPECIES - ORG-6256
0043 % //
0044 
0045 %enzrxns.dat
0046 % UNIQUE-ID - ENZRXN-14974
0047 % TYPES - Enzymatic-Reactions
0048 % COMMON-NAME - (1S,4R)-iso-dihydrocarvone monooxygenase
0049 % BASIS-FOR-ASSIGNMENT - :MANUAL
0050 % CITATIONS - 10769172:EV-EXP-IDA-PURIFIED-PROTEIN:3419689584:caspi
0051 % COFACTORS - FAD
0052 % ENZYME - MONOMER-13964
0053 % PHYSIOLOGICALLY-RELEVANT? - T
0054 % REACTION - RXN-9435
0055 % REGULATED-BY - REG-12241
0056 % REGULATED-BY - REG-12240
0057 % TEMPERATURE-OPT - 35
0058 
0059 % A line that contains only '//' separates each object.
0060 
0061 if nargin<1
0062     ravenPath=findRAVENroot();
0063     metacycPath=fullfile(ravenPath,'external','metacyc');
0064 else
0065     metacycPath=char(metacycPath);
0066 end
0067 
0068 %Check if the enzymatic proteins have been parsed before and saved. If so,
0069 %load the model.
0070 enzymesFile=fullfile(metacycPath,'metaCycEnzymes.mat');
0071 metaCycProteinFile='proteins.dat';
0072 metaCycEnzrxnsFile='enzrxns.dat';
0073 
0074 try
0075     (['Importing MetaCyc enzymes and reaction-enzyme association from ' strrep(enzymesFile,'\','/') '... ']);
0076     load(enzymesFile);
0077     fprintf('done\n');
0078 catch
0079     fprintf(['Cannot locate ' strrep(enzymesFile,'\','/') '\nNow try to generate it from local MetaCyc data files...\n']);
0080     if ~isfile(fullfile(metacycPath,metaCycProteinFile)) || ~isfile(fullfile(metacycPath,metaCycEnzrxnsFile))
0081         EM=fprintf(['The files of enzymes or proteins cannot be located, and should be downloaded from MetaCyc.\n']);
0082         dispEM(EM);
0083     else
0084         metaCycEnzymes.id='MetaCyc';
0085         metaCycEnzymes.name='Automatically generated from MetaCyc database';
0086         
0087         %Reserve space for 10000 enzyme complexs
0088         metaCycEnzymes.cplxs=cell(10000,1);
0089         metaCycEnzymes.cplxComp=cell(10000,1);
0090         metaCycEnzymes.enzymes=cell(50000,1);
0091         
0092         %Get the information of protein complexes and their components
0093         nCplx=0;
0094         enzymeCounter=0;
0095         
0096         fid = fopen(fullfile(metacycPath,metaCycProteinFile), 'r');
0097         %Loop through the file
0098         while 1
0099             tline = fgetl(fid);
0100             
0101             %Abort at end of file
0102             if ~ischar(tline)
0103                 break;
0104             end
0105             
0106             %Add Enzyme id
0107             if numel(tline)>12 && strcmp(tline(1:12),'UNIQUE-ID - ')
0108                 enzymeCounter=enzymeCounter+1;
0109                 addMe=false;
0110                 enzymeID=tline(13:end);
0111                 metaCycEnzymes.enzymes{enzymeCounter}=enzymeID;
0112             end
0113             
0114             %Check if enzyme complexes
0115             if strcmp(tline(1:end),'TYPES - Protein-Complexes')
0116                 nCplx=nCplx+1;
0117                 nComp=0;
0118                 
0119                 %Reserve 100 subunits for each enzyme complex
0120                 Comp.subunit=cell(100,1);
0121                 metaCycEnzymes.cplxs{nCplx}=enzymeID;
0122                 
0123                 addMe=true;
0124             end
0125             
0126             if numel(tline)>13 && strcmp(tline(1:13),'COMPONENTS - ')
0127                 if addMe
0128                     nComp=nComp+1;
0129                     Comp.subunit{nComp}=tline(14:end);
0130                 end
0131             end
0132             
0133             if strcmp(tline(1:end),'//')
0134                 if addMe
0135                     Comp.subunit=Comp.subunit(1:nComp);
0136                     metaCycEnzymes.cplxComp{nCplx}=Comp;
0137                     addMe=false;
0138                 end
0139             end
0140         end
0141         %Close the file
0142         fclose(fid);
0143         
0144         %If too much space was allocated, shrink the model
0145         metaCycEnzymes.cplxs=metaCycEnzymes.cplxs(1:nCplx);
0146         metaCycEnzymes.cplxComp=metaCycEnzymes.cplxComp(1:nCplx);
0147         metaCycEnzymes.enzymes=metaCycEnzymes.enzymes(1:enzymeCounter);
0148         
0149         % Iteratively go through the components of each complex
0150         for i=1:numel(metaCycEnzymes.cplxComp)
0151             
0152             %replace all complex-type components with their subunits
0153             checkCplx=true;
0154             while checkCplx
0155                 x=0;
0156                 mat=[]; %Matrix for component of protein-complexs
0157                 for j=1:numel(metaCycEnzymes.cplxComp{i}.subunit)
0158                     [a, b]=ismember(metaCycEnzymes.cplxComp{i}.subunit{j},metaCycEnzymes.cplxs);
0159                     if a
0160                         x=x+1;   %record this value j, and the cplx id b
0161                         mat(x,:)=[j b];
0162                     end
0163                 end
0164                 
0165                 if isempty(mat)
0166                     checkCplx=false; % No complexs found among components
0167                 else
0168                     %go through matrix JB
0169                     for k=1:x
0170                         %disp(metaCycEnzymes.cplxComp{i}.subunit{mat(k,1)});
0171                         metaCycEnzymes.cplxComp{i}.subunit(mat(k,1))=[];
0172                         metaCycEnzymes.cplxComp{i}.subunit=[metaCycEnzymes.cplxComp{i}.subunit; metaCycEnzymes.cplxComp{mat(k,2)}.subunit];
0173                     end
0174                 end
0175                 
0176             end
0177             
0178             % make sure the subunits are all included in the enzyme list
0179             % since in one case subunit was not found in enzyme dump file
0180             [a, b] = ismember(metaCycEnzymes.cplxComp{i}.subunit,metaCycEnzymes.enzymes);
0181             if ~all(a)
0182                 metaCycEnzymes.cplxComp{i}.subunit = metaCycEnzymes.enzymes(b(find(a)));
0183             end
0184         end
0185         
0186         %Preallocate space for 500000 enzymatic reactions
0187         metaCycEnzymes.enzrxns=cell(50000,1);
0188         metaCycEnzymes.rxns=cell(50000,1);
0189         metaCycEnzymes.rxnNames=cell(50000,1);
0190         metaCycEnzymes.commoname=cell(50000,1);
0191         metaCycEnzymes.rxnEnzymeMat=sparse(50000,enzymeCounter); % row: rxn, column: enzymes
0192         
0193         %Load enzyme and reaction association information
0194         fid = fopen(fullfile(metacycPath,metaCycEnzrxnsFile), 'r');
0195         
0196         %Keeps track of how many enzymes and reactions have been added
0197         enzrxnCounter=0;
0198         nRxn=0;
0199         
0200         %These contain the mapping between Reactions and Enzymes Loop
0201         %through the file
0202         while 1
0203             tline = fgetl(fid);
0204             
0205             %Abort at end of file
0206             if ~ischar(tline)
0207                 break;
0208             end
0209 
0210             % Get the version of MetaCyc database
0211             if numel(tline)>11 && strcmp(tline(1:11),'# Version: ')
0212                 metaCycEnzymes.version=tline(12:end);
0213             end
0214 
0215             %Check if it is a new enzymatic reaction
0216             if numel(tline)>12 && strcmp(tline(1:12),'UNIQUE-ID - ')
0217                 enzrxnCounter=enzrxnCounter+1;
0218                 metaCycEnzymes.enzrxns{enzrxnCounter}=tline(13:end);
0219                 metaCycEnzymes.commoname{enzrxnCounter}='';
0220             end
0221             
0222             %Add common name of enzymatic reactions
0223             if numel(tline)>14 && strcmp(tline(1:14),'COMMON-NAME - ')
0224                 metaCycEnzymes.commoname{enzrxnCounter}=tline(15:end);
0225                 
0226                 %Remove HTML symobls
0227                 metaCycEnzymes.commoname{enzrxnCounter}=regexprep(metaCycEnzymes.commoname{enzrxnCounter},'<(\w+)>','');
0228                 metaCycEnzymes.commoname{enzrxnCounter}=regexprep(metaCycEnzymes.commoname{enzrxnCounter},'</(\w+)>','');
0229                 metaCycEnzymes.commoname{enzrxnCounter}=regexprep(metaCycEnzymes.commoname{enzrxnCounter},'[&;]','');
0230             end
0231             
0232             % I add one cell array rxns for checking existing ones by
0233             % ismember because I cannot manage field rxns in this function
0234             
0235             %Add enzyme name
0236             if numel(tline)>9 && strcmp(tline(1:9),'ENZYME - ')
0237                 
0238                 %Save the index in enzymes field to nEnzyme
0239                 [x, nEnzyme]=ismember(tline(10:end),metaCycEnzymes.enzymes);
0240                 if ~x
0241                     %disp(tline(10:end));
0242                 end
0243             end
0244             
0245             %Add reaction id, and rxnNames by concatenating unique common
0246             %names
0247             if numel(tline)>11 && strcmp(tline(1:11),'REACTION - ')
0248                 nRxn=nRxn+1;
0249                 rxns{nRxn}='';
0250                 rxnID=tline(12:end);
0251                 [c, d]=ismember(rxnID,rxns);
0252                 if c
0253                     nRxn=nRxn-1;
0254                     
0255                     %Check if this common name has been included by
0256                     %rxnNames
0257                     k=strfind(metaCycEnzymes.rxnNames{d},metaCycEnzymes.commoname{enzrxnCounter});
0258                     if isempty(k)
0259                         metaCycEnzymes.rxnNames{d}=strcat(metaCycEnzymes.rxnNames{d},';',metaCycEnzymes.commoname{enzrxnCounter});
0260                     end
0261                     metaCycEnzymes.rxnEnzymeMat(d,nEnzyme)=1;
0262                 else
0263                     metaCycEnzymes.rxns{nRxn}=rxnID;
0264                     metaCycEnzymes.rxnNames{nRxn}=metaCycEnzymes.commoname{enzrxnCounter};
0265                     rxns{nRxn}=rxnID;
0266                     metaCycEnzymes.rxnEnzymeMat(nRxn,nEnzyme)=1;
0267                 end
0268             end
0269             
0270         end
0271         
0272         %Close the file
0273         fclose(fid);
0274         
0275         %Shrink the sizes
0276         metaCycEnzymes.enzrxns=metaCycEnzymes.enzrxns(1:enzrxnCounter);
0277         metaCycEnzymes.commoname=metaCycEnzymes.commoname(1:enzrxnCounter);
0278         metaCycEnzymes.rxns=metaCycEnzymes.rxns(1:nRxn);
0279         metaCycEnzymes.rxnNames=metaCycEnzymes.rxnNames(1:nRxn);
0280         metaCycEnzymes.rxnEnzymeMat=metaCycEnzymes.rxnEnzymeMat(1:nRxn,:);
0281         
0282         %Save the model structure
0283         save(enzymesFile,'metaCycEnzymes');
0284         fprintf(['New metaCycEnzymes.mat has been successfully updated!\n\n']);
0285     end
0286     
0287 end
0288 end

Generated by m2html © 2005