Home > tutorial > tutorial6.m

tutorial6

PURPOSE ^

tutorial6

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

 tutorial6
   This exercise demonstrates how to reconstruct a combined draft GEM by
   from KEGG and MetaCyc pathway databases. A combined model with the
   comprehensive coverage of metabolic pathways is generated from
   different de novo reconstruction approaches. The input is a FASTA
   format file with whole-proteome sequences. The combined model is
   subsequently used for refinement of existing high-quality model and
   generation of a new version of GEM, by utilizing the manual curation
   results. This tutorial is a showcase of the new features released in
   RAVEN 2.0 through demonstrating the utilization of the newly developed
   functions on GEM reconstruction and curation for Streptomyces
   coelicolor strain A3(2). Users may apply this script as the template in
   their own work for other organisms.
   This refers to Tutorial 6 from "RAVEN tutorials.docx"

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 % tutorial6
0002 %   This exercise demonstrates how to reconstruct a combined draft GEM by
0003 %   from KEGG and MetaCyc pathway databases. A combined model with the
0004 %   comprehensive coverage of metabolic pathways is generated from
0005 %   different de novo reconstruction approaches. The input is a FASTA
0006 %   format file with whole-proteome sequences. The combined model is
0007 %   subsequently used for refinement of existing high-quality model and
0008 %   generation of a new version of GEM, by utilizing the manual curation
0009 %   results. This tutorial is a showcase of the new features released in
0010 %   RAVEN 2.0 through demonstrating the utilization of the newly developed
0011 %   functions on GEM reconstruction and curation for Streptomyces
0012 %   coelicolor strain A3(2). Users may apply this script as the template in
0013 %   their own work for other organisms.
0014 %   This refers to Tutorial 6 from "RAVEN tutorials.docx"
0015 
0016 %Before reconstruction, a FASTA file with protein sequences of the target
0017 %organism needs to be prepared. In this tutorial, all protein sequences of
0018 %S. coelicolor A3(2) were downloaded from the NCBI genome database and
0019 %provided in current folder (Sco_all_protein.faa). The description lines
0020 %(starting with ">" character) of this FASTA file were modified by keeping
0021 %only the locus tag (e.g. SCO6005), which is commonly used as the gene
0022 %names in GEMs.
0023 
0024 %Generate a draft GEM from MetaCyc database. The first parameter is
0025 %organismID and needs to be specified by user. The other parameters are set
0026 %to exclude unbalanced and undetermined reactions, but keep transport
0027 %reactions. The two parameters for homology search are set to the default
0028 %values that have been optimized to capture the protein hits with both
0029 %comprehensive coverage and the least false positives.
0030 ScoMetaCycDraftModel=getMetaCycModelForOrganism('ScoMetaCyc','Sco_all_protein.faa',1);
0031 
0032 %Generate the two draft models from KEGG database. The first one is
0033 %reconstructed from the genome information annotated by KEGG. Since KEGG
0034 %provides genome annotations for over 5000 organisms, their GEMs can thus
0035 %be generated without providing protein sequences while only using the
0036 %organismID. The organisms with KEGG annotation and their associated ids
0037 %(e.g. 'sco' for Streptomyces coelicolor) can be found from here:
0038 %https://www.kegg.jp/kegg/catalog/org_list.html.
0039 %Generate a draft model using KEGG annotation and exclude incomplete
0040 %reactions and reactions with undefined stoichiometry
0041 ScoKEGGAnnotation=getKEGGModelForOrganism('sco','','','',1,0,0);
0042 
0043 %Generate the second KEGG-based draft model based on sequence homology to
0044 %KEGG Ortholog sequence clusters while excluding incomplete reactions and
0045 %the ones with undefined stoichiometry. Type "help getKEGGModelForOrganism"
0046 %to see the detailed instructions for the choice of different parameters.
0047 %The default values for homology search are used because they have been
0048 %optimized for the best performance.
0049 ScoKEGGHomology=getKEGGModelForOrganism('ScoKEGGHMMs','Sco_all_protein.faa','prok90_kegg105','',1,0,0);
0050 
0051 %De novo reconstruction from MetaCyc should take about 10 minutes, while
0052 %both reconstructions from KEGG may take up to 50-60 minutes in overall
0053 %depending on the computer hardware and the system used. Given that KEGG
0054 %and MetaCyc databases are formulated in different ways; KEGG relies on
0055 %sequence-based annotation, while MetaCyc collects only experimentally
0056 %verified pathways. Therefore, integration of MetaCyc- and KEGG-based draft
0057 %models could have a better coverage of the metbolism for the target
0058 %organism.
0059 
0060 %At first, the two KEGG-based models can be directly merged
0061 ScoKEGGDraftModel=mergeModels({ScoKEGGAnnotation ScoKEGGHomology});
0062 
0063 disp(numel(ScoKEGGHomology.rxns)+numel(ScoKEGGAnnotation.rxns));
0064 disp(numel(ScoKEGGDraftModel.rxns));
0065 %By checking the reaction number, it can be seen that reaction number in
0066 %the merged model equals adding up the reaction numbers in homology and
0067 %annotation KEGG draft models. And there are duplicated reactions in this
0068 %merged model.
0069 
0070 %Merge the duplicated reactions into one and combine multiple iso-enzymes
0071 %into a single grRules. The expandModel and contractModel functions are
0072 %utilised, see their instructions for details.
0073 ScoKEGGDraftModel=expandModel(ScoKEGGDraftModel);
0074 ScoKEGGDraftModel=contractModel(ScoKEGGDraftModel);
0075 
0076 %In the end, KEGG- and MetaCyc-based draft models can be further combined
0077 %into an integrated GEM. This step is achieved by the function
0078 %combineMetaCycKEGGModels, which firstly converts metabolite and reaction
0079 %identifiers in the KEGG model into corresponding MetaCyc ids, and then
0080 %detects duplications and keeps only unique reactions and metabolites that
0081 %are mostly in MetaCyc identifiers.
0082 ScoCombinedDraftModel=combineMetaCycKEGGModels(ScoMetaCycDraftModel, ScoKEGGDraftModel);
0083 
0084 %With this combined model, the existing iMK1208 model is refined by
0085 %incorporating the new pathways/reactions that are found in the combined
0086 %model but absent from the previous iMK1208 model. At the time of
0087 %publication, a total of 398 reactions in the combined draft model were
0088 %determined as new pathways based on manual curation results, which have
0089 %been organized into TableS3. Load these manually selected reactions and
0090 %their subSystems as the pre-processed array structure selectedNewRxns.
0091 load('iMK1208+suppInfo.mat','selectedNewRxns');
0092 disp(selectedNewRxns);
0093 
0094 %Generate a sub-model that includes only these new reactions. This is
0095 %implemented by subtracting the other reactions from the combined model
0096 %using function removeReactions
0097 rxnsToRemove=setdiff(ScoCombinedDraftModel.rxns,selectedNewRxns.rxns);
0098 newRxnSubModel=removeReactions(ScoCombinedDraftModel,rxnsToRemove,1,1);
0099 
0100 %Now this newRxnSubModel contains only these new reactions. It needs to be
0101 %modified before merging with the iMK1208. Since these reactions are
0102 %metabolic ones and can all be assigned to the cytoplasm compartment
0103 newRxnSubModel.comps{1}='c';
0104 newRxnSubModel.compNames{1}='Cytoplasm';
0105 
0106 %Amend to the genes and rxnGeneMat fields
0107 newRxnSubModel.genes={};
0108 for k=1:numel(newRxnSubModel.rxns)
0109    newRxnSubModel.genes=[newRxnSubModel.genes;transpose(strsplit(newRxnSubModel.grRules{k},' or '))];
0110 end
0111 newRxnSubModel.genes=unique(newRxnSubModel.genes);
0112 [~, newRxnSubModel.rxnGeneMat, ~]=standardizeGrRules(newRxnSubModel);
0113 
0114 %Since the newRxnSubModel is ready for incorportation, the iMK1208 model
0115 %can be loaded for integration
0116 load('iMK1208+suppInfo.mat','iMK1208');
0117 
0118 %It should be noted that the incompatible nomenclatures (especially the
0119 %metabolite identifiers) used in different GEMs and databases led to a
0120 %serious problem in model comparison, curation and integration. In order to
0121 %properly integrate the iMK1208 with this sub-model, both models should be
0122 %unified into the same name space, since the metabolites in the sub-model
0123 %use MetaCyc identifiers that are different from the ones used in iMK1208
0124 %model. TableS2 contains the results for database mining and intensive
0125 %manual curation while associating these metabolite identifiers. Load
0126 %these manually selected reactions and their subSystems as the
0127 %pre-processed array structure metaCycMetsIniMK.
0128 load('iMK1208+suppInfo.mat','metaCycMetsIniMK');
0129 
0130 %Replace the metabolites in the sub-model with the identifiers and names
0131 %used in iMK1208 according to the mapping information in TableS2
0132 %(metaCycMetsIniMK).
0133 [a, b]=ismember(newRxnSubModel.mets,metaCycMetsIniMK);
0134 I=find(a);
0135 newRxnSubModel.mets=strcat(newRxnSubModel.mets,'_c');
0136 newRxnSubModel.metNames(I)=iMK1208.metNames(b(I));
0137 newRxnSubModel.mets(I)=iMK1208.mets(b(I));
0138 
0139 %The necessary issues for merging iMK1208 with the new reactions determined
0140 %from de novo generated draft GEMs have been resolved. Those can be
0141 %directly merged into the Sco4 model, which refers to the 4th major update
0142 %of the GEM for Streptomyces coelicolor.
0143 Sco4=mergeModels({iMK1208 newRxnSubModel});
0144 
0145 %Check out if the newly generated Sco4 could grow
0146 sol=solveLP(Sco4);
0147 disp(sol.f);
0148 %The results indicate that this new model is functional

Generated by m2html © 2005