Description of tutorial6

0001 % tutorial6
0002 %   This exercise demonstrates how to reconstruct a combined draft GEM by
0003 %   from KEGG and MetaCyc pathway databases. A combined model with the
0004 %   comprehensive coverage of metabolic pathways is generated from
0005 %   different de novo reconstruction approaches. The input is a FASTA
0006 %   format file with whole-proteome sequences. The combined model is
0007 %   subsequently used for refinement of existing high-quality model and
0008 %   generation of a new version of GEM, by utilizing the manual curation
0009 %   results. This tutorial is a showcase of the new features released in
0010 %   RAVEN 2.0 through demonstrating the utilization of the newly developed
0011 %   functions on GEM reconstruction and curation for Streptomyces
0012 %   coelicolor strain A3(2). Users may apply this script as the template in
0013 %   their own work for other organisms.
0014 %   This refers to Tutorial 6 from "RAVEN tutorials.docx"
0015 
0016 %Before reconstruction, a FASTA file with protein sequences of the target
0017 %organism needs to be prepared. In this tutorial, all protein sequences of
0018 %S. coelicolor A3(2) were downloaded from the NCBI genome database and
0019 %provided in current folder (Sco_all_protein.faa). The description lines
0020 %(starting with ">" character) of this FASTA file were modified by keeping
0021 %only the locus tag (e.g. SCO6005), which is commonly used as the gene
0022 %names in GEMs.
0023 
0024 %Generate a draft GEM from MetaCyc database. The first parameter is
0025 %organismID and needs to be specified by user. The other parameters are set
0026 %to exclude unbalanced and undetermined reactions, but keep transport
0027 %reactions. The two parameters for homology search are set to the default
0028 %values that have been optimized to capture the protein hits with both
0029 %comprehensive coverage and the least false positives.
0030 ScoMetaCycDraftModel=getMetaCycModelForOrganism('ScoMetaCyc','Sco_all_protein.faa',1);
0031 
0032 %Generate the two draft models from KEGG database. The first one is
0033 %reconstructed from the genome information annotated by KEGG. Since KEGG
0034 %provides genome annotations for over 5000 organisms, their GEMs can thus
0035 %be generated without providing protein sequences while only using the
0036 %organismID. The organisms with KEGG annotation and their associated ids
0037 %(e.g. 'sco' for Streptomyces coelicolor) can be found from here:
0038 %https://www.kegg.jp/kegg/catalog/org_list.html.
0039 %Generate a draft model using KEGG annotation and exclude incomplete
0040 %reactions and reactions with undefined stoichiometry
0041 ScoKEGGAnnotation=getKEGGModelForOrganism('sco','','','',1,0,0);
0042 
0043 %Generate the second KEGG-based draft model based on sequence homology to
0044 %KEGG Ortholog sequence clusters while excluding incomplete reactions and
0045 %the ones with undefined stoichiometry. Type "help getKEGGModelForOrganism"
0046 %to see the detailed instructions for the choice of different parameters.
0047 %The default values for homology search are used because they have been
0048 %optimized for the best performance.
0049 ScoKEGGHomology=getKEGGModelForOrganism('ScoKEGGHMMs','Sco_all_protein.faa','prok90_kegg105','',1,0,0);
0050 
0051 %De novo reconstruction from MetaCyc should take about 10 minutes, while
0052 %both reconstructions from KEGG may take up to 50-60 minutes in overall
0053 %depending on the computer hardware and the system used. Given that KEGG
0054 %and MetaCyc databases are formulated in different ways; KEGG relies on
0055 %sequence-based annotation, while MetaCyc collects only experimentally
0056 %verified pathways. Therefore, integration of MetaCyc- and KEGG-based draft
0057 %models could have a better coverage of the metbolism for the target
0058 %organism.
0059 
0060 %At first, the two KEGG-based models can be directly merged
0061 ScoKEGGDraftModel=mergeModels({ScoKEGGAnnotation ScoKEGGHomology});
0062 
0063 disp(numel(ScoKEGGHomology.rxns)+numel(ScoKEGGAnnotation.rxns));
0064 disp(numel(ScoKEGGDraftModel.rxns));
0065 %By checking the reaction number, it can be seen that reaction number in
0066 %the merged model equals adding up the reaction numbers in homology and
0067 %annotation KEGG draft models. And there are duplicated reactions in this
0068 %merged model.
0069 
0070 %Merge the duplicated reactions into one and combine multiple iso-enzymes
0071 %into a single grRules. The expandModel and contractModel functions are
0072 %utilised, see their instructions for details.
0073 ScoKEGGDraftModel=expandModel(ScoKEGGDraftModel);
0074 ScoKEGGDraftModel=contractModel(ScoKEGGDraftModel);
0075 
0076 %In the end, KEGG- and MetaCyc-based draft models can be further combined
0077 %into an integrated GEM. This step is achieved by the function
0078 %combineMetaCycKEGGModels, which firstly converts metabolite and reaction
0079 %identifiers in the KEGG model into corresponding MetaCyc ids, and then
0080 %detects duplications and keeps only unique reactions and metabolites that
0081 %are mostly in MetaCyc identifiers.
0082 ScoCombinedDraftModel=combineMetaCycKEGGModels(ScoMetaCycDraftModel, ScoKEGGDraftModel);
0083 
0084 %With this combined model, the existing iMK1208 model is refined by
0085 %incorporating the new pathways/reactions that are found in the combined
0086 %model but absent from the previous iMK1208 model. At the time of
0087 %publication, a total of 398 reactions in the combined draft model were
0088 %determined as new pathways based on manual curation results, which have
0089 %been organized into TableS3. Load these manually selected reactions and
0090 %their subSystems as the pre-processed array structure selectedNewRxns.
0091 load('iMK1208+suppInfo.mat','selectedNewRxns');
0092 disp(selectedNewRxns);
0093 
0094 %Generate a sub-model that includes only these new reactions. This is
0095 %implemented by subtracting the other reactions from the combined model
0096 %using function removeReactions
0097 rxnsToRemove=setdiff(ScoCombinedDraftModel.rxns,selectedNewRxns.rxns);
0098 newRxnSubModel=removeReactions(ScoCombinedDraftModel,rxnsToRemove,1,1);
0099 
0100 %Now this newRxnSubModel contains only these new reactions. It needs to be
0101 %modified before merging with the iMK1208. Since these reactions are
0102 %metabolic ones and can all be assigned to the cytoplasm compartment
0103 newRxnSubModel.comps{1}='c';
0104 newRxnSubModel.compNames{1}='Cytoplasm';
0105 
0106 %Amend to the genes and rxnGeneMat fields
0107 newRxnSubModel.genes={};
0108 for k=1:numel(newRxnSubModel.rxns)
0109    newRxnSubModel.genes=[newRxnSubModel.genes;transpose(strsplit(newRxnSubModel.grRules{k},' or '))];
0110 end
0111 newRxnSubModel.genes=unique(newRxnSubModel.genes);
0112 [~, newRxnSubModel.rxnGeneMat, ~]=standardizeGrRules(newRxnSubModel);
0113 
0114 %Since the newRxnSubModel is ready for incorportation, the iMK1208 model
0115 %can be loaded for integration
0116 load('iMK1208+suppInfo.mat','iMK1208');
0117 
0118 %It should be noted that the incompatible nomenclatures (especially the
0119 %metabolite identifiers) used in different GEMs and databases led to a
0120 %serious problem in model comparison, curation and integration. In order to
0121 %properly integrate the iMK1208 with this sub-model, both models should be
0122 %unified into the same name space, since the metabolites in the sub-model
0123 %use MetaCyc identifiers that are different from the ones used in iMK1208
0124 %model. TableS2 contains the results for database mining and intensive
0125 %manual curation while associating these metabolite identifiers. Load
0126 %these manually selected reactions and their subSystems as the
0127 %pre-processed array structure metaCycMetsIniMK.
0128 load('iMK1208+suppInfo.mat','metaCycMetsIniMK');
0129 
0130 %Replace the metabolites in the sub-model with the identifiers and names
0131 %used in iMK1208 according to the mapping information in TableS2
0132 %(metaCycMetsIniMK).
0133 [a, b]=ismember(newRxnSubModel.mets,metaCycMetsIniMK);
0134 I=find(a);
0135 newRxnSubModel.mets=strcat(newRxnSubModel.mets,'_c');
0136 newRxnSubModel.metNames(I)=iMK1208.metNames(b(I));
0137 newRxnSubModel.mets(I)=iMK1208.mets(b(I));
0138 
0139 %The necessary issues for merging iMK1208 with the new reactions determined
0140 %from de novo generated draft GEMs have been resolved. Those can be
0141 %directly merged into the Sco4 model, which refers to the 4th major update
0142 %of the GEM for Streptomyces coelicolor.
0143 Sco4=mergeModels({iMK1208 newRxnSubModel});
0144 
0145 %Check out if the newly generated Sco4 could grow
0146 sol=solveLP(Sco4);
0147 disp(sol.f);
0148 %The results indicate that this new model is functional
tutorial6

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE