Home > tutorial > tutorial5.m

tutorial5

PURPOSE ^

tutorial5

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

 tutorial5
   This exercise is about creating a model from KEGG, based on protein
   sequences in a FASTA file, and doing some functionality checks on the
   model. The example case is for the yeast Saccharomyces cerevisiae. This
   tutorial is more of a showcase than the previous four, and its main
   purpose is to serve as a scaffold to reconstruct a GEM for any
   organism.
   This refers to Tutorial 5 from "RAVEN tutorials.docx"
 
 Start by downloading trained Hidden Markov Models for eukaryotes. This can
 be done automatically or manually from the RAVEN Wiki in its GitHub
 repository. In this tutorial, the archive "euk90_kegg105" is picked for
 the automatic download. See the documentation in the RAVEN Wiki for more
 information regarding preparation of such archive.
 
 This creates a model for S. cerevisiae. The parameters are set to exclude
 general or unclear reactions and reactions with undefined stoichiometry.
 Type "help getKEGGModelForOrganism" to see what the different parameters
 are for. This process takes up to 20-35 minutes in macOS, Unix systems and
 40-55 minutes in Windows, depending on your hardware and the size of
 target organism proteome

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 % tutorial5
0002 %   This exercise is about creating a model from KEGG, based on protein
0003 %   sequences in a FASTA file, and doing some functionality checks on the
0004 %   model. The example case is for the yeast Saccharomyces cerevisiae. This
0005 %   tutorial is more of a showcase than the previous four, and its main
0006 %   purpose is to serve as a scaffold to reconstruct a GEM for any
0007 %   organism.
0008 %   This refers to Tutorial 5 from "RAVEN tutorials.docx"
0009 %
0010 % Start by downloading trained Hidden Markov Models for eukaryotes. This can
0011 % be done automatically or manually from the RAVEN Wiki in its GitHub
0012 % repository. In this tutorial, the archive "euk90_kegg105" is picked for
0013 % the automatic download. See the documentation in the RAVEN Wiki for more
0014 % information regarding preparation of such archive.
0015 %
0016 % This creates a model for S. cerevisiae. The parameters are set to exclude
0017 % general or unclear reactions and reactions with undefined stoichiometry.
0018 % Type "help getKEGGModelForOrganism" to see what the different parameters
0019 % are for. This process takes up to 20-35 minutes in macOS, Unix systems and
0020 % 40-55 minutes in Windows, depending on your hardware and the size of
0021 % target organism proteome
0022 model=getKEGGModelForOrganism('sce','sce.fa','euk90_kegg105','output',false,false,false,false,10^-30,0.8,0.3,-1);
0023 
0024 % The resulting model should contain around 1589 reactions, 1600
0025 % metabolites and 836 genes. Small variations are possible since it is an
0026 % heuristic algorithm and different KEGG versions will give slightly
0027 % different results.
0028 disp(model);
0029 
0030 % A first control is that the model should not be able to produce any
0031 % metabolites without uptake of some metabolites. This commonly happens when
0032 % metabolites have a different meaning in different reactions. The best way
0033 % to find such reactions is to run makeSomething and analyze the resulting
0034 % solution for bad reactions. An automated approach is to use removeBadRxns,
0035 % which tries to do the same thing in an automated manner. Type
0036 % "help removeBadRxns" for details.
0037 [newModel, removedRxns]=removeBadRxns(model);
0038 
0039 % One can see an error about that H+ can be made even if no reactions were
0040 % unbalanced. Protons are particularly problematic since it is rather
0041 % arbitary at which pH the formulas are written for. For the purpose of this
0042 % analysis, the protons can be ignored and fixed later.
0043 [newModel, removedRxns]=removeBadRxns(model,1,{'H+'},true);
0044 
0045 % Only one reaction was removed because it enabled the model to produce
0046 % something from nothing. Since it is only one reaction, it might be
0047 % worthwhile to look into this in more detail.
0048 disp(removedRxns);
0049 
0050 % According to the information in KEGG about this reaction it is a general
0051 % polymer reaction. One might want to look at the flux distributions
0052 % in more detail to try to find out if there is any better alternative to
0053 % delete. Use makeSomething to do this.
0054 [fluxes, metabolite]=makeSomething(model,{'H+'},true);
0055 model.metNames(metabolite)
0056 
0057 % The model could produce H2O using the following reactions
0058 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n')
0059 
0060 % That resulted in quite a lot of fluxes to look through. Maybe it is easier
0061 % if the elementally balanced reactions are excluded.
0062 balanceStructure=getElementalBalance(model);
0063 
0064 % The hydrogen balancing is a bit tricky, so this time it seems more
0065 % relevant to only look at the ones unbalanced for oxygen (since water was
0066 % produced)
0067 goodOnes=balanceStructure.leftComp(:,6)==balanceStructure.rightComp(:,6);
0068 printFluxes(removeReactions(model,goodOnes), fluxes(~goodOnes), false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n');
0069 
0070 % There is still a good number of reactions. Leave only the reactions which
0071 % involve amylose or starch, from one of the problematic reactions that was
0072 % identified earlier.
0073 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n',{'Amylose';'Starch'});
0074 
0075 % There are two elementally unbalanced reactions, including, the reaction
0076 % which was identified by removeBadRxns. When looking to these reactions
0077 % closer, one can notice the contradiction between the reactions. The first
0078 % one shows that amylose and starch are interconvertible, while the second
0079 % reaction shows that amylose contains one less glucose unit than starch.
0080 % This type of general reactions are problematic and should be fixed
0081 % manually. It is hereby chosen to trust removeBadRxns and delete R02110.
0082 model=removeReactions(model,'R02110');
0083 
0084 % The model can no longer make something from nothing. Can it consume
0085 % something without any output?
0086 [solution, metabolite]=consumeSomething(model,{'H+'},true);
0087 model.metNames(metabolite)
0088 
0089 % Nope, so that was good. Add some uptakes and see what it can produce.
0090 [~, J]=ismember({'D-Glucose';'H2O';'Orthophosphate';'Oxygen';'NH3';'Sulfate'},model.metNames);
0091 [model, addedRxns]=addExchangeRxns(model,'in',J);
0092 
0093 % Check which metabolites can be produced given these uptakes. The
0094 % canProduce function allows for output of all metabolites. This will not
0095 % happen in the real cell, but it is very useful for functionality testing
0096 % of the model. Once it is functional, the excretion reactions based on
0097 % evidence can be added as well.
0098 I=canProduce(model);
0099 
0100 fprintf('%d%%\n', round(sum(I)/numel(model.mets)*100));
0101 % It seems that around 31% of the metabolites could be synthesized. It is
0102 % not directly clear whether this is a high or low number, many metabolites
0103 % should not be possible to synthesize from those simple precursors.
0104 
0105 % Try to fill gaps using the full KEGG model to see if that gives a
0106 % significantly higher number
0107 keggModel=getModelFromKEGG([],false,false,false,false);
0108 
0109 % The KEGG model is associated to more than 6,400,000 genes. They will not
0110 % be used for the gapfilling, so they are removed to make this a little
0111 % faster
0112 keggModel=rmfield(keggModel,'genes');
0113 keggModel=rmfield(keggModel,'rxnGeneMat');
0114 
0115 % It is already known from the previous commands that there are some
0116 % unbalanced reactions in KEGG. Only use the balanced ones for the gap
0117 % filling
0118 balanceStructure=getElementalBalance(keggModel);
0119 keggModel=removeReactions(keggModel,balanceStructure.balanceStatus~=1,true,true);
0120 
0121 % The function fillGaps with these settings will try to include reactions in
0122 % order to have flux through all reactions in the model. There are other
0123 % settings as well. The first flag says that production of all metabolites
0124 % should be allowed.
0125 params.relGap=0.6; %Lower number for a more exhaustive search
0126 params.printReport=true;
0127 [newConnected, cannotConnect, addedRxns, newModel, exitFlag]=fillGaps(model,keggModel,true,false,false,[],params);
0128 
0129 % The results show that fillGaps could connect around 29 reactions
0130 % (newConnected) by including around 41 reactions from the KEGG model
0131 % (addedRxns). Those should of course be checked manually to confirm that
0132 % they exist in yeast, but in this tutorial it is assumed that all these
0133 % reactions indeed occur in yeast.
0134 
0135 % Continue to improve the connectivity of the model by identifying
0136 % metabolites that should be connected. A convenient way to get an overview
0137 % of how connected the model is, and at the same time getting a lot of
0138 % useful data, is to use gapReport. Note that it can take several to many
0139 % hours to run, depending on the number of gaps in the model.
0140 [noFluxRxns, noFluxRxnsRelaxed, subGraphs, notProducedMets, minToConnect,...
0141     neededForProductionMat]=gapReport(newModel);
0142 
0143 % The results show that 532 from 1634 reactions cannot carry flux. Remember
0144 % that an output is allowed for all metabolites, that is why it calculates
0145 % 532 in both cases. 543 from 1617 metabolites cannot be synthesized from
0146 % the precursors which are supplied. There are 6 subnetworks in the model,
0147 % of which 1605 from 1617 metabolites belong to the first one.
0148 
0149 % It will also print something similar to:
0150 
0151 % To enable net production of all metabolites, a total of 322 metabolites
0152 % must be connected
0153 % Top 10 metabolites to connect:
0154 %     1. Acyl-CoA[s] (connects 162 metabolites)
0155 %     2. Thiamin diphosphate[s] (connects 17 metabolites)
0156 %     3. G00003[s] (connects 14 metabolites)
0157 %     4. Androstenediol[s] (connects 13 metabolites)
0158 %     5. [Dihydrolipoyllysine-residue succinyltransferase] S-glutaryldihydrolipoyllysine[s] (connects 13 metabolites)
0159 %     6. Selenomethionyl-tRNA(Met)[s] (connects 12 metabolites)
0160 %     7. Retinyl ester[s] (connects 11 metabolites)
0161 %     8. 1-Alkyl-2-acylglycerol[s] (connects 10 metabolites)
0162 %     9. G00146[s] (connects 10 metabolites)
0163 %     10. Progesterone[s] (connects 9 metabolites)
0164 
0165 % This is a very useful way of directing the gap-filling tasks to where they
0166 % are of the greatest use. In this case it says that in order to have net
0167 % production of all metabolites in the model from the simple inputs which
0168 % have been defined (543 metabolites can currently not have net production)
0169 % one needs to connect 322 unconnected metabolites in. However, by
0170 % connecting only the top 10 in the list 271/543 (50%) of them will now be
0171 % connected. When looking at the list one can see that the top ranking one
0172 % is Acyl-CoA, which represents many different fatty acid chains. The second
0173 % metabolite, thiamine disphosphate, is an enzyme co-factor. It turns out
0174 % that yeast cannot grow on only the substrates that had been defined, but
0175 % that it requires some other precursors for co-factor synthesis as well.
0176 
0177 % Add uptake reactions for the minimal media constituents needed for yeast
0178 % to grow.
0179 [~, J]=ismember({'4-Aminobenzoate';'Riboflavin';'Thiamine';'Biotin';'Folate';'Nicotinate';'Zymosterol';'Choline'},newModel.metNames);
0180 [newModel, addedRxns]=addExchangeRxns(newModel,'in',J);
0181 
0182 % Rerun gapReport and use the output for targeting the gap-filling efforts.
0183 % Note that only some info is printed; most of it is available in the output
0184 % structures. Work like this in an iterative manner until the model is of
0185 % sufficient quality.

Generated by m2html © 2005