Description of tutorial5

0001 % tutorial5
0002 %   This exercise is about creating a model from KEGG, based on protein
0003 %   sequences in a FASTA file, and doing some functionality checks on the
0004 %   model. The example case is for the yeast Saccharomyces cerevisiae. This
0005 %   tutorial is more of a showcase than the previous four, and its main
0006 %   purpose is to serve as a scaffold to reconstruct a GEM for any
0007 %   organism.
0008 %   This refers to Tutorial 5 from "RAVEN tutorials.docx"
0009 %
0010 % Start by downloading trained Hidden Markov Models for eukaryotes. This can
0011 % be done automatically or manually from the RAVEN Wiki in its GitHub
0012 % repository. In this tutorial, the archive "euk90_kegg105" is picked for
0013 % the automatic download. See the documentation in the RAVEN Wiki for more
0014 % information regarding preparation of such archive.
0015 %
0016 % This creates a model for S. cerevisiae. The parameters are set to exclude
0017 % general or unclear reactions and reactions with undefined stoichiometry.
0018 % Type "help getKEGGModelForOrganism" to see what the different parameters
0019 % are for. This process takes up to 20-35 minutes in macOS, Unix systems and
0020 % 40-55 minutes in Windows, depending on your hardware and the size of
0021 % target organism proteome
0022 model=getKEGGModelForOrganism('sce','sce.fa','euk90_kegg105','output',false,false,false,false,10^-30,0.8,0.3,-1);
0023 
0024 % The resulting model should contain around 1589 reactions, 1600
0025 % metabolites and 836 genes. Small variations are possible since it is an
0026 % heuristic algorithm and different KEGG versions will give slightly
0027 % different results.
0028 disp(model);
0029 
0030 % A first control is that the model should not be able to produce any
0031 % metabolites without uptake of some metabolites. This commonly happens when
0032 % metabolites have a different meaning in different reactions. The best way
0033 % to find such reactions is to run makeSomething and analyze the resulting
0034 % solution for bad reactions. An automated approach is to use removeBadRxns,
0035 % which tries to do the same thing in an automated manner. Type
0036 % "help removeBadRxns" for details.
0037 [newModel, removedRxns]=removeBadRxns(model);
0038 
0039 % One can see an error about that H+ can be made even if no reactions were
0040 % unbalanced. Protons are particularly problematic since it is rather
0041 % arbitary at which pH the formulas are written for. For the purpose of this
0042 % analysis, the protons can be ignored and fixed later.
0043 [newModel, removedRxns]=removeBadRxns(model,1,{'H+'},true);
0044 
0045 % Only one reaction was removed because it enabled the model to produce
0046 % something from nothing. Since it is only one reaction, it might be
0047 % worthwhile to look into this in more detail.
0048 disp(removedRxns);
0049 
0050 % According to the information in KEGG about this reaction it is a general
0051 % polymer reaction. One might want to look at the flux distributions
0052 % in more detail to try to find out if there is any better alternative to
0053 % delete. Use makeSomething to do this.
0054 [fluxes, metabolite]=makeSomething(model,{'H+'},true);
0055 model.metNames(metabolite)
0056 
0057 % The model could produce H2O using the following reactions
0058 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n')
0059 
0060 % That resulted in quite a lot of fluxes to look through. Maybe it is easier
0061 % if the elementally balanced reactions are excluded.
0062 balanceStructure=getElementalBalance(model);
0063 
0064 % The hydrogen balancing is a bit tricky, so this time it seems more
0065 % relevant to only look at the ones unbalanced for oxygen (since water was
0066 % produced)
0067 goodOnes=balanceStructure.leftComp(:,6)==balanceStructure.rightComp(:,6);
0068 printFluxes(removeReactions(model,goodOnes), fluxes(~goodOnes), false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n');
0069 
0070 % There is still a good number of reactions. Leave only the reactions which
0071 % involve amylose or starch, from one of the problematic reactions that was
0072 % identified earlier.
0073 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n',{'Amylose';'Starch'});
0074 
0075 % There are two elementally unbalanced reactions, including, the reaction
0076 % which was identified by removeBadRxns. When looking to these reactions
0077 % closer, one can notice the contradiction between the reactions. The first
0078 % one shows that amylose and starch are interconvertible, while the second
0079 % reaction shows that amylose contains one less glucose unit than starch.
0080 % This type of general reactions are problematic and should be fixed
0081 % manually. It is hereby chosen to trust removeBadRxns and delete R02110.
0082 model=removeReactions(model,'R02110');
0083 
0084 % The model can no longer make something from nothing. Can it consume
0085 % something without any output?
0086 [solution, metabolite]=consumeSomething(model,{'H+'},true);
0087 model.metNames(metabolite)
0088 
0089 % Nope, so that was good. Add some uptakes and see what it can produce.
0090 [~, J]=ismember({'D-Glucose';'H2O';'Orthophosphate';'Oxygen';'NH3';'Sulfate'},model.metNames);
0091 [model, addedRxns]=addExchangeRxns(model,'in',J);
0092 
0093 % Check which metabolites can be produced given these uptakes. The
0094 % canProduce function allows for output of all metabolites. This will not
0095 % happen in the real cell, but it is very useful for functionality testing
0096 % of the model. Once it is functional, the excretion reactions based on
0097 % evidence can be added as well.
0098 I=canProduce(model);
0099 
0100 fprintf('%d%%\n', round(sum(I)/numel(model.mets)*100));
0101 % It seems that around 31% of the metabolites could be synthesized. It is
0102 % not directly clear whether this is a high or low number, many metabolites
0103 % should not be possible to synthesize from those simple precursors.
0104 
0105 % Try to fill gaps using the full KEGG model to see if that gives a
0106 % significantly higher number
0107 keggModel=getModelFromKEGG([],false,false,false,false);
0108 
0109 % The KEGG model is associated to more than 6,400,000 genes. They will not
0110 % be used for the gapfilling, so they are removed to make this a little
0111 % faster
0112 keggModel=rmfield(keggModel,'genes');
0113 keggModel=rmfield(keggModel,'rxnGeneMat');
0114 
0115 % It is already known from the previous commands that there are some
0116 % unbalanced reactions in KEGG. Only use the balanced ones for the gap
0117 % filling
0118 balanceStructure=getElementalBalance(keggModel);
0119 keggModel=removeReactions(keggModel,balanceStructure.balanceStatus~=1,true,true);
0120 
0121 % The function fillGaps with these settings will try to include reactions in
0122 % order to have flux through all reactions in the model. There are other
0123 % settings as well. The first flag says that production of all metabolites
0124 % should be allowed.
0125 params.relGap=0.6; %Lower number for a more exhaustive search
0126 params.printReport=true;
0127 [newConnected, cannotConnect, addedRxns, newModel, exitFlag]=fillGaps(model,keggModel,true,false,false,[],params);
0128 
0129 % The results show that fillGaps could connect around 29 reactions
0130 % (newConnected) by including around 41 reactions from the KEGG model
0131 % (addedRxns). Those should of course be checked manually to confirm that
0132 % they exist in yeast, but in this tutorial it is assumed that all these
0133 % reactions indeed occur in yeast.
0134 
0135 % Continue to improve the connectivity of the model by identifying
0136 % metabolites that should be connected. A convenient way to get an overview
0137 % of how connected the model is, and at the same time getting a lot of
0138 % useful data, is to use gapReport. Note that it can take several to many
0139 % hours to run, depending on the number of gaps in the model.
0140 [noFluxRxns, noFluxRxnsRelaxed, subGraphs, notProducedMets, minToConnect,...
0141     neededForProductionMat]=gapReport(newModel);
0142 
0143 % The results show that 532 from 1634 reactions cannot carry flux. Remember
0144 % that an output is allowed for all metabolites, that is why it calculates
0145 % 532 in both cases. 543 from 1617 metabolites cannot be synthesized from
0146 % the precursors which are supplied. There are 6 subnetworks in the model,
0147 % of which 1605 from 1617 metabolites belong to the first one.
0148 
0149 % It will also print something similar to:
0150 
0151 % To enable net production of all metabolites, a total of 322 metabolites
0152 % must be connected
0153 % Top 10 metabolites to connect:
0154 %     1. Acyl-CoA[s] (connects 162 metabolites)
0155 %     2. Thiamin diphosphate[s] (connects 17 metabolites)
0156 %     3. G00003[s] (connects 14 metabolites)
0157 %     4. Androstenediol[s] (connects 13 metabolites)
0158 %     5. [Dihydrolipoyllysine-residue succinyltransferase] S-glutaryldihydrolipoyllysine[s] (connects 13 metabolites)
0159 %     6. Selenomethionyl-tRNA(Met)[s] (connects 12 metabolites)
0160 %     7. Retinyl ester[s] (connects 11 metabolites)
0161 %     8. 1-Alkyl-2-acylglycerol[s] (connects 10 metabolites)
0162 %     9. G00146[s] (connects 10 metabolites)
0163 %     10. Progesterone[s] (connects 9 metabolites)
0164 
0165 % This is a very useful way of directing the gap-filling tasks to where they
0166 % are of the greatest use. In this case it says that in order to have net
0167 % production of all metabolites in the model from the simple inputs which
0168 % have been defined (543 metabolites can currently not have net production)
0169 % one needs to connect 322 unconnected metabolites in. However, by
0170 % connecting only the top 10 in the list 271/543 (50%) of them will now be
0171 % connected. When looking at the list one can see that the top ranking one
0172 % is Acyl-CoA, which represents many different fatty acid chains. The second
0173 % metabolite, thiamine disphosphate, is an enzyme co-factor. It turns out
0174 % that yeast cannot grow on only the substrates that had been defined, but
0175 % that it requires some other precursors for co-factor synthesis as well.
0176 
0177 % Add uptake reactions for the minimal media constituents needed for yeast
0178 % to grow.
0179 [~, J]=ismember({'4-Aminobenzoate';'Riboflavin';'Thiamine';'Biotin';'Folate';'Nicotinate';'Zymosterol';'Choline'},newModel.metNames);
0180 [newModel, addedRxns]=addExchangeRxns(newModel,'in',J);
0181 
0182 % Rerun gapReport and use the output for targeting the gap-filling efforts.
0183 % Note that only some info is printed; most of it is available in the output
0184 % structures. Work like this in an iterative manner until the model is of
0185 % sufficient quality.
tutorial5

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE