tutorial5 This exercise is about creating a model from KEGG, based on protein sequences in a FASTA file, and doing some functionality checks on the model. The example case is for the yeast Saccharomyces cerevisiae. This tutorial is more of a showcase than the previous four, and its main purpose is to serve as a scaffold to reconstruct a GEM for any organism. This refers to Tutorial 5 from "RAVEN tutorials.docx" Start by downloading trained Hidden Markov Models for eukaryotes. This can be done automatically or manually from the RAVEN Wiki in its GitHub repository. In this tutorial, the archive "euk90_kegg105" is picked for the automatic download. See the documentation in the RAVEN Wiki for more information regarding preparation of such archive. This creates a model for S. cerevisiae. The parameters are set to exclude general or unclear reactions and reactions with undefined stoichiometry. Type "help getKEGGModelForOrganism" to see what the different parameters are for. This process takes up to 20-35 minutes in macOS, Unix systems and 40-55 minutes in Windows, depending on your hardware and the size of target organism proteome
0001 % tutorial5 0002 % This exercise is about creating a model from KEGG, based on protein 0003 % sequences in a FASTA file, and doing some functionality checks on the 0004 % model. The example case is for the yeast Saccharomyces cerevisiae. This 0005 % tutorial is more of a showcase than the previous four, and its main 0006 % purpose is to serve as a scaffold to reconstruct a GEM for any 0007 % organism. 0008 % This refers to Tutorial 5 from "RAVEN tutorials.docx" 0009 % 0010 % Start by downloading trained Hidden Markov Models for eukaryotes. This can 0011 % be done automatically or manually from the RAVEN Wiki in its GitHub 0012 % repository. In this tutorial, the archive "euk90_kegg105" is picked for 0013 % the automatic download. See the documentation in the RAVEN Wiki for more 0014 % information regarding preparation of such archive. 0015 % 0016 % This creates a model for S. cerevisiae. The parameters are set to exclude 0017 % general or unclear reactions and reactions with undefined stoichiometry. 0018 % Type "help getKEGGModelForOrganism" to see what the different parameters 0019 % are for. This process takes up to 20-35 minutes in macOS, Unix systems and 0020 % 40-55 minutes in Windows, depending on your hardware and the size of 0021 % target organism proteome 0022 model=getKEGGModelForOrganism('sce','sce.fa','euk90_kegg105','output',false,false,false,false,10^-30,0.8,0.3,-1); 0023 0024 % The resulting model should contain around 1589 reactions, 1600 0025 % metabolites and 836 genes. Small variations are possible since it is an 0026 % heuristic algorithm and different KEGG versions will give slightly 0027 % different results. 0028 disp(model); 0029 0030 % A first control is that the model should not be able to produce any 0031 % metabolites without uptake of some metabolites. This commonly happens when 0032 % metabolites have a different meaning in different reactions. The best way 0033 % to find such reactions is to run makeSomething and analyze the resulting 0034 % solution for bad reactions. An automated approach is to use removeBadRxns, 0035 % which tries to do the same thing in an automated manner. Type 0036 % "help removeBadRxns" for details. 0037 [newModel, removedRxns]=removeBadRxns(model); 0038 0039 % One can see an error about that H+ can be made even if no reactions were 0040 % unbalanced. Protons are particularly problematic since it is rather 0041 % arbitary at which pH the formulas are written for. For the purpose of this 0042 % analysis, the protons can be ignored and fixed later. 0043 [newModel, removedRxns]=removeBadRxns(model,1,{'H+'},true); 0044 0045 % Only one reaction was removed because it enabled the model to produce 0046 % something from nothing. Since it is only one reaction, it might be 0047 % worthwhile to look into this in more detail. 0048 disp(removedRxns); 0049 0050 % According to the information in KEGG about this reaction it is a general 0051 % polymer reaction. One might want to look at the flux distributions 0052 % in more detail to try to find out if there is any better alternative to 0053 % delete. Use makeSomething to do this. 0054 [fluxes, metabolite]=makeSomething(model,{'H+'},true); 0055 model.metNames(metabolite) 0056 0057 % The model could produce H2O using the following reactions 0058 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n') 0059 0060 % That resulted in quite a lot of fluxes to look through. Maybe it is easier 0061 % if the elementally balanced reactions are excluded. 0062 balanceStructure=getElementalBalance(model); 0063 0064 % The hydrogen balancing is a bit tricky, so this time it seems more 0065 % relevant to only look at the ones unbalanced for oxygen (since water was 0066 % produced) 0067 goodOnes=balanceStructure.leftComp(:,6)==balanceStructure.rightComp(:,6); 0068 printFluxes(removeReactions(model,goodOnes), fluxes(~goodOnes), false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n'); 0069 0070 % There is still a good number of reactions. Leave only the reactions which 0071 % involve amylose or starch, from one of the problematic reactions that was 0072 % identified earlier. 0073 printFluxes(model, fluxes, false, [], [],'%rxnID (%rxnName):\n\t%eqn: %flux\n',{'Amylose';'Starch'}); 0074 0075 % There are two elementally unbalanced reactions, including, the reaction 0076 % which was identified by removeBadRxns. When looking to these reactions 0077 % closer, one can notice the contradiction between the reactions. The first 0078 % one shows that amylose and starch are interconvertible, while the second 0079 % reaction shows that amylose contains one less glucose unit than starch. 0080 % This type of general reactions are problematic and should be fixed 0081 % manually. It is hereby chosen to trust removeBadRxns and delete R02110. 0082 model=removeReactions(model,'R02110'); 0083 0084 % The model can no longer make something from nothing. Can it consume 0085 % something without any output? 0086 [solution, metabolite]=consumeSomething(model,{'H+'},true); 0087 model.metNames(metabolite) 0088 0089 % Nope, so that was good. Add some uptakes and see what it can produce. 0090 [~, J]=ismember({'D-Glucose';'H2O';'Orthophosphate';'Oxygen';'NH3';'Sulfate'},model.metNames); 0091 [model, addedRxns]=addExchangeRxns(model,'in',J); 0092 0093 % Check which metabolites can be produced given these uptakes. The 0094 % canProduce function allows for output of all metabolites. This will not 0095 % happen in the real cell, but it is very useful for functionality testing 0096 % of the model. Once it is functional, the excretion reactions based on 0097 % evidence can be added as well. 0098 I=canProduce(model); 0099 0100 fprintf('%d%%\n', round(sum(I)/numel(model.mets)*100)); 0101 % It seems that around 31% of the metabolites could be synthesized. It is 0102 % not directly clear whether this is a high or low number, many metabolites 0103 % should not be possible to synthesize from those simple precursors. 0104 0105 % Try to fill gaps using the full KEGG model to see if that gives a 0106 % significantly higher number 0107 keggModel=getModelFromKEGG([],false,false,false,false); 0108 0109 % The KEGG model is associated to more than 6,400,000 genes. They will not 0110 % be used for the gapfilling, so they are removed to make this a little 0111 % faster 0112 keggModel=rmfield(keggModel,'genes'); 0113 keggModel=rmfield(keggModel,'rxnGeneMat'); 0114 0115 % It is already known from the previous commands that there are some 0116 % unbalanced reactions in KEGG. Only use the balanced ones for the gap 0117 % filling 0118 balanceStructure=getElementalBalance(keggModel); 0119 keggModel=removeReactions(keggModel,balanceStructure.balanceStatus~=1,true,true); 0120 0121 % The function fillGaps with these settings will try to include reactions in 0122 % order to have flux through all reactions in the model. There are other 0123 % settings as well. The first flag says that production of all metabolites 0124 % should be allowed. 0125 params.relGap=0.6; %Lower number for a more exhaustive search 0126 params.printReport=true; 0127 [newConnected, cannotConnect, addedRxns, newModel, exitFlag]=fillGaps(model,keggModel,true,false,false,[],params); 0128 0129 % The results show that fillGaps could connect around 29 reactions 0130 % (newConnected) by including around 41 reactions from the KEGG model 0131 % (addedRxns). Those should of course be checked manually to confirm that 0132 % they exist in yeast, but in this tutorial it is assumed that all these 0133 % reactions indeed occur in yeast. 0134 0135 % Continue to improve the connectivity of the model by identifying 0136 % metabolites that should be connected. A convenient way to get an overview 0137 % of how connected the model is, and at the same time getting a lot of 0138 % useful data, is to use gapReport. Note that it can take several to many 0139 % hours to run, depending on the number of gaps in the model. 0140 [noFluxRxns, noFluxRxnsRelaxed, subGraphs, notProducedMets, minToConnect,... 0141 neededForProductionMat]=gapReport(newModel); 0142 0143 % The results show that 532 from 1634 reactions cannot carry flux. Remember 0144 % that an output is allowed for all metabolites, that is why it calculates 0145 % 532 in both cases. 543 from 1617 metabolites cannot be synthesized from 0146 % the precursors which are supplied. There are 6 subnetworks in the model, 0147 % of which 1605 from 1617 metabolites belong to the first one. 0148 0149 % It will also print something similar to: 0150 0151 % To enable net production of all metabolites, a total of 322 metabolites 0152 % must be connected 0153 % Top 10 metabolites to connect: 0154 % 1. Acyl-CoA[s] (connects 162 metabolites) 0155 % 2. Thiamin diphosphate[s] (connects 17 metabolites) 0156 % 3. G00003[s] (connects 14 metabolites) 0157 % 4. Androstenediol[s] (connects 13 metabolites) 0158 % 5. [Dihydrolipoyllysine-residue succinyltransferase] S-glutaryldihydrolipoyllysine[s] (connects 13 metabolites) 0159 % 6. Selenomethionyl-tRNA(Met)[s] (connects 12 metabolites) 0160 % 7. Retinyl ester[s] (connects 11 metabolites) 0161 % 8. 1-Alkyl-2-acylglycerol[s] (connects 10 metabolites) 0162 % 9. G00146[s] (connects 10 metabolites) 0163 % 10. Progesterone[s] (connects 9 metabolites) 0164 0165 % This is a very useful way of directing the gap-filling tasks to where they 0166 % are of the greatest use. In this case it says that in order to have net 0167 % production of all metabolites in the model from the simple inputs which 0168 % have been defined (543 metabolites can currently not have net production) 0169 % one needs to connect 322 unconnected metabolites in. However, by 0170 % connecting only the top 10 in the list 271/543 (50%) of them will now be 0171 % connected. When looking at the list one can see that the top ranking one 0172 % is Acyl-CoA, which represents many different fatty acid chains. The second 0173 % metabolite, thiamine disphosphate, is an enzyme co-factor. It turns out 0174 % that yeast cannot grow on only the substrates that had been defined, but 0175 % that it requires some other precursors for co-factor synthesis as well. 0176 0177 % Add uptake reactions for the minimal media constituents needed for yeast 0178 % to grow. 0179 [~, J]=ismember({'4-Aminobenzoate';'Riboflavin';'Thiamine';'Biotin';'Folate';'Nicotinate';'Zymosterol';'Choline'},newModel.metNames); 0180 [newModel, addedRxns]=addExchangeRxns(newModel,'in',J); 0181 0182 % Rerun gapReport and use the output for targeting the gap-filling efforts. 0183 % Note that only some info is printed; most of it is available in the output 0184 % structures. Work like this in an iterative manner until the model is of 0185 % sufficient quality.