Home > src > geckomat > gather_kcats > getStandardKcat.m

getStandardKcat

PURPOSE ^

getStandardKcat

SYNOPSIS ^

function [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat)

DESCRIPTION ^

 getStandardKcat
   Calculate an standard kcat and standard molecular weight (MW) that can
   be used to apply enzyme constraints to reactions without any associated
   enzymes. Such reactions have either an empty model.grRules field, or
   they have no match in model.ec.rxns (which can be the case if the genes
   in the model.grRules field could not be mapped to enzymes). This is
   done by adding those reactions to model.ec, assign a "standard"
   pseudoenzyme with the standard MW (median of all proteins in the
   organism) and standard kcat (median from all kcat, or subsystem
   specific kcat).

   A reaction is assigned a subSystem specific kcat values if the model
   has a subSystems field and the reaction is annotated with a subSystem.
   Only the first subSystem will be considered if multiple are annotated
   to the same reaction.

   Exchange, transport and pseudoreactions are filtered out, plus any
   reaction identifiers specified in /data/pseudoRxns.tsv in the model
   adapter folder.

   In addition, reactions that are annotated with an enzyme (and therefore
   already in model.ec), but not assigned any reaction-specific kcat value
   (their model.ec.kcat entry is either 0 or NaN), can be assigned
   standard kcat values by a similar approach. However, those reactions
   will not be linked to the "standard" pseudoenzyme, but will use the
   enzyme that they had already been associated with.

   Any pre-existing standard kcat assignments (identified by 'standard'
   entires in model.ec.source) are removed when applying this function.

 Input:
   model           an ecModel in GECKO 3 format (with ecModel.ec structure)
   modelAdapter    a loaded model adapter (Optional, will otherwise use
                   the default model adapter).
   threshold       a threshold to determine when use a kcat value based on
                   the mean kcat of the reactions in the same subSystem or
                   based on the median value of all the kcat in the model.
                   Second option is used when the number of reactions in a
                   determined subSystem is < threshold. (Optional, default
                   = 10)
   fillZeroKcat    logical whether zero kcat values should be replaced
                   with standard kcat values. (Optional, default = true).

 Output:
   model           ecModel where model.ec is expanded with a standard
                   protein with standard kcat and standard MW, assigned to
                   reactions without gene associations.
   rxnsMissingGPR  a list of updated rxns identifiers with a standard value
   standardMW      the standard MW value calculated
   standardKcat    the standard Kcat value calculated 
   rxnsNoKcat      a list of rxns identifiers whose zero kcat has been replaced

   While model.ec.kcat is populated, applyKcatConstraints would still need
   to be run to apply the new constraints to the S-matrix.

 Usage:
    [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat);

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat)
0002 % getStandardKcat
0003 %   Calculate an standard kcat and standard molecular weight (MW) that can
0004 %   be used to apply enzyme constraints to reactions without any associated
0005 %   enzymes. Such reactions have either an empty model.grRules field, or
0006 %   they have no match in model.ec.rxns (which can be the case if the genes
0007 %   in the model.grRules field could not be mapped to enzymes). This is
0008 %   done by adding those reactions to model.ec, assign a "standard"
0009 %   pseudoenzyme with the standard MW (median of all proteins in the
0010 %   organism) and standard kcat (median from all kcat, or subsystem
0011 %   specific kcat).
0012 %
0013 %   A reaction is assigned a subSystem specific kcat values if the model
0014 %   has a subSystems field and the reaction is annotated with a subSystem.
0015 %   Only the first subSystem will be considered if multiple are annotated
0016 %   to the same reaction.
0017 %
0018 %   Exchange, transport and pseudoreactions are filtered out, plus any
0019 %   reaction identifiers specified in /data/pseudoRxns.tsv in the model
0020 %   adapter folder.
0021 %
0022 %   In addition, reactions that are annotated with an enzyme (and therefore
0023 %   already in model.ec), but not assigned any reaction-specific kcat value
0024 %   (their model.ec.kcat entry is either 0 or NaN), can be assigned
0025 %   standard kcat values by a similar approach. However, those reactions
0026 %   will not be linked to the "standard" pseudoenzyme, but will use the
0027 %   enzyme that they had already been associated with.
0028 %
0029 %   Any pre-existing standard kcat assignments (identified by 'standard'
0030 %   entires in model.ec.source) are removed when applying this function.
0031 %
0032 % Input:
0033 %   model           an ecModel in GECKO 3 format (with ecModel.ec structure)
0034 %   modelAdapter    a loaded model adapter (Optional, will otherwise use
0035 %                   the default model adapter).
0036 %   threshold       a threshold to determine when use a kcat value based on
0037 %                   the mean kcat of the reactions in the same subSystem or
0038 %                   based on the median value of all the kcat in the model.
0039 %                   Second option is used when the number of reactions in a
0040 %                   determined subSystem is < threshold. (Optional, default
0041 %                   = 10)
0042 %   fillZeroKcat    logical whether zero kcat values should be replaced
0043 %                   with standard kcat values. (Optional, default = true).
0044 %
0045 % Output:
0046 %   model           ecModel where model.ec is expanded with a standard
0047 %                   protein with standard kcat and standard MW, assigned to
0048 %                   reactions without gene associations.
0049 %   rxnsMissingGPR  a list of updated rxns identifiers with a standard value
0050 %   standardMW      the standard MW value calculated
0051 %   standardKcat    the standard Kcat value calculated
0052 %   rxnsNoKcat      a list of rxns identifiers whose zero kcat has been replaced
0053 %
0054 %   While model.ec.kcat is populated, applyKcatConstraints would still need
0055 %   to be run to apply the new constraints to the S-matrix.
0056 %
0057 % Usage:
0058 %    [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat);
0059 
0060 if nargin < 2 || isempty(modelAdapter)
0061     modelAdapter = ModelAdapterManager.getDefault();
0062     if isempty(modelAdapter)
0063         error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.')
0064     end
0065 end
0066 params = modelAdapter.getParameters();
0067 
0068 if nargin < 3 || isempty(threshold)
0069     threshold = 10;
0070 end
0071 
0072 if nargin < 4 || isempty(fillZeroKcat)
0073     fillZeroKcat = true;
0074 end
0075 
0076 databases = loadDatabases('uniprot', modelAdapter);
0077 
0078 % An standard MW is defined for all the rxns which does not have a GPR
0079 % rule defined. This is based in all the proteins reported for the specific
0080 % organism in uniprot
0081 standardMW = median(databases.uniprot.MW, 'omitnan');
0082 
0083 % An standard Kcat is defined for all the rxns which does not have a GPR
0084 % rule defined. In this case, the kcat value for a particular reaction is
0085 % defined as the mean of the kcat values of the reactions involved in the
0086 % same subsystem in which the given reaction is involved. Nevertheless, if
0087 % a subSystem have a number of reactions lower than a treshold, the kcat
0088 % value will be the median of the kcat in all the reactions of the model.
0089 
0090 % Remove from the list those with kcat zero
0091 rxnsKcatZero = model.ec.kcat > 0;
0092 
0093 % Get the kcat value based on all the kcats in the model
0094 standardKcat = median(model.ec.kcat(rxnsKcatZero), 'omitnan');
0095 
0096 % If the model have subSystems assigned calculate kcat based on subSystem
0097 if isfield(model,'subSystems') && ~all(cellfun(@isempty, model.subSystems))
0098     standard = false;
0099     if model.ec.geckoLight
0100         modRxns = extractAfter(model.ec.rxns,4);
0101     else
0102         modRxns = model.ec.rxns;
0103     end
0104     % Map ec-rxns to model.rxns
0105     [~,rxnIdx]  = ismember(modRxns,model.rxns);
0106     % Choose first subSystem
0107     enzSubSystems = flattenCell(model.subSystems(rxnIdx));
0108     enzSubSystems = enzSubSystems(:,1);
0109     if ~all(cellfun(@isempty, enzSubSystems))
0110 
0111     % Make list of unique subsystems, and which rxns are linked to them
0112     [enzSubSystem_names, ~, rxnToSub] = unique(enzSubSystems);
0113     % Make matrix of ec-rxns vs. unique subsystem index
0114     ind = sub2ind([numel(enzSubSystem_names) numel(enzSubSystems)],rxnToSub',1:numel(rxnToSub));
0115     kcatSubSystem = false([numel(enzSubSystem_names) numel(enzSubSystems)]);
0116     kcatSubSystem(ind) = true;
0117     % Number of kcats per subSystem
0118     kcatsPerSubSystem = sum(kcatSubSystem,2);
0119     % Calculate average kcat values per subSystem
0120     kcatSubSystem = (kcatSubSystem*model.ec.kcat)./kcatsPerSubSystem;
0121     kcatSubSystem(kcatsPerSubSystem < threshold) = standardKcat;
0122     else
0123         standard = true;
0124         printOrange('WARNING: No subSystem-specific kcat values can be calculated');
0125     end
0126 else
0127     standard = true;
0128     printOrange('WARNING: No subSystem-specific kcat values can be calculated');
0129 end
0130 
0131 % Find reactions without GPR
0132 rxnsMissingGPR = find(cellfun(@isempty, model.grRules));
0133 
0134 % Find reactions with GPR but without model.ec entry (for instance due to
0135 % no protein matching)
0136 rxnsMissingEnzyme = find(~cellfun(@isempty, model.grRules));
0137 if model.ec.geckoLight
0138     ecRxnsList = unique(extractAfter(model.ec.rxns,4));
0139 else
0140     ecRxnsList = model.ec.rxns;
0141 end
0142 rxnsMissingEnzyme = find(and(~ismember(model.rxns(rxnsMissingEnzyme),ecRxnsList), ~contains(model.rxns(rxnsMissingEnzyme),'usage_prot_')));
0143 rxnsMissingGPR = [rxnsMissingGPR;rxnsMissingEnzyme];
0144 
0145 % Get custom list of reaction IDs to ignore, if existing. First column
0146 % contains reaction IDs, second column contains reaction names for
0147 % reference only.
0148 if exist(fullfile(params.path,'data','pseudoRxns.tsv'),'file')
0149     fID        = fopen(fullfile(params.path,'data','pseudoRxns.tsv'));
0150     fileData   = textscan(fID,'%s %s','delimiter','\t');
0151     fclose(fID);
0152     customRxns = fileData{1};
0153     customRxns = find(ismember(model.rxns,customRxns));
0154 else
0155     customRxns = [];
0156 end
0157 % Get and remove exchange, transport, spontaneous and pseudo reactions
0158 [~, exchangeRxns]  = getExchangeRxns(model);
0159 transportRxns = getTransportRxns(model);
0160 [spontaneousRxns, ~] = modelAdapter.getSpontaneousReactions(model);
0161 pseudoRxns = contains(model.rxnNames,'pseudoreaction');
0162 slimeRxns = contains(model.rxnNames,'SLIME rxn');
0163 rxnsToIgnore = [customRxns; exchangeRxns; find(transportRxns); ...
0164                 find(spontaneousRxns); find(pseudoRxns); find(slimeRxns)];
0165 rxnsMissingGPR(ismember(rxnsMissingGPR, rxnsToIgnore)) = [];
0166 
0167 % Only add if not geckoLight & getStandardKcat was not run earlier
0168 if ~any(strcmp(model.mets,'prot_standard'))
0169     % Add a new gene to be consistent with ec field named standard
0170     proteinStdGenes.genes = 'standard';
0171     if isfield(model,'geneShortNames')
0172         proteinStdGenes.geneShortNames = 'std';
0173     end
0174     model = addGenesRaven(model, proteinStdGenes);
0175 
0176     if ~model.ec.geckoLight
0177         % Add a new metabolite named prot_standard
0178         proteinStdMets.mets         = 'prot_standard';
0179         proteinStdMets.metNames     = proteinStdMets.mets;
0180         proteinStdMets.compartments = 'c';
0181         if isfield(model,'metNotes')
0182             proteinStdMets.metNotes = 'Standard enzyme-usage pseudometabolite';
0183         end
0184         model = addMets(model, proteinStdMets);
0185 
0186         % Add a protein usage reaction if not a light version
0187         proteinStdUsageRxn.rxns         = {'usage_prot_standard'};
0188         proteinStdUsageRxn.rxnNames     = proteinStdUsageRxn.rxns;
0189         proteinStdUsageRxn.mets         = {proteinStdMets.mets, 'prot_pool'};
0190         proteinStdUsageRxn.stoichCoeffs = [-1, 1];
0191         proteinStdUsageRxn.lb           = -1000;
0192         proteinStdUsageRxn.ub           = 0;
0193         proteinStdUsageRxn.grRules      = proteinStdGenes.genes;
0194 
0195         model = addRxns(model, proteinStdUsageRxn);
0196     end
0197     % Update .ec structure in model
0198     model.ec.genes(end+1)      = {'standard'};
0199     model.ec.enzymes(end+1)    = {'standard'};
0200     model.ec.mw(end+1)         = standardMW;
0201     model.ec.sequence(end+1)   = {''};
0202     % Additional info
0203     if isfield(model.ec,'concs')
0204         model.ec.concs(end+1)  = nan();
0205     end
0206 
0207     % Expand the enzyme rxns matrix
0208     model.ec.rxnEnzMat =  [model.ec.rxnEnzMat, zeros(length(model.ec.rxns), 1)]; % 1 new enzyme
0209     model.ec.rxnEnzMat =  [model.ec.rxnEnzMat; zeros(length(rxnsMissingGPR), length(model.ec.enzymes))]; % new rxns
0210 end
0211 stdMetIdx = find(strcmpi(model.ec.enzymes, 'standard'));
0212 
0213 % Remove previous standard kcat assignment
0214 oldStandardEnz = find(strcmp(model.ec.source,'standard'));
0215 if ~isempty(oldStandardEnz)
0216     oldStandardProt = logical(model.ec.rxnEnzMat(oldStandardEnz,stdMetIdx));
0217     % If annotated with real enzyme => set kcat to zero
0218     model.ec.kcat(oldStandardEnz(~oldStandardProt))        = 0;
0219     model.ec.source(oldStandardEnz(~oldStandardProt))      = {''};
0220     % If annotated with standard protein => remove entry
0221     model.ec.rxns(oldStandardEnz(oldStandardProt))        = [];
0222     model.ec.kcat(oldStandardEnz(oldStandardProt))        = [];
0223     model.ec.source(oldStandardEnz(oldStandardProt))      = [];
0224     model.ec.notes(oldStandardEnz(oldStandardProt))       = [];
0225     model.ec.eccodes(oldStandardEnz(oldStandardProt))     = [];
0226     model.ec.rxnEnzMat(oldStandardEnz(oldStandardProt),:) = [];
0227 end
0228 
0229 numRxns = length(model.ec.rxns);
0230 for i = 1:numel(rxnsMissingGPR)
0231     rxnIdx = rxnsMissingGPR(i);
0232 
0233     % Update .ec structure in model
0234     if ~model.ec.geckoLight
0235         model.ec.rxns(end+1)     = model.rxns(rxnIdx);
0236         % Add prefix in case is light version
0237     else
0238         model.ec.rxns{end+1}     = ['001_' model.rxns{rxnIdx}];
0239     end
0240 
0241     if ~standard
0242         kcatSubSystemIdx = strcmpi(enzSubSystem_names, model.subSystems{rxnIdx}(1));
0243         if all(kcatSubSystemIdx)
0244             model.ec.kcat(end+1) = kcatSubSystem(kcatSubSystemIdx);
0245         else
0246             model.ec.kcat(end+1) = standardKcat;
0247         end
0248     else
0249         model.ec.kcat(end+1) = standardKcat;
0250     end
0251 
0252     model.ec.source(end+1)   = {'standard'};
0253     model.ec.notes(end+1)    = {''};
0254     model.ec.eccodes(end+1)  = {''};
0255 
0256     % Update the enzyme rxns matrix
0257     model.ec.rxnEnzMat(numRxns+i, stdMetIdx) = 1;
0258 end
0259 % Get the rxns identifiers of the updated rxns
0260 rxnsMissingGPR = model.rxns(rxnsMissingGPR);
0261 
0262 if fillZeroKcat
0263     zeroKcat = model.ec.kcat == 0 | isnan(model.ec.kcat);
0264     model.ec.kcat(zeroKcat)     = standardKcat;
0265     model.ec.source(zeroKcat)   = {'standard'};
0266     rxnsNoKcat = model.ec.rxns(zeroKcat);
0267 else
0268     rxnsNoKcat = [];
0269 end
0270 end
0271 
0272 function Cflat = flattenCell(C,strFlag)
0273 %FLATTENCELL  Flatten a nested column cell array into a matrix cell array.
0274 %
0275 % CFLAT = FLATTENCELL(C) takes a column cell array in which one or more
0276 % entries is a nested cell array, and flattens it into a 2D matrix cell
0277 % array, where the nested entries are spread across new columns.
0278 %
0279 % CFLAT = FLATTENCELL(C,STRFLAG) if STRFLAG is TRUE, empty entries in the
0280 % resulting CFLAT will be replaced with empty strings {''}. Default = FALSE
0281 if nargin < 2
0282     strFlag = false;
0283 end
0284 
0285 % determine which entries are cells
0286 cells = cellfun(@iscell,C);
0287 
0288 % determine number of elements in each nested cell
0289 cellsizes = cellfun(@numel,C);
0290 cellsizes(~cells) = 1;  % ignore non-cell entries
0291 
0292 % flatten single-entry cells
0293 Cflat = C;
0294 Cflat(cells & (cellsizes == 1)) = cellfun(@(x) x{1},Cflat(cells & (cellsizes == 1)),'UniformOutput',false);
0295 
0296 % iterate through multi-entry cells
0297 multiCells = find(cellsizes > 1);
0298 for i = 1:length(multiCells)
0299     cellContents = Cflat{multiCells(i)};
0300     Cflat(multiCells(i),1:length(cellContents)) = cellContents;
0301 end
0302 
0303 % change empty elements to strings, if specified
0304 if ( strFlag )
0305     Cflat(cellfun(@isempty,Cflat)) = {''};
0306 end
0307 end

Generated by m2html © 2005