Description of getStandardKcat

getStandardKcat
Calculate an standard kcat and standard molecular weight (MW) that can
be used to apply enzyme constraints to reactions without any associated
enzymes. Such reactions have either an empty model.grRules field, or
they have no match in model.ec.rxns (which can be the case if the genes
in the model.grRules field could not be mapped to enzymes). This is
done by adding those reactions to model.ec, assign a "standard"
pseudoenzyme with the standard MW (median of all proteins in the
organism) and standard kcat (median from all kcat, or subsystem
specific kcat).

A reaction is assigned a subSystem specific kcat values if the model
has a subSystems field and the reaction is annotated with a subSystem.
Only the first subSystem will be considered if multiple are annotated
to the same reaction.

Exchange, transport and pseudoreactions are filtered out, plus any
reaction identifiers specified in /data/pseudoRxns.tsv in the model
adapter folder.

In addition, reactions that are annotated with an enzyme (and therefore
already in model.ec), but not assigned any reaction-specific kcat value
(their model.ec.kcat entry is either 0 or NaN), can be assigned
standard kcat values by a similar approach. However, those reactions
will not be linked to the "standard" pseudoenzyme, but will use the
enzyme that they had already been associated with.

Any pre-existing standard kcat assignments (identified by 'standard'
entires in model.ec.source) are removed when applying this function.

Input:
model an ecModel in GECKO 3 format (with ecModel.ec structure)
modelAdapter a loaded model adapter (Optional, will otherwise use
the default model adapter).
threshold a threshold to determine when use a kcat value based on
the mean kcat of the reactions in the same subSystem or
based on the median value of all the kcat in the model.
Second option is used when the number of reactions in a
determined subSystem is < threshold. (Optional, default
= 10)
fillZeroKcat logical whether zero kcat values should be replaced
with standard kcat values. (Optional, default = true).

Output:
model ecModel where model.ec is expanded with a standard
protein with standard kcat and standard MW, assigned to
reactions without gene associations.
rxnsMissingGPR a list of updated rxns identifiers with a standard value
standardMW the standard MW value calculated
standardKcat the standard Kcat value calculated
rxnsNoKcat a list of rxns identifiers whose zero kcat has been replaced

While model.ec.kcat is populated, applyKcatConstraints would still need
to be run to apply the new constraints to the S-matrix.

Usage:
[model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat);

0001 function [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat)
0002 % getStandardKcat
0003 %   Calculate an standard kcat and standard molecular weight (MW) that can
0004 %   be used to apply enzyme constraints to reactions without any associated
0005 %   enzymes. Such reactions have either an empty model.grRules field, or
0006 %   they have no match in model.ec.rxns (which can be the case if the genes
0007 %   in the model.grRules field could not be mapped to enzymes). This is
0008 %   done by adding those reactions to model.ec, assign a "standard"
0009 %   pseudoenzyme with the standard MW (median of all proteins in the
0010 %   organism) and standard kcat (median from all kcat, or subsystem
0011 %   specific kcat).
0012 %
0013 %   A reaction is assigned a subSystem specific kcat values if the model
0014 %   has a subSystems field and the reaction is annotated with a subSystem.
0015 %   Only the first subSystem will be considered if multiple are annotated
0016 %   to the same reaction.
0017 %
0018 %   Exchange, transport and pseudoreactions are filtered out, plus any
0019 %   reaction identifiers specified in /data/pseudoRxns.tsv in the model
0020 %   adapter folder.
0021 %
0022 %   In addition, reactions that are annotated with an enzyme (and therefore
0023 %   already in model.ec), but not assigned any reaction-specific kcat value
0024 %   (their model.ec.kcat entry is either 0 or NaN), can be assigned
0025 %   standard kcat values by a similar approach. However, those reactions
0026 %   will not be linked to the "standard" pseudoenzyme, but will use the
0027 %   enzyme that they had already been associated with.
0028 %
0029 %   Any pre-existing standard kcat assignments (identified by 'standard'
0030 %   entires in model.ec.source) are removed when applying this function.
0031 %
0032 % Input:
0033 %   model           an ecModel in GECKO 3 format (with ecModel.ec structure)
0034 %   modelAdapter    a loaded model adapter (Optional, will otherwise use
0035 %                   the default model adapter).
0036 %   threshold       a threshold to determine when use a kcat value based on
0037 %                   the mean kcat of the reactions in the same subSystem or
0038 %                   based on the median value of all the kcat in the model.
0039 %                   Second option is used when the number of reactions in a
0040 %                   determined subSystem is < threshold. (Optional, default
0041 %                   = 10)
0042 %   fillZeroKcat    logical whether zero kcat values should be replaced
0043 %                   with standard kcat values. (Optional, default = true).
0044 %
0045 % Output:
0046 %   model           ecModel where model.ec is expanded with a standard
0047 %                   protein with standard kcat and standard MW, assigned to
0048 %                   reactions without gene associations.
0049 %   rxnsMissingGPR  a list of updated rxns identifiers with a standard value
0050 %   standardMW      the standard MW value calculated
0051 %   standardKcat    the standard Kcat value calculated
0052 %   rxnsNoKcat      a list of rxns identifiers whose zero kcat has been replaced
0053 %
0054 %   While model.ec.kcat is populated, applyKcatConstraints would still need
0055 %   to be run to apply the new constraints to the S-matrix.
0056 %
0057 % Usage:
0058 %    [model, rxnsMissingGPR, standardMW, standardKcat, rxnsNoKcat] = getStandardKcat(model, modelAdapter, threshold, fillZeroKcat);
0059 
0060 if nargin < 2 || isempty(modelAdapter)
0061     modelAdapter = ModelAdapterManager.getDefault();
0062     if isempty(modelAdapter)
0063         error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.')
0064     end
0065 end
0066 params = modelAdapter.getParameters();
0067 
0068 if nargin < 3 || isempty(threshold)
0069     threshold = 10;
0070 end
0071 
0072 if nargin < 4 || isempty(fillZeroKcat)
0073     fillZeroKcat = true;
0074 end
0075 
0076 databases = loadDatabases('uniprot', modelAdapter);
0077 
0078 % An standard MW is defined for all the rxns which does not have a GPR
0079 % rule defined. This is based in all the proteins reported for the specific
0080 % organism in uniprot
0081 standardMW = median(databases.uniprot.MW, 'omitnan');
0082 
0083 % An standard Kcat is defined for all the rxns which does not have a GPR
0084 % rule defined. In this case, the kcat value for a particular reaction is
0085 % defined as the mean of the kcat values of the reactions involved in the
0086 % same subsystem in which the given reaction is involved. Nevertheless, if
0087 % a subSystem have a number of reactions lower than a treshold, the kcat
0088 % value will be the median of the kcat in all the reactions of the model.
0089 
0090 % Remove from the list those with kcat zero
0091 rxnsKcatZero = model.ec.kcat > 0;
0092 
0093 % Get the kcat value based on all the kcats in the model
0094 standardKcat = median(model.ec.kcat(rxnsKcatZero), 'omitnan');
0095 
0096 % If the model have subSystems assigned calculate kcat based on subSystem
0097 if isfield(model,'subSystems') && ~all(cellfun(@isempty, model.subSystems))
0098     standard = false;
0099     if model.ec.geckoLight
0100         modRxns = extractAfter(model.ec.rxns,4);
0101     else
0102         modRxns = model.ec.rxns;
0103     end
0104     % Map ec-rxns to model.rxns
0105     [~,rxnIdx]  = ismember(modRxns,model.rxns);
0106     % Choose first subSystem
0107     enzSubSystems = flattenCell(model.subSystems(rxnIdx));
0108     enzSubSystems = enzSubSystems(:,1);
0109     if ~all(cellfun(@isempty, enzSubSystems))
0110 
0111     % Make list of unique subsystems, and which rxns are linked to them
0112     [enzSubSystem_names, ~, rxnToSub] = unique(enzSubSystems);
0113     % Make matrix of ec-rxns vs. unique subsystem index
0114     ind = sub2ind([numel(enzSubSystem_names) numel(enzSubSystems)],rxnToSub',1:numel(rxnToSub));
0115     kcatSubSystem = false([numel(enzSubSystem_names) numel(enzSubSystems)]);
0116     kcatSubSystem(ind) = true;
0117     % Number of kcats per subSystem
0118     kcatsPerSubSystem = sum(kcatSubSystem,2);
0119     % Calculate average kcat values per subSystem
0120     kcatSubSystem = (kcatSubSystem*model.ec.kcat)./kcatsPerSubSystem;
0121     kcatSubSystem(kcatsPerSubSystem < threshold) = standardKcat;
0122     else
0123         standard = true;
0124         printOrange('WARNING: No subSystem-specific kcat values can be calculated');
0125     end
0126 else
0127     standard = true;
0128     printOrange('WARNING: No subSystem-specific kcat values can be calculated');
0129 end
0130 
0131 % Find reactions without GPR
0132 rxnsMissingGPR = find(cellfun(@isempty, model.grRules));
0133 
0134 % Find reactions with GPR but without model.ec entry (for instance due to
0135 % no protein matching)
0136 rxnsMissingEnzyme = find(~cellfun(@isempty, model.grRules));
0137 if model.ec.geckoLight
0138     ecRxnsList = unique(extractAfter(model.ec.rxns,4));
0139 else
0140     ecRxnsList = model.ec.rxns;
0141 end
0142 rxnsMissingEnzyme = find(and(~ismember(model.rxns(rxnsMissingEnzyme),ecRxnsList), ~contains(model.rxns(rxnsMissingEnzyme),'usage_prot_')));
0143 rxnsMissingGPR = [rxnsMissingGPR;rxnsMissingEnzyme];
0144 
0145 % Get custom list of reaction IDs to ignore, if existing. First column
0146 % contains reaction IDs, second column contains reaction names for
0147 % reference only.
0148 if exist(fullfile(params.path,'data','pseudoRxns.tsv'),'file')
0149     fID        = fopen(fullfile(params.path,'data','pseudoRxns.tsv'));
0150     fileData   = textscan(fID,'%s %s','delimiter','\t');
0151     fclose(fID);
0152     customRxns = fileData{1};
0153     customRxns = find(ismember(model.rxns,customRxns));
0154 else
0155     customRxns = [];
0156 end
0157 % Get and remove exchange, transport, spontaneous and pseudo reactions
0158 [~, exchangeRxns]  = getExchangeRxns(model);
0159 transportRxns = getTransportRxns(model);
0160 [spontaneousRxns, ~] = modelAdapter.getSpontaneousReactions(model);
0161 pseudoRxns = contains(model.rxnNames,'pseudoreaction');
0162 slimeRxns = contains(model.rxnNames,'SLIME rxn');
0163 rxnsToIgnore = [customRxns; exchangeRxns; find(transportRxns); ...
0164                 find(spontaneousRxns); find(pseudoRxns); find(slimeRxns)];
0165 rxnsMissingGPR(ismember(rxnsMissingGPR, rxnsToIgnore)) = [];
0166 
0167 % Only add if not geckoLight & getStandardKcat was not run earlier
0168 if ~any(strcmp(model.mets,'prot_standard'))
0169     % Add a new gene to be consistent with ec field named standard
0170     proteinStdGenes.genes = 'standard';
0171     if isfield(model,'geneShortNames')
0172         proteinStdGenes.geneShortNames = 'std';
0173     end
0174     model = addGenesRaven(model, proteinStdGenes);
0175 
0176     if ~model.ec.geckoLight
0177         % Add a new metabolite named prot_standard
0178         proteinStdMets.mets         = 'prot_standard';
0179         proteinStdMets.metNames     = proteinStdMets.mets;
0180         % Validate compartment
0181         proteinStdMets.compartments = strcmp(model.compNames,params.enzyme_comp);
0182         if ~any(proteinStdMets.compartments)
0183             error(['Compartment ' params.enzyme_comp ' (specified in params.enzyme_comp) '...
0184                 'cannot be found in model.compNames'])
0185         end
0186         proteinStdMets.compartments = model.comps(compartmentID);
0187 
0188         if isfield(model,'metNotes')
0189             proteinStdMets.metNotes = 'Standard enzyme-usage pseudometabolite';
0190         end
0191         model = addMets(model, proteinStdMets);
0192 
0193         % Add a protein usage reaction if not a light version
0194         proteinStdUsageRxn.rxns         = {'usage_prot_standard'};
0195         proteinStdUsageRxn.rxnNames     = proteinStdUsageRxn.rxns;
0196         proteinStdUsageRxn.mets         = {proteinStdMets.mets, 'prot_pool'};
0197         proteinStdUsageRxn.stoichCoeffs = [-1, 1];
0198         proteinStdUsageRxn.lb           = -1000;
0199         proteinStdUsageRxn.ub           = 0;
0200         proteinStdUsageRxn.grRules      = proteinStdGenes.genes;
0201 
0202         model = addRxns(model, proteinStdUsageRxn);
0203     end
0204     % Update .ec structure in model
0205     model.ec.genes(end+1)      = {'standard'};
0206     model.ec.enzymes(end+1)    = {'standard'};
0207     model.ec.mw(end+1)         = standardMW;
0208     model.ec.sequence(end+1)   = {''};
0209     % Additional info
0210     if isfield(model.ec,'concs')
0211         model.ec.concs(end+1)  = nan();
0212     end
0213 
0214     % Expand the enzyme rxns matrix
0215     model.ec.rxnEnzMat =  [model.ec.rxnEnzMat, zeros(length(model.ec.rxns), 1)]; % 1 new enzyme
0216     model.ec.rxnEnzMat =  [model.ec.rxnEnzMat; zeros(length(rxnsMissingGPR), length(model.ec.enzymes))]; % new rxns
0217 end
0218 stdMetIdx = find(strcmpi(model.ec.enzymes, 'standard'));
0219 
0220 % Remove previous standard kcat assignment
0221 oldStandardEnz = find(strcmp(model.ec.source,'standard'));
0222 if ~isempty(oldStandardEnz)
0223     oldStandardProt = logical(model.ec.rxnEnzMat(oldStandardEnz,stdMetIdx));
0224     % If annotated with real enzyme => set kcat to zero
0225     model.ec.kcat(oldStandardEnz(~oldStandardProt))        = 0;
0226     model.ec.source(oldStandardEnz(~oldStandardProt))      = {''};
0227     % If annotated with standard protein => remove entry
0228     model.ec.rxns(oldStandardEnz(oldStandardProt))        = [];
0229     model.ec.kcat(oldStandardEnz(oldStandardProt))        = [];
0230     model.ec.source(oldStandardEnz(oldStandardProt))      = [];
0231     model.ec.notes(oldStandardEnz(oldStandardProt))       = [];
0232     model.ec.eccodes(oldStandardEnz(oldStandardProt))     = [];
0233     model.ec.rxnEnzMat(oldStandardEnz(oldStandardProt),:) = [];
0234 end
0235 
0236 numRxns = length(model.ec.rxns);
0237 for i = 1:numel(rxnsMissingGPR)
0238     rxnIdx = rxnsMissingGPR(i);
0239 
0240     % Update .ec structure in model
0241     if ~model.ec.geckoLight
0242         model.ec.rxns(end+1)     = model.rxns(rxnIdx);
0243         % Add prefix in case is light version
0244     else
0245         model.ec.rxns{end+1}     = ['001_' model.rxns{rxnIdx}];
0246     end
0247 
0248     if ~standard
0249         kcatSubSystemIdx = strcmpi(enzSubSystem_names, model.subSystems{rxnIdx}(1));
0250         if all(kcatSubSystemIdx)
0251             model.ec.kcat(end+1) = kcatSubSystem(kcatSubSystemIdx);
0252         else
0253             model.ec.kcat(end+1) = standardKcat;
0254         end
0255     else
0256         model.ec.kcat(end+1) = standardKcat;
0257     end
0258 
0259     model.ec.source(end+1)   = {'standard'};
0260     model.ec.notes(end+1)    = {''};
0261     model.ec.eccodes(end+1)  = {''};
0262 
0263     % Update the enzyme rxns matrix
0264     model.ec.rxnEnzMat(numRxns+i, stdMetIdx) = 1;
0265 end
0266 % Get the rxns identifiers of the updated rxns
0267 rxnsMissingGPR = model.rxns(rxnsMissingGPR);
0268 
0269 if fillZeroKcat
0270     zeroKcat = model.ec.kcat == 0 | isnan(model.ec.kcat);
0271     model.ec.kcat(zeroKcat)     = standardKcat;
0272     model.ec.source(zeroKcat)   = {'standard'};
0273     rxnsNoKcat = model.ec.rxns(zeroKcat);
0274 else
0275     rxnsNoKcat = [];
0276 end
0277 end
0278 
0279 function Cflat = flattenCell(C,strFlag)
0280 %FLATTENCELL  Flatten a nested column cell array into a matrix cell array.
0281 %
0282 % CFLAT = FLATTENCELL(C) takes a column cell array in which one or more
0283 % entries is a nested cell array, and flattens it into a 2D matrix cell
0284 % array, where the nested entries are spread across new columns.
0285 %
0286 % CFLAT = FLATTENCELL(C,STRFLAG) if STRFLAG is TRUE, empty entries in the
0287 % resulting CFLAT will be replaced with empty strings {''}. Default = FALSE
0288 if nargin < 2
0289     strFlag = false;
0290 end
0291 
0292 % determine which entries are cells
0293 cells = cellfun(@iscell,C);
0294 
0295 % determine number of elements in each nested cell
0296 cellsizes = cellfun(@numel,C);
0297 cellsizes(~cells) = 1;  % ignore non-cell entries
0298 
0299 % flatten single-entry cells
0300 Cflat = C;
0301 Cflat(cells & (cellsizes == 1)) = cellfun(@(x) x{1},Cflat(cells & (cellsizes == 1)),'UniformOutput',false);
0302 
0303 % iterate through multi-entry cells
0304 multiCells = find(cellsizes > 1);
0305 for i = 1:length(multiCells)
0306     cellContents = Cflat{multiCells(i)};
0307     Cflat(multiCells(i),1:length(cellContents)) = cellContents;
0308 end
0309 
0310 % change empty elements to strings, if specified
0311 if ( strFlag )
0312     Cflat(cellfun(@isempty,Cflat)) = {''};
0313 end
0314 end

getStandardKcat

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE