Description of applyCustomKcats

 applyCustomKcats
   Apply user defined kcats.  Reads data/customKcats.tsv in the obj.params.path
   specified in the model adapter. Alternatively, a customKcats structure can
   provided, as specified below.

 Input:
   model           an ecModel in GECKO 3 format (with ecModel.ec structure)
   customKcats     structure with custom kcat information. If nothing
                   is provided, an attempt will be made to read
                   data/customKcats.tsv from the obj.params.path folder
                   specified in the modelAdapter.
   modelAdapter    a loaded model adapter (Optional, will otherwise use the
                   default model adapter).

 Output:
   model           ecModel where kcats for defined proteins have been
                   changed
   rxnUpdated      ids list of updated reactions, new kcats were applied
   notMatch        table with the list of reactions which the custom
                   information provided does not have full match (> 50%)
                   based on GPR rules. Then, they are suggested to be
                   curated by the user

   customKcats structure:
   - proteins    protein identifiers, multiple for the same kcat (in case
                 of a protein complex) are separated by ' + '
   - genes       gene identifiers (optional, not used in matching)
   - gene_name   short gene name (optional, not used in matching)
   - kcat        new kcat value (one per entry)
   - rxns        reaction identifiers, multiple for the same kcat are
                 separated by ',' (see further explanation below)
   - notes       will be appended to model.ec.notes (optional)
   - stoicho     complex stoichiometry, separated by ' + ' (examples: '1'
                 or '3 + 1'), matching the order in proteins field
   
   Matching order:
   (1) reactions are identified by .proteins and .rxns
   (2) reactions are identified by .proteins only (empty .rxns entry)
       no additional checks are made: is a reaction annotated with these
       proteins? => its kcat will be updated, irrespective of the exact
       reaction, direction, substrate, etc.
   (3) reactions are identified by .rxns only (empty .proteins entry)
       no additional checks are made: is a reaction derived from the
       original reaction identifier => its kcat will be updated,
       irrespective of the annotated protein

   customKcats.rxns field:
   The reaction identifiers are from the ORIGINAL model, before _EXP_
   suffixes were added by makeEcModel. Reaction directionality IS however
   specified, with a _REV suffix.
   Example entries:
   'r_0001'     will match r_0001, r_0001_EXP_1, r_0001_EXP_2 etc., but
                not r_0001_REV, r_0001_EXP_1_REV etc.
   'r_0001_REV' will match r_0001_REV, r_0001_EXP_1_REV, r_0001_EXP_2_REV,
                etc., but not r_0001, r_0001_EXP_1 etc.
   Multiple identifiers should be comma separated (e.g. r_0001, r_0002)

 Usage:
   [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter);

0001 function [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter)
0002 % applyCustomKcats
0003 %   Apply user defined kcats.  Reads data/customKcats.tsv in the obj.params.path
0004 %   specified in the model adapter. Alternatively, a customKcats structure can
0005 %   provided, as specified below.
0006 %
0007 % Input:
0008 %   model           an ecModel in GECKO 3 format (with ecModel.ec structure)
0009 %   customKcats     structure with custom kcat information. If nothing
0010 %                   is provided, an attempt will be made to read
0011 %                   data/customKcats.tsv from the obj.params.path folder
0012 %                   specified in the modelAdapter.
0013 %   modelAdapter    a loaded model adapter (Optional, will otherwise use the
0014 %                   default model adapter).
0015 %
0016 % Output:
0017 %   model           ecModel where kcats for defined proteins have been
0018 %                   changed
0019 %   rxnUpdated      ids list of updated reactions, new kcats were applied
0020 %   notMatch        table with the list of reactions which the custom
0021 %                   information provided does not have full match (> 50%)
0022 %                   based on GPR rules. Then, they are suggested to be
0023 %                   curated by the user
0024 %
0025 %   customKcats structure:
0026 %   - proteins    protein identifiers, multiple for the same kcat (in case
0027 %                 of a protein complex) are separated by ' + '
0028 %   - genes       gene identifiers (optional, not used in matching)
0029 %   - gene_name   short gene name (optional, not used in matching)
0030 %   - kcat        new kcat value (one per entry)
0031 %   - rxns        reaction identifiers, multiple for the same kcat are
0032 %                 separated by ',' (see further explanation below)
0033 %   - notes       will be appended to model.ec.notes (optional)
0034 %   - stoicho     complex stoichiometry, separated by ' + ' (examples: '1'
0035 %                 or '3 + 1'), matching the order in proteins field
0036 %
0037 %   Matching order:
0038 %   (1) reactions are identified by .proteins and .rxns
0039 %   (2) reactions are identified by .proteins only (empty .rxns entry)
0040 %       no additional checks are made: is a reaction annotated with these
0041 %       proteins? => its kcat will be updated, irrespective of the exact
0042 %       reaction, direction, substrate, etc.
0043 %   (3) reactions are identified by .rxns only (empty .proteins entry)
0044 %       no additional checks are made: is a reaction derived from the
0045 %       original reaction identifier => its kcat will be updated,
0046 %       irrespective of the annotated protein
0047 %
0048 %   customKcats.rxns field:
0049 %   The reaction identifiers are from the ORIGINAL model, before _EXP_
0050 %   suffixes were added by makeEcModel. Reaction directionality IS however
0051 %   specified, with a _REV suffix.
0052 %   Example entries:
0053 %   'r_0001'     will match r_0001, r_0001_EXP_1, r_0001_EXP_2 etc., but
0054 %                not r_0001_REV, r_0001_EXP_1_REV etc.
0055 %   'r_0001_REV' will match r_0001_REV, r_0001_EXP_1_REV, r_0001_EXP_2_REV,
0056 %                etc., but not r_0001, r_0001_EXP_1 etc.
0057 %   Multiple identifiers should be comma separated (e.g. r_0001, r_0002)
0058 %
0059 % Usage:
0060 %   [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter);
0061 
0062 if nargin < 3 || isempty(modelAdapter)
0063     modelAdapter = ModelAdapterManager.getDefault();
0064     if isempty(modelAdapter)
0065         error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.')
0066     end
0067 end
0068 
0069 params = modelAdapter.params;
0070 
0071 if nargin<2 || isempty(customKcats)
0072     customKcats = fullfile(params.path,'data','customKcats.tsv');
0073 end
0074 if isstruct(customKcats)
0075     if ~all(isfield(customKcats,{'proteins','kcat','rxns'}))
0076         error('The customKcats structure does not have all essential fields.');
0077     end
0078 elseif isfile(customKcats)
0079     fID = fopen(customKcats, 'r');
0080     fileContent = textscan(fID, '%s %s %s %f %q %s %s', 'Delimiter', '\t', 'HeaderLines', 1);
0081     fclose(fID);
0082     clear customKcats
0083     customKcats.proteins    = fileContent{1};
0084     customKcats.genes       = fileContent{2};
0085     customKcats.gene_name   = fileContent{3};
0086     customKcats.kcat        = fileContent{4};
0087     customKcats.rxns        = fileContent{5};
0088     customKcats.notes       = fileContent{6};
0089     customKcats.stoicho     = fileContent{7};
0090 else
0091     error(['Cannot find file: ' customKcats]);
0092 end
0093 
0094 rxnToUpdate = false(length(model.ec.rxns),1);
0095 rxnNotMatch = false(length(model.ec.rxns),1);
0096 evaluatedRule = cell(length(model.ec.rxns),1);
0097 enzInModel = cell(length(model.ec.rxns),1);
0098 ecRxnNoSuffix = regexprep(model.ec.rxns,'_EXP_\d+$','');
0099 
0100 % Implementation for full GECKO formulation
0101 if ~model.ec.geckoLight
0102     for i = 1:numel(customKcats.proteins)
0103         if isempty(customKcats.proteins{i})
0104             %If only reaction ID(s) is/are specified (and no proteins),
0105             %then apply the kcat to all isozymic reactions
0106             rxns    = strtrim(strsplit(customKcats.rxns{i}, ','));
0107             rxnIdxs = ismember(ecRxnNoSuffix,rxns);
0108             rxnToUpdate(rxnIdxs) = 1;
0109             model.ec.kcat(rxnIdxs) = customKcats.kcat(i);
0110         else
0111             prots = strtrim(strsplit(customKcats.proteins{i}, '+'));
0112 
0113             % Find the index for the enzymes which kcat will be changed. In
0114             % case the protein is not in the model generate a warning.
0115             try
0116                 enzIdx = cellfun(@(x) find(strcmpi(model.ec.enzymes, x)), prots);
0117             catch
0118                 enzIdx = [];
0119                 printOrange(['WARNING: Protein(s) ' customKcats.proteins{i} ' were not found in the model.']);
0120             end
0121 
0122             % if not specific reactions are defined, find all the reaction
0123             % index where the enzyme is used
0124             if isempty(customKcats.rxns{i})
0125                 temp_rxnIdxs = arrayfun(@(x) find(model.ec.rxnEnzMat(:, x)), enzIdx, 'UniformOutput', false);
0126                 % otherwhise, If a set of reactions if defined, only get the index for those
0127                 % but ignore any _EXP_ suffixes
0128             else
0129                 rxns = strtrim(strsplit(customKcats.rxns{i}, ','));
0130                 temp_rxnIdxs = arrayfun(@(x) find(strcmpi(ecRxnNoSuffix, x)), rxns, 'UniformOutput', false);
0131             end
0132 
0133             if ~isempty(temp_rxnIdxs)
0134                 rxnIdxs = [];
0135                 for j = 1:numel(temp_rxnIdxs)
0136                     rxnIdxs = [rxnIdxs; temp_rxnIdxs{j}];
0137                 end
0138 
0139                 % Check when multiple proteins are involved, since it can return same rxn n times
0140                 rxnIdxs = unique(rxnIdxs); %unique(rxnIdxs{1, :});
0141 
0142                 for j = 1:numel(rxnIdxs)
0143                     % Get all the enzymes involved in the reaction
0144                     allEnzInRxn = find(model.ec.rxnEnzMat(rxnIdxs(j),:));
0145 
0146                     C = intersect(enzIdx, allEnzInRxn);
0147 
0148                     % Determine the match percentage bewteen the rules
0149                     if numel(C) == numel(enzIdx) && numel(C) == numel(allEnzInRxn)
0150                         match = 1;
0151                     else
0152                         if numel(enzIdx) < numel(allEnzInRxn)
0153                             match = numel(C) / numel(allEnzInRxn);
0154                         else
0155                             match = numel(C) / numel(enzIdx);
0156                         end
0157                     end
0158 
0159                     % Update the kcat value stored in the model, if match 100%,
0160                     % otherwhise if >= 0.5 inform to the user to be validated
0161                     if match == 1
0162                         rxnToUpdate(rxnIdxs(j)) = 1;
0163                         model.ec.kcat(rxnIdxs(j)) = customKcats.kcat(i);
0164 
0165                         % Add note mentioning manual kcat change
0166                         model.ec.source{rxnIdxs(j),1} = 'custom';
0167                         if isfield(customKcats,'notes')
0168                             if isempty(model.ec.notes{rxnIdxs(j), 1}) && ~isempty(customKcats.notes{i})
0169                                 model.ec.notes{rxnIdxs(j), 1} = customKcats.notes{i};
0170                             else
0171                                 model.ec.notes{rxnIdxs(j), 1} = [model.ec.notes{rxnIdxs(j), 1} ', ' customKcats.notes{i}];
0172                             end
0173                         end
0174                     elseif match >= 0.5 && match < 1
0175                         rxnNotMatch(rxnIdxs(j)) = 1;
0176                         evaluatedRule{rxnIdxs(j), 1} = customKcats.proteins{i};
0177                         enzInModel{rxnIdxs(j), 1} = strjoin(model.ec.enzymes(allEnzInRxn), ' + ');
0178                     end
0179                 end
0180             end
0181         end
0182     end
0183 
0184     % Apply the new kcat values to the model
0185     if ~isempty(find(rxnToUpdate, 1))
0186         model = applyKcatConstraints(model, rxnToUpdate);
0187     else
0188         printOrange('WARNING: No matches found. Consider checking the IDs or proteins in customKcats.');
0189     end
0190 
0191     rxnUpdated = model.ec.rxns(find(rxnToUpdate));
0192 
0193     % Remove from notMatch those reactions already updated
0194     remove = and(rxnToUpdate, rxnNotMatch);
0195     rxnNotMatch(remove) = 0;
0196     evaluatedRule(remove) = '';
0197     enzInModel(remove) = '';
0198 
0199     idRxns = model.ec.rxns(find(rxnNotMatch));
0200     fullIdx = cellfun(@(x) find(strcmpi(model.rxns, x)), idRxns);
0201     rxnsNames = model.rxnNames(fullIdx);
0202     evaluatedRule = evaluatedRule(~cellfun('isempty', evaluatedRule));
0203     enzInModel = enzInModel(~cellfun('isempty', enzInModel));
0204     rules = model.grRules(fullIdx);
0205     notMatch = table(idRxns, rxnsNames, evaluatedRule, enzInModel, rules, ...
0206         'VariableNames',{'rxns', 'name', 'custom enzymes', 'enzymes in model', 'rules'});
0207 end
0208 end

applyCustomKcats

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE