applyCustomKcats Apply user defined kcats. Reads data/customKcats.tsv in the obj.params.path specified in the model adapter. Alternatively, a customKcats structure can provided, as specified below. Input: model an ecModel in GECKO 3 format (with ecModel.ec structure) customKcats structure with custom kcat information. If nothing is provided, an attempt will be made to read data/customKcats.tsv from the obj.params.path folder specified in the modelAdapter. modelAdapter a loaded model adapter (Optional, will otherwise use the default model adapter). Output: model ecModel where kcats for defined proteins have been changed rxnUpdated ids list of updated reactions, new kcats were applied notMatch table with the list of reactions which the custom information provided does not have full match (> 50%) based on GPR rules. Then, they are suggested to be curated by the user customKcats structure: - proteins protein identifiers, multiple for the same kcat (in case of a protein complex) are separated by ' + ' - genes gene identifiers (optional, not used in matching) - gene_name short gene name (optional, not used in matching) - kcat new kcat value (one per entry) - rxns reaction identifiers, multiple for the same kcat are separated by ',' (see further explanation below) - notes will be appended to model.ec.notes (optional) - stoicho complex stoichiometry, separated by ' + ' (examples: '1' or '3 + 1'), matching the order in proteins field Matching order: (1) reactions are identified by .proteins and .rxns (2) reactions are identified by .proteins only (empty .rxns entry) no additional checks are made: is a reaction annotated with these proteins? => its kcat will be updated, irrespective of the exact reaction, direction, substrate, etc. (3) reactions are identified by .rxns only (empty .proteins entry) no additional checks are made: is a reaction derived from the original reaction identifier => its kcat will be updated, irrespective of the annotated protein customKcats.rxns field: The reaction identifiers are from the ORIGINAL model, before _EXP_ suffixes were added by makeEcModel. Reaction directionality IS however specified, with a _REV suffix. Example entries: 'r_0001' will match r_0001, r_0001_EXP_1, r_0001_EXP_2 etc., but not r_0001_REV, r_0001_EXP_1_REV etc. 'r_0001_REV' will match r_0001_REV, r_0001_EXP_1_REV, r_0001_EXP_2_REV, etc., but not r_0001, r_0001_EXP_1 etc. Multiple identifiers should be comma separated (e.g. r_0001, r_0002) Usage: [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter);
0001 function [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter) 0002 % applyCustomKcats 0003 % Apply user defined kcats. Reads data/customKcats.tsv in the obj.params.path 0004 % specified in the model adapter. Alternatively, a customKcats structure can 0005 % provided, as specified below. 0006 % 0007 % Input: 0008 % model an ecModel in GECKO 3 format (with ecModel.ec structure) 0009 % customKcats structure with custom kcat information. If nothing 0010 % is provided, an attempt will be made to read 0011 % data/customKcats.tsv from the obj.params.path folder 0012 % specified in the modelAdapter. 0013 % modelAdapter a loaded model adapter (Optional, will otherwise use the 0014 % default model adapter). 0015 % 0016 % Output: 0017 % model ecModel where kcats for defined proteins have been 0018 % changed 0019 % rxnUpdated ids list of updated reactions, new kcats were applied 0020 % notMatch table with the list of reactions which the custom 0021 % information provided does not have full match (> 50%) 0022 % based on GPR rules. Then, they are suggested to be 0023 % curated by the user 0024 % 0025 % customKcats structure: 0026 % - proteins protein identifiers, multiple for the same kcat (in case 0027 % of a protein complex) are separated by ' + ' 0028 % - genes gene identifiers (optional, not used in matching) 0029 % - gene_name short gene name (optional, not used in matching) 0030 % - kcat new kcat value (one per entry) 0031 % - rxns reaction identifiers, multiple for the same kcat are 0032 % separated by ',' (see further explanation below) 0033 % - notes will be appended to model.ec.notes (optional) 0034 % - stoicho complex stoichiometry, separated by ' + ' (examples: '1' 0035 % or '3 + 1'), matching the order in proteins field 0036 % 0037 % Matching order: 0038 % (1) reactions are identified by .proteins and .rxns 0039 % (2) reactions are identified by .proteins only (empty .rxns entry) 0040 % no additional checks are made: is a reaction annotated with these 0041 % proteins? => its kcat will be updated, irrespective of the exact 0042 % reaction, direction, substrate, etc. 0043 % (3) reactions are identified by .rxns only (empty .proteins entry) 0044 % no additional checks are made: is a reaction derived from the 0045 % original reaction identifier => its kcat will be updated, 0046 % irrespective of the annotated protein 0047 % 0048 % customKcats.rxns field: 0049 % The reaction identifiers are from the ORIGINAL model, before _EXP_ 0050 % suffixes were added by makeEcModel. Reaction directionality IS however 0051 % specified, with a _REV suffix. 0052 % Example entries: 0053 % 'r_0001' will match r_0001, r_0001_EXP_1, r_0001_EXP_2 etc., but 0054 % not r_0001_REV, r_0001_EXP_1_REV etc. 0055 % 'r_0001_REV' will match r_0001_REV, r_0001_EXP_1_REV, r_0001_EXP_2_REV, 0056 % etc., but not r_0001, r_0001_EXP_1 etc. 0057 % Multiple identifiers should be comma separated (e.g. r_0001, r_0002) 0058 % 0059 % Usage: 0060 % [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter); 0061 0062 if nargin < 3 || isempty(modelAdapter) 0063 modelAdapter = ModelAdapterManager.getDefault(); 0064 if isempty(modelAdapter) 0065 error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.') 0066 end 0067 end 0068 0069 params = modelAdapter.params; 0070 0071 if nargin<2 || isempty(customKcats) 0072 customKcats = fullfile(params.path,'data','customKcats.tsv'); 0073 end 0074 if isstruct(customKcats) 0075 if ~all(isfield(customKcats,{'proteins','kcat','rxns'})) 0076 error('The customKcats structure does not have all essential fields.'); 0077 end 0078 elseif isfile(customKcats) 0079 fID = fopen(customKcats, 'r'); 0080 fileContent = textscan(fID, '%s %s %s %f %q %s %s', 'Delimiter', '\t', 'HeaderLines', 1); 0081 fclose(fID); 0082 clear customKcats 0083 customKcats.proteins = fileContent{1}; 0084 customKcats.genes = fileContent{2}; 0085 customKcats.gene_name = fileContent{3}; 0086 customKcats.kcat = fileContent{4}; 0087 customKcats.rxns = fileContent{5}; 0088 customKcats.notes = fileContent{6}; 0089 customKcats.stoicho = fileContent{7}; 0090 else 0091 error(['Cannot find file: ' customKcats]); 0092 end 0093 0094 rxnToUpdate = false(length(model.ec.rxns),1); 0095 rxnNotMatch = false(length(model.ec.rxns),1); 0096 evaluatedRule = cell(length(model.ec.rxns),1); 0097 enzInModel = cell(length(model.ec.rxns),1); 0098 ecRxnNoSuffix = regexprep(model.ec.rxns,'_EXP_\d+$',''); 0099 0100 % Implementation for full GECKO formulation 0101 if ~model.ec.geckoLight 0102 for i = 1:numel(customKcats.proteins) 0103 if isempty(customKcats.proteins{i}) 0104 %If only reaction ID(s) is/are specified (and no proteins), 0105 %then apply the kcat to all isozymic reactions 0106 rxns = strtrim(strsplit(customKcats.rxns{i}, ',')); 0107 rxnIdxs = ismember(ecRxnNoSuffix,rxns); 0108 rxnToUpdate(rxnIdxs) = 1; 0109 model.ec.kcat(rxnIdxs) = customKcats.kcat(i); 0110 else 0111 prots = strtrim(strsplit(customKcats.proteins{i}, '+')); 0112 0113 % Find the index for the enzymes which kcat will be changed. In 0114 % case the protein is not in the model generate a warning. 0115 try 0116 enzIdx = cellfun(@(x) find(strcmpi(model.ec.enzymes, x)), prots); 0117 catch 0118 enzIdx = []; 0119 printOrange(['WARNING: Protein(s) ' customKcats.proteins{i} ' were not found in the model.']); 0120 end 0121 0122 % if not specific reactions are defined, find all the reaction 0123 % index where the enzyme is used 0124 if isempty(customKcats.rxns{i}) 0125 temp_rxnIdxs = arrayfun(@(x) find(model.ec.rxnEnzMat(:, x)), enzIdx, 'UniformOutput', false); 0126 % otherwhise, If a set of reactions if defined, only get the index for those 0127 % but ignore any _EXP_ suffixes 0128 else 0129 rxns = strtrim(strsplit(customKcats.rxns{i}, ',')); 0130 temp_rxnIdxs = arrayfun(@(x) find(strcmpi(ecRxnNoSuffix, x)), rxns, 'UniformOutput', false); 0131 end 0132 0133 if ~isempty(temp_rxnIdxs) 0134 rxnIdxs = []; 0135 for j = 1:numel(temp_rxnIdxs) 0136 rxnIdxs = [rxnIdxs; temp_rxnIdxs{j}]; 0137 end 0138 0139 % Check when multiple proteins are involved, since it can return same rxn n times 0140 rxnIdxs = unique(rxnIdxs); %unique(rxnIdxs{1, :}); 0141 0142 for j = 1:numel(rxnIdxs) 0143 % Get all the enzymes involved in the reaction 0144 allEnzInRxn = find(model.ec.rxnEnzMat(rxnIdxs(j),:)); 0145 0146 C = intersect(enzIdx, allEnzInRxn); 0147 0148 % Determine the match percentage bewteen the rules 0149 if numel(C) == numel(enzIdx) && numel(C) == numel(allEnzInRxn) 0150 match = 1; 0151 else 0152 if numel(enzIdx) < numel(allEnzInRxn) 0153 match = numel(C) / numel(allEnzInRxn); 0154 else 0155 match = numel(C) / numel(enzIdx); 0156 end 0157 end 0158 0159 % Update the kcat value stored in the model, if match 100%, 0160 % otherwhise if >= 0.5 inform to the user to be validated 0161 if match == 1 0162 rxnToUpdate(rxnIdxs(j)) = 1; 0163 model.ec.kcat(rxnIdxs(j)) = customKcats.kcat(i); 0164 0165 % Add note mentioning manual kcat change 0166 model.ec.source{rxnIdxs(j),1} = 'custom'; 0167 if isfield(customKcats,'notes') 0168 if isempty(model.ec.notes{rxnIdxs(j), 1}) && ~isempty(customKcats.notes{i}) 0169 model.ec.notes{rxnIdxs(j), 1} = customKcats.notes{i}; 0170 else 0171 model.ec.notes{rxnIdxs(j), 1} = [model.ec.notes{rxnIdxs(j), 1} ', ' customKcats.notes{i}]; 0172 end 0173 end 0174 elseif match >= 0.5 && match < 1 0175 rxnNotMatch(rxnIdxs(j)) = 1; 0176 evaluatedRule{rxnIdxs(j), 1} = customKcats.proteins{i}; 0177 enzInModel{rxnIdxs(j), 1} = strjoin(model.ec.enzymes(allEnzInRxn), ' + '); 0178 end 0179 end 0180 end 0181 end 0182 end 0183 0184 % Apply the new kcat values to the model 0185 if ~isempty(find(rxnToUpdate, 1)) 0186 model = applyKcatConstraints(model, rxnToUpdate); 0187 else 0188 printOrange('WARNING: No matches found. Consider checking the IDs or proteins in customKcats.'); 0189 end 0190 0191 rxnUpdated = model.ec.rxns(find(rxnToUpdate)); 0192 0193 % Remove from notMatch those reactions already updated 0194 remove = and(rxnToUpdate, rxnNotMatch); 0195 rxnNotMatch(remove) = 0; 0196 evaluatedRule(remove) = ''; 0197 enzInModel(remove) = ''; 0198 0199 idRxns = model.ec.rxns(find(rxnNotMatch)); 0200 fullIdx = cellfun(@(x) find(strcmpi(model.rxns, x)), idRxns); 0201 rxnsNames = model.rxnNames(fullIdx); 0202 evaluatedRule = evaluatedRule(~cellfun('isempty', evaluatedRule)); 0203 enzInModel = enzInModel(~cellfun('isempty', enzInModel)); 0204 rules = model.grRules(fullIdx); 0205 notMatch = table(idRxns, rxnsNames, evaluatedRule, enzInModel, rules, ... 0206 'VariableNames',{'rxns', 'name', 'custom enzymes', 'enzymes in model', 'rules'}); 0207 end 0208 end