removeLowScoreGenes Remove low-scoring genes from model. This function removes genes from a model based on their scores, a step used by the tINIT package. The function recognizes and differentiates between isozymes and subunits of an enzyme complex. Genes are removed from each grRule, subject to the following conditions: 1) At least one gene must remain associated with the reaction 2) Genes involved in a complex (joined by ANDs) are not removed Usage: [newModel,remGenes] = removeLowScoreGenes(model,geneScores,complexScoring); Inputs: model Model structure from which genes are to be removed. geneScores A vector of scores associated with the model genes. Genes with a positive score will remain in the model, whereas genes with a negative score will try to be removed. If all genes associated with a reaction have a negative score, then the least-negative gene will remain; if there is a tie, one will be selected at random. If a negative-scoring gene is a subunit in a complex, it will not be removed; however, the entire complex may be removed. See the following example cases: Original: G1 or (G2 and G3 and G4) Negative: G1, G2 New: G2 and G3 and G4 Original: G1 or (G2 and G3) or (G4 and G5) Negative: G1, G2 New: G4 and G5 [using default complexScoring] Original: (G1 and (G2 or G3) and G4) Negative: G2 New: (G1 and G3 and G4) isozymeScoring Method used to combine the scores of multiple isozymes; 'min', 'max', 'median', or 'average'. (Opt, default 'max') complexScoring Method used to combine the scores of enzyme complex subunits: 'min', 'max', 'median', or 'average'. (Opt, default 'min') Outputs: newModel Model with updated gene, grRules, and rxnGeneMat fields after attemping to remove negative-score genes. remGenes A list of negative-score genes that were fully removed from the model. Negative-score genes that were removed from some grRules but not all will not be included in this list.
0001 function [newModel,remGenes] = removeLowScoreGenes(model,geneScores,isozymeScoring,complexScoring) 0002 %removeLowScoreGenes Remove low-scoring genes from model. 0003 % 0004 % This function removes genes from a model based on their scores, a step 0005 % used by the tINIT package. The function recognizes and differentiates 0006 % between isozymes and subunits of an enzyme complex. Genes are removed 0007 % from each grRule, subject to the following conditions: 0008 % 1) At least one gene must remain associated with the reaction 0009 % 2) Genes involved in a complex (joined by ANDs) are not removed 0010 % 0011 % Usage: 0012 % 0013 % [newModel,remGenes] = removeLowScoreGenes(model,geneScores,complexScoring); 0014 % 0015 % Inputs: 0016 % 0017 % model Model structure from which genes are to be removed. 0018 % 0019 % geneScores A vector of scores associated with the model genes. 0020 % Genes with a positive score will remain in the model, 0021 % whereas genes with a negative score will try to be 0022 % removed. 0023 % 0024 % If all genes associated with a reaction have a negative 0025 % score, then the least-negative gene will remain; if 0026 % there is a tie, one will be selected at random. 0027 % 0028 % If a negative-scoring gene is a subunit in a complex, 0029 % it will not be removed; however, the entire complex may 0030 % be removed. See the following example cases: 0031 % 0032 % Original: G1 or (G2 and G3 and G4) 0033 % Negative: G1, G2 0034 % New: G2 and G3 and G4 0035 % 0036 % Original: G1 or (G2 and G3) or (G4 and G5) 0037 % Negative: G1, G2 0038 % New: G4 and G5 [using default complexScoring] 0039 % 0040 % Original: (G1 and (G2 or G3) and G4) 0041 % Negative: G2 0042 % New: (G1 and G3 and G4) 0043 % 0044 % isozymeScoring Method used to combine the scores of multiple isozymes; 0045 % 'min', 'max', 'median', or 'average'. 0046 % (Opt, default 'max') 0047 % 0048 % complexScoring Method used to combine the scores of enzyme complex 0049 % subunits: 'min', 'max', 'median', or 'average'. 0050 % (Opt, default 'min') 0051 % 0052 % Outputs: 0053 % 0054 % newModel Model with updated gene, grRules, and rxnGeneMat fields 0055 % after attemping to remove negative-score genes. 0056 % 0057 % remGenes A list of negative-score genes that were fully removed 0058 % from the model. Negative-score genes that were removed 0059 % from some grRules but not all will not be included in 0060 % this list. 0061 % 0062 0063 0064 if nargin < 3 || isempty(isozymeScoring) 0065 isozymeScoring = 'max'; 0066 end 0067 if nargin < 4 0068 complexScoring = 'min'; 0069 end 0070 0071 if ~isequal(size(model.genes),size(geneScores)) 0072 error('The dimensions of model genes and geneScores do not match.'); 0073 end 0074 0075 % convert logical operators to symbols 0076 grRules = model.grRules; 0077 if any(contains(grRules,{'&','|'})) 0078 symbolicRules = true; 0079 else 0080 symbolicRules = false; 0081 end 0082 grRules = regexprep(grRules,' and ',' & '); 0083 grRules = regexprep(grRules,' or ',' | '); 0084 0085 % get unique list of grRules 0086 [uRules,~,rule_ind] = unique(grRules); 0087 0088 % iterate through each rule 0089 newRules = uRules; %initialize newRules 0090 for i = 1:numel(uRules) 0091 if isempty(uRules{i}) || ~contains(uRules{i},'|') 0092 continue 0093 elseif contains(uRules{i},'&') 0094 newRules{i} = processComplexRule(uRules{i},model.genes,geneScores,isozymeScoring,complexScoring); 0095 else 0096 newRules{i} = processSimpleRule(uRules{i},model.genes,geneScores,isozymeScoring,complexScoring); 0097 end 0098 end 0099 0100 % re-map unique rules to model 0101 newModel = model; 0102 newModel.grRules = newRules(rule_ind); 0103 0104 % restore original logical operator formatting if it was changed 0105 if ~symbolicRules 0106 newModel.grRules = regexprep(newModel.grRules, ' \| ', ' or '); 0107 newModel.grRules = regexprep(newModel.grRules, ' & ', ' and '); 0108 end 0109 0110 % regenerate "genes" and "rxnGeneMat" model fields 0111 [genes,rxnGeneMat] = getGenesFromGrRules(newModel.grRules); 0112 newModel.genes = genes; 0113 newModel.rxnGeneMat = rxnGeneMat; 0114 0115 % update other gene-related fields 0116 remInd = ~ismember(model.genes,newModel.genes); 0117 remGenes = model.genes(remInd); 0118 0119 if isfield(newModel,'geneShortNames') 0120 newModel.geneShortNames(remInd) = []; 0121 end 0122 if isfield(newModel,'geneMiriams') 0123 newModel.geneMiriams(remInd) = []; 0124 end 0125 if isfield(newModel,'geneFrom') 0126 newModel.geneFrom(remInd) = []; 0127 end 0128 if isfield(newModel,'geneComps') 0129 newModel.geneComps(remInd) = []; 0130 end 0131 0132 0133 end 0134 0135 0136 0137 function [updatedRule,rScore] = processSimpleRule(rule,genes,gScores,isozymeScoring,complexScoring) 0138 % Either score or modify a reaction gene rule containig only ANDs or ORs. 0139 % 0140 % If the rule contains an enzyme complex (all ANDs), the complex will be 0141 % scored based on the score of its subunits. Subunits without a score (NaN) 0142 % will be excluded from the score calculation. 0143 % 0144 % If the rule contains only isozymes (all ORs), the negative-score genes 0145 % will be removed from the rule. Isozymes without a score (NaN) will not be 0146 % removed from the rule. The resuling rule will then be scored. 0147 0148 0149 % get IDs and indices of genes involved in rule 0150 ruleGenes = unique(regexp(rule,'[^&|\(\) ]+','match')); 0151 [~,geneInd] = ismember(ruleGenes,genes); 0152 0153 % rules with one or no genes remain unchanged 0154 if numel(ruleGenes) < 2 0155 rScore = gScores(geneInd); 0156 updatedRule = rule; 0157 return 0158 end 0159 0160 if ~contains(rule,'&') % rule contains isozymes 0161 0162 scoreMethod = isozymeScoring; 0163 negInd = gScores(geneInd) < 0; % NaNs will return false here 0164 if all(negInd) 0165 % get the least negative gene, adding a small random value to avoid a tie 0166 [~,maxInd] = max(gScores(geneInd) + rand(size(geneInd))*(1e-8)); 0167 updatedRule = ruleGenes{maxInd}; 0168 elseif sum(~negInd) == 1 0169 updatedRule = ruleGenes{~negInd}; 0170 else 0171 updatedRule = strjoin(ruleGenes(~negInd),' | '); 0172 if startsWith(rule,'(') 0173 updatedRule = ['(',updatedRule,')']; 0174 end 0175 end 0176 0177 % update ruleGenes and their indices 0178 ruleGenes = unique(regexp(updatedRule,'[^&|\(\) ]+','match')); 0179 [~,geneInd] = ismember(ruleGenes,genes); 0180 0181 elseif ~contains(rule,'|') % rule contains enzyme complex 0182 scoreMethod = complexScoring; 0183 updatedRule = rule; 0184 else 0185 error('This function cannot handle rules with both "OR" and "AND" expressions.'); 0186 end 0187 0188 % score rule 0189 switch lower(scoreMethod) 0190 case 'min' 0191 rScore = min(gScores(geneInd),[],'omitnan'); 0192 case 'max' 0193 rScore = max(gScores(geneInd),[],'omitnan'); 0194 case 'median' 0195 rScore = median(gScores(geneInd),'omitnan'); 0196 case 'average' 0197 rScore = mean(gScores(geneInd),'omitnan'); 0198 end 0199 0200 end 0201 0202 0203 0204 function updatedRule = processComplexRule(rule,genes,gScores,isozymeScoring,complexScoring) 0205 % Update reactions containing both AND and OR expressions. 0206 % 0207 % Negative-score genes will be removed if they are isozymic, whereas they 0208 % will not be removed if they are part of an enzyme complex. However, if 0209 % the enzyme complex has a negative score, the entire complex will be 0210 % removed, as long as it is not the only remaining element in the rule. 0211 0212 0213 % Specify phrases to search for in the grRule. These phrases will find 0214 % genes grouped by all ANDs (first phrase) or all ORs (second phrase). 0215 search_phrases = {'\([^&|\(\) ]+( & [^&|\(\) ]+)+\)', '\([^&|\(\) ]+( \| [^&|\(\) ]+)+\)'}; 0216 0217 % initialize some variables 0218 subsets = {}; % subsets are groups of genes grouped by all ANDs or all ORs 0219 c = 1; % counter to keep track of the group (subset) number 0220 r_orig = rule; % record original rule to determine when it stops changing 0221 for k = 1:100 % iterate some arbitrarily high number of times 0222 for j = 1:length(search_phrases) 0223 new_subset = regexp(rule,search_phrases{j},'match')'; % extract subsets 0224 if ~isempty(new_subset) 0225 subsets = [subsets; new_subset]; % append to list of subsets 0226 subset_nums = arrayfun(@num2str,(c:length(subsets))','UniformOutput',false); % get group numbers to be assigned to the new subsets, and convert to strings 0227 rule = regexprep(rule,search_phrases{j},strcat('#',subset_nums,'#'),'once'); % replace the subsets in the expression with their group numbers (enclosed by "#"s) 0228 c = c + length(new_subset); 0229 end 0230 end 0231 if isequal(rule,r_orig) 0232 break; % stop iterating when rule stops changing 0233 else 0234 r_orig = rule; 0235 end 0236 end 0237 subsets{end+1} = rule; % add final state of rule as the last subset 0238 0239 % score and update each subset, and append to gene list and gene scores 0240 for i = 1:numel(subsets) 0241 [subsets{i},subset_score] = processSimpleRule(subsets{i},genes,gScores,isozymeScoring,complexScoring); 0242 gScores = [gScores; subset_score]; 0243 genes = [genes; {strcat('#',num2str(i),'#')}]; 0244 end 0245 0246 % reconstruct the rule from its updated subsets 0247 updatedRule = subsets{end}; 0248 for i = c-1:-1:1 0249 updatedRule = regexprep(updatedRule,strcat('#',num2str(i),'#'),subsets{i}); 0250 end 0251 0252 end 0253 0254 0255