Description of ftINIT

0001 function [model, metProduction, addedRxnsForTasks, deletedRxnsInINIT, fullMipRes] = ftINIT(prepData, tissue, celltype, hpaData, transcrData, metabolomicsData, INITSteps, removeGenes, useScoresForTasks, paramsFT, verbose)
0002 % ftINIT
0003 %   Main function for generates a model using the ftINIT algorithm, based
0004 %   on proteomics and/or transcriptomics and/or metabolomics and/or metabolic
0005 %   tasks. The algorithm is not designed for running with metabolomics only.
0006 %   The function prepINITModel needs to be run first for the template
0007 %   model (such as the generic Human-GEM), but only need to be run once. This
0008 %   function precalculates things independent of the omics to speed up the model
0009 %   generation process, and outputs the prepData, which is input to this function.
0010 %
0011 %   prepData            The prepdata for the model.
0012 %   tissue              tissue to score for. Should exist in either
0013 %                       hpaData.tissues or transcrData.tissues
0014 %   celltype            cell type to score for. Should exist in either
0015 %                       hpaData.celltypes or transcrData.celltypes for this
0016 %                       tissue (optional, default is to use the max values
0017 %                       among all the cell types for the tissue.
0018 %   hpaData             HPA data structure from parseHPA (optional if transcrData
0019 %                       is supplied, default [])
0020 %   transcrData         gene expression data structure (optional if hpaData is
0021 %                       supplied, default []). Used to be called arrayData.
0022 %       genes           cell array with the unique gene names
0023 %       tissues         cell array with the tissue names. The list may not
0024 %                       be unique, as there can be multiple cell types per
0025 %                       tissue.
0026 %       celltypes       cell array with the cell type names for each tissue
0027 %       levels          GENESxTISSUES array with the expression level for
0028 %                       each gene in each tissue/celltype. NaN should be
0029 %                       used when no measurement was performed
0030 %       threshold       a single value or a vector of gene expression
0031 %                       thresholds, above which genes are considered to be
0032 %                       "expressed". default = 1(optional, by default, the mean expression
0033 %                       levels of each gene across all tissues in transcrData
0034 %                       will be used as the threshold values)
0035 %       singleCells     binary value selecting whether to use the
0036 %                       single-cell algorithm to identify expressed genes.
0037 %                       If used, specify cell subpopulations in CELLTYPES
0038 %                       (optional, default [])
0039 %       plotResults     true if single cell probability distributions
0040 %                       should be plotted (optional, default = false)
0041 %   metabolomicsData    cell array with metabolite names that the model
0042 %                       should produce (optional, default [])
0043 %   INITSteps           Specifies the steps in the algorithm. For more info,
0044 %                       see INITStepDesc and getINITSteps.
0045 %                       (optional, default getINITSteps(), which is the standard ftINIT).
0046 %   removeGenes         if true, low-abundance genes will be removed from
0047 %                       grRules, unless they are the only gene associated
0048 %                       with a reaction, or a subunit of an enzyme complex
0049 %                       (see "removeLowScoreGenes" function for details).
0050 %                       If false, grRules will not be modified; however,
0051 %                       genes that were associated only with removed
0052 %                       reactions will not be present in the final model.
0053 %                       (optional, default true).
0054 %   useScoresForTasks   true if the calculated reaction scored should be
0055 %                       used as weights when fitting to tasks (optional, default
0056 %                       true)
0057 %   paramsFT            parameter structure as used by getMILPParams. This
0058 %                       is for the fitTasks step. For the INIT algorithm,
0059 %                       see params (optional, default [])
0060 %   verbose             if true, the MILP progression will be shown.
0061 %                       (optional, default false)
0062 %
0063 %   model                   the resulting model structure
0064 %   metProduction           array that indicates which of the
0065 %                           metabolites in metabolomicsData that could be
0066 %                           produced. Note that this is before the
0067 %                           gap-filling process to enable defined tasks. To
0068 %                           see which metabolites that can be produced in
0069 %                           the final model, use canProduce.
0070 %                           -2: metabolite name not found in model
0071 %                           -1: metabolite found, but it could not be produced
0072 %                           1: metabolite could be produced
0073 %   addedRxnsForTasks       cell array of the reactions which were added in
0074 %                           order to perform the tasks
0075 %   deletedRxnsInINIT       cell array of reactions deleted because they
0076 %                           could not carry flux (INIT requires a
0077 %                           functional input model)
0078 %   fullMipRes              The solver results from the last MILP step run
0079 %
0080 %   This is the main function for automatic reconstruction of models based
0081 %   on the ftINIT algorithm ().
0082 %
0083 %   NOTE: Exchange metabolites should normally not be removed from the model
0084 %   when using this approach, since checkTasks/fitTasks rely on putting specific
0085 %   constraints for each task. The INIT algorithm will remove exchange metabolites
0086 %   if any are present. Use importModel(file,false) to import a model with
0087 %   exchange metabolites remaining.
0088 %
0089 % Usage: [model, metProduction, addedRxnsForTasks, deletedRxnsInINIT, ...
0090 %               fullMipRes] = ...
0091 %               ftINIT(prepData, tissue, celltype, hpaData, transcrData, ...
0092 %               metabolomicsData, INITSteps, removeGenes, useScoresForTasks, ...
0093 %               paramsFT);
0094 %
0095 
0096 
0097 if nargin < 5
0098     transcrData = [];
0099 end
0100 if nargin < 6
0101     metabolomicsData = [];
0102 end
0103 if nargin < 7 || isempty(INITSteps)
0104     INITSteps = getINITSteps([],'1+1');
0105 end
0106 if nargin < 8 || isempty(removeGenes)
0107     removeGenes = true;
0108 end
0109 if nargin < 9 || isempty(useScoresForTasks)
0110     useScoresForTasks = true;
0111 end
0112 if nargin < 10
0113     paramsFT = [];
0114 end
0115 
0116 if nargin < 11
0117     verbose = false;
0118 end
0119 %Handle detected mets:
0120 %Previously, this was handled by giving a bonus for secreting those metabolites,
0121 %but that doesn't work since the metabolite secretion and uptake can be lost when
0122 %we merge linearly dependent reactions.
0123 %Instead, we need to figure out which reactions either produce or take up the mets.
0124 %We then give a bonus if any of them carry flux.
0125 %To simplify things, we focus on reactions that produce the metabolite (since there must be one such reaction).
0126 %It is still a bit complicated though. In this step, we focus on identifying
0127 %producer reactions. We further reason that the direction doesn't matter -
0128 %we can force one of these reactions in any direction - if it becomes a consumer, it will
0129 %automatically force another producer on as well (otherwise we'll have a net consumption).
0130 
0131 if (~isempty(metabolomicsData))
0132     if length(unique(upper(metabolomicsData))) ~= length(metabolomicsData)
0133         dispEM('Metabolomics contains the same metabolite multiple times');
0134     end
0135     metData = false(numel(metabolomicsData), length(prepData.minModel.rxns)); %one row per metabolite that is a boolean vector
0136     for i=1:numel(metabolomicsData)
0137         %Get the matching mets
0138         metSel = ismember(upper(prepData.refModel.metNames),upper(metabolomicsData{i}));
0139         prodRxnsSel = any(prepData.refModel.S(metSel,:) > 0,1) | ... %direct producers
0140                      (any(prepData.refModel.S(metSel,:) < 0,1) & prepData.refModel.rev.'); %reversible reactions that are consumers
0141         %convert the production rxns from refModel to minModel
0142         prepData.groupIds;
0143         [~,ia,ib] = intersect(prepData.minModel.rxns,prepData.refModel.rxns);
0144         grpIdsMerged = nan(length(prepData.minModel.rxns),1);
0145         grpIdsMerged(ia) = prepData.groupIds(ib);
0146         
0147         groupIdsPos = unique(prepData.groupIds(prodRxnsSel));%gets all group ids which includes a production rxn
0148         groupIdsPos = groupIdsPos(groupIdsPos ~= 0);%remove the 0 id, it means there is no group
0149         %the other option is that there is a direct match between the rxn id in minModel and refModel:
0150         posRxns = prepData.refModel.rxns(prodRxnsSel);
0151         directMatch = ismember(prepData.minModel.rxns, posRxns).';
0152         
0153         metData(i,:) = ismember(grpIdsMerged, groupIdsPos).' | directMatch;
0154     end
0155     metData = sparse(metData);
0156 else
0157     metData = [];
0158 end
0159 
0160 % Get rxn scores and adapt them to the minimized model
0161 origRxnScores = scoreComplexModel(prepData.refModel,hpaData,transcrData,tissue,celltype);
0162 origRxnScores(origRxnScores > -0.1 & origRxnScores <= 0) = -0.1;%we don't want reaction scores that are exactly 0 (or close), this causes problems in the milp
0163 origRxnScores(origRxnScores < 0.1 & origRxnScores > 0) = 0.1;
0164 
0165 rxnsTurnedOn = false(length(prepData.minModel.rxns),1);
0166 fluxes = zeros(length(prepData.minModel.rxns),1);
0167 
0168 rxnsToIgnoreLastStep = [1;1;1;1;1;1;1;1];
0169 
0170 %We assume that all essential rxns are irrev - this is taken care of in
0171 %prepINITModel. We then use an initial flux "from last run" of 0.1 for all
0172 %reactions. This is used for knowing what flux should be forced through an
0173 %essential rxn.
0174 fluxes = ones(length(prepData.minModel.rxns), 1).*0.1;
0175 
0176 for initStep = 1:length(INITSteps)
0177     disp(['ftINIT: Running step ' num2str(initStep)])
0178     stp = INITSteps{initStep};
0179     
0180     if any ((rxnsToIgnoreLastStep - stp.RxnsToIgnoreMask) < 0)
0181         dispEM('RxnsToIgnoreMask may not cover rxns not covered in previous steps, but the other way around is fine.');
0182     end
0183     rxnsToIgnoreLastStep = stp.RxnsToIgnoreMask;
0184     
0185     mm = prepData.minModel;
0186     
0187     if (~isempty(stp.MetsToIgnore))
0188         if (~isempty(stp.MetsToIgnore.simpleMets))
0189             %Here, we remove simple metabolites that will not really affect the milp but
0190             %are very common in the S matrix. For example H2O, H+, etc.
0191             %It is also possible to leave compartments untouched, for example the i compartment in the mitochondria (for H+).
0192             metsToRem = ismember(mm.metNames,stp.MetsToIgnore.simpleMets.mets);
0193             compsToKeep = find(ismember(mm.comps, stp.MetsToIgnore.simpleMets.compsToKeep));
0194             metsToRem = metsToRem & ~ismember(mm.metComps, compsToKeep);
0195             mm.S(metsToRem,:) = 0;
0196         end    
0197     end
0198 
0199     %Set up the reaction scores and essential rxns
0200     rxnsToIgnore = getRxnsFromPattern(stp.RxnsToIgnoreMask, prepData);
0201     rxnScores = groupRxnScores(prepData.minModel, origRxnScores, prepData.refModel.rxns, prepData.groupIds, rxnsToIgnore);
0202 
0203     essentialRxns = prepData.essentialRxns;
0204     toRev = false(numel(mm.rxns),1);
0205     %Handle the results from previous steps ('ignore', 'exclude', 'essential')
0206     if strcmp(stp.HowToUsePrevResults, 'exclude')
0207         rxnScores(rxnsTurnedOn) = 0; %This is not used anymore in any step setup.
0208     elseif strcmp(stp.HowToUsePrevResults, 'essential')
0209         %Make all reversible reactions turned on in previous steps reversible
0210         %in the direction that they were previously carrying flux
0211         
0212         %first reverse the reactions that need to be reversed
0213         rev = mm.rev == 1;
0214         toRev = rxnsTurnedOn & rev & fluxes < 0;
0215         mm = reverseRxns(mm, mm.rxns(toRev));
0216         
0217         %Then make them irreversible
0218         mm.rev(rxnsTurnedOn) = 0;
0219         mm.lb(rxnsTurnedOn) = 0;
0220 
0221         essentialRxns = unique([prepData.essentialRxns;mm.rxns(rxnsTurnedOn)]);
0222     end
0223 
0224     
0225     mipGap = 1;
0226     first = true;
0227     success = false;
0228     fullMipRes = [];
0229     for rn = 1:length(stp.MILPParams)
0230         params = stp.MILPParams{rn};
0231         if ~isfield(params, 'MIPGap')
0232             params.MIPGap = 0.0004;
0233         end
0234         
0235         if ~isfield(params, 'TimeLimit')
0236             params.TimeLimit = 5000;
0237         end
0238         
0239         if ~first 
0240             %There is sometimes a problem with that the objective function becomes close to zero,
0241             %which leads to that a small percentage of that (which is the MIPGap sent in) is very small
0242             %and the MILP hence takes a lot of time to finish. We also therefore use an absolute MIP gap,
0243             %converted to a percentage using the last value of the objective function.
0244             params.MIPGap = min(max(params.MIPGap, stp.AbsMIPGaps{rn}/abs(lastObjVal)),1);
0245             params.seed = 1234;%use another seed, may work better
0246 
0247             if mipGap <= params.MIPGap
0248                 success = true;
0249                 break; %we're done - this will not happen the first time
0250             else
0251                 disp(['MipGap too high, trying with a different run. MipGap = ' num2str(mipGap) ' New MipGap Limit = ' num2str(params.MIPGap)])
0252             end
0253         end
0254         
0255         first = false;
0256         
0257         %now run the MILP
0258         try
0259             %The prodweight for metabolomics is currently set to 5 - 0.5 was default in the old version, which I deemed very small?
0260             %There could be a need to specify this somewhere in the call at some point.
0261             %This value has not been evaluated, but is assumed in the test cases - if changed, update the test case
0262             startVals = [];
0263             if ~isempty(fullMipRes)
0264                 startVals = fullMipRes.full;
0265             end
0266             [deletedRxnsInINIT1, metProduction,fullMipRes,rxnsTurnedOn1,fluxes1] = ftINITInternalAlg(mm,rxnScores,metData,essentialRxns,5,stp.AllowMetSecr,stp.PosRevOff,params, startVals, fluxes, verbose);
0267             %This is a bit tricky - since we reversed some reactions, those fluxes also need to be reversed
0268             fluxes1(toRev) = -fluxes1(toRev);
0269             
0270             mipGap = fullMipRes.mipgap;
0271             lastObjVal = fullMipRes.obj;
0272         catch e
0273             mipGap = Inf;
0274             lastObjVal = Inf; %we need to set something here, Inf leads to that this doesn't come into play
0275         end
0276         
0277         success = mipGap <= params.MIPGap;
0278     end
0279     
0280     if ~success
0281         dispEM(['Failed to find good enough solution within the time frame. MIPGap: ' num2str(mipGap)]);
0282     end
0283     
0284     %save the reactions turned on and their fluxes for the next step
0285     rxnsTurnedOn = rxnsTurnedOn | rxnsTurnedOn1.';
0286     %The fluxes are a bit tricky - what if they change direction between the steps?
0287     %The fluxes are used to determine the direction in which reactions are forced on
0288     %(to simplify the problem it is good if they are unidirectional).
0289     %We use the following strategy:
0290     %1. Use the fluxes from the most recent step.
0291     %2. If any flux is very low there (i.e. basically zero), use the flux from the previous steps
0292     %This could in theory cause problems, but seems to work well practically
0293     fluxesOld = fluxes;
0294     fluxes = fluxes1;
0295     %make sure that all reactions that are on actually has a flux - otherwise
0296     %things could go bad, since the flux will be set to essential in a random direction
0297     %This sometimes happens for rxns with negative score - let's just accept that.
0298     %if (sum(abs(fluxes1) < 10^-7 & rxnsTurnedOn))
0299     %    dispEM('There are rxns turned on without flux - this might cause problems');
0300     %end
0301     %fluxes(abs(fluxes1) < 10^-7) = fluxesOld(abs(fluxes1) < 10^-9);
0302 end
0303 
0304 
0305 %get the essential rxns
0306 essential = ismember(prepData.minModel.rxns,prepData.essentialRxns);
0307 %So, we only add reactions where the linearly merged scores are zero for all linearly dependent reactions
0308 % (this cannot happen by chance, taken care of in the function groupRxnScores)
0309 rxnsToIgn = rxnScores == 0; 
0310 deletedRxnsInINITSel = ~(rxnsTurnedOn | rxnsToIgn | essential);
0311 deletedRxnsInINIT = prepData.minModel.rxns(deletedRxnsInINITSel);
0312 
0313 %Here we need to figure out which original reactions (before the linear merge)
0314 %that were removed. These are all reactions with the same group ids as the removed reactions
0315 groupIdsRemoved = prepData.groupIds(ismember(prepData.refModel.rxns, deletedRxnsInINIT)); %can improve this slightly, use sel above
0316 groupIdsRemoved = groupIdsRemoved(groupIdsRemoved ~= 0);%zero means that the reaction was not grouped, all with zeros are not a group!
0317 rxnsToRem = union(prepData.refModel.rxns(ismember(prepData.groupIds,groupIdsRemoved)), deletedRxnsInINIT);%make a union here to include the ungrouped (unmerged) as well
0318 
0319 initModel = removeReactions(prepData.refModel,rxnsToRem,false,true);
0320 
0321 % remove metabolites separately to avoid removing those needed for tasks
0322 unusedMets = initModel.mets(all(initModel.S == 0,2));
0323 initModel = removeMets(initModel, setdiff(unusedMets, prepData.essentialMetsForTasks));
0324 
0325 %if printReport == true
0326 %    printScores(initModel,'INIT model statistics',hpaData,transcrData,tissue,celltype);
0327 %    printScores(removeReactions(cModel,setdiff(cModel.rxns,rxnsToRem),true,true),'Reactions deleted by INIT',hpaData,transcrData,tissue,celltype);
0328 %end
0329 
0330 %The full model has exchange reactions in it. ftINITFillGapsForAllTasks calls
0331 %ftINITFillGaps, which automatically removes exchange metabolites (because it
0332 %assumes that the reactions are constrained when appropriate). In this case the
0333 %uptakes/outputs are retrieved from the task sheet instead. To prevent
0334 %exchange reactions being used to fill gaps, they are deleted from the
0335 %reference model here.
0336 initModel.id = 'INITModel';
0337 
0338 %If gaps in the model should be filled using a task list
0339 if ~isempty(prepData.taskStruct)
0340     %Remove exchange reactions and reactions already included in the INIT
0341     %model
0342     %We changed strategy and instead include all rxns except the exchange rxns in the ref model
0343     %But we do keep the exchange rxns that are essential.
0344     %Let's test to remove all, that should work
0345     
0346     %At this stage the model is fully connected and most of the genes with
0347     %good scores should have been included. The final gap-filling should
0348     %take the scores of the genes into account, so that "rather bad"
0349     %reactions are preferred to "very bad" reactions. However, reactions
0350     %with positive scores will be included even if they are not connected
0351     %in the current formulation. Therefore, such reactions will have to be
0352     %assigned a small negative score instead.
0353     exchRxns = getExchangeRxns(prepData.refModel);
0354     refModelNoExc = removeReactions(prepData.refModelWithBM,exchRxns,false,true);
0355     exchRxns = getExchangeRxns(initModel);
0356     initModelNoExc = removeReactions(closeModel(initModel),exchRxns,false,true);
0357     
0358     if useScoresForTasks == true
0359         %map the rxn scores to the model without exchange rxns
0360         [~,ia,ib] = intersect(refModelNoExc.rxns,prepData.refModel.rxns);
0361         rxnScores2nd = NaN(length(refModelNoExc.rxns),1);
0362         rxnScores2nd(ia) = origRxnScores(ib);
0363         %all(rxnScores2nd == refRxnScores);%should be the same, ok!
0364         [outModel,addedRxnMat] = ftINITFillGapsForAllTasks(initModelNoExc,refModelNoExc,[],true,min(rxnScores2nd,-0.1),prepData.taskStruct,paramsFT,verbose);
0365     else
0366         [outModel,addedRxnMat] = ftINITFillGapsForAllTasks(initModelNoExc,refModelNoExc,[],true,[],prepData.taskStruct,paramsFT,verbose);
0367     end
0368     %if printReport == true
0369     %    printScores(outModel,'Functional model statistics',hpaData,transcrData,tissue,celltype);
0370     %    printScores(removeReactions(outModel,intersect(outModel.rxns,initModel.rxns),true,true),'Reactions added to perform the tasks',hpaData,transcrData,tissue,celltype);
0371     %end
0372     
0373     addedRxnsForTasks = refModelNoExc.rxns(any(addedRxnMat,2));
0374 else
0375     outModel = initModel;
0376     addedRxnMat = [];
0377     addedRxnsForTasks = {};
0378 end
0379 
0380 % The model can now perform all the tasks defined in the task list.
0381 model = outModel;
0382 
0383 
0384 % At this stage the model will contain some exchange reactions but probably
0385 % not all (and maybe zero). This can be inconvenient, so all exchange
0386 % reactions from the reference model are added, except for those which
0387 % involve metabolites that are not in the model.
0388 
0389 %Start from the original model, and just remove the reactions that are no longer there (and keep exchange rxns). The model we got out
0390 %from the problem is not complete, it doesn't have GRPs etc.
0391 %The logic below is a bit complicated. We identify the reactions that should be removed from the full model as
0392 %reactions that have been removed in the init model except the ones that were added back. In addition, we make
0393 %sure that no exchange rxns are removed - they can be removed in the init model if they were linearly merged with other
0394 %reactions that were decided to be removed from the model. We want to keep all exchange rxns to make sure the tasks can
0395 %be performed also without manipulating the b vector in the model (which is what is done in the gap-filling).
0396 exchRxns = getExchangeRxns(prepData.refModel);
0397 deletedRxnsInINIT = setdiff(prepData.refModel.rxns,union(union(initModel.rxns, addedRxnsForTasks), exchRxns));
0398 outModel = removeReactions(prepData.refModel, deletedRxnsInINIT, true); %we skip removing the genes for now, I'm not sure it is desirable
0399 
0400 % If requested, attempt to remove negative-score genes from the model,
0401 % depending on their role (isozyme or complex subunit) in each grRule.
0402 % See the "removeLowScoreGenes" function more more details, and to adjust
0403 % any default parameters therein.
0404 if ( removeGenes )
0405     [~, geneScores] = scoreComplexModel(outModel,hpaData,transcrData,tissue,celltype);
0406     outModel = removeLowScoreGenes(outModel,geneScores);
0407 end
0408 
0409 
0410 model = outModel;
0411 
0412 end
0413 
0414 %This is for printing a summary of a model
0415 function [rxnS, geneS] = printScores(model,name,hpaData,transcrData,tissue,celltype)
0416     [a, b] = scoreComplexModel(model,hpaData,transcrData,tissue,celltype);
0417     rxnS = mean(a);
0418     geneS = mean(b,'omitnan');
0419     fprintf([name ':\n']);
0420     fprintf(['\t' num2str(numel(model.rxns)) ' reactions, ' num2str(numel(model.genes)) ' genes\n']);
0421     fprintf(['\tMean reaction score: ' num2str(rxnS) '\n']);
0422     fprintf(['\tMean gene score: ' num2str(geneS) '\n']);
0423     fprintf(['\tReactions with positive scores: ' num2str(100*sum(a>0)/numel(a)) '%%\n\n']);
0424 end
0425 
0426 function rxnsToIgnore = getRxnsFromPattern(rxnsToIgnorePattern, prepData)
0427     rxnsToIgnore = false(length(prepData.toIgnoreExch),1);
0428     if rxnsToIgnorePattern(1) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreExch; end;
0429     if rxnsToIgnorePattern(2) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreImportRxns; end;
0430     if rxnsToIgnorePattern(3) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreSimpleTransp; end;
0431     if rxnsToIgnorePattern(4) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreAdvTransp; end;
0432     if rxnsToIgnorePattern(5) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreSpont; end;
0433     if rxnsToIgnorePattern(6) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreS; end;
0434     if rxnsToIgnorePattern(7) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreCustomRxns; end;
0435     if rxnsToIgnorePattern(8) rxnsToIgnore = rxnsToIgnore | prepData.toIgnoreAllWithoutGPRs; end;
0436 end
0437
ftINIT

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE