Home > core > randomSampling.m

randomSampling

PURPOSE ^

randomSampling

SYNOPSIS ^

function [solutions, goodRxns]=randomSampling(model,nSamples,replaceBoundsWithInf,supressErrors,runParallel,goodRxns,minFlux)

DESCRIPTION ^

 randomSampling
   Performs random sampling of the solution space, as described in Bordel
   et al. (2010) PLOS Compt Biol (doi:10.1371/journal.pcbi.1000859).

 Input:
   model                   a model structure
   nSamples                the number of solutions to return
                           (optional, default 1000)
   replaceBoundsWithInf    replace the largest upper bounds with Inf and
                           the smallest lower bounds with -Inf. This is
                           needed in order to get solutions without loops
                           if your model has for example 1000/-1000 as
                           arbitary large bounds. If your model only has
                           "biologically relevant" bounds, then set this
                           to false (optional, default true)
   supressErrors           the program will halt if it has problems
                           finding non-zero solutions which are not
                           involved in loops. This could be because the
                           constraints on the model are too relaxed (such
                           as unlimited glucose uptake) or too strict
                           (such as too many and too narrow constraints)
                           (optional, default false)
   runParallel             speed up calculations by parallel processing.
                           Requires MATLAB Parallel Computing Toolbox. If
                           this is not installed, the calculations will
                           not be parallelized, regardless what is
                           indicated as runParallel. (optional, default
                           true)
   goodRxns                double vector of indexes of those reactions
                           that are not involved in loops and can be used
                           as random objective functions, as generated by
                           a previous run of randomSampling on the same
                           model (optional, default empty)
   minFlux                 determines if a second optimization should be
                           performed for each random sample, to minimize
                           the number of fluxes and thereby preventing
                           loops. Typically, loops are averaged out when a
                           large number of samples are taken, but this is
                           not always the case (optional, default false)

 Output:
   solutions               matrix with the solutions
   goodRxns                double vector of indexes of those reactions
                           that are not involved in loops or always carry
                           zero flux and can be used as random objective
                           functions

 Note: The solutions are generated by maximizing (with random weights) for
 a random set of three reactions. For reversible reactions it randomly
 chooses between maximizing and minimizing.

 If the model is a GECKO v3+ ecModel, then usage_prot reactions are not
 selected for sampling, instead focusing on sampling from the metabolic
 aspects that form the solution space.

 Usage: solutions = randomSampling(model, nSamples, replaceBoundsWithInf,...
                       supressErrors, runParallel, goodRxns, minFlux)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [solutions, goodRxns]=randomSampling(model,nSamples,replaceBoundsWithInf,supressErrors,runParallel,goodRxns,minFlux)
0002 % randomSampling
0003 %   Performs random sampling of the solution space, as described in Bordel
0004 %   et al. (2010) PLOS Compt Biol (doi:10.1371/journal.pcbi.1000859).
0005 %
0006 % Input:
0007 %   model                   a model structure
0008 %   nSamples                the number of solutions to return
0009 %                           (optional, default 1000)
0010 %   replaceBoundsWithInf    replace the largest upper bounds with Inf and
0011 %                           the smallest lower bounds with -Inf. This is
0012 %                           needed in order to get solutions without loops
0013 %                           if your model has for example 1000/-1000 as
0014 %                           arbitary large bounds. If your model only has
0015 %                           "biologically relevant" bounds, then set this
0016 %                           to false (optional, default true)
0017 %   supressErrors           the program will halt if it has problems
0018 %                           finding non-zero solutions which are not
0019 %                           involved in loops. This could be because the
0020 %                           constraints on the model are too relaxed (such
0021 %                           as unlimited glucose uptake) or too strict
0022 %                           (such as too many and too narrow constraints)
0023 %                           (optional, default false)
0024 %   runParallel             speed up calculations by parallel processing.
0025 %                           Requires MATLAB Parallel Computing Toolbox. If
0026 %                           this is not installed, the calculations will
0027 %                           not be parallelized, regardless what is
0028 %                           indicated as runParallel. (optional, default
0029 %                           true)
0030 %   goodRxns                double vector of indexes of those reactions
0031 %                           that are not involved in loops and can be used
0032 %                           as random objective functions, as generated by
0033 %                           a previous run of randomSampling on the same
0034 %                           model (optional, default empty)
0035 %   minFlux                 determines if a second optimization should be
0036 %                           performed for each random sample, to minimize
0037 %                           the number of fluxes and thereby preventing
0038 %                           loops. Typically, loops are averaged out when a
0039 %                           large number of samples are taken, but this is
0040 %                           not always the case (optional, default false)
0041 %
0042 % Output:
0043 %   solutions               matrix with the solutions
0044 %   goodRxns                double vector of indexes of those reactions
0045 %                           that are not involved in loops or always carry
0046 %                           zero flux and can be used as random objective
0047 %                           functions
0048 %
0049 % Note: The solutions are generated by maximizing (with random weights) for
0050 % a random set of three reactions. For reversible reactions it randomly
0051 % chooses between maximizing and minimizing.
0052 %
0053 % If the model is a GECKO v3+ ecModel, then usage_prot reactions are not
0054 % selected for sampling, instead focusing on sampling from the metabolic
0055 % aspects that form the solution space.
0056 %
0057 % Usage: solutions = randomSampling(model, nSamples, replaceBoundsWithInf,...
0058 %                       supressErrors, runParallel, goodRxns, minFlux)
0059 
0060 if nargin<2 | isempty(nSamples)
0061     nSamples=1000;
0062 end
0063 if nargin<3 | isempty(replaceBoundsWithInf)
0064     replaceBoundsWithInf=true;
0065 end
0066 if nargin<4 | isempty(supressErrors)
0067     supressErrors=false;
0068 end
0069 if nargin<5 | isempty(runParallel)
0070     runParallel=true;
0071 end
0072 if nargin<7 | isempty(minFlux)
0073     minFlux=false;
0074 end
0075 
0076 [ps, oldPoolAutoCreateSetting] = parallelPoolRAVEN(runParallel);
0077 
0078 nObjRxns=2; %Number of reactions in the objective function in each iteration
0079 
0080 %Simplify the model to speed stuff up a little. Keep original mapping
0081 originalRxns=model.rxns;
0082 model=simplifyModel(model,false,false,true,true);
0083 
0084 %Then change the bounds to +/- Inf. This is needed in order to not have
0085 %loops in the solutions
0086 if replaceBoundsWithInf==true
0087     model.ub(model.ub==max(model.ub))=Inf;
0088     if min(model.lb)<0 % Only negative lower bounds should be set to -Inf
0089         model.lb(model.lb==min(model.lb))=-Inf;
0090     end
0091 end
0092 
0093 %Check that the model is feasible given the constraints
0094 [sol,~]=solveLP(model);
0095 if isempty(sol.x)
0096     EM='The model has no feasible solution, likely due to incompatible constraints';
0097     dispEM(EM);
0098 elseif sol.f==0 && showProgress
0099     warning('The model objective function cannot reach a non-zero value. This might be intended, so randomSampling will continue, but this could indicate problems with your model')
0100 end
0101 
0102 nRxns = numel(model.rxns);
0103 %Reactions which can be involved in loops should not be optimized for.
0104 %Check which reactions reach an arbitary high upper bound
0105 if nargin<6 || isempty(goodRxns)
0106     revRxns = transpose(find(model.lb < 0));
0107     fwdRxns = transpose(find(model.ub > 0));
0108     rxnList = [fwdRxns,revRxns];
0109     objList = [ones(numel(fwdRxns),1);-ones(numel(revRxns),1)];
0110     testSol = zeros(numel(objList),1);
0111     PB = ProgressBar2(nRxns,'Prepare goodRxns not involved in loops','cli');
0112 
0113     parfor i = 1:numel(objList)
0114         testModel=setParam(model,'obj',rxnList(i),objList(i));
0115         sol=solveLP(testModel,0);
0116         if ~isempty(sol.f)
0117             testSol(i) = sol.x(rxnList(i));
0118         end
0119         count(PB);
0120     end
0121     testSol  = testSol.*objList;
0122     goodRxns = true(nRxns,1);
0123     % Filter out reactions that can reach (-)1000 (= involved in loop)
0124     goodRxns(fwdRxns(testSol(1:numel(fwdRxns)) > 999)) = false;
0125     goodRxns(revRxns(testSol(numel(fwdRxns)+1:end) > 999)) = false;
0126     testSol(revRxns) = testSol(revRxns) + testSol(numel(fwdRxns)+1:end);
0127     % Filter out reactions that cannot carry flux
0128     goodRxns(testSol(1:nRxns) == 0) = false;
0129     % In ecModels do not sample from usage_prot reactions
0130     if isfield(model,'ec')
0131         usageRxns = startsWith(model.rxns,'usage_prot_');
0132         goodRxns(usageRxns) = false;
0133     end
0134     goodRxns = find(goodRxns);
0135 end
0136 
0137 %Reserve space for a solution matrix
0138 sols = zeros(numel(model.rxns),nSamples);
0139 sols = num2cell(sols,1);
0140 
0141 %Main loop
0142 if nSamples > 0
0143 
0144     PB = ProgressBar2(nSamples,'Performing random sampling','cli');
0145     parfor i=1:nSamples
0146         badSolutions = 1;
0147         tmpModel = model;
0148         while lt(0,badSolutions)
0149             rxns = randsample(numel(goodRxns),nObjRxns);
0150             tmpModel.c = zeros(numel(tmpModel.rxns),1);
0151             
0152             multipliers = randsample([-1 1],nObjRxns,true);
0153             multipliers(tmpModel.ub(goodRxns(rxns))<0) = 1;
0154             
0155             tmpModel.c(goodRxns(rxns)) = rand(nObjRxns,1).*multipliers;
0156 
0157             if true(minFlux)
0158                 sol=solveLP(tmpModel,1);
0159             else
0160                 sol=solveLP(tmpModel,0);
0161             end
0162             if any(sol.x) && abs(sol.f)>10^-8
0163                 sols{i} = sol.x;
0164                 badSolutions = 0;
0165             else
0166                 badSolutions = badSolutions+1;
0167                 %If it only finds bad solutions then throw an error.
0168                 if badSolutions == 100 && supressErrors==false
0169                     error('The program is having problems finding non-zero solutions (ignoring reactions that might be involved in loops). Review the constraints on your model. Set supressErrors to true to ignore this error');
0170                 end
0171             end
0172         end
0173         count(PB);
0174     end
0175 
0176     %Map to original model
0177     sols = cell2mat(sols);
0178     [~, I]=ismember(model.rxns,originalRxns);
0179     solutions=zeros(numel(originalRxns),nSamples);
0180     solutions(I,:)=sols;
0181 else
0182     solutions=[];
0183 end
0184 % Reset original Parallel setting
0185 ps.Pool.AutoCreate = oldPoolAutoCreateSetting;
0186 end
0187 
0188 %To use instead of the normal Matlab randsample function. This is in order
0189 %to not depend on the Matlab statistical toolbox.
0190 function I=randsample(n,k,replacement)
0191 if nargin<3
0192     replacement=false;
0193 end
0194 %n can be a integer, which leads to I being sampled from 1:n, or it can be
0195 %a population to sample from.
0196 if numel(n)==1 && isnumeric(n)
0197     n=1:n;
0198 end
0199 %Loop and get random numbers until the list is unique. This is only a good
0200 %option is the number of samples is small compared to the population. There
0201 %are several checks that should be made here, for example regarding size
0202 %and that the number of samples is <=population size if replacement==false.
0203 %This is not the case in randomSampling, so such checks are ignored
0204 while true
0205     J=randi(numel(n),[k,1]);
0206     if replacement==true || numel(J)==numel(unique(J))
0207         I=n(J);
0208         break;
0209     end
0210 end
0211 I=I(:);
0212 end

Generated by m2html © 2005