randomSampling Performs random sampling of the solution space, as described in Bordel et al. (2010) PLOS Compt Biol (doi:10.1371/journal.pcbi.1000859). Input: model a model structure nSamples the number of solutions to return (optional, default 1000) replaceBoundsWithInf replace the largest upper bounds with Inf and the smallest lower bounds with -Inf. This is needed in order to get solutions without loops if your model has for example 1000/-1000 as arbitary large bounds. If your model only has "biologically relevant" bounds, then set this to false (optional, default true) supressErrors the program will halt if it has problems finding non-zero solutions which are not involved in loops. This could be because the constraints on the model are too relaxed (such as unlimited glucose uptake) or too strict (such as too many and too narrow constraints) (optional, default false) runParallel speed up calculations by parallel processing. Requires MATLAB Parallel Computing Toolbox. If this is not installed, the calculations will not be parallelized, regardless what is indicated as runParallel. (optional, default true) goodRxns double vector of indexes of those reactions that are not involved in loops and can be used as random objective functions, as generated by a previous run of randomSampling on the same model (optional, default empty) minFlux determines if a second optimization should be performed for each random sample, to minimize the number of fluxes and thereby preventing loops. Typically, loops are averaged out when a large number of samples are taken, but this is not always the case (optional, default false) Output: solutions matrix with the solutions goodRxns double vector of indexes of those reactions that are not involved in loops or always carry zero flux and can be used as random objective functions Note: The solutions are generated by maximizing (with random weights) for a random set of three reactions. For reversible reactions it randomly chooses between maximizing and minimizing. If the model is a GECKO v3+ ecModel, then usage_prot reactions are not selected for sampling, instead focusing on sampling from the metabolic aspects that form the solution space. Usage: solutions = randomSampling(model, nSamples, replaceBoundsWithInf,... supressErrors, runParallel, goodRxns, minFlux)
0001 function [solutions, goodRxns]=randomSampling(model,nSamples,replaceBoundsWithInf,supressErrors,runParallel,goodRxns,minFlux) 0002 % randomSampling 0003 % Performs random sampling of the solution space, as described in Bordel 0004 % et al. (2010) PLOS Compt Biol (doi:10.1371/journal.pcbi.1000859). 0005 % 0006 % Input: 0007 % model a model structure 0008 % nSamples the number of solutions to return 0009 % (optional, default 1000) 0010 % replaceBoundsWithInf replace the largest upper bounds with Inf and 0011 % the smallest lower bounds with -Inf. This is 0012 % needed in order to get solutions without loops 0013 % if your model has for example 1000/-1000 as 0014 % arbitary large bounds. If your model only has 0015 % "biologically relevant" bounds, then set this 0016 % to false (optional, default true) 0017 % supressErrors the program will halt if it has problems 0018 % finding non-zero solutions which are not 0019 % involved in loops. This could be because the 0020 % constraints on the model are too relaxed (such 0021 % as unlimited glucose uptake) or too strict 0022 % (such as too many and too narrow constraints) 0023 % (optional, default false) 0024 % runParallel speed up calculations by parallel processing. 0025 % Requires MATLAB Parallel Computing Toolbox. If 0026 % this is not installed, the calculations will 0027 % not be parallelized, regardless what is 0028 % indicated as runParallel. (optional, default 0029 % true) 0030 % goodRxns double vector of indexes of those reactions 0031 % that are not involved in loops and can be used 0032 % as random objective functions, as generated by 0033 % a previous run of randomSampling on the same 0034 % model (optional, default empty) 0035 % minFlux determines if a second optimization should be 0036 % performed for each random sample, to minimize 0037 % the number of fluxes and thereby preventing 0038 % loops. Typically, loops are averaged out when a 0039 % large number of samples are taken, but this is 0040 % not always the case (optional, default false) 0041 % 0042 % Output: 0043 % solutions matrix with the solutions 0044 % goodRxns double vector of indexes of those reactions 0045 % that are not involved in loops or always carry 0046 % zero flux and can be used as random objective 0047 % functions 0048 % 0049 % Note: The solutions are generated by maximizing (with random weights) for 0050 % a random set of three reactions. For reversible reactions it randomly 0051 % chooses between maximizing and minimizing. 0052 % 0053 % If the model is a GECKO v3+ ecModel, then usage_prot reactions are not 0054 % selected for sampling, instead focusing on sampling from the metabolic 0055 % aspects that form the solution space. 0056 % 0057 % Usage: solutions = randomSampling(model, nSamples, replaceBoundsWithInf,... 0058 % supressErrors, runParallel, goodRxns, minFlux) 0059 0060 if nargin<2 | isempty(nSamples) 0061 nSamples=1000; 0062 end 0063 if nargin<3 | isempty(replaceBoundsWithInf) 0064 replaceBoundsWithInf=true; 0065 end 0066 if nargin<4 | isempty(supressErrors) 0067 supressErrors=false; 0068 end 0069 if nargin<5 | isempty(runParallel) 0070 runParallel=true; 0071 end 0072 if nargin<7 | isempty(minFlux) 0073 minFlux=false; 0074 end 0075 0076 [ps, oldPoolAutoCreateSetting] = parallelPoolRAVEN(runParallel); 0077 0078 nObjRxns=2; %Number of reactions in the objective function in each iteration 0079 0080 %Simplify the model to speed stuff up a little. Keep original mapping 0081 originalRxns=model.rxns; 0082 model=simplifyModel(model,false,false,true,true); 0083 0084 %Then change the bounds to +/- Inf. This is needed in order to not have 0085 %loops in the solutions 0086 if replaceBoundsWithInf==true 0087 model.ub(model.ub==max(model.ub))=Inf; 0088 if min(model.lb)<0 % Only negative lower bounds should be set to -Inf 0089 model.lb(model.lb==min(model.lb))=-Inf; 0090 end 0091 end 0092 0093 %Check that the model is feasible given the constraints 0094 [sol,~]=solveLP(model); 0095 if isempty(sol.x) 0096 EM='The model has no feasible solution, likely due to incompatible constraints'; 0097 dispEM(EM); 0098 elseif sol.f==0 && showProgress 0099 warning('The model objective function cannot reach a non-zero value. This might be intended, so randomSampling will continue, but this could indicate problems with your model') 0100 end 0101 0102 nRxns = numel(model.rxns); 0103 %Reactions which can be involved in loops should not be optimized for. 0104 %Check which reactions reach an arbitary high upper bound 0105 if nargin<6 || isempty(goodRxns) 0106 revRxns = transpose(find(model.lb < 0)); 0107 fwdRxns = transpose(find(model.ub > 0)); 0108 rxnList = [fwdRxns,revRxns]; 0109 objList = [ones(numel(fwdRxns),1);-ones(numel(revRxns),1)]; 0110 testSol = zeros(numel(objList),1); 0111 PB = ProgressBar2(nRxns,'Prepare goodRxns not involved in loops','cli'); 0112 0113 parfor i = 1:numel(objList) 0114 testModel=setParam(model,'obj',rxnList(i),objList(i)); 0115 sol=solveLP(testModel,0); 0116 if ~isempty(sol.f) 0117 testSol(i) = sol.x(rxnList(i)); 0118 end 0119 count(PB); 0120 end 0121 testSol = testSol.*objList; 0122 goodRxns = true(nRxns,1); 0123 % Filter out reactions that can reach (-)1000 (= involved in loop) 0124 goodRxns(fwdRxns(testSol(1:numel(fwdRxns)) > 999)) = false; 0125 goodRxns(revRxns(testSol(numel(fwdRxns)+1:end) > 999)) = false; 0126 testSol(revRxns) = testSol(revRxns) + testSol(numel(fwdRxns)+1:end); 0127 % Filter out reactions that cannot carry flux 0128 goodRxns(testSol(1:nRxns) == 0) = false; 0129 % In ecModels do not sample from usage_prot reactions 0130 if isfield(model,'ec') 0131 usageRxns = startsWith(model.rxns,'usage_prot_'); 0132 goodRxns(usageRxns) = false; 0133 end 0134 goodRxns = find(goodRxns); 0135 end 0136 0137 %Reserve space for a solution matrix 0138 sols = zeros(numel(model.rxns),nSamples); 0139 sols = num2cell(sols,1); 0140 0141 %Main loop 0142 if nSamples > 0 0143 0144 PB = ProgressBar2(nSamples,'Performing random sampling','cli'); 0145 parfor i=1:nSamples 0146 badSolutions = 1; 0147 tmpModel = model; 0148 while lt(0,badSolutions) 0149 rxns = randsample(numel(goodRxns),nObjRxns); 0150 tmpModel.c = zeros(numel(tmpModel.rxns),1); 0151 0152 multipliers = randsample([-1 1],nObjRxns,true); 0153 multipliers(tmpModel.ub(goodRxns(rxns))<0) = 1; 0154 0155 tmpModel.c(goodRxns(rxns)) = rand(nObjRxns,1).*multipliers; 0156 0157 if true(minFlux) 0158 sol=solveLP(tmpModel,1); 0159 else 0160 sol=solveLP(tmpModel,0); 0161 end 0162 if any(sol.x) && abs(sol.f)>10^-8 0163 sols{i} = sol.x; 0164 badSolutions = 0; 0165 else 0166 badSolutions = badSolutions+1; 0167 %If it only finds bad solutions then throw an error. 0168 if badSolutions == 100 && supressErrors==false 0169 error('The program is having problems finding non-zero solutions (ignoring reactions that might be involved in loops). Review the constraints on your model. Set supressErrors to true to ignore this error'); 0170 end 0171 end 0172 end 0173 count(PB); 0174 end 0175 0176 %Map to original model 0177 sols = cell2mat(sols); 0178 [~, I]=ismember(model.rxns,originalRxns); 0179 solutions=zeros(numel(originalRxns),nSamples); 0180 solutions(I,:)=sols; 0181 else 0182 solutions=[]; 0183 end 0184 % Reset original Parallel setting 0185 ps.Pool.AutoCreate = oldPoolAutoCreateSetting; 0186 end 0187 0188 %To use instead of the normal Matlab randsample function. This is in order 0189 %to not depend on the Matlab statistical toolbox. 0190 function I=randsample(n,k,replacement) 0191 if nargin<3 0192 replacement=false; 0193 end 0194 %n can be a integer, which leads to I being sampled from 1:n, or it can be 0195 %a population to sample from. 0196 if numel(n)==1 && isnumeric(n) 0197 n=1:n; 0198 end 0199 %Loop and get random numbers until the list is unique. This is only a good 0200 %option is the number of samples is small compared to the population. There 0201 %are several checks that should be made here, for example regarding size 0202 %and that the number of samples is <=population size if replacement==false. 0203 %This is not the case in randomSampling, so such checks are ignored 0204 while true 0205 J=randi(numel(n),[k,1]); 0206 if replacement==true || numel(J)==numel(unique(J)) 0207 I=n(J); 0208 break; 0209 end 0210 end 0211 I=I(:); 0212 end