


parseTaskList
Parses a task list file.
inputFile a task list in either Excel (*.xlsx, with a sheet named
TASKS with all relevant content) or tab-delimited
(*.txt) format. The file may contain the following
column headers (note, all rows starting with a
non-empty cell are removed. The first row after that
is considered the headers):
ID
the only required header. Each task must have a
unique id (string or numeric). Tasks can span multiple
rows, only the first row in each task should have
an id
DESCRIPTION
description of the task
IN
allowed input(s) for the task. Metabolite names
should be on the form
"model.metName[model.comps]". Several inputs
can be delimited by ";". If so, then the same
bounds are used for all inputs. If that is not
wanted, then use several rows for the task
IN LB
lower bound for the uptake of the metabolites in
the row (optional, default 0 which corresponds to a
minimal uptake of 0 units)
IN UB
upper bound for the uptake of the metabolites in
the row (optional, default 1000 which corresponds to a
maximal uptake of 1000 units)
OUT
allowed output(s) for the task (see IN)
OUT LB
lower bound for the production of the metabolites in
the row (optional, default 0 which corresponds to a
minimal production of 0 units)
OUT UB
upper bound for the production of the metabolites in
the row (optional, default 1000 which corresponds to a
maximal production of 1000 units)
EQU
equation to add. The equation should be on the form
"0.4 A + 2 B <=> (or =>) C" and the metabolites
should be on the form
"model.metName[model.comps]" (optional)
EQU LB
lower bound for the equation (optional, default -1000
for reversible and 0 for irreversible)
EQU UB
upper bound for the equation (optional, default 1000)
CHANGED RXN
reaction ID for which to change the bounds for.
Several IDs can be delimited by ";". If so,
then the same bounds are used for all reactions. If
that is not wanted, then use several rows for the task
CHANGED LB
lower bound for the reaction
CHANGED UB
upper bound for the reaction
SHOULD FAIL
true if the correct behavior of the model is to
not have a feasible solution given the constraints
(optional, default false)
PRINT FLUX
true if the function should print the corresponding
flux distribution for a task. Can be useful for
testing (optional, default false)
taskStruct array of structures with the following fields
id the id of the task
description the description of the task
shouldFail true if the task should fail
printFluxes true if the fluxes should be printed
comments string with comments
inputs cell array with input metabolites (in the form metName[comps])
LBin array with lower bounds on inputs (default, 0)
UBin array with upper bounds on inputs (default, 1000)
outputs cell array with output metabolites (in the form metName[comps])
LBout array with lower bounds on outputs (default, 0)
UBout array with upper bounds on outputs (default, 1000)
equations cell array with equations (with mets in the form metName[comps])
LBequ array with lower bounds on equations (default, -1000 for
reversible and 0 for irreversible)
UBequ array with upper bounds on equations (default, 1000)
changed cell array with reactions to change bounds for
LBrxn array with lower bounds on changed reactions
UBrxn array with upper bounds on changed reactions
This function is used for defining a set of tasks for a model to
perform. The tasks are defined by defining constraints on the model,
and if the problem is feasible, then the task is considered successful.
In general, each row can contain one constraint on uptakes, one
constraint on outputs, one new equation, and one change of reaction
bounds. If more bounds are needed to define the task, then several rows
can be used for each task. To perform the task use checkTasks or
fitTasks.
NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
can be used as inputs or outputs in the similar manner to normal
metabolites. This is a convenient way to, for example, allow excretion of
all metabolites to check whether it's the synthesis of some metabolite
that is limiting or whether it's the degradation of some byproduct. One
important difference is that only the upper bounds are used for these general
metabolites. That is, you can only say that uptake or excretion is
allowed, not that it is required. This is to avoid conflicts where the
constraints for the general metabolites overwrite those of the real
ones.
Usage: taskStruct=parseTaskList(inputFile)

0001 function taskStruct=parseTaskList(inputFile) 0002 % parseTaskList 0003 % Parses a task list file. 0004 % 0005 % inputFile a task list in either Excel (*.xlsx, with a sheet named 0006 % TASKS with all relevant content) or tab-delimited 0007 % (*.txt) format. The file may contain the following 0008 % column headers (note, all rows starting with a 0009 % non-empty cell are removed. The first row after that 0010 % is considered the headers): 0011 % ID 0012 % the only required header. Each task must have a 0013 % unique id (string or numeric). Tasks can span multiple 0014 % rows, only the first row in each task should have 0015 % an id 0016 % DESCRIPTION 0017 % description of the task 0018 % IN 0019 % allowed input(s) for the task. Metabolite names 0020 % should be on the form 0021 % "model.metName[model.comps]". Several inputs 0022 % can be delimited by ";". If so, then the same 0023 % bounds are used for all inputs. If that is not 0024 % wanted, then use several rows for the task 0025 % IN LB 0026 % lower bound for the uptake of the metabolites in 0027 % the row (optional, default 0 which corresponds to a 0028 % minimal uptake of 0 units) 0029 % IN UB 0030 % upper bound for the uptake of the metabolites in 0031 % the row (optional, default 1000 which corresponds to a 0032 % maximal uptake of 1000 units) 0033 % OUT 0034 % allowed output(s) for the task (see IN) 0035 % OUT LB 0036 % lower bound for the production of the metabolites in 0037 % the row (optional, default 0 which corresponds to a 0038 % minimal production of 0 units) 0039 % OUT UB 0040 % upper bound for the production of the metabolites in 0041 % the row (optional, default 1000 which corresponds to a 0042 % maximal production of 1000 units) 0043 % EQU 0044 % equation to add. The equation should be on the form 0045 % "0.4 A + 2 B <=> (or =>) C" and the metabolites 0046 % should be on the form 0047 % "model.metName[model.comps]" (optional) 0048 % EQU LB 0049 % lower bound for the equation (optional, default -1000 0050 % for reversible and 0 for irreversible) 0051 % EQU UB 0052 % upper bound for the equation (optional, default 1000) 0053 % CHANGED RXN 0054 % reaction ID for which to change the bounds for. 0055 % Several IDs can be delimited by ";". If so, 0056 % then the same bounds are used for all reactions. If 0057 % that is not wanted, then use several rows for the task 0058 % CHANGED LB 0059 % lower bound for the reaction 0060 % CHANGED UB 0061 % upper bound for the reaction 0062 % SHOULD FAIL 0063 % true if the correct behavior of the model is to 0064 % not have a feasible solution given the constraints 0065 % (optional, default false) 0066 % PRINT FLUX 0067 % true if the function should print the corresponding 0068 % flux distribution for a task. Can be useful for 0069 % testing (optional, default false) 0070 % 0071 % taskStruct array of structures with the following fields 0072 % id the id of the task 0073 % description the description of the task 0074 % shouldFail true if the task should fail 0075 % printFluxes true if the fluxes should be printed 0076 % comments string with comments 0077 % inputs cell array with input metabolites (in the form metName[comps]) 0078 % LBin array with lower bounds on inputs (default, 0) 0079 % UBin array with upper bounds on inputs (default, 1000) 0080 % outputs cell array with output metabolites (in the form metName[comps]) 0081 % LBout array with lower bounds on outputs (default, 0) 0082 % UBout array with upper bounds on outputs (default, 1000) 0083 % equations cell array with equations (with mets in the form metName[comps]) 0084 % LBequ array with lower bounds on equations (default, -1000 for 0085 % reversible and 0 for irreversible) 0086 % UBequ array with upper bounds on equations (default, 1000) 0087 % changed cell array with reactions to change bounds for 0088 % LBrxn array with lower bounds on changed reactions 0089 % UBrxn array with upper bounds on changed reactions 0090 % 0091 % This function is used for defining a set of tasks for a model to 0092 % perform. The tasks are defined by defining constraints on the model, 0093 % and if the problem is feasible, then the task is considered successful. 0094 % In general, each row can contain one constraint on uptakes, one 0095 % constraint on outputs, one new equation, and one change of reaction 0096 % bounds. If more bounds are needed to define the task, then several rows 0097 % can be used for each task. To perform the task use checkTasks or 0098 % fitTasks. 0099 % 0100 % NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]" 0101 % can be used as inputs or outputs in the similar manner to normal 0102 % metabolites. This is a convenient way to, for example, allow excretion of 0103 % all metabolites to check whether it's the synthesis of some metabolite 0104 % that is limiting or whether it's the degradation of some byproduct. One 0105 % important difference is that only the upper bounds are used for these general 0106 % metabolites. That is, you can only say that uptake or excretion is 0107 % allowed, not that it is required. This is to avoid conflicts where the 0108 % constraints for the general metabolites overwrite those of the real 0109 % ones. 0110 % 0111 % Usage: taskStruct=parseTaskList(inputFile) 0112 0113 if ~isfile(inputFile) 0114 error('Task list %s cannot be found',string(inputFile)); 0115 end 0116 0117 %Load the tasks file 0118 convNumeric = false; 0119 if strcmp(extractAfter(inputFile,strlength(inputFile) - 4), '.txt') 0120 %load from tab delimited text file 0121 fid = fopen(inputFile); 0122 %Need to read numeric columns as strings, this is converted further 0123 %down. If not, the titles would be lost. 0124 convNumeric = true; 0125 C = textscan(fid,'%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%*[^\n]', 'Delimiter', '\t'); 0126 fclose(fid); 0127 raw = [C{:}];%unnest the cell array of cell arrays into a 2-dim cell array 0128 else 0129 [raw,flag]=loadSheet(loadWorkbook(inputFile), 'TASKS'); 0130 if flag~=0 0131 EM=['Could not load sheet "TASKS" from ' inputFile]; 0132 dispEM(EM); 0133 end 0134 end 0135 0136 %Remove all lines starting with "#" (or actually any character) and all 0137 %empty columns 0138 raw=cleanSheet(raw); 0139 0140 %Captions 0141 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'}; 0142 %Match the columns 0143 [I, colI]=ismember(columns,raw(1,:)); 0144 0145 %If read from a text file, the numbers will be strings - fix that 0146 if convNumeric % in theory, this if should not be needed, the code should do nothing if all are already numeric. But it is kept as a safeguard. 0147 numericColumns = [0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0] == 1; 0148 cols = colI(numericColumns); 0149 numeric = cellfun(@isnumeric,raw(:,cols)); 0150 %trick to avoid messing up the title row: 0151 numeric(1,:) = 1; 0152 for colind = 1:numel(cols) 0153 col = cols(colind); 0154 raw(~numeric(:,colind),col) = cellfun(@str2num, raw(~numeric(:,colind),col), 'UniformOutput', false); 0155 end 0156 end 0157 0158 %Check that the ID field is present 0159 if I(1)==0 0160 EM='The TASKS sheet must have a column named ID'; 0161 dispEM(EM); 0162 end 0163 0164 %make sure numerical fields are converted from strings 0165 0166 %Add default bounds where needed 0167 for i=[4 5 7 8] 0168 I=cellfun(@isempty,raw(:,colI(i))); 0169 if i==5 || i==8 0170 raw(I,colI(i))={1000}; 0171 else 0172 raw(I,colI(i))={0}; 0173 end 0174 end 0175 0176 %Create an empty task structure 0177 eTask.id=''; 0178 eTask.description=''; 0179 eTask.shouldFail=false; 0180 eTask.printFluxes=false; 0181 eTask.comments=''; 0182 eTask.inputs={}; 0183 eTask.LBin=[]; 0184 eTask.UBin=[]; 0185 eTask.outputs={}; 0186 eTask.LBout=[]; 0187 eTask.UBout=[]; 0188 eTask.equations={}; 0189 eTask.LBequ=[]; 0190 eTask.UBequ=[]; 0191 eTask.changed={}; 0192 eTask.LBrxn=[]; 0193 eTask.UBrxn=[]; 0194 0195 %Main loop 0196 taskStruct=[]; 0197 task=eTask; 0198 if isnumeric(raw{2,colI(1)}) 0199 task.id=num2str(raw{2,colI(1)}); 0200 else 0201 task.id=raw{2,colI(1)}; 0202 end 0203 task.description=raw{2,colI(2)}; 0204 if ~isempty(raw{2,colI(15)}) 0205 task.shouldFail=true; 0206 end 0207 if ~isempty(raw{2,colI(16)}) 0208 task.printFluxes=true; 0209 end 0210 if ~isempty(raw{2,colI(17)}) 0211 task.comments=raw{2,colI(17)}; 0212 end 0213 0214 for i=2:size(raw,1) 0215 %Set the inputs 0216 if ischar(raw{i,colI(3)}) 0217 inputs=regexp(raw{i,colI(3)},';','split'); 0218 task.inputs=[task.inputs;inputs(:)]; 0219 task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}]; 0220 task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}]; 0221 end 0222 %Set the outputs 0223 if ischar(raw{i,colI(6)}) 0224 outputs=regexp(raw{i,colI(6)},';','split'); 0225 task.outputs=[task.outputs;outputs(:)]; 0226 task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}]; 0227 task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}]; 0228 end 0229 %Add new rxns 0230 if ischar(raw{i,colI(9)}) 0231 task.equations=[task.equations;raw{i,colI(9)}]; 0232 if ~isempty(raw{i,colI(10)}) 0233 task.LBequ=[task.LBequ;raw{i,colI(10)}]; 0234 else 0235 if any(strfind(raw{i,colI(9)},'<=>')) 0236 task.LBequ=[task.LBequ;-1000]; 0237 else 0238 task.LBequ=[task.LBequ;0]; 0239 end 0240 end 0241 if ~isempty(raw{i,colI(11)}) 0242 task.UBequ=[task.UBequ;raw{i,colI(11)}]; 0243 else 0244 task.UBequ=[task.UBequ;1000]; 0245 end 0246 end 0247 %Add changed bounds 0248 if ischar(raw{i,colI(12)}) 0249 changed=regexp(raw{i,colI(12)},';','split'); 0250 task.changed=[task.changed;changed(:)]; 0251 task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}]; 0252 task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}]; 0253 end 0254 0255 %Check if it should add more constraints 0256 if i<size(raw,1) 0257 if isempty(raw{i+1,colI(1)}) 0258 continue; 0259 end 0260 end 0261 0262 taskStruct=[taskStruct;task]; 0263 task=eTask; 0264 if i<size(raw,1) 0265 if isnumeric(raw{i+1,colI(1)}) 0266 task.id=num2str(raw{i+1,colI(1)}); 0267 else 0268 task.id=raw{i+1,colI(1)}; 0269 end 0270 task.description=raw{i+1,colI(2)}; 0271 if ~isempty(raw{i+1,colI(15)}) 0272 task.shouldFail=true; 0273 end 0274 if ~isempty(raw{i+1,colI(16)}) 0275 task.printFluxes=true; 0276 end 0277 if ~isempty(raw{i+1,colI(17)}) 0278 task.comments=raw{i+1,colI(17)}; 0279 end 0280 end 0281 end 0282 0283 %Should add more checks, such as unique IDs and missing headers 0284 0285 end