parseTaskList Parses a task list file. inputFile a task list in Excel format. The file must contain a sheet named TASKS, which in turn may contain the following column headers (note, all rows starting with a non-empty cell are removed. The first row after that is considered the headers): ID the only required header. Each task must have a unique id (string or numeric). Tasks can span multiple rows, only the first row in each task should have an id DESCRIPTION description of the task IN allowed input(s) for the task. Metabolite names should be on the form "model.metName[model.comps]". Several inputs can be delimited by ";". If so, then the same bounds are used for all inputs. If that is not wanted, then use several rows for the task IN LB lower bound for the uptake of the metabolites in the row (optional, default 0 which corresponds to a minimal uptake of 0 units) IN UB upper bound for the uptake of the metabolites in the row (optional, default 1000 which corresponds to a maximal uptake of 1000 units) OUT allowed output(s) for the task (see IN) OUT LB lower bound for the production of the metabolites in the row (optional, default 0 which corresponds to a minimal production of 0 units) OUT UB upper bound for the production of the metabolites in the row (optional, default 1000 which corresponds to a maximal production of 1000 units) EQU equation to add. The equation should be on the form "0.4 A + 2 B <=> (or =>) C" and the metabolites should be on the form "model.metName[model.comps]" (optional) EQU LB lower bound for the equation (optional, default -1000 for reversible and 0 for irreversible) EQU UB upper bound for the equation (optional, default 1000) CHANGED RXN reaction ID for which to change the bounds for. Several IDs can be delimited by ";". If so, then the same bounds are used for all reactions. If that is not wanted, then use several rows for the task CHANGED LB lower bound for the reaction CHANGED UB upper bound for the reaction SHOULD FAIL true if the correct behavior of the model is to not have a feasible solution given the constraints (optional, default false) PRINT FLUX true if the function should print the corresponding flux distribution for a task. Can be useful for testing (optional, default false) taskStruct array of structures with the following fields id the id of the task description the description of the task shouldFail true if the task should fail printFluxes true if the fluxes should be printed comments string with comments inputs cell array with input metabolites (in the form metName[comps]) LBin array with lower bounds on inputs (default, 0) UBin array with upper bounds on inputs (default, 1000) outputs cell array with output metabolites (in the form metName[comps]) LBout array with lower bounds on outputs (default, 0) UBout array with upper bounds on outputs (default, 1000) equations cell array with equations (with mets in the form metName[comps]) LBequ array with lower bounds on equations (default, -1000 for reversible and 0 for irreversible) UBequ array with upper bounds on equations (default, 1000) changed cell array with reactions to change bounds for LBrxn array with lower bounds on changed reactions UBrxn array with upper bounds on changed reactions This function is used for defining a set of tasks for a model to perform. The tasks are defined by defining constraints on the model, and if the problem is feasible, then the task is considered successful. In general, each row can contain one constraint on uptakes, one constraint on outputs, one new equation, and one change of reaction bounds. If more bounds are needed to define the task, then several rows can be used for each task. To perform the task use checkTasks or fitTasks. NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]" can be used as inputs or outputs in the similar manner to normal metabolites. This is a convenient way to, for example, allow excretion of all metabolites to check whether it's the synthesis of some metabolite that is limiting or whether it's the degradation of some byproduct. One important difference is that only the upper bounds are used for these general metabolites. That is, you can only say that uptake or excretion is allowed, not that it is required. This is to avoid conflicts where the constraints for the general metabolites overwrite those of the real ones. Usage: taskStruct=parseTaskList(inputFile)
0001 function taskStruct=parseTaskList(inputFile) 0002 % parseTaskList 0003 % Parses a task list file. 0004 % 0005 % inputFile a task list in Excel format. The file must contain a 0006 % sheet named TASKS, which in turn may contain the 0007 % following column headers (note, all rows starting with 0008 % a non-empty cell are removed. The first row after that 0009 % is considered the headers): 0010 % ID 0011 % the only required header. Each task must have a 0012 % unique id (string or numeric). Tasks can span multiple 0013 % rows, only the first row in each task should have 0014 % an id 0015 % DESCRIPTION 0016 % description of the task 0017 % IN 0018 % allowed input(s) for the task. Metabolite names 0019 % should be on the form 0020 % "model.metName[model.comps]". Several inputs 0021 % can be delimited by ";". If so, then the same 0022 % bounds are used for all inputs. If that is not 0023 % wanted, then use several rows for the task 0024 % IN LB 0025 % lower bound for the uptake of the metabolites in 0026 % the row (optional, default 0 which corresponds to a 0027 % minimal uptake of 0 units) 0028 % IN UB 0029 % upper bound for the uptake of the metabolites in 0030 % the row (optional, default 1000 which corresponds to a 0031 % maximal uptake of 1000 units) 0032 % OUT 0033 % allowed output(s) for the task (see IN) 0034 % OUT LB 0035 % lower bound for the production of the metabolites in 0036 % the row (optional, default 0 which corresponds to a 0037 % minimal production of 0 units) 0038 % OUT UB 0039 % upper bound for the production of the metabolites in 0040 % the row (optional, default 1000 which corresponds to a 0041 % maximal production of 1000 units) 0042 % EQU 0043 % equation to add. The equation should be on the form 0044 % "0.4 A + 2 B <=> (or =>) C" and the metabolites 0045 % should be on the form 0046 % "model.metName[model.comps]" (optional) 0047 % EQU LB 0048 % lower bound for the equation (optional, default -1000 0049 % for reversible and 0 for irreversible) 0050 % EQU UB 0051 % upper bound for the equation (optional, default 1000) 0052 % CHANGED RXN 0053 % reaction ID for which to change the bounds for. 0054 % Several IDs can be delimited by ";". If so, 0055 % then the same bounds are used for all reactions. If 0056 % that is not wanted, then use several rows for the task 0057 % CHANGED LB 0058 % lower bound for the reaction 0059 % CHANGED UB 0060 % upper bound for the reaction 0061 % SHOULD FAIL 0062 % true if the correct behavior of the model is to 0063 % not have a feasible solution given the constraints 0064 % (optional, default false) 0065 % PRINT FLUX 0066 % true if the function should print the corresponding 0067 % flux distribution for a task. Can be useful for 0068 % testing (optional, default false) 0069 % 0070 % taskStruct array of structures with the following fields 0071 % id the id of the task 0072 % description the description of the task 0073 % shouldFail true if the task should fail 0074 % printFluxes true if the fluxes should be printed 0075 % comments string with comments 0076 % inputs cell array with input metabolites (in the form metName[comps]) 0077 % LBin array with lower bounds on inputs (default, 0) 0078 % UBin array with upper bounds on inputs (default, 1000) 0079 % outputs cell array with output metabolites (in the form metName[comps]) 0080 % LBout array with lower bounds on outputs (default, 0) 0081 % UBout array with upper bounds on outputs (default, 1000) 0082 % equations cell array with equations (with mets in the form metName[comps]) 0083 % LBequ array with lower bounds on equations (default, -1000 for 0084 % reversible and 0 for irreversible) 0085 % UBequ array with upper bounds on equations (default, 1000) 0086 % changed cell array with reactions to change bounds for 0087 % LBrxn array with lower bounds on changed reactions 0088 % UBrxn array with upper bounds on changed reactions 0089 % 0090 % This function is used for defining a set of tasks for a model to 0091 % perform. The tasks are defined by defining constraints on the model, 0092 % and if the problem is feasible, then the task is considered successful. 0093 % In general, each row can contain one constraint on uptakes, one 0094 % constraint on outputs, one new equation, and one change of reaction 0095 % bounds. If more bounds are needed to define the task, then several rows 0096 % can be used for each task. To perform the task use checkTasks or 0097 % fitTasks. 0098 % 0099 % NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]" 0100 % can be used as inputs or outputs in the similar manner to normal 0101 % metabolites. This is a convenient way to, for example, allow excretion of 0102 % all metabolites to check whether it's the synthesis of some metabolite 0103 % that is limiting or whether it's the degradation of some byproduct. One 0104 % important difference is that only the upper bounds are used for these general 0105 % metabolites. That is, you can only say that uptake or excretion is 0106 % allowed, not that it is required. This is to avoid conflicts where the 0107 % constraints for the general metabolites overwrite those of the real 0108 % ones. 0109 % 0110 % Usage: taskStruct=parseTaskList(inputFile) 0111 0112 if ~isfile(inputFile) 0113 error('Task list %s cannot be found',string(inputFile)); 0114 end 0115 0116 %Load the tasks file 0117 convNumeric = false; 0118 if strcmp(extractAfter(inputFile,strlength(inputFile) - 4), '.txt') 0119 %load from tab delimited text file 0120 fid = fopen(inputFile); 0121 %Need to read numeric columns as strings, this is converted further 0122 %down. If not, the titles would be lost. 0123 convNumeric = true; 0124 C = textscan(fid,'%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%*[^\n]', 'Delimiter', '\t'); 0125 fclose(fid); 0126 raw = [C{:}];%unnest the cell array of cell arrays into a 2-dim cell array 0127 else 0128 [raw,flag]=loadSheet(loadWorkbook(inputFile), 'TASKS'); 0129 if flag~=0 0130 EM=['Could not load sheet "TASKS" from ' inputFile]; 0131 dispEM(EM); 0132 end 0133 end 0134 0135 %Remove all lines starting with "#" (or actually any character) and all 0136 %empty columns 0137 raw=cleanSheet(raw); 0138 0139 %Captions 0140 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'}; 0141 %Match the columns 0142 [I, colI]=ismember(columns,raw(1,:)); 0143 0144 %If read from a text file, the numbers will be strings - fix that 0145 if convNumeric % in theory, this if should not be needed, the code should do nothing if all are already numeric. But it is kept as a safeguard. 0146 numericColumns = [0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0] == 1; 0147 cols = colI(numericColumns); 0148 numeric = cellfun(@isnumeric,raw(:,cols)); 0149 %trick to avoid messing up the title row: 0150 numeric(1,:) = 1; 0151 for colind = 1:numel(cols) 0152 col = cols(colind); 0153 raw(~numeric(:,colind),col) = cellfun(@str2num, raw(~numeric(:,colind),col), 'UniformOutput', false); 0154 end 0155 end 0156 0157 %Check that the ID field is present 0158 if I(1)==0 0159 EM='The TASKS sheet must have a column named ID'; 0160 dispEM(EM); 0161 end 0162 0163 %make sure numerical fields are converted from strings 0164 0165 %Add default bounds where needed 0166 for i=[4 5 7 8] 0167 I=cellfun(@isempty,raw(:,colI(i))); 0168 if i==5 || i==8 0169 raw(I,colI(i))={1000}; 0170 else 0171 raw(I,colI(i))={0}; 0172 end 0173 end 0174 0175 %Create an empty task structure 0176 eTask.id=''; 0177 eTask.description=''; 0178 eTask.shouldFail=false; 0179 eTask.printFluxes=false; 0180 eTask.comments=''; 0181 eTask.inputs={}; 0182 eTask.LBin=[]; 0183 eTask.UBin=[]; 0184 eTask.outputs={}; 0185 eTask.LBout=[]; 0186 eTask.UBout=[]; 0187 eTask.equations={}; 0188 eTask.LBequ=[]; 0189 eTask.UBequ=[]; 0190 eTask.changed={}; 0191 eTask.LBrxn=[]; 0192 eTask.UBrxn=[]; 0193 0194 %Main loop 0195 taskStruct=[]; 0196 task=eTask; 0197 if isnumeric(raw{2,colI(1)}) 0198 task.id=num2str(raw{2,colI(1)}); 0199 else 0200 task.id=raw{2,colI(1)}; 0201 end 0202 task.description=raw{2,colI(2)}; 0203 if ~isempty(raw{2,colI(15)}) 0204 task.shouldFail=true; 0205 end 0206 if ~isempty(raw{2,colI(16)}) 0207 task.printFluxes=true; 0208 end 0209 if ~isempty(raw{2,colI(17)}) 0210 task.comments=raw{2,colI(17)}; 0211 end 0212 0213 for i=2:size(raw,1) 0214 %Set the inputs 0215 if ischar(raw{i,colI(3)}) 0216 inputs=regexp(raw{i,colI(3)},';','split'); 0217 task.inputs=[task.inputs;inputs(:)]; 0218 task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}]; 0219 task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}]; 0220 end 0221 %Set the outputs 0222 if ischar(raw{i,colI(6)}) 0223 outputs=regexp(raw{i,colI(6)},';','split'); 0224 task.outputs=[task.outputs;outputs(:)]; 0225 task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}]; 0226 task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}]; 0227 end 0228 %Add new rxns 0229 if ischar(raw{i,colI(9)}) 0230 task.equations=[task.equations;raw{i,colI(9)}]; 0231 if ~isempty(raw{i,colI(10)}) 0232 task.LBequ=[task.LBequ;raw{i,colI(10)}]; 0233 else 0234 if any(strfind(raw{i,colI(9)},'<=>')) 0235 task.LBequ=[task.LBequ;-1000]; 0236 else 0237 task.LBequ=[task.LBequ;0]; 0238 end 0239 end 0240 if ~isempty(raw{i,colI(11)}) 0241 task.UBequ=[task.UBequ;raw{i,colI(11)}]; 0242 else 0243 task.UBequ=[task.UBequ;1000]; 0244 end 0245 end 0246 %Add changed bounds 0247 if ischar(raw{i,colI(12)}) 0248 changed=regexp(raw{i,colI(12)},';','split'); 0249 task.changed=[task.changed;changed(:)]; 0250 task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}]; 0251 task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}]; 0252 end 0253 0254 %Check if it should add more constraints 0255 if i<size(raw,1) 0256 if isempty(raw{i+1,colI(1)}) 0257 continue; 0258 end 0259 end 0260 0261 taskStruct=[taskStruct;task]; 0262 task=eTask; 0263 if i<size(raw,1) 0264 if isnumeric(raw{i+1,colI(1)}) 0265 task.id=num2str(raw{i+1,colI(1)}); 0266 else 0267 task.id=raw{i+1,colI(1)}; 0268 end 0269 task.description=raw{i+1,colI(2)}; 0270 if ~isempty(raw{i+1,colI(15)}) 0271 task.shouldFail=true; 0272 end 0273 if ~isempty(raw{i+1,colI(16)}) 0274 task.printFluxes=true; 0275 end 0276 if ~isempty(raw{i+1,colI(17)}) 0277 task.comments=raw{i+1,colI(17)}; 0278 end 0279 end 0280 end 0281 0282 %Should add more checks, such as unique IDs and missing headers 0283 0284 end