Home > core > parseTaskList.m

parseTaskList

PURPOSE ^

parseTaskList

SYNOPSIS ^

function taskStruct=parseTaskList(inputFile)

DESCRIPTION ^

 parseTaskList
   Parses a task list file.

   inputFile       a task list in either Excel (*.xlsx, with a sheet named
                   TASKS with all relevant content) or tab-delimited
                   (*.txt) format. The file may contain the following
                   column headers (note, all rows starting with a
                   non-empty cell are removed. The first row after that
                   is considered the headers):
                   ID
                       the only required header. Each task must have a
                       unique id (string or numeric). Tasks can span multiple
                       rows, only the first row in each task should have
                       an id
                   DESCRIPTION
                       description of the task
                   IN
                       allowed input(s) for the task. Metabolite names
                       should be on the form
                       "model.metName[model.comps]". Several inputs
                       can be delimited by ";". If so, then the same
                       bounds are used for all inputs. If that is not
                       wanted, then use several rows for the task
                   IN LB
                       lower bound for the uptake of the metabolites in
                       the row (optional, default 0 which corresponds to a
                       minimal uptake of 0 units)
                   IN UB
                       upper bound for the uptake of the metabolites in
                       the row (optional, default 1000 which corresponds to a
                       maximal uptake of 1000 units)
                   OUT
                       allowed output(s) for the task (see IN)
                   OUT LB
                       lower bound for the production of the metabolites in
                       the row (optional, default 0 which corresponds to a
                       minimal production of 0 units)
                   OUT UB
                       upper bound for the production of the metabolites in
                       the row (optional, default 1000 which corresponds to a
                       maximal production of 1000 units)
                   EQU
                       equation to add. The equation should be on the form
                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
                       should be on the form
                       "model.metName[model.comps]" (optional)
                   EQU LB
                       lower bound for the equation (optional, default -1000
                       for reversible and 0 for irreversible)
                   EQU UB
                       upper bound for the equation (optional, default 1000)
                   CHANGED RXN
                       reaction ID for which to change the bounds for.
                       Several IDs can be delimited by ";". If so,
                       then the same bounds are used for all reactions. If
                       that is not wanted, then use several rows for the task
                   CHANGED LB
                       lower bound for the reaction
                   CHANGED UB
                       upper bound for the reaction
                   SHOULD FAIL
                       true if the correct behavior of the model is to
                       not have a feasible solution given the constraints
                       (optional, default false)
                   PRINT FLUX
                       true if the function should print the corresponding
                       flux distribution for a task. Can be useful for
                       testing (optional, default false)

   taskStruct      array of structures with the following fields
       id          the id of the task
       description the description of the task
       shouldFail  true if the task should fail
       printFluxes true if the fluxes should be printed
       comments    string with comments
       inputs      cell array with input metabolites (in the form metName[comps])
       LBin        array with lower bounds on inputs (default, 0)
       UBin        array with upper bounds on inputs (default, 1000)
       outputs     cell array with output metabolites (in the form metName[comps])
       LBout       array with lower bounds on outputs (default, 0)
       UBout       array with upper bounds on outputs (default, 1000)
       equations   cell array with equations (with mets in the form metName[comps])
       LBequ       array with lower bounds on equations (default, -1000 for
                   reversible and 0 for irreversible)
       UBequ       array with upper bounds on equations (default, 1000)
       changed     cell array with reactions to change bounds for
       LBrxn       array with lower bounds on changed reactions
       UBrxn       array with upper bounds on changed reactions

   This function is used for defining a set of tasks for a model to
   perform. The tasks are defined by defining constraints on the model,
   and if the problem is feasible, then the task is considered successful.
   In general, each row can contain one constraint on uptakes, one
   constraint on outputs, one new equation, and one change of reaction
   bounds. If more bounds are needed to define the task, then several rows
   can be used for each task. To perform the task use checkTasks or
   fitTasks.

   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
   can be used as inputs or outputs in the similar manner to normal
   metabolites. This is a convenient way to, for example, allow excretion of
   all metabolites to check whether it's the synthesis of some metabolite
   that is limiting or whether it's the degradation of some byproduct. One
   important difference is that only the upper bounds are used for these general
   metabolites. That is, you can only say that uptake or excretion is
   allowed, not that it is required. This is to avoid conflicts where the
   constraints for the general metabolites overwrite those of the real
   ones.

 Usage: taskStruct=parseTaskList(inputFile)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function taskStruct=parseTaskList(inputFile)
0002 % parseTaskList
0003 %   Parses a task list file.
0004 %
0005 %   inputFile       a task list in either Excel (*.xlsx, with a sheet named
0006 %                   TASKS with all relevant content) or tab-delimited
0007 %                   (*.txt) format. The file may contain the following
0008 %                   column headers (note, all rows starting with a
0009 %                   non-empty cell are removed. The first row after that
0010 %                   is considered the headers):
0011 %                   ID
0012 %                       the only required header. Each task must have a
0013 %                       unique id (string or numeric). Tasks can span multiple
0014 %                       rows, only the first row in each task should have
0015 %                       an id
0016 %                   DESCRIPTION
0017 %                       description of the task
0018 %                   IN
0019 %                       allowed input(s) for the task. Metabolite names
0020 %                       should be on the form
0021 %                       "model.metName[model.comps]". Several inputs
0022 %                       can be delimited by ";". If so, then the same
0023 %                       bounds are used for all inputs. If that is not
0024 %                       wanted, then use several rows for the task
0025 %                   IN LB
0026 %                       lower bound for the uptake of the metabolites in
0027 %                       the row (optional, default 0 which corresponds to a
0028 %                       minimal uptake of 0 units)
0029 %                   IN UB
0030 %                       upper bound for the uptake of the metabolites in
0031 %                       the row (optional, default 1000 which corresponds to a
0032 %                       maximal uptake of 1000 units)
0033 %                   OUT
0034 %                       allowed output(s) for the task (see IN)
0035 %                   OUT LB
0036 %                       lower bound for the production of the metabolites in
0037 %                       the row (optional, default 0 which corresponds to a
0038 %                       minimal production of 0 units)
0039 %                   OUT UB
0040 %                       upper bound for the production of the metabolites in
0041 %                       the row (optional, default 1000 which corresponds to a
0042 %                       maximal production of 1000 units)
0043 %                   EQU
0044 %                       equation to add. The equation should be on the form
0045 %                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
0046 %                       should be on the form
0047 %                       "model.metName[model.comps]" (optional)
0048 %                   EQU LB
0049 %                       lower bound for the equation (optional, default -1000
0050 %                       for reversible and 0 for irreversible)
0051 %                   EQU UB
0052 %                       upper bound for the equation (optional, default 1000)
0053 %                   CHANGED RXN
0054 %                       reaction ID for which to change the bounds for.
0055 %                       Several IDs can be delimited by ";". If so,
0056 %                       then the same bounds are used for all reactions. If
0057 %                       that is not wanted, then use several rows for the task
0058 %                   CHANGED LB
0059 %                       lower bound for the reaction
0060 %                   CHANGED UB
0061 %                       upper bound for the reaction
0062 %                   SHOULD FAIL
0063 %                       true if the correct behavior of the model is to
0064 %                       not have a feasible solution given the constraints
0065 %                       (optional, default false)
0066 %                   PRINT FLUX
0067 %                       true if the function should print the corresponding
0068 %                       flux distribution for a task. Can be useful for
0069 %                       testing (optional, default false)
0070 %
0071 %   taskStruct      array of structures with the following fields
0072 %       id          the id of the task
0073 %       description the description of the task
0074 %       shouldFail  true if the task should fail
0075 %       printFluxes true if the fluxes should be printed
0076 %       comments    string with comments
0077 %       inputs      cell array with input metabolites (in the form metName[comps])
0078 %       LBin        array with lower bounds on inputs (default, 0)
0079 %       UBin        array with upper bounds on inputs (default, 1000)
0080 %       outputs     cell array with output metabolites (in the form metName[comps])
0081 %       LBout       array with lower bounds on outputs (default, 0)
0082 %       UBout       array with upper bounds on outputs (default, 1000)
0083 %       equations   cell array with equations (with mets in the form metName[comps])
0084 %       LBequ       array with lower bounds on equations (default, -1000 for
0085 %                   reversible and 0 for irreversible)
0086 %       UBequ       array with upper bounds on equations (default, 1000)
0087 %       changed     cell array with reactions to change bounds for
0088 %       LBrxn       array with lower bounds on changed reactions
0089 %       UBrxn       array with upper bounds on changed reactions
0090 %
0091 %   This function is used for defining a set of tasks for a model to
0092 %   perform. The tasks are defined by defining constraints on the model,
0093 %   and if the problem is feasible, then the task is considered successful.
0094 %   In general, each row can contain one constraint on uptakes, one
0095 %   constraint on outputs, one new equation, and one change of reaction
0096 %   bounds. If more bounds are needed to define the task, then several rows
0097 %   can be used for each task. To perform the task use checkTasks or
0098 %   fitTasks.
0099 %
0100 %   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
0101 %   can be used as inputs or outputs in the similar manner to normal
0102 %   metabolites. This is a convenient way to, for example, allow excretion of
0103 %   all metabolites to check whether it's the synthesis of some metabolite
0104 %   that is limiting or whether it's the degradation of some byproduct. One
0105 %   important difference is that only the upper bounds are used for these general
0106 %   metabolites. That is, you can only say that uptake or excretion is
0107 %   allowed, not that it is required. This is to avoid conflicts where the
0108 %   constraints for the general metabolites overwrite those of the real
0109 %   ones.
0110 %
0111 % Usage: taskStruct=parseTaskList(inputFile)
0112 
0113 if ~isfile(inputFile)
0114     error('Task list %s cannot be found',string(inputFile));
0115 end
0116 
0117 %Load the tasks file
0118 convNumeric = false;
0119 if strcmp(extractAfter(inputFile,strlength(inputFile) - 4), '.txt')
0120     %load from tab delimited text file
0121     fid = fopen(inputFile);
0122     %Need to read numeric columns as strings, this is converted further
0123     %down. If not, the titles would be lost.
0124     convNumeric = true;
0125     C = textscan(fid,'%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%*[^\n]', 'Delimiter', '\t');
0126     fclose(fid);
0127     raw = [C{:}];%unnest the cell array of cell arrays into a 2-dim cell array
0128 else
0129     [raw,flag]=loadSheet(loadWorkbook(inputFile), 'TASKS');
0130     if flag~=0
0131         EM=['Could not load sheet "TASKS" from ' inputFile];
0132         dispEM(EM);
0133     end
0134 end
0135 
0136 %Remove all lines starting with "#" (or actually any character) and all
0137 %empty columns
0138 raw=cleanSheet(raw);
0139 
0140 %Captions
0141 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'};
0142 %Match the columns
0143 [I, colI]=ismember(columns,raw(1,:));
0144 
0145 %If read from  a text file, the numbers will be strings - fix that
0146 if convNumeric % in theory, this if should not be needed, the code should do nothing if all are already numeric. But it is kept as a safeguard.
0147     numericColumns = [0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0] == 1;
0148     cols = colI(numericColumns);
0149     numeric = cellfun(@isnumeric,raw(:,cols));
0150     %trick to avoid messing up the title row:
0151     numeric(1,:) = 1;
0152     for colind = 1:numel(cols)
0153         col = cols(colind);
0154         raw(~numeric(:,colind),col) = cellfun(@str2num, raw(~numeric(:,colind),col), 'UniformOutput', false);
0155     end
0156 end
0157 
0158 %Check that the ID field is present
0159 if I(1)==0
0160     EM='The TASKS sheet must have a column named ID';
0161     dispEM(EM);
0162 end
0163 
0164 %make sure numerical fields are converted from strings
0165 
0166 %Add default bounds where needed
0167 for i=[4 5 7 8]
0168     I=cellfun(@isempty,raw(:,colI(i)));
0169     if i==5 || i==8
0170         raw(I,colI(i))={1000};
0171     else
0172         raw(I,colI(i))={0};
0173     end
0174 end
0175 
0176 %Create an empty task structure
0177 eTask.id='';
0178 eTask.description='';
0179 eTask.shouldFail=false;
0180 eTask.printFluxes=false;
0181 eTask.comments='';
0182 eTask.inputs={};
0183 eTask.LBin=[];
0184 eTask.UBin=[];
0185 eTask.outputs={};
0186 eTask.LBout=[];
0187 eTask.UBout=[];
0188 eTask.equations={};
0189 eTask.LBequ=[];
0190 eTask.UBequ=[];
0191 eTask.changed={};
0192 eTask.LBrxn=[];
0193 eTask.UBrxn=[];
0194 
0195 %Main loop
0196 taskStruct=[];
0197 task=eTask;
0198 if isnumeric(raw{2,colI(1)})
0199     task.id=num2str(raw{2,colI(1)});
0200 else
0201     task.id=raw{2,colI(1)};
0202 end
0203 task.description=raw{2,colI(2)};
0204 if ~isempty(raw{2,colI(15)})
0205     task.shouldFail=true;
0206 end
0207 if ~isempty(raw{2,colI(16)})
0208     task.printFluxes=true;
0209 end
0210 if ~isempty(raw{2,colI(17)})
0211     task.comments=raw{2,colI(17)};
0212 end
0213 
0214 for i=2:size(raw,1)
0215     %Set the inputs
0216     if ischar(raw{i,colI(3)})
0217         inputs=regexp(raw{i,colI(3)},';','split');
0218         task.inputs=[task.inputs;inputs(:)];
0219         task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}];
0220         task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}];
0221     end
0222     %Set the outputs
0223     if ischar(raw{i,colI(6)})
0224         outputs=regexp(raw{i,colI(6)},';','split');
0225         task.outputs=[task.outputs;outputs(:)];
0226         task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}];
0227         task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}];
0228     end
0229     %Add new rxns
0230     if ischar(raw{i,colI(9)})
0231         task.equations=[task.equations;raw{i,colI(9)}];
0232         if ~isempty(raw{i,colI(10)})
0233             task.LBequ=[task.LBequ;raw{i,colI(10)}];
0234         else
0235             if any(strfind(raw{i,colI(9)},'<=>'))
0236                 task.LBequ=[task.LBequ;-1000];
0237             else
0238                 task.LBequ=[task.LBequ;0];
0239             end
0240         end
0241         if ~isempty(raw{i,colI(11)})
0242             task.UBequ=[task.UBequ;raw{i,colI(11)}];
0243         else
0244             task.UBequ=[task.UBequ;1000];
0245         end
0246     end
0247     %Add changed bounds
0248     if ischar(raw{i,colI(12)})
0249         changed=regexp(raw{i,colI(12)},';','split');
0250         task.changed=[task.changed;changed(:)];
0251         task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}];
0252         task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}];
0253     end
0254     
0255     %Check if it should add more constraints
0256     if i<size(raw,1)
0257         if isempty(raw{i+1,colI(1)})
0258             continue;
0259         end
0260     end
0261     
0262     taskStruct=[taskStruct;task];
0263     task=eTask;
0264     if i<size(raw,1)
0265         if isnumeric(raw{i+1,colI(1)})
0266             task.id=num2str(raw{i+1,colI(1)});
0267         else
0268             task.id=raw{i+1,colI(1)};
0269         end
0270         task.description=raw{i+1,colI(2)};
0271         if ~isempty(raw{i+1,colI(15)})
0272             task.shouldFail=true;
0273         end
0274         if ~isempty(raw{i+1,colI(16)})
0275             task.printFluxes=true;
0276         end
0277         if ~isempty(raw{i+1,colI(17)})
0278             task.comments=raw{i+1,colI(17)};
0279         end
0280     end
0281 end
0282 
0283 %Should add more checks, such as unique IDs and missing headers
0284 
0285 end

Generated by m2html © 2005