Home > core > parseTaskList.m

parseTaskList

PURPOSE ^

parseTaskList

SYNOPSIS ^

function taskStruct=parseTaskList(inputFile)

DESCRIPTION ^

 parseTaskList
   Parses a task list file.

   inputFile       a task list in Excel format. The file must contain a
                   sheet named TASKS, which in turn may contain the
                   following column headers (note, all rows starting with
                   a non-empty cell are removed. The first row after that
                   is considered the headers):
                   ID
                       the only required header. Each task must have a
                       unique id (string or numeric). Tasks can span multiple
                       rows, only the first row in each task should have
                       an id
                   DESCRIPTION
                       description of the task
                   IN
                       allowed input(s) for the task. Metabolite names
                       should be on the form
                       "model.metName[model.comps]". Several inputs
                       can be delimited by ";". If so, then the same
                       bounds are used for all inputs. If that is not
                       wanted, then use several rows for the task
                   IN LB
                       lower bound for the uptake of the metabolites in
                       the row (optional, default 0 which corresponds to a
                       minimal uptake of 0 units)
                   IN UB
                       upper bound for the uptake of the metabolites in
                       the row (optional, default 1000 which corresponds to a
                       maximal uptake of 1000 units)
                   OUT
                       allowed output(s) for the task (see IN)
                   OUT LB
                       lower bound for the production of the metabolites in
                       the row (optional, default 0 which corresponds to a
                       minimal production of 0 units)
                   OUT UB
                       upper bound for the production of the metabolites in
                       the row (optional, default 1000 which corresponds to a
                       maximal production of 1000 units)
                   EQU
                       equation to add. The equation should be on the form
                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
                       should be on the form
                       "model.metName[model.comps]" (optional)
                   EQU LB
                       lower bound for the equation (optional, default -1000
                       for reversible and 0 for irreversible)
                   EQU UB
                       upper bound for the equation (optional, default 1000)
                   CHANGED RXN
                       reaction ID for which to change the bounds for.
                       Several IDs can be delimited by ";". If so,
                       then the same bounds are used for all reactions. If
                       that is not wanted, then use several rows for the task
                   CHANGED LB
                       lower bound for the reaction
                   CHANGED UB
                       upper bound for the reaction
                   SHOULD FAIL
                       true if the correct behavior of the model is to
                       not have a feasible solution given the constraints
                       (optional, default false)
                   PRINT FLUX
                       true if the function should print the corresponding
                       flux distribution for a task. Can be useful for
                       testing (optional, default false)

   taskStruct      array of structures with the following fields
       id          the id of the task
       description the description of the task
       shouldFail  true if the task should fail
       printFluxes true if the fluxes should be printed
       comments    string with comments
       inputs      cell array with input metabolites (in the form metName[comps])
       LBin        array with lower bounds on inputs (default, 0)
       UBin        array with upper bounds on inputs (default, 1000)
       outputs     cell array with output metabolites (in the form metName[comps])
       LBout       array with lower bounds on outputs (default, 0)
       UBout       array with upper bounds on outputs (default, 1000)
       equations   cell array with equations (with mets in the form metName[comps])
       LBequ       array with lower bounds on equations (default, -1000 for
                   reversible and 0 for irreversible)
       UBequ       array with upper bounds on equations (default, 1000)
       changed     cell array with reactions to change bounds for
       LBrxn       array with lower bounds on changed reactions
       UBrxn       array with upper bounds on changed reactions

   This function is used for defining a set of tasks for a model to
   perform. The tasks are defined by defining constraints on the model,
   and if the problem is feasible, then the task is considered successful.
   In general, each row can contain one constraint on uptakes, one
   constraint on outputs, one new equation, and one change of reaction
   bounds. If more bounds are needed to define the task, then several rows
   can be used for each task. To perform the task use checkTasks or
   fitTasks.

   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
   can be used as inputs or outputs in the similar manner to normal
   metabolites. This is a convenient way to, for example, allow excretion of
   all metabolites to check whether it's the synthesis of some metabolite
   that is limiting or whether it's the degradation of some byproduct. One
   important difference is that only the upper bounds are used for these general
   metabolites. That is, you can only say that uptake or excretion is
   allowed, not that it is required. This is to avoid conflicts where the
   constraints for the general metabolites overwrite those of the real
   ones.

 Usage: taskStruct=parseTaskList(inputFile)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function taskStruct=parseTaskList(inputFile)
0002 % parseTaskList
0003 %   Parses a task list file.
0004 %
0005 %   inputFile       a task list in Excel format. The file must contain a
0006 %                   sheet named TASKS, which in turn may contain the
0007 %                   following column headers (note, all rows starting with
0008 %                   a non-empty cell are removed. The first row after that
0009 %                   is considered the headers):
0010 %                   ID
0011 %                       the only required header. Each task must have a
0012 %                       unique id (string or numeric). Tasks can span multiple
0013 %                       rows, only the first row in each task should have
0014 %                       an id
0015 %                   DESCRIPTION
0016 %                       description of the task
0017 %                   IN
0018 %                       allowed input(s) for the task. Metabolite names
0019 %                       should be on the form
0020 %                       "model.metName[model.comps]". Several inputs
0021 %                       can be delimited by ";". If so, then the same
0022 %                       bounds are used for all inputs. If that is not
0023 %                       wanted, then use several rows for the task
0024 %                   IN LB
0025 %                       lower bound for the uptake of the metabolites in
0026 %                       the row (optional, default 0 which corresponds to a
0027 %                       minimal uptake of 0 units)
0028 %                   IN UB
0029 %                       upper bound for the uptake of the metabolites in
0030 %                       the row (optional, default 1000 which corresponds to a
0031 %                       maximal uptake of 1000 units)
0032 %                   OUT
0033 %                       allowed output(s) for the task (see IN)
0034 %                   OUT LB
0035 %                       lower bound for the production of the metabolites in
0036 %                       the row (optional, default 0 which corresponds to a
0037 %                       minimal production of 0 units)
0038 %                   OUT UB
0039 %                       upper bound for the production of the metabolites in
0040 %                       the row (optional, default 1000 which corresponds to a
0041 %                       maximal production of 1000 units)
0042 %                   EQU
0043 %                       equation to add. The equation should be on the form
0044 %                       "0.4 A + 2 B <=> (or =>) C" and the metabolites
0045 %                       should be on the form
0046 %                       "model.metName[model.comps]" (optional)
0047 %                   EQU LB
0048 %                       lower bound for the equation (optional, default -1000
0049 %                       for reversible and 0 for irreversible)
0050 %                   EQU UB
0051 %                       upper bound for the equation (optional, default 1000)
0052 %                   CHANGED RXN
0053 %                       reaction ID for which to change the bounds for.
0054 %                       Several IDs can be delimited by ";". If so,
0055 %                       then the same bounds are used for all reactions. If
0056 %                       that is not wanted, then use several rows for the task
0057 %                   CHANGED LB
0058 %                       lower bound for the reaction
0059 %                   CHANGED UB
0060 %                       upper bound for the reaction
0061 %                   SHOULD FAIL
0062 %                       true if the correct behavior of the model is to
0063 %                       not have a feasible solution given the constraints
0064 %                       (optional, default false)
0065 %                   PRINT FLUX
0066 %                       true if the function should print the corresponding
0067 %                       flux distribution for a task. Can be useful for
0068 %                       testing (optional, default false)
0069 %
0070 %   taskStruct      array of structures with the following fields
0071 %       id          the id of the task
0072 %       description the description of the task
0073 %       shouldFail  true if the task should fail
0074 %       printFluxes true if the fluxes should be printed
0075 %       comments    string with comments
0076 %       inputs      cell array with input metabolites (in the form metName[comps])
0077 %       LBin        array with lower bounds on inputs (default, 0)
0078 %       UBin        array with upper bounds on inputs (default, 1000)
0079 %       outputs     cell array with output metabolites (in the form metName[comps])
0080 %       LBout       array with lower bounds on outputs (default, 0)
0081 %       UBout       array with upper bounds on outputs (default, 1000)
0082 %       equations   cell array with equations (with mets in the form metName[comps])
0083 %       LBequ       array with lower bounds on equations (default, -1000 for
0084 %                   reversible and 0 for irreversible)
0085 %       UBequ       array with upper bounds on equations (default, 1000)
0086 %       changed     cell array with reactions to change bounds for
0087 %       LBrxn       array with lower bounds on changed reactions
0088 %       UBrxn       array with upper bounds on changed reactions
0089 %
0090 %   This function is used for defining a set of tasks for a model to
0091 %   perform. The tasks are defined by defining constraints on the model,
0092 %   and if the problem is feasible, then the task is considered successful.
0093 %   In general, each row can contain one constraint on uptakes, one
0094 %   constraint on outputs, one new equation, and one change of reaction
0095 %   bounds. If more bounds are needed to define the task, then several rows
0096 %   can be used for each task. To perform the task use checkTasks or
0097 %   fitTasks.
0098 %
0099 %   NOTE: The general metabolites "ALLMETS" and "ALLMETSIN[comps]"
0100 %   can be used as inputs or outputs in the similar manner to normal
0101 %   metabolites. This is a convenient way to, for example, allow excretion of
0102 %   all metabolites to check whether it's the synthesis of some metabolite
0103 %   that is limiting or whether it's the degradation of some byproduct. One
0104 %   important difference is that only the upper bounds are used for these general
0105 %   metabolites. That is, you can only say that uptake or excretion is
0106 %   allowed, not that it is required. This is to avoid conflicts where the
0107 %   constraints for the general metabolites overwrite those of the real
0108 %   ones.
0109 %
0110 % Usage: taskStruct=parseTaskList(inputFile)
0111 
0112 if ~isfile(inputFile)
0113     error('Task list %s cannot be found',string(inputFile));
0114 end
0115 
0116 %Load the tasks file
0117 convNumeric = false;
0118 if strcmp(extractAfter(inputFile,strlength(inputFile) - 4), '.txt')
0119     %load from tab delimited text file
0120     fid = fopen(inputFile);
0121     %Need to read numeric columns as strings, this is converted further
0122     %down. If not, the titles would be lost.
0123     convNumeric = true;
0124     C = textscan(fid,'%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%q%*[^\n]', 'Delimiter', '\t');
0125     fclose(fid);
0126     raw = [C{:}];%unnest the cell array of cell arrays into a 2-dim cell array
0127 else
0128     [raw,flag]=loadSheet(loadWorkbook(inputFile), 'TASKS');
0129     if flag~=0
0130         EM=['Could not load sheet "TASKS" from ' inputFile];
0131         dispEM(EM);
0132     end
0133 end
0134 
0135 %Remove all lines starting with "#" (or actually any character) and all
0136 %empty columns
0137 raw=cleanSheet(raw);
0138 
0139 %Captions
0140 columns={'ID';'DESCRIPTION';'IN';'IN LB';'IN UB';'OUT';'OUT LB';'OUT UB';'EQU';'EQU LB';'EQU UB';'CHANGED RXN';'CHANGED LB';'CHANGED UB';'SHOULD FAIL';'PRINT FLUX';'COMMENTS'};
0141 %Match the columns
0142 [I, colI]=ismember(columns,raw(1,:));
0143 
0144 %If read from  a text file, the numbers will be strings - fix that
0145 if convNumeric % in theory, this if should not be needed, the code should do nothing if all are already numeric. But it is kept as a safeguard.
0146     numericColumns = [0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0] == 1;
0147     cols = colI(numericColumns);
0148     numeric = cellfun(@isnumeric,raw(:,cols));
0149     %trick to avoid messing up the title row:
0150     numeric(1,:) = 1;
0151     for colind = 1:numel(cols)
0152         col = cols(colind);
0153         raw(~numeric(:,colind),col) = cellfun(@str2num, raw(~numeric(:,colind),col), 'UniformOutput', false);
0154     end
0155 end
0156 
0157 %Check that the ID field is present
0158 if I(1)==0
0159     EM='The TASKS sheet must have a column named ID';
0160     dispEM(EM);
0161 end
0162 
0163 %make sure numerical fields are converted from strings
0164 
0165 %Add default bounds where needed
0166 for i=[4 5 7 8]
0167     I=cellfun(@isempty,raw(:,colI(i)));
0168     if i==5 || i==8
0169         raw(I,colI(i))={1000};
0170     else
0171         raw(I,colI(i))={0};
0172     end
0173 end
0174 
0175 %Create an empty task structure
0176 eTask.id='';
0177 eTask.description='';
0178 eTask.shouldFail=false;
0179 eTask.printFluxes=false;
0180 eTask.comments='';
0181 eTask.inputs={};
0182 eTask.LBin=[];
0183 eTask.UBin=[];
0184 eTask.outputs={};
0185 eTask.LBout=[];
0186 eTask.UBout=[];
0187 eTask.equations={};
0188 eTask.LBequ=[];
0189 eTask.UBequ=[];
0190 eTask.changed={};
0191 eTask.LBrxn=[];
0192 eTask.UBrxn=[];
0193 
0194 %Main loop
0195 taskStruct=[];
0196 task=eTask;
0197 if isnumeric(raw{2,colI(1)})
0198     task.id=num2str(raw{2,colI(1)});
0199 else
0200     task.id=raw{2,colI(1)};
0201 end
0202 task.description=raw{2,colI(2)};
0203 if ~isempty(raw{2,colI(15)})
0204     task.shouldFail=true;
0205 end
0206 if ~isempty(raw{2,colI(16)})
0207     task.printFluxes=true;
0208 end
0209 if ~isempty(raw{2,colI(17)})
0210     task.comments=raw{2,colI(17)};
0211 end
0212 
0213 for i=2:size(raw,1)
0214     %Set the inputs
0215     if ischar(raw{i,colI(3)})
0216         inputs=regexp(raw{i,colI(3)},';','split');
0217         task.inputs=[task.inputs;inputs(:)];
0218         task.LBin=[task.LBin;ones(numel(inputs),1)*raw{i,colI(4)}];
0219         task.UBin=[task.UBin;ones(numel(inputs),1)*raw{i,colI(5)}];
0220     end
0221     %Set the outputs
0222     if ischar(raw{i,colI(6)})
0223         outputs=regexp(raw{i,colI(6)},';','split');
0224         task.outputs=[task.outputs;outputs(:)];
0225         task.LBout=[task.LBout;ones(numel(outputs),1)*raw{i,colI(7)}];
0226         task.UBout=[task.UBout;ones(numel(outputs),1)*raw{i,colI(8)}];
0227     end
0228     %Add new rxns
0229     if ischar(raw{i,colI(9)})
0230         task.equations=[task.equations;raw{i,colI(9)}];
0231         if ~isempty(raw{i,colI(10)})
0232             task.LBequ=[task.LBequ;raw{i,colI(10)}];
0233         else
0234             if any(strfind(raw{i,colI(9)},'<=>'))
0235                 task.LBequ=[task.LBequ;-1000];
0236             else
0237                 task.LBequ=[task.LBequ;0];
0238             end
0239         end
0240         if ~isempty(raw{i,colI(11)})
0241             task.UBequ=[task.UBequ;raw{i,colI(11)}];
0242         else
0243             task.UBequ=[task.UBequ;1000];
0244         end
0245     end
0246     %Add changed bounds
0247     if ischar(raw{i,colI(12)})
0248         changed=regexp(raw{i,colI(12)},';','split');
0249         task.changed=[task.changed;changed(:)];
0250         task.LBrxn=[task.LBrxn;ones(numel(changed),1)*raw{i,colI(13)}];
0251         task.UBrxn=[task.UBrxn;ones(numel(changed),1)*raw{i,colI(14)}];
0252     end
0253     
0254     %Check if it should add more constraints
0255     if i<size(raw,1)
0256         if isempty(raw{i+1,colI(1)})
0257             continue;
0258         end
0259     end
0260     
0261     taskStruct=[taskStruct;task];
0262     task=eTask;
0263     if i<size(raw,1)
0264         if isnumeric(raw{i+1,colI(1)})
0265             task.id=num2str(raw{i+1,colI(1)});
0266         else
0267             task.id=raw{i+1,colI(1)};
0268         end
0269         task.description=raw{i+1,colI(2)};
0270         if ~isempty(raw{i+1,colI(15)})
0271             task.shouldFail=true;
0272         end
0273         if ~isempty(raw{i+1,colI(16)})
0274             task.printFluxes=true;
0275         end
0276         if ~isempty(raw{i+1,colI(17)})
0277             task.comments=raw{i+1,colI(17)};
0278         end
0279     end
0280 end
0281 
0282 %Should add more checks, such as unique IDs and missing headers
0283 
0284 end

Generated by m2html © 2005