Home > core > predictLocalization.m

predictLocalization

PURPOSE ^

predictLocalization

SYNOPSIS ^

function [outModel, geneLocalization, transportStruct, scores,removedRxns] = predictLocalization(model, GSS,defaultCompartment, transportCost, maxTime, plotResults)

DESCRIPTION ^

 predictLocalization
    Tries to assign reactions to compartments in a manner that is in
    agreement with localization predictors while at the same time
    maintaining connectivity.

   Input:
    model                   a model structure. If the model contains
                           several compartments they will be merged
    GSS                     a gene scoring structure as from parseScores
    defaultCompartment      transport reactions are expressed as diffusion
                           between the defaultCompartment and the others.
                           This is usually the cytosol. The default
                           compartment must have a match in GSS
    transportCost           the cost for including a transport reaction. If
                           this a scalar then the same cost is used for
                           all metabolites. It can also be a vector of
                           costs with the same dimension as model.mets.
                           Note that negative costs will result in that
                           transport of the metabolite is encouraged (opt,
                           default 0.5)
    maxTime                 maximum optimization time in minutes (opt,
                           default 15)
    plotResults             true if the results should be plotted during the
                           optimization (opt, default false)

   Output:
    outModel                the resulting model structure
    geneLocalization        structure with the genes and their resulting
                           localization
    transportStruct         structure with the transport reactions that had
                           to be inferred and between which compartments
    scores                  structure that contains the total score history
                           together with the score based on gene
                           localization and the score based on included
                           transport reactions
    removedRxns             cell array with the reaction ids that had to be
                           removed in order to have a connected input
                           model

    This function requires that the starting network is connected when it
    is in one compartment. Reactions that are unconnected are removed and
    saved in removedRxns. Try running fillGaps to have a more connected
    input model if there are many such reactions. The input model should
    also not include any exchange, demand or sink reactions, otherwise this
    function would not provide any results.

    In the final model all metabolites are produced in at least one
    reaction. This does not guarantee a fully functional model since there
    can be internal loops. Transport reactions are only included as passive
    diffusion (A <=> B).

    The score of a model is the sum of scores for all genes in their
    assigned compartment minus the cost of all transport reactions that had
    to be included. A gene can only be assigned to one compartment. This is
    a simplification to keep the problem size down. The problem is solved
    using simulated annealing.

   Usage: [outModel, geneLocalization, transportStruct, scores,...
       removedRxns] = predictLocalization(model, GSS,...
       defaultCompartment, transportCost, maxTime, plotResults)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [outModel, geneLocalization, transportStruct, scores,...
0002     removedRxns] = predictLocalization(model, GSS,...
0003     defaultCompartment, transportCost, maxTime, plotResults)
0004 % predictLocalization
0005 %    Tries to assign reactions to compartments in a manner that is in
0006 %    agreement with localization predictors while at the same time
0007 %    maintaining connectivity.
0008 %
0009 %   Input:
0010 %    model                   a model structure. If the model contains
0011 %                           several compartments they will be merged
0012 %    GSS                     a gene scoring structure as from parseScores
0013 %    defaultCompartment      transport reactions are expressed as diffusion
0014 %                           between the defaultCompartment and the others.
0015 %                           This is usually the cytosol. The default
0016 %                           compartment must have a match in GSS
0017 %    transportCost           the cost for including a transport reaction. If
0018 %                           this a scalar then the same cost is used for
0019 %                           all metabolites. It can also be a vector of
0020 %                           costs with the same dimension as model.mets.
0021 %                           Note that negative costs will result in that
0022 %                           transport of the metabolite is encouraged (opt,
0023 %                           default 0.5)
0024 %    maxTime                 maximum optimization time in minutes (opt,
0025 %                           default 15)
0026 %    plotResults             true if the results should be plotted during the
0027 %                           optimization (opt, default false)
0028 %
0029 %   Output:
0030 %    outModel                the resulting model structure
0031 %    geneLocalization        structure with the genes and their resulting
0032 %                           localization
0033 %    transportStruct         structure with the transport reactions that had
0034 %                           to be inferred and between which compartments
0035 %    scores                  structure that contains the total score history
0036 %                           together with the score based on gene
0037 %                           localization and the score based on included
0038 %                           transport reactions
0039 %    removedRxns             cell array with the reaction ids that had to be
0040 %                           removed in order to have a connected input
0041 %                           model
0042 %
0043 %    This function requires that the starting network is connected when it
0044 %    is in one compartment. Reactions that are unconnected are removed and
0045 %    saved in removedRxns. Try running fillGaps to have a more connected
0046 %    input model if there are many such reactions. The input model should
0047 %    also not include any exchange, demand or sink reactions, otherwise this
0048 %    function would not provide any results.
0049 %
0050 %    In the final model all metabolites are produced in at least one
0051 %    reaction. This does not guarantee a fully functional model since there
0052 %    can be internal loops. Transport reactions are only included as passive
0053 %    diffusion (A <=> B).
0054 %
0055 %    The score of a model is the sum of scores for all genes in their
0056 %    assigned compartment minus the cost of all transport reactions that had
0057 %    to be included. A gene can only be assigned to one compartment. This is
0058 %    a simplification to keep the problem size down. The problem is solved
0059 %    using simulated annealing.
0060 %
0061 %   Usage: [outModel, geneLocalization, transportStruct, scores,...
0062 %       removedRxns] = predictLocalization(model, GSS,...
0063 %       defaultCompartment, transportCost, maxTime, plotResults)
0064 
0065 if nargin<4
0066     transportCost=ones(numel(model.mets),1)*0.5;
0067 end
0068 if numel(transportCost)==1
0069     transportCost=ones(numel(model.mets),1)*transportCost;
0070 end
0071 transportCost=transportCost(:);
0072 
0073 if numel(transportCost)~=numel(model.mets)
0074     EM='The vector of transport costs must have the same dimension as model.mets';
0075     dispEM(EM,true);
0076 end
0077 if nargin<5
0078     maxTime=15;
0079 end
0080 if nargin<6
0081     plotResults=false;
0082 end
0083 
0084 if isfield(model,'rxnComps')
0085     model=rmfield(model,'rxnComps');
0086     EM='The model structure contains information about reaction compartmentalization. This is not supported by this function. The rxnComps field has been deleted';
0087     dispEM(EM,false);
0088 end
0089 if isfield(model,'geneComps')
0090     model=rmfield(model,'geneComps');
0091     EM='The model structure contains information about gene compartmentalization. This is not supported by this function. The geneComps field has been deleted';
0092     dispEM(EM,false);
0093 end
0094 
0095 defaultCompartment=char(defaultCompartment);
0096 I=ismember(defaultCompartment,GSS.compartments);
0097 if I==false
0098     EM='defaultCompartment not found in GSS';
0099     dispEM(EM);
0100 end
0101 
0102 if numel(model.comps)>1
0103     EM='The model has several compartments. All compartments will be merged';
0104     dispEM(EM,false);
0105     model=mergeCompartments(model,true,true);
0106 end
0107 
0108 noGenes = ismember(model.genes,GSS.genes);
0109 if ~all(noGenes)
0110     EM=['For ' num2str(numel(find(~noGenes))) ' of ' num2str(numel(model.genes)) ' model genes no data was found in GSS'];
0111     dispEM(EM,false);
0112 end
0113 
0114 %***Begin formating the data structures
0115 
0116 %Expand the model so that iso-enzymes have different reactions
0117 model=expandModel(model);
0118 
0119 %Identify reactions that have to be deleted because the involved mets are
0120 %never produced. This is done in an iterative manner
0121 removedRxns={};
0122 %This is to keep track of which metabolites are removed in this step. It is
0123 %needed to adjust the transport costs
0124 originalModelMets=model.mets;
0125 while 1
0126     irrevModel=convertToIrrev(model);
0127     
0128     I=sum(irrevModel.S>0,2);
0129     
0130     %Pretend that the unconstrained metabolites are made enough
0131     if isfield(irrevModel,'unconstrained')
0132         I(irrevModel.unconstrained~=0)=2;
0133     end
0134     metsToDelete=false(numel(model.mets),1);
0135     
0136     %This is not very neat but iterate through each metabolite and check
0137     %whether it can be produced (without using only one isolated reversible
0138     %reaction)
0139     for i=1:numel(irrevModel.mets)
0140         %If something can be made in two reactions then everything is fine.
0141         %If it can be made in one reaction it is fine unless it is through
0142         %an isolated reversible reaction (which can act as a mini loop)
0143         if I(i)<2
0144             if I(i)==1
0145                 %Find the reaction where this metabolite is produced
0146                 [~, J]=find(irrevModel.S(i,:)>0);
0147                 
0148                 %Check the metabolites that are consumed in this reaction.
0149                 %The problem is if any of them is only produced in the
0150                 %opposite reversible reaction
0151                 K=irrevModel.S(:,J)<0;
0152                 check=find(K & I<=1);
0153                 
0154                 for j=1:numel(check)
0155                     %Find the reactions where it participates
0156                     [~, L]=find(irrevModel.S(check(j),:)>0);
0157                     
0158                     if ~isempty(L)
0159                         rxn=irrevModel.rxns(J);
0160                         rxnRev=irrevModel.rxns(L);
0161                         if strcmp(strrep(rxn,'_REV',''),strrep(rxnRev,'_REV',''))
0162                             metsToDelete(i)=true;
0163                         end
0164                     else
0165                         %If the metabolite was never produced then do
0166                         %nothing and deal with it when the loop gets there
0167                         continue;
0168                     end
0169                 end
0170             else
0171                 %Not made anywhere
0172                 metsToDelete(i)=true;
0173             end
0174         end
0175     end
0176     
0177     if any(metsToDelete)
0178         %Delete any reactions involving any of the metsToDelete
0179         [~, I]=find(model.S(metsToDelete,:));
0180         removedRxns=[removedRxns;model.rxns(I)];
0181         model=removeReactions(model,I,true,true);
0182     else
0183         %All bad reactions deleted
0184         break;
0185     end
0186 end
0187 
0188 %Adjust the transport costs
0189 transportCost=transportCost(ismember(originalModelMets,model.mets));
0190 
0191 %Assign fake genes to reactions without genes. This is just to make things
0192 %easier later on
0193 I=find(sum(model.rxnGeneMat,2)==0);
0194 for i=1:numel(I)
0195     model.genes=[model.genes;['&&FAKE&&' num2str(i)]];
0196     if isfield(model,'geneShortNames')
0197         model.geneShortNames=[model.geneShortNames;{''}];
0198     end
0199     if isfield(model,'geneMiriams')
0200         model.geneMiriams=[model.geneMiriams;{[]}];
0201     end
0202     if isfield(model,'geneFrom')
0203         model.geneFrom=[model.geneFrom;{{'FAKE'}}];
0204     end
0205     model.rxnGeneMat(I(i),numel(model.genes))=1;
0206     model.grRules{I(i)}='';
0207 end
0208 
0209 %Update the GSS. All genes, fake or real, for which there is no evidence
0210 %gets a score 0.5 in all compartments. Also just to make it easier further
0211 %on
0212 I=setdiff(model.genes,GSS.genes);
0213 GSS.genes=[GSS.genes;I];
0214 GSS.scores=[GSS.scores;ones(numel(I),numel(GSS.compartments))*0.5];
0215 
0216 %Gene complexes should be moved together in order to be biologically
0217 %relevant. The average score for the genes is used for each compartment.
0218 %This is done by changing the model so that gene complexes are used as a
0219 %single gene name and then a score is calculated for that "gene".
0220 
0221 %Only "and"-relationships exist after expandModel
0222 genes=unique(model.grRules);
0223 nGenes=strrep(genes,'(','');
0224 nGenes=strrep(nGenes,')','');
0225 %nGenes=strrep(nGenes,' and ','_and_');
0226 complexes=setdiff(nGenes,model.genes);
0227 if ~isempty(complexes)
0228     if isempty(complexes{1}) %Empty grRules also come up here
0229         complexes(1)=[];
0230     end
0231 end
0232 cScores=zeros(numel(complexes),numel(GSS.compartments));
0233 for i=1:numel(complexes)
0234     genesInComplex=regexp(complexes{i},' and ','split');
0235     
0236     %Find these genes in GSS
0237     [I, J]=ismember(genesInComplex,GSS.genes);
0238     
0239     if any(I)
0240         %Get the average of the genes that were found
0241         mScores=mean(GSS.scores(J(I),:));
0242         
0243         %And add 0.5 for the genes that were not found in order to be
0244         %consistent with non-complexes
0245         mScores=(mScores.*sum(I)+(numel(genesInComplex)-sum(I))*0.5)/numel(genesInComplex);
0246     else
0247         EM=['Could not parse grRule "' complexes{i} '". Assigning score 0.0 in all compartments'];
0248         dispEM(EM,false);
0249         mScores=ones(1,numel(genesInComplex))*0.5;
0250     end
0251     cScores(i,:)=mScores;
0252     
0253     %Add this complex as a new gene
0254     model.genes=[model.genes;complexes{i}];
0255     if isfield(model,'geneMiriams')
0256         model.geneMiriams=[model.geneMiriams;{[]}];
0257     end
0258     if isfield(model,'geneShortNames')
0259         model.geneShortNames=[model.geneShortNames;{''}];
0260     end
0261     if isfield(model,'geneFrom')
0262         model.geneFrom=[model.geneFrom;{'COMPLEX'}];
0263     end
0264     %Find the reactions which had the original complex and change them to
0265     %use the new "gene"
0266     I=ismember(model.grRules,['(' complexes{i} ')']);
0267     
0268     %Should check more carefully if there can be an error here
0269     if ~isempty(I)
0270         model.rxnGeneMat(I,:)=0; %Ok since the split on "or" was applied
0271         model.rxnGeneMat(I,numel(model.genes))=1;
0272     end
0273 end
0274 
0275 %Add the new "genes"
0276 GSS.genes=[GSS.genes;complexes];
0277 GSS.scores=[GSS.scores;cScores];
0278 
0279 %After merging the complexes it could happen that there are genes that are
0280 %no longer in use. Delete such genes
0281 model=removeReactions(model,{},false,true);
0282 
0283 %Exchange reactions, defined as involving an unconstrained metabolite, are
0284 %special in that they have to stay in the defaultCompartment. This means
0285 %that uptake/excretion of metabolites is always via the default
0286 %compartment. This is a small simplification, but should be valid in most
0287 %cases
0288 [~, I]=getExchangeRxns(model);
0289 
0290 %It will be easier later on if the same place. Put them in the beginning
0291 J=1:numel(model.rxns);
0292 J(I)=[];
0293 model=permuteModel(model,[I;J'],'rxns');
0294 
0295 %Number of exchange reactions
0296 nER=numel(I);
0297 
0298 %Also put the exchange metabolites in the beginning
0299 if isfield(model,'unconstrained')
0300     I=find(model.unconstrained);
0301     J=1:numel(model.mets);
0302     J(I)=[];
0303     model=permuteModel(model,[I;J'],'mets');
0304     %Also reorder the transport costs
0305     transportCost=transportCost([I;J']);
0306     %Number of exchange metabolites
0307     nEM=numel(I);
0308 else
0309     nEM=0;
0310 end
0311 
0312 %There is no point of having genes for exchange reactions, so delete them.
0313 %Also to make computations easier
0314 model.rxnGeneMat(1:nER,:)=0;
0315 model.grRules(1:nER)={''};
0316 
0317 %Remove unused genes
0318 model=removeReactions(model,{},false,true);
0319 
0320 %Remove genes with no match to the model and reorder so that the genes are
0321 %in the same order as model.genes. Since the fake genes are already added
0322 %so that all genes in model exist in GSS it is fine to do like this
0323 [~, J]=ismember(model.genes,GSS.genes);
0324 GSS.genes=model.genes;
0325 GSS.scores=GSS.scores(J,:);
0326 
0327 %Reorder the GSS so that the first index corresponds to the default
0328 %compartment
0329 [~, J]=ismember(defaultCompartment,GSS.compartments);
0330 reorder=1:numel(GSS.compartments);
0331 reorder(J)=[];
0332 reorder=[J reorder];
0333 GSS.scores=GSS.scores(:,reorder);
0334 GSS.compartments=GSS.compartments(reorder);
0335 
0336 %Since it is only checked whether the metabolites can be synthesized, there
0337 %is no need to care about the stoichiometry. Change to -1/1 to simplify
0338 %later. Keep the S matrix for later though
0339 oldS=model.S;
0340 model.S(model.S>0)=1;
0341 model.S(model.S<0)=-1;
0342 
0343 %Here is a bit of a trick. To avoid the recurring calculation which
0344 %reactions are reversible, the reversible reactions have the coefficients
0345 %-10/10 instead of -1/1
0346 model.S(:,model.rev==1)=model.S(:,model.rev==1).*10;
0347 
0348 %***Begin problem formulation
0349 
0350 %Some numbers that are good to have
0351 nRxns=numel(model.rxns)-nER; %Excluding exchange rxns
0352 nMets=numel(model.mets)-nEM; %Excluding exchange mets
0353 nGenes=numel(model.genes);
0354 nComps=numel(GSS.compartments);
0355 
0356 %Create a big stoichiometric matrix that will be the current model. In
0357 %order to have faster simulations the maximal model size is declared and
0358 %reactions are then moved within it.
0359 
0360 %First the original model (with the first nE being exchange rxns), then
0361 %reserve space for number of rxns minus exchange rxns for each non-default
0362 %compartment, then transport reactions for all non-exchange mets between
0363 %the default compartment and all others.
0364 %NOTE: Kept eye()*0 since eye() can be used to include all transport from
0365 %the beginning
0366 s=repmat(eye(nMets)*0,1,nComps-1);
0367 s=[zeros(numel(model.mets)-nMets,size(s,2));s];
0368 S=[model.S sparse(numel(model.mets),nRxns*(nComps-1)) s];
0369 s=[sparse(nMets*(nComps-1),numel(model.rxns)+nRxns*(nComps-1)) eye(nMets*(nComps-1))*0];
0370 S=[S;s];
0371 
0372 %Also replicate the transport costs
0373 transportCost=[transportCost(1:nEM);repmat(transportCost(nEM+1:end),nComps,1)];
0374 
0375 %Create a binary matrix that says where the genes are in the current
0376 %solution
0377 g2c=false(nGenes,nComps);
0378 %All genes start in the default compartment
0379 g2c(:,1)=true;
0380 
0381 %Start of main optimization loop
0382 tic;
0383 bestScore=-inf;
0384 bestS=[];
0385 bestg2c=[];
0386 
0387 %Temp for testing
0388 plotScore=[];
0389 nTrans=[];
0390 totScore=[];
0391 minScore=sum(min(GSS.scores,[],2));
0392 maxScore=sum(max(GSS.scores,[],2));
0393 
0394 while toc<maxTime*60
0395     %Pick a random gene, weighted by it is current score minus the best
0396     %score for that gene (often 1.0, but can be 0.5 for no genes or average
0397     %for complexes). Genes with bad fits are more likely to be moved. This
0398     %formulation never moves a gene from its best compartment. Therefore a
0399     %small uniform weight is added
0400     [I, J]=find(g2c);
0401     geneToMove=randsample(nGenes,1,true,max(GSS.scores(I,:),[],2)-GSS.scores(sub2ind(size(g2c),I,J))+0.1);
0402     
0403     %Sample among possible compartments to move to. Add a larger weight to
0404     %even out the odds a little. Also a way of getting rid of loops where
0405     %the same set of genes are moved back and forth several times
0406     toComp=randsample(nComps,1,true,GSS.scores(geneToMove,:)+0.2);
0407     
0408     %Check that it moves to a new compartment
0409     if toComp==find(g2c(geneToMove,:))
0410         continue;
0411     end
0412     
0413     %Moves the gene
0414     [newS, newg2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets);
0415     
0416     %Tries to connect the network. If this was not possible in 10
0417     %iterations, then abort. If more than 20 modifications were needed then
0418     %it is unlikely that it will be a lower score
0419     wasConnected=false;
0420     for j=1:10
0421         %Find the metabolites that are now unconnected
0422         unconnected=findUnconnected(newS,nEM);
0423         
0424         %Continue if there are still unconnected
0425         if any(unconnected)
0426             %For each gene find out how many of these could be connected if
0427             %the gene was moved and how many would be disconnected by
0428             %moving that gene
0429             [geneIndex, moveTo, deltaConnected, deltaScore]=selectGenes(newS,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS);
0430             
0431             %Score which gene would be the best to move. The highest
0432             %deltaScore is 1.0. It should be possible to move a gene from
0433             %worst to best compartment even if it disconnects, say, 1.5
0434             %more metabolites
0435             [score, I]=max(1.5*deltaScore+deltaConnected);
0436             
0437             %Check if it has to add a transport or if there is a gene that
0438             %could be moved order to have a more connected network
0439             hasToAddTransport=true;
0440             if ~isempty(deltaConnected)
0441                 if score>0
0442                     hasToAddTransport=false;
0443                 end
0444             end
0445             
0446             %If it is possible to move any gene in order to have a more
0447             %connected network, then move the best one
0448             if hasToAddTransport==false
0449                 [newS, newg2c]=moveGene(newS,model,g2c,geneIndex(I),moveTo(I),nRxns,nMets);
0450             else
0451                 %Choose a random unconnected metabolite that should be
0452                 %connected
0453                 transMet=unconnected(randsample(numel(unconnected),1));
0454                 
0455                 %First get where the metabolite is now
0456                 comps=ceil((transMet-nEM)/((size(S,1)-nEM)/nComps));
0457                 
0458                 %Find the corresponding metabolite index if it were in the
0459                 %default compartment
0460                 dcIndex=transMet-(comps-1)*nMets;
0461                 
0462                 %Then get the indexes of that metabolite in all
0463                 %compartments
0464                 allIndexes=dcIndex;
0465                 for k=1:nComps-1
0466                     allIndexes=[allIndexes;dcIndex+nMets*k];
0467                 end
0468                 
0469                 %It could be that some of these are not used in any
0470                 %reaction. Get only the ones which are
0471                 I=sum(newS(allIndexes,:)~=0,2)>0;
0472                 
0473                 %Then get the ones that are used but not in unconnected.
0474                 %These are metabolites that could potentially be
0475                 %transported to connect transMet
0476                 connectedUsed=setdiff(allIndexes(I),unconnected);
0477                 
0478                 %This may be an error but leave it for now. It seems to
0479                 %happen if nothing can be connected in one step
0480                 if isempty(connectedUsed)
0481                     break;
0482                 end
0483                 
0484                 %If transMet is in the default compartment then everything
0485                 %is fine, just connect it to a random one
0486                 if transMet==dcIndex
0487                     newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,connectedUsed(randsample(numel(connectedUsed),1)));
0488                 else
0489                     %If one of the connectedUsed is in the default
0490                     %compartment then connect to that one
0491                     I=connectedUsed(connectedUsed<(nMets+nEM));
0492                     if any(I)
0493                         newS=addTransport(newS,nRxns,nER,nMets,nEM,nComps,transMet,I(randsample(numel(I),1)));
0494                     else
0495                         %This is if the only way to connect it is by adding
0496                         %two transport reactions, going via the default
0497                         %compartment
0498                         break;
0499                     end
0500                 end
0501             end
0502         else
0503             wasConnected=true;
0504             break;
0505         end
0506     end
0507     
0508     %If the network was connected in a new way, it is possible that some
0509     %transport reactions are no longer needed. They should be removed
0510     if wasConnected==true
0511         %These are the metabolites that are being transported
0512         activeTransport=find(sum(newS(:,nER+nRxns*nComps+1:end),2));
0513         
0514         %Get the metabolites that are unconnected if transport was not used
0515         unconnected=findUnconnected(newS(:,1:nER+nRxns*nComps),nEM);
0516         
0517         %Find the transport reactions that are not needed and delete them
0518         I=setdiff(activeTransport,unconnected);
0519         
0520         %Since both metabolites in a transport rxns must be connected for
0521         %the reaction to be deleted, the sum over the colums should be 4
0522         newS(:,find(sum(newS(I,nER+nRxns*nComps+1:end))==4)+nER+nRxns*nComps)=0;
0523         
0524         %Score the solution and determine whether to keep it as a new
0525         %solution
0526         [score, geneScore, trCost]=scoreModel(newS,newg2c,GSS,transportCost);
0527         
0528         %If it was the best solution so far, keep it
0529         if score>bestScore
0530             bestScore=score;
0531             bestS=newS;
0532             bestg2c=newg2c;
0533         end
0534         
0535         %This should not be steepest descent later
0536         if score>=bestScore% || exp((score-bestScore)*7)>rand()
0537             plotScore=[plotScore;geneScore];
0538             nTrans=[nTrans;trCost];
0539             totScore=[totScore;score];
0540             S=newS;
0541             g2c=newg2c;
0542             
0543             if plotResults==true
0544                 subplot(3,2,1);
0545                 spy(S);
0546                 subplot(3,2,2);
0547                 plot(plotScore,'r');
0548                 xlabel('Gene score');
0549                 subplot(3,2,3);
0550                 plot((plotScore-minScore)/(maxScore-minScore),'r');
0551                 xlabel('Gene score relative to predictions');
0552                 subplot(3,2,4);
0553                 plot(nTrans,'g');
0554                 xlabel('Transport cost');
0555                 subplot(3,2,5);
0556                 plot(totScore,'b');
0557                 xlabel('Total score');
0558                 subplot(3,2,6);
0559                 pause(0.2);
0560             end
0561         end
0562     end
0563 end
0564 scores.totScore=score;
0565 scores.geneScore=geneScore;
0566 scores.transCost=trCost;
0567 
0568 %Find which metabolites are transported and to where
0569 [I, J]=find(bestS(nEM+1:nEM+nMets,end-nMets*(nComps-1)+1:end));
0570 J=ceil(J/nMets+1);
0571 transportStruct.mets=model.metNames(I+nEM);
0572 transportStruct.toComp=GSS.compartments(J);
0573 
0574 [I, J]=find(bestg2c);
0575 geneLocalization.genes=GSS.genes(I);
0576 geneLocalization.comps=GSS.compartments(J);
0577 
0578 %Resort the gene names
0579 [~, I]=sort(geneLocalization.genes);
0580 geneLocalization.genes=geneLocalization.genes(I);
0581 geneLocalization.comps=geneLocalization.comps(I);
0582 
0583 %Remove the fake genes
0584 I=strncmp('&&FAKE&&',geneLocalization.genes,8);
0585 geneLocalization.genes(I)=[];
0586 geneLocalization.comps(I)=[];
0587 
0588 %Put together the model. This is done by first duplicating the S matrix
0589 %into the different compartments. Then the transport reactions are added
0590 %based on transportStruct. By now model.S should have the same size as the
0591 %S matrix used in the optimization, but with conserved stoichiometry. In
0592 %the final step all reactions and metabolites that are not used in the S
0593 %matrix from the optimization are deleted from the model
0594 outModel=model;
0595 outModel.S=oldS;
0596 
0597 %This is the S matrix without exchange rxns or metabolites
0598 copyPart=outModel.S(nEM+1:end,nER+1:end);
0599 
0600 %Replicate to give the rxnGeneMat for the full system
0601 copyRxnGeneMat=outModel.rxnGeneMat(nER+1:end,:);
0602 outModel.rxnGeneMat=[outModel.rxnGeneMat;repmat(copyRxnGeneMat,nComps-1,1)];
0603 
0604 %First fix the compartments. The model is already ordered with the exchange
0605 %metabolites first. The original model may contain one or two compartments,
0606 %depending on whether any exchange metabolites are defined
0607 nStartComps=numel(outModel.comps);
0608 if nStartComps==1
0609     outModel.comps={'1'};
0610     outModel.compNames=GSS.compartments(1);
0611 else
0612     if model.metComps(1)==1
0613         outModel.compNames(1)=GSS.compartments(1);
0614     else
0615         outModel.compNames(2)=GSS.compartments(1);
0616     end
0617 end
0618 outModel.compNames=[outModel.compNames;GSS.compartments(2:end)'];
0619 
0620 %Ugly little loop
0621 for i=1:numel(GSS.compartments)-1
0622     outModel.comps=[outModel.comps;num2str(numel(outModel.comps)+1)];
0623 end
0624 %This information is not known from the data, so empty fields are added
0625 outModel.compOutside=cell(numel(outModel.comps),1);
0626 outModel.compOutside(:)={''};
0627 
0628 for i=1:nComps-1
0629     outModel.S=[outModel.S sparse(size(outModel.S,1),nRxns)];
0630     outModel.S=[outModel.S;[sparse(nMets,nRxns*i+nER) copyPart]];
0631     outModel.rxns=[outModel.rxns;strcat(outModel.rxns(nER+1:nER+nRxns),'_',GSS.compartments{i+1})];
0632     outModel.rxnNames=[outModel.rxnNames;strcat(outModel.rxnNames(nER+1:nER+nRxns),' (',GSS.compartments{i+1},')')];
0633     outModel.lb=[outModel.lb;outModel.lb(nER+1:nER+nRxns)];
0634     outModel.ub=[outModel.ub;outModel.ub(nER+1:nER+nRxns)];
0635     outModel.rev=[outModel.rev;outModel.rev(nER+1:nER+nRxns)];
0636     outModel.c=[outModel.c;outModel.c(nER+1:nER+nRxns)];
0637     if isfield(outModel,'grRules')
0638         outModel.grRules=[outModel.grRules;outModel.grRules(nER+1:nER+nRxns)];
0639     end
0640     if isfield(outModel,'subSystems')
0641         outModel.subSystems=[outModel.subSystems;outModel.subSystems(nER+1:nER+nRxns)];
0642     end
0643     if isfield(outModel,'eccodes')
0644         outModel.eccodes=[outModel.eccodes;outModel.eccodes(nER+1:nER+nRxns)];
0645     end
0646     if isfield(outModel,'rxnFrom')
0647         outModel.rxnFrom=[outModel.rxnFrom;outModel.rxnFrom(nER+1:nER+nRxns)];
0648     end
0649     if isfield(outModel,'rxnMiriams')
0650         outModel.rxnMiriams=[outModel.rxnMiriams;outModel.rxnMiriams(nER+1:nER+nRxns)];
0651     end
0652     if isfield(outModel,'rxnNotes')
0653         outModel.rxnNotes=[outModel.rxnNotes;outModel.rxnNotes(nER+1:nER+nRxns)];
0654     end
0655     if isfield(outModel,'rxnReferences')
0656         outModel.rxnReferences=[outModel.rxnReferences;outModel.rxnReferences(nER+1:nER+nRxns)];
0657     end
0658     if isfield(outModel,'rxnConfidenceScores')
0659         outModel.rxnConfidenceScores=[outModel.rxnConfidenceScores;outModel.rxnConfidenceScores(nER+1:nER+nRxns)];
0660     end
0661     if isfield(outModel,'rxnDeltaG')
0662         outModel.rxnDeltaG=[outModel.rxnDeltaG;outModel.rxnDeltaG(nER+1:nER+nRxns)];
0663     end
0664     outModel.mets=[outModel.mets;strcat(outModel.mets(nEM+1:nEM+nMets),'_',GSS.compartments{i+1})];
0665     outModel.metNames=[outModel.metNames;outModel.metNames(nEM+1:nEM+nMets)];
0666     outModel.b=[outModel.b;outModel.b(nEM+1:nEM+nMets,:)];
0667     I=ones(nMets,1)*nStartComps+i;
0668     outModel.metComps=[outModel.metComps;I];
0669     if isfield(outModel,'inchis')
0670         outModel.inchis=[outModel.inchis;outModel.inchis(nEM+1:nEM+nMets)];
0671     end
0672     if isfield(outModel,'metSmiles')
0673         outModel.metSmiles=[outModel.metSmiles;outModel.metSmiles(nEM+1:nEM+nMets)];
0674     end
0675     if isfield(outModel,'unconstrained')
0676         outModel.unconstrained=[outModel.unconstrained;outModel.unconstrained(nEM+1:nEM+nMets)];
0677     end
0678     if isfield(outModel,'metMiriams')
0679         outModel.metMiriams=[outModel.metMiriams;outModel.metMiriams(nEM+1:nEM+nMets)];
0680     end
0681     if isfield(outModel,'metFormulas')
0682         outModel.metFormulas=[outModel.metFormulas;outModel.metFormulas(nEM+1:nEM+nMets)];
0683     end
0684     if isfield(outModel,'metFrom')
0685         outModel.metFrom=[outModel.metFrom;outModel.metFrom(nEM+1:nEM+nMets)];
0686     end
0687     if isfield(outModel,'metCharges')
0688         outModel.metCharges=[outModel.metCharges;outModel.metCharges(nEM+1:nEM+nMets)];
0689     end
0690     if isfield(outModel,'metDeltaG')
0691         outModel.metDeltaG=[outModel.metDeltaG;outModel.metDeltaG(nEM+1:nEM+nMets)];
0692     end
0693 end
0694 
0695 %Add the transport reactions
0696 transS=bestS(:,numel(outModel.rxns)+1:end);
0697 J=sum(transS)>0; %Active rxns
0698 
0699 %Transport reactions are written in a different way compared to a "real"
0700 %stoichimetric matrix. This is to fix that
0701 transS(transS~=0)=1;
0702 transS(1:nEM+nMets,:)=transS(1:nEM+nMets,:)*-1;
0703 I=find(sum(transS>0,2));
0704 nTransRxns=numel(I);
0705 outModel.S=[outModel.S transS(:,J)];
0706 filler=ones(nTransRxns,1);
0707 outModel.lb=[outModel.lb;filler*-1000];
0708 outModel.ub=[outModel.ub;filler*1000];
0709 outModel.rev=[outModel.rev;filler];
0710 outModel.c=[outModel.c;filler*0];
0711 outModel.rxnGeneMat=[outModel.rxnGeneMat;sparse(nTransRxns,numel(outModel.genes))];
0712 
0713 for i=1:numel(I)
0714     outModel.rxns=[outModel.rxns;strcat('transport',num2str(i))];
0715     outModel.rxnNames=[outModel.rxnNames;['Transport of ',outModel.metNames{I(i)}]];
0716     if isfield(outModel,'grRules')
0717         outModel.grRules=[outModel.grRules;{''}];
0718     end
0719     if isfield(outModel,'rxnMiriams')
0720         outModel.rxnMiriams=[outModel.rxnMiriams;{[]}];
0721     end
0722     if isfield(outModel,'subSystems')
0723         outModel.subSystems=[outModel.subSystems;{{'Inferred transport reactions'}}];
0724     end
0725     if isfield(outModel,'eccodes')
0726         outModel.eccodes=[outModel.eccodes;{''}];
0727     end
0728     if isfield(outModel,'rxnFrom')
0729         outModel.rxnFrom=[outModel.rxnFrom;{''}];
0730     end
0731     if isfield(outModel,'rxnNotes')
0732         outModel.rxnNotes=[outModel.rxnNotes;{''}];
0733     end
0734     if isfield(outModel,'rxnReferences')
0735         outModel.rxnReferences=[outModel.rxnReferences;{''}];
0736     end
0737     if isfield(outModel,'rxnConfidenceScores')
0738         outModel.rxnConfidenceScores=[outModel.rxnConfidenceScores;NaN];
0739     end
0740     if isfield(outModel,'rxnDeltaG')
0741         outModel.rxnDeltaG=[outModel.rxnDeltaG;NaN];
0742     end
0743 end
0744 
0745 %Then remove all reactions and metabolites that aren't used in the final
0746 %solution from the optimization
0747 [~, J]=find(bestS(:,1:nER+nComps*nRxns));
0748 K=true(numel(outModel.rxns),1);
0749 K(J)=false;
0750 K(end-nTransRxns+1:end)=false;
0751 outModel=removeReactions(outModel,K,true);
0752 
0753 %Remove all fake genes
0754 I=strncmp('&&FAKE&&',outModel.genes,8);
0755 outModel.genes(I)=[];
0756 if isfield(outModel,'geneMiriams')
0757     outModel.geneMiriams(I)=[];
0758 end
0759 if isfield(outModel,'geneShortNames')
0760     outModel.geneShortNames(I)=[];
0761 end
0762 outModel.rxnGeneMat(:,I)=[];
0763 
0764 %Fix grRules and reconstruct rxnGeneMat
0765 [grRules,rxnGeneMat] = standardizeGrRules(outModel,true);
0766 outModel.grRules = grRules;
0767 outModel.rxnGeneMat = rxnGeneMat;
0768 end
0769 
0770 %Moves a gene and all associated reactions from one compartment to another
0771 function [S, g2c]=moveGene(S,model,g2c,geneToMove,toComp,nRxns,nMets)
0772 %Find the current compartment and update to the new one
0773 currentComp=find(g2c(geneToMove,:));
0774 g2c(geneToMove,:)=false;
0775 g2c(geneToMove,toComp)=true;
0776 
0777 %Find the reactions in the original model that the gene controls
0778 [I, ~]=find(model.rxnGeneMat(:,geneToMove));
0779 
0780 %Calculate their current positions in the S matrix
0781 oldRxns=I+(currentComp-1)*nRxns;
0782 
0783 %And their new positions
0784 newRxns=I+(toComp-1)*nRxns;
0785 
0786 %The metabolite ids also have to be changed in order to match the new
0787 %compartment
0788 metChange=nMets*(toComp-currentComp);
0789 
0790 %Update the reactions
0791 [I, J, K]=find(S(:,oldRxns));
0792 I=I+metChange;
0793 
0794 %Move the reactions
0795 S(:,oldRxns)=0;
0796 S(sub2ind(size(S),I,newRxns(J)))=K;
0797 end
0798 
0799 %Finds which metabolites are unconnected, in the sense that they are never
0800 %a product or only a product in a reversible reaction where one reactant is
0801 %only a product in the opposite direction of that reaction. This function
0802 %ignores exchange metabolites. Returns a vector of metabolite indexes.
0803 %metsToCheck is an array of metabolite indexes to check for connectivity.
0804 %If not supplied then all metabolites are checked
0805 function unconnected=findUnconnected(S,nEM,metsToCheck)
0806 if nargin>2
0807     %Do this by deleting everything from the network that is not in
0808     %metsToCheck and that is not exchange metabolites
0809     I=false(size(S,1),1);
0810     I(1:nEM)=true;
0811     I(metsToCheck)=true;
0812     S=S(I,:);
0813 end
0814 
0815 em=false(size(S,1),1);
0816 em(1:nEM)=true;
0817 
0818 %Construct a matrix in which the reversible reactions are inverted
0819 I=sum(S>2,1) | sum(S>2,1);
0820 revS=S;
0821 revS(:,I)=revS(:,I)*-1;
0822 
0823 %First calculate the ones that are ok
0824 %Produced in 2 rxns, is exchange, is not used at all, is produced in
0825 %non-reversible, involved in more than 1 reversible reactions
0826 connected=sum(S>0,2)>1 | em | sum(S~=0,2)==0 | sum(S(:,~I)>0,2)>0 | sum(S(:,I)~=0,2)>1;
0827 
0828 %Then get the ones that are unconnected because they are never produced
0829 unconnected=sum(S>0 | revS>0,2)==0 & connected==false;
0830 
0831 %Then get the ones that are potentially unconnected
0832 maybeUnconnected=~connected & ~unconnected;
0833 %maybeUnconnected=find(maybeUnconnectedS);
0834 
0835 %The metabolites in maybeUnconnected are involved in one reversible
0836 %reaction and not produced in any other reaction. This means that the
0837 %reactions which have at least one met in maybeUnconnected as reactant and
0838 %one as product are unconnected. The metabolites in maybeUnconnected that
0839 %are present in those reactions are then dead ends
0840 deadRxns=any(S(maybeUnconnected,:)>0) & any(S(maybeUnconnected,:)<0);
0841 
0842 %Get the mets involved in any of those reactions
0843 problematic=any(S(:,deadRxns)~=0,2);
0844 
0845 %If any of these are in the maybeUnconnected list then the metabolite is
0846 %unconnected
0847 unconnected(problematic & maybeUnconnected)=true;
0848 
0849 %Map back to metsToCheck
0850 if nargin>2
0851     unconnected=metsToCheck(unconnected(nEM+1:end));
0852 else
0853     unconnected=find(unconnected);
0854 end
0855 end
0856 
0857 %Given a set of unconnected metabolites, this function tries to move each
0858 %gene that could connect any of them, calculates the number of newly
0859 %connected metabolites minus the number of newly disconnected metabolites.
0860 %As some metabolites are very connected, only 25 genes are checked. Genes
0861 %that have a low score in their current compartment are more likely to be
0862 %moved
0863 function [geneIndex, moveTo, deltaConnected, deltaScore]=selectGenes(S,nEM,nMets,nER,nRxns,model,unconnected,g2c,GSS)
0864 %If moveTo is 0 then the gene cannot connect any of the metabolites
0865 moveTo=zeros(numel(model.genes),1);
0866 deltaConnected=zeros(numel(model.genes),1);
0867 
0868 %First get where the metabolites are now
0869 nComps=size(g2c,2);
0870 comps=ceil((unconnected-nEM)/((size(S,1)-nEM)/nComps));
0871 
0872 %Find the corresponding metabolite indexes if they all were in the default
0873 %compartment
0874 dcIndexes=unique(unconnected-(comps-1)*nMets);
0875 
0876 %Then find them if they were in any other compartment
0877 allIndexes=dcIndexes;
0878 for i=1:nComps-1
0879     allIndexes=[allIndexes;dcIndexes+nMets*i];
0880 end
0881 
0882 %Also check which reversible reactions that could be used
0883 I=sum(S>2,1) | sum(S>2,1);
0884 revS=S;
0885 revS(:,I)=revS(:,I)*-1;
0886 
0887 %Find all reactions that could make any of the unconnected metabolites in
0888 %some other compartment
0889 newMets=setdiff(allIndexes,unconnected);
0890 [~, potential]=find(S(newMets,:)>0 | revS(newMets,:)>0);
0891 potential(potential<=nER | potential>nER+nRxns*nComps)=[]; %No exchange rxns or transport rxns
0892 
0893 %Map J to the real metabolic reactions in model
0894 rxnComps=ceil((potential-nER)/(nRxns));
0895 
0896 %Find the corresponding reaction indexes if they all were in the default
0897 %compartment
0898 dcRxnIndexes=potential-(rxnComps-1)*nRxns;
0899 
0900 %Get the genes for those reactions
0901 genes=find(sum(model.rxnGeneMat(dcRxnIndexes,:)>0,1));
0902 
0903 %For some cases there can be very many reactions to connect something. This
0904 %is in particular true in the beginning of the optimization if, say, ATP is
0905 %unconnected. Therefore limit the number of genes to be checked to 25.
0906 %Weigh so that genes with bad scores in their current compartment are more
0907 %likely to be moved.
0908 
0909 %Get scores for these genes
0910 [~, J]=find(g2c(genes,:));
0911 
0912 %Add a small weight so that genes in their best compartment could be moved
0913 %as well
0914 geneScores=GSS.scores(sub2ind(size(g2c),genes(:),J));
0915 modGeneScores=1.1-geneScores;
0916 if numel(genes)>25
0917     rGenes=genes(randsample(numel(genes),min(numel(genes),25),true,modGeneScores));
0918     
0919     %The sampling with weights could give duplicates
0920     rGenes=unique(rGenes);
0921     
0922     %Reorder the geneScores to match
0923     [~, I]=ismember(rGenes,genes);
0924     geneScores=geneScores(I);
0925     genes=rGenes;
0926 end
0927 for i=1:numel(genes)
0928     %Since one gene is moved at a time, only metabolites involved in any of
0929     %the reactions for that gene can become unconnected. This helps to
0930     %speed up the algorithm. First get all involved reactions in the
0931     %default compartment
0932     rxns=find(model.rxnGeneMat(:,genes(i)));
0933     
0934     %Then get their mets
0935     mets=find(sum(model.S(:,rxns)~=0,2)>0);
0936     
0937     %Then get their indexes in all compartments
0938     allIndexes=mets;
0939     for j=1:nComps-1
0940         allIndexes=[allIndexes;mets+nMets*j];
0941     end
0942     
0943     %Check which of the unconnected metabolites that these reactions
0944     %correspond to. This could have been done earlier, but it is fast. The
0945     %reversibility check is skipped because it is unlikely to be an issue
0946     %here. Worst case is that the gene is tested once to much
0947     [I, ~]=find(model.S(:,rxns));
0948     moveToComps=unique(comps(ismember(dcIndexes,I)));
0949     
0950     %Try to move the gene to each of the compartments
0951     bestMove=-inf;
0952     bestComp=[];
0953     for j=1:numel(moveToComps)
0954         newS=moveGene(S,model,g2c,genes(i),moveToComps(j),nRxns,nMets);
0955         
0956         %Check how many metabolites that are unconnected after moving the
0957         %gene
0958         dConnected=numel(unconnected)-numel(findUnconnected(newS,nEM,[allIndexes;unconnected]));
0959         if dConnected>bestMove
0960             bestMove=dConnected;
0961             bestComp=moveToComps(j);
0962         end
0963     end
0964     
0965     %Add the difference in connectivity and where the genes should be moved
0966     moveTo(genes(i))=bestComp;
0967     deltaConnected(genes(i))=bestMove;
0968 end
0969 
0970 %Finish up
0971 geneIndex=genes(:);
0972 moveTo=moveTo(geneIndex);
0973 deltaConnected=deltaConnected(geneIndex);
0974 deltaScore=GSS.scores(sub2ind(size(g2c),geneIndex(:),moveTo))-geneScores;
0975 end
0976 
0977 %Small function to add a transport reactions between two metabolites.
0978 %Transport reactions are written as having a coefficient 2.0 for both
0979 %reactant and product. This is not a "real" reaction, but since all normal
0980 %reactions have coefficient -1/1 or -10/10 it is a compact way of writing
0981 %it
0982 function S=addTransport(S,nRxns,nER,nMets,nEM,nComps,metA,metB)
0983 mets=[metA;metB];
0984 %Find the current compartments for the metabolites
0985 comps=ceil((mets-nEM)/((size(S,1)-nEM)/nComps));
0986 
0987 if sum(comps==1)~=1
0988     EM='Tried to create a transport reaction from a non-default compartment';
0989     dispEM(EM);
0990 end
0991 
0992 %Calculate the reaction index
0993 rIndex=(nER+nRxns*nComps)+mets(comps~=1)-nEM-nMets;
0994 
0995 S(mets,rIndex)=2;
0996 end
0997 
0998 %Scores a network based on the localization of the genes and the number of
0999 %transporter reactions used
1000 function [score, geneScore, transportCost]=scoreModel(S,g2c,GSS,transportCost)
1001 [I, J]=find(g2c);
1002 geneScore=sum(GSS.scores(sub2ind(size(g2c),I,J)));
1003 [I, ~]=find(S==2);
1004 I=unique(I);
1005 transportCost=sum(transportCost(I));
1006 score=geneScore-transportCost;
1007 end
1008 
1009 % To avoid dependency on stats toolbox, use this alternative implementation
1010 % of randsample, source:
1011 % https://github.com/gpeyre/numerical-tours/blob/dacee30081c04ef5f67b26b387ead85f2b193af9/matlab/toolbox_signal/randsample.m
1012 function y = randsample(n, k, replace, w)
1013 %RANDSAMPLE Random sample, with or without replacement.
1014 %   Y = RANDSAMPLE(N,K) returns Y as a vector of K values sampled uniformly
1015 %   at random, without replacement, from the integers 1:N.
1016 %
1017 %   Y = RANDSAMPLE(POPULATION,K) returns K values sampled uniformly at
1018 %   random, without replacement, from the values in the vector POPULATION.
1019 %
1020 %   Y = RANDSAMPLE(...,REPLACE) returns a sample taken with replacement if
1021 %   REPLACE is true, or without replacement if REPLACE is false (the default).
1022 %
1023 %   Y = RANDSAMPLE(...,true,W) returns a weighted sample, using positive
1024 %   weights W, taken with replacement.  W is often a vector of probabilities.
1025 %   This function does not support weighted sampling without replacement.
1026 %
1027 %   Example:  Generate a random sequence of the characters ACGT, with
1028 %   replacement, according to specified probabilities.
1029 %
1030 %      R = randsample('ACGT',48,true,[0.15 0.35 0.35 0.15])
1031 %
1032 %   See also RAND, RANDPERM.
1033 
1034 %   Copyright 1993-2008 The MathWorks, Inc.
1035 %   $Revision: 1.1.4.3 $  $Date: 2008/12/01 08:09:34 $
1036 
1037 if nargin < 2
1038     error('stats:randsample:TooFewInputs','Requires two input arguments.');
1039 elseif numel(n) == 1
1040     population = [];
1041 else
1042     population = n;
1043     n = numel(population);
1044     if length(population)~=n
1045        error('stats:randsample:BadPopulation','POPULATION must be a vector.');
1046     end
1047 end
1048 
1049 if nargin < 3
1050     replace = false;
1051 end
1052 
1053 if nargin < 4
1054     w = [];
1055 elseif ~isempty(w)
1056     if length(w) ~= n
1057         if isempty(population)
1058             error('stats:randsample:InputSizeMismatch',...
1059                   'W must have length equal to N.');
1060         else
1061             error('stats:randsample:InputSizeMismatch',...
1062                   'W must have the same length as the population.');
1063         end
1064     else
1065         p = w(:)' / sum(w);
1066     end
1067 end
1068 
1069 switch replace
1070 
1071 % Sample with replacement
1072 case {true, 'true', 1}
1073     if isempty(w)
1074         y = ceil(n .* rand(k,1));
1075     else
1076         [dum, y] = histc(rand(k,1),[0 cumsum(p)]);
1077     end
1078 
1079 % Sample without replacement
1080 case {false, 'false', 0}
1081     if k > n
1082         if isempty(population)
1083             error('stats:randsample:SampleTooLarge',...
1084         'K must be less than or equal to N for sampling without replacement.');
1085         else
1086             error('stats:randsample:SampleTooLarge',...
1087                   'K must be less than or equal to the population size.');
1088         end
1089     end
1090 
1091     if isempty(w)
1092         % If the sample is a sizeable fraction of the population,
1093         % just randomize the whole population (which involves a full
1094         % sort of n random values), and take the first k.
1095         if 4*k > n
1096             rp = randperm(n);
1097             y = rp(1:k);
1098 
1099         % If the sample is a small fraction of the population, a full sort
1100         % is wasteful.  Repeatedly sample with replacement until there are
1101         % k unique values.
1102         else
1103             x = zeros(1,n); % flags
1104             sumx = 0;
1105             while sumx < k
1106                 x(ceil(n * rand(1,k-sumx))) = 1; % sample w/replacement
1107                 sumx = sum(x); % count how many unique elements so far
1108             end
1109             y = find(x > 0);
1110             y = y(randperm(k));
1111         end
1112     else
1113         error('stats:randsample:NoWeighting',...
1114               'Weighted sampling without replacement is not supported.');
1115     end
1116 otherwise
1117     error('stats:randsample:BadReplaceValue',...
1118           'REPLACE must be either true or false.');
1119 end
1120 
1121 if ~isempty(population)
1122     y = population(y);
1123 else
1124     y = y(:);
1125 end
1126 end

Generated by m2html © 2005