Home > core > reporterMetabolites.m

reporterMetabolites

PURPOSE ^

reporterMetabolites

SYNOPSIS ^

function repMets=reporterMetabolites(model,genes,genePValues,printResults,outputFile,geneFoldChanges)

DESCRIPTION ^

 reporterMetabolites
   The Reporter Metabolites algorithm for identifying metabolites around
   which transcriptional changes occur

   model           a model structure
   genes           a cell array of gene names (should match with
                   model.genes)
   genePValues     P-values for differential expression of the genes
   printResults    true if the top 20 Reporter Metabolites should be
                   printed to the screen (optional, default false)
   outputFile      the results are printed to this file (optional)
   geneFoldChanges log-fold changes for the genes. If supplied, then
                   Reporter Metabolites are calculated for only up/down-
                   regulated genes in addition to the full test (optional)

   repMets         an array of structures with the following fields.
       test            a string the describes the genes that were used to
                       calculate the Reporter Metabolites ('all', 'only up',
                       or 'only down'). The two latter structures are
                       only calculated if geneFoldChanges are supplied.
       mets            a cell array of metabolite IDs for the metabolites for
                       which a score could be calculated
       metZScores      Z-scores for differential expression around each
                       metabolite in "mets"
       metPValues      P-values for differential expression around each
                       metabolite in "mets"
       metNGenes       number of neighbouring genes for each metabolite in
                       "mets"
       meanZ           average Z-scores for the genes around each metabolite
                       in "mets"
       stdZ            standard deviations of the Z-scores around each
                       metabolite in "mets"

   NOTE: For details about the algorithm, see Patil KR, Nielsen J,
   Uncovering transcriptional regulation of metabolism by using metabolic
   network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689.

 Usage: repMets=reporterMetabolites(model,genes,genePValues,printResults,...
           outputFile,geneFoldChanges)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function repMets=reporterMetabolites(model,genes,genePValues,printResults,outputFile,geneFoldChanges)
0002 % reporterMetabolites
0003 %   The Reporter Metabolites algorithm for identifying metabolites around
0004 %   which transcriptional changes occur
0005 %
0006 %   model           a model structure
0007 %   genes           a cell array of gene names (should match with
0008 %                   model.genes)
0009 %   genePValues     P-values for differential expression of the genes
0010 %   printResults    true if the top 20 Reporter Metabolites should be
0011 %                   printed to the screen (optional, default false)
0012 %   outputFile      the results are printed to this file (optional)
0013 %   geneFoldChanges log-fold changes for the genes. If supplied, then
0014 %                   Reporter Metabolites are calculated for only up/down-
0015 %                   regulated genes in addition to the full test (optional)
0016 %
0017 %   repMets         an array of structures with the following fields.
0018 %       test            a string the describes the genes that were used to
0019 %                       calculate the Reporter Metabolites ('all', 'only up',
0020 %                       or 'only down'). The two latter structures are
0021 %                       only calculated if geneFoldChanges are supplied.
0022 %       mets            a cell array of metabolite IDs for the metabolites for
0023 %                       which a score could be calculated
0024 %       metZScores      Z-scores for differential expression around each
0025 %                       metabolite in "mets"
0026 %       metPValues      P-values for differential expression around each
0027 %                       metabolite in "mets"
0028 %       metNGenes       number of neighbouring genes for each metabolite in
0029 %                       "mets"
0030 %       meanZ           average Z-scores for the genes around each metabolite
0031 %                       in "mets"
0032 %       stdZ            standard deviations of the Z-scores around each
0033 %                       metabolite in "mets"
0034 %
0035 %   NOTE: For details about the algorithm, see Patil KR, Nielsen J,
0036 %   Uncovering transcriptional regulation of metabolism by using metabolic
0037 %   network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689.
0038 %
0039 % Usage: repMets=reporterMetabolites(model,genes,genePValues,printResults,...
0040 %           outputFile,geneFoldChanges)
0041 
0042 genes=convertCharArray(genes);
0043 if nargin<4
0044     printResults=false;
0045 end
0046 if nargin<5
0047     outputFile=[];
0048 end
0049 if nargin<6
0050     geneFoldChanges=[];
0051 end
0052 
0053 %Check some stuff
0054 if numel(genes)~=numel(genePValues)
0055     EM='The number of genes and the number of Z-scores must be the same';
0056     dispEM(EM);
0057 end
0058 if ~isfield(model,'rxnGeneMat')
0059     EM='The model structure must have a rxnGeneMat field';
0060     dispEM(EM);
0061 end
0062 
0063 %Remove the genes which are not in the model
0064 genePValues(~ismember(genes,model.genes))=[];
0065 if any(geneFoldChanges)
0066     geneFoldChanges(~ismember(genes,model.genes))=[];
0067 end
0068 genes(~ismember(genes,model.genes))=[];
0069 
0070 %Remove the genes that has NA P-value
0071 genes(isnan(genePValues))=[];
0072 if any(geneFoldChanges)
0073     geneFoldChanges(isnan(genePValues))=[];
0074 end
0075 genePValues(isnan(genePValues))=[];
0076 
0077 %Convert p-values to Z-scores
0078 geneZScores=norminv(genePValues)*-1;
0079 
0080 %This is to prevent errors if the p-values are exactly 0 or 1
0081 geneZScores(geneZScores==-inf)=-15;
0082 geneZScores(geneZScores==inf)=15;
0083 
0084 %For each metabolite, calculate the aggregate Z-score and keep track of the
0085 %number of neighbouring genes
0086 metZScores=nan(numel(model.mets),1);
0087 metNGenes=nan(numel(model.mets),1);
0088 meanZ=nan(numel(model.mets),1);
0089 stdZ=nan(numel(model.mets),1);
0090 for i=1:numel(model.mets)
0091     %Get the involved rxns
0092     I=model.S(i,:);
0093     
0094     %Get the involved genes
0095     [~, J]=find(model.rxnGeneMat(I~=0,:));
0096     
0097     %Find the genes in the gene list
0098     K=find(ismember(genes,model.genes(J)));
0099     
0100     %Calculate the aggregated Z-score for the metabolite
0101     if any(K)
0102         metZScores(i)=sum(geneZScores(K))/sqrt(numel(K));
0103         meanZ(i)=mean(geneZScores(K));
0104         stdZ(i)=std(geneZScores(K));
0105         metNGenes(i)=numel(K);
0106     end
0107 end
0108 
0109 %Remove the metabolites which have no Z-scores
0110 mets=model.mets(~isnan(metZScores));
0111 metNames=model.metNames(~isnan(metZScores));
0112 metZScores(isnan(metZScores))=[];
0113 metNGenes(isnan(metNGenes))=[];
0114 meanZ(isnan(meanZ))=[];
0115 stdZ(isnan(stdZ))=[];
0116 
0117 %Then correct for background by calculating the mean Z-score for random
0118 %sets of the same size as the ones that were found for the metabolites
0119 sizes=unique(metNGenes);
0120 nGenes=numel(genes);
0121 
0122 for i=1:numel(sizes)
0123     %Sample 100000 sets for each size. Sample with replacement, which may
0124     %or may not be the best choice.
0125     I=ceil(rand(100000,sizes(i))*nGenes);
0126     J=geneZScores(I);
0127     K=sum(J,2)/sqrt(sizes(i));
0128     
0129     %Correct all elements of this size
0130     mK=mean(K);
0131     stdK=std(K);
0132     metZScores(metNGenes==sizes(i))=(metZScores(metNGenes==sizes(i))-mK)/stdK;
0133 end
0134 
0135 %Calculate the P-values
0136 metPValues=1-normcdf(metZScores);
0137 
0138 %Sort the results
0139 [metZScores, I]=sort(metZScores,'descend');
0140 mets=mets(I);
0141 metNames=metNames(I);
0142 metPValues=metPValues(I);
0143 metNGenes=metNGenes(I);
0144 meanZ=meanZ(I);
0145 stdZ=stdZ(I);
0146 
0147 %Prepare the output
0148 repMets.mets=mets;
0149 repMets.metNames=metNames;
0150 repMets.metZScores=metZScores;
0151 repMets.metPValues=metPValues;
0152 repMets.metNGenes=metNGenes;
0153 repMets.meanZ=meanZ;
0154 repMets.stdZ=stdZ;
0155 
0156 %Call this function recursively if geneFoldChanges has been specified
0157 repMets(1).test='all';
0158 if any(geneFoldChanges)
0159     repMets=[repMets;reporterMetabolites(model,genes(geneFoldChanges>=0),genePValues(geneFoldChanges>=0))];
0160     repMets=[repMets;reporterMetabolites(model,genes(geneFoldChanges<0),genePValues(geneFoldChanges<0))];
0161     repMets(2).test='only up';
0162     repMets(3).test='only down';
0163 end
0164 
0165 %This is for printing results to the screen. For printing a full list of
0166 %all scores, specify a output file
0167 if printResults==true
0168     for i=1:numel(repMets)
0169         fprintf(['TOP 20 REPORTER METABOLITES\nTEST TYPE: ' repMets(i).test '\nID\tNAME\tP-VALUE\n']);
0170         for j=1:min(20,numel(repMets(i).mets))
0171             fprintf([repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metPValues(j)) '\n']);
0172         end
0173         fprintf('\n');
0174     end
0175 end
0176 
0177 %Print results to file
0178 if any(outputFile)
0179     fid=fopen(outputFile,'w');
0180     for i=1:numel(repMets)
0181         fprintf(fid,['REPORTER METABOLITES USING TEST TYPE: ' repMets(i).test '\n']);
0182         fprintf(fid,'ID\tNAME\tZ-SCORE\tP-VALUE\tNUMBER OF NEIGHBOURS\tAVERAGE Z-SCORE\tSTD Z-SCORE\n');
0183         for j=1:numel(repMets(i).mets)
0184             fprintf(fid,[repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metZScores(j)) '\t' num2str(repMets(i).metPValues(j)) '\t' num2str(repMets(i).metNGenes(j)) '\t' num2str(repMets(i).meanZ(j)) '\t' num2str(repMets(i).stdZ(j)) '\n']);
0185         end
0186     end
0187     fclose(fid);
0188 end
0189 end

Generated by m2html © 2005