reporterMetabolites The Reporter Metabolites algorithm for identifying metabolites around which transcriptional changes occur model a model structure genes a cell array of gene names (should match with model.genes) genePValues P-values for differential expression of the genes printResults true if the top 20 Reporter Metabolites should be printed to the screen (optional, default false) outputFile the results are printed to this file (optional) geneFoldChanges log-fold changes for the genes. If supplied, then Reporter Metabolites are calculated for only up/down- regulated genes in addition to the full test (optional) repMets an array of structures with the following fields. test a string the describes the genes that were used to calculate the Reporter Metabolites ('all', 'only up', or 'only down'). The two latter structures are only calculated if geneFoldChanges are supplied. mets a cell array of metabolite IDs for the metabolites for which a score could be calculated metZScores Z-scores for differential expression around each metabolite in "mets" metPValues P-values for differential expression around each metabolite in "mets" metNGenes number of neighbouring genes for each metabolite in "mets" meanZ average Z-scores for the genes around each metabolite in "mets" stdZ standard deviations of the Z-scores around each metabolite in "mets" NOTE: For details about the algorithm, see Patil KR, Nielsen J, Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689. Usage: repMets=reporterMetabolites(model,genes,genePValues,printResults,... outputFile,geneFoldChanges)
0001 function repMets=reporterMetabolites(model,genes,genePValues,printResults,outputFile,geneFoldChanges) 0002 % reporterMetabolites 0003 % The Reporter Metabolites algorithm for identifying metabolites around 0004 % which transcriptional changes occur 0005 % 0006 % model a model structure 0007 % genes a cell array of gene names (should match with 0008 % model.genes) 0009 % genePValues P-values for differential expression of the genes 0010 % printResults true if the top 20 Reporter Metabolites should be 0011 % printed to the screen (optional, default false) 0012 % outputFile the results are printed to this file (optional) 0013 % geneFoldChanges log-fold changes for the genes. If supplied, then 0014 % Reporter Metabolites are calculated for only up/down- 0015 % regulated genes in addition to the full test (optional) 0016 % 0017 % repMets an array of structures with the following fields. 0018 % test a string the describes the genes that were used to 0019 % calculate the Reporter Metabolites ('all', 'only up', 0020 % or 'only down'). The two latter structures are 0021 % only calculated if geneFoldChanges are supplied. 0022 % mets a cell array of metabolite IDs for the metabolites for 0023 % which a score could be calculated 0024 % metZScores Z-scores for differential expression around each 0025 % metabolite in "mets" 0026 % metPValues P-values for differential expression around each 0027 % metabolite in "mets" 0028 % metNGenes number of neighbouring genes for each metabolite in 0029 % "mets" 0030 % meanZ average Z-scores for the genes around each metabolite 0031 % in "mets" 0032 % stdZ standard deviations of the Z-scores around each 0033 % metabolite in "mets" 0034 % 0035 % NOTE: For details about the algorithm, see Patil KR, Nielsen J, 0036 % Uncovering transcriptional regulation of metabolism by using metabolic 0037 % network topology. Proc. Natl Acad. Sci. USA 2005;102:2685-2689. 0038 % 0039 % Usage: repMets=reporterMetabolites(model,genes,genePValues,printResults,... 0040 % outputFile,geneFoldChanges) 0041 0042 genes=convertCharArray(genes); 0043 if nargin<4 0044 printResults=false; 0045 end 0046 if nargin<5 0047 outputFile=[]; 0048 end 0049 if nargin<6 0050 geneFoldChanges=[]; 0051 end 0052 0053 %Check some stuff 0054 if numel(genes)~=numel(genePValues) 0055 EM='The number of genes and the number of Z-scores must be the same'; 0056 dispEM(EM); 0057 end 0058 if ~isfield(model,'rxnGeneMat') 0059 EM='The model structure must have a rxnGeneMat field'; 0060 dispEM(EM); 0061 end 0062 0063 %Remove the genes which are not in the model 0064 genePValues(~ismember(genes,model.genes))=[]; 0065 if any(geneFoldChanges) 0066 geneFoldChanges(~ismember(genes,model.genes))=[]; 0067 end 0068 genes(~ismember(genes,model.genes))=[]; 0069 0070 %Remove the genes that has NA P-value 0071 genes(isnan(genePValues))=[]; 0072 if any(geneFoldChanges) 0073 geneFoldChanges(isnan(genePValues))=[]; 0074 end 0075 genePValues(isnan(genePValues))=[]; 0076 0077 %Convert p-values to Z-scores 0078 geneZScores=norminv(genePValues)*-1; 0079 0080 %This is to prevent errors if the p-values are exactly 0 or 1 0081 geneZScores(geneZScores==-inf)=-15; 0082 geneZScores(geneZScores==inf)=15; 0083 0084 %For each metabolite, calculate the aggregate Z-score and keep track of the 0085 %number of neighbouring genes 0086 metZScores=nan(numel(model.mets),1); 0087 metNGenes=nan(numel(model.mets),1); 0088 meanZ=nan(numel(model.mets),1); 0089 stdZ=nan(numel(model.mets),1); 0090 for i=1:numel(model.mets) 0091 %Get the involved rxns 0092 I=model.S(i,:); 0093 0094 %Get the involved genes 0095 [~, J]=find(model.rxnGeneMat(I~=0,:)); 0096 0097 %Find the genes in the gene list 0098 K=find(ismember(genes,model.genes(J))); 0099 0100 %Calculate the aggregated Z-score for the metabolite 0101 if any(K) 0102 metZScores(i)=sum(geneZScores(K))/sqrt(numel(K)); 0103 meanZ(i)=mean(geneZScores(K)); 0104 stdZ(i)=std(geneZScores(K)); 0105 metNGenes(i)=numel(K); 0106 end 0107 end 0108 0109 %Remove the metabolites which have no Z-scores 0110 mets=model.mets(~isnan(metZScores)); 0111 metNames=model.metNames(~isnan(metZScores)); 0112 metZScores(isnan(metZScores))=[]; 0113 metNGenes(isnan(metNGenes))=[]; 0114 meanZ(isnan(meanZ))=[]; 0115 stdZ(isnan(stdZ))=[]; 0116 0117 %Then correct for background by calculating the mean Z-score for random 0118 %sets of the same size as the ones that were found for the metabolites 0119 sizes=unique(metNGenes); 0120 nGenes=numel(genes); 0121 0122 for i=1:numel(sizes) 0123 %Sample 100000 sets for each size. Sample with replacement, which may 0124 %or may not be the best choice. 0125 I=ceil(rand(100000,sizes(i))*nGenes); 0126 J=geneZScores(I); 0127 K=sum(J,2)/sqrt(sizes(i)); 0128 0129 %Correct all elements of this size 0130 mK=mean(K); 0131 stdK=std(K); 0132 metZScores(metNGenes==sizes(i))=(metZScores(metNGenes==sizes(i))-mK)/stdK; 0133 end 0134 0135 %Calculate the P-values 0136 metPValues=1-normcdf(metZScores); 0137 0138 %Sort the results 0139 [metZScores, I]=sort(metZScores,'descend'); 0140 mets=mets(I); 0141 metNames=metNames(I); 0142 metPValues=metPValues(I); 0143 metNGenes=metNGenes(I); 0144 meanZ=meanZ(I); 0145 stdZ=stdZ(I); 0146 0147 %Prepare the output 0148 repMets.mets=mets; 0149 repMets.metNames=metNames; 0150 repMets.metZScores=metZScores; 0151 repMets.metPValues=metPValues; 0152 repMets.metNGenes=metNGenes; 0153 repMets.meanZ=meanZ; 0154 repMets.stdZ=stdZ; 0155 0156 %Call this function recursively if geneFoldChanges has been specified 0157 repMets(1).test='all'; 0158 if any(geneFoldChanges) 0159 repMets=[repMets;reporterMetabolites(model,genes(geneFoldChanges>=0),genePValues(geneFoldChanges>=0))]; 0160 repMets=[repMets;reporterMetabolites(model,genes(geneFoldChanges<0),genePValues(geneFoldChanges<0))]; 0161 repMets(2).test='only up'; 0162 repMets(3).test='only down'; 0163 end 0164 0165 %This is for printing results to the screen. For printing a full list of 0166 %all scores, specify a output file 0167 if printResults==true 0168 for i=1:numel(repMets) 0169 fprintf(['TOP 20 REPORTER METABOLITES\nTEST TYPE: ' repMets(i).test '\nID\tNAME\tP-VALUE\n']); 0170 for j=1:min(20,numel(repMets(i).mets)) 0171 fprintf([repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metPValues(j)) '\n']); 0172 end 0173 fprintf('\n'); 0174 end 0175 end 0176 0177 %Print results to file 0178 if any(outputFile) 0179 fid=fopen(outputFile,'w'); 0180 for i=1:numel(repMets) 0181 fprintf(fid,['REPORTER METABOLITES USING TEST TYPE: ' repMets(i).test '\n']); 0182 fprintf(fid,'ID\tNAME\tZ-SCORE\tP-VALUE\tNUMBER OF NEIGHBOURS\tAVERAGE Z-SCORE\tSTD Z-SCORE\n'); 0183 for j=1:numel(repMets(i).mets) 0184 fprintf(fid,[repMets(i).mets{j} '\t' repMets(i).metNames{j} '\t' num2str(repMets(i).metZScores(j)) '\t' num2str(repMets(i).metPValues(j)) '\t' num2str(repMets(i).metNGenes(j)) '\t' num2str(repMets(i).meanZ(j)) '\t' num2str(repMets(i).stdZ(j)) '\n']); 0185 end 0186 end 0187 fclose(fid); 0188 end 0189 end