| Literature DB >> 24773160 |
Xin Zou1, Elaine Holmes, Jeremy K Nicholson, Ruey Leng Loo.
Abstract
We propose a novel statistical approach to improve the reliability of (1)H NMR spectral analysis in complex metabolic studies. The Statistical HOmogeneous Cluster SpectroscopY (SHOCSY) algorithm aims to reduce the variation within biological classes by selecting subsets of homogeneous (1)H NMR spectra that contain specific spectroscopic metabolic signatures related to each biological class in a study. In SHOCSY, we used a clustering method to categorize the whole data set into a number of clusters of samples with each cluster showing a similar spectral feature and hence biochemical composition, and we then used an enrichment test to identify the associations between the clusters and the biological classes in the data set. We evaluated the performance of the SHOCSY algorithm using a simulated (1)H NMR data set to emulate renal tubule toxicity and further exemplified this method with a (1)H NMR spectroscopic study of hydrazine-induced liver toxicity study in rats. The SHOCSY algorithm improved the predictive ability of the orthogonal partial least-squares discriminatory analysis (OPLS-DA) model through the use of "truly" representative samples in each biological class (i.e., homogeneous subsets). This method ensures that the analyses are no longer confounded by idiosyncratic responders and thus improves the reliability of biomarker extraction. SHOCSY is a useful tool for removing irrelevant variation that interfere with the interpretation and predictive ability of models and has widespread applicability to other spectroscopic data, as well as other "omics" type of data.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24773160 PMCID: PMC4110102 DOI: 10.1021/ac500161k
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1Schematic diagram of the SHOCSY algorithm for a data set consisting two biological classes. The closed circles and rectangles denote homogeneous samples; and the open circles and rectangles denote idiosyncratic samples.
Figure 2PCA scores plots for principal components 1 and 2 using the simulated data set of 30 spectra in each biological class with 50% idiosyncratic responders for (a) the whole data set and (b) the homogeneous and idiosyncratic subsets identified by the SHOCSY algorithm. (c) The median 1H NMR spectral profiles of homogeneous control (N = 30, red), idiosyncratic responders (N = 15, green), and homogeneous responders (N = 15, blue) for aliphatic regions of the spectrum. The OPLS-DA loading plots of the whole data set (d) and the homogeneous data set (e). The metabolite signals pointing upward correspond to those metabolites up-regulated in Paraquat toxicity, and conversely, metabolite signals pointing downward correspond to those metabolites down-regulated in Paraquat toxicity. The color bar defines the weights of the corresponding discriminating biomarkers between the control and Paraquat toxicity with “hotter” colors indicating a higher correlation with class discrimination. Key: 1, lactate; 2, l-alanine; 3, acetic acid; 4, phenylactylglutamine; 5, p-cresl sulfate; 6, succinic acid; 7, citrate; 8, dimethylamine; 9, trimethylamine; 10, creatinine; 11, trimethylamine-N-oxide; 12, l-histidine, 13, hippurate; 14, taurine; 15, glycine; 16, creatine; 17, glycolic acid. * indicates the metabolic biomarker signals that correspond to Paraquat toxicity.
Figure 3OPLS-DA loading coefficient plots comparing control and hydrazine 90 mg/kg at 120–144 h for the following: (a) the aliphatic spectral region and (b) the aromatic spectral region; for the whole data set (N = 23 for control and N = 18 for hydrazine class); (c) the aliphatic spectral region and (d) the aromatic spectral region; for the homogeneous subsets (N = 23 for control and N = 9 for hydrazine class). Key: 1, N-α-acetyl-citrulline; 2,2-aminoadipic acid; 3, citruline; 4, diacetyl-hydrazine; 5, succinate; 6, 2-oxoglutarate; 7, citrate; 8, creatinine; 9, creatine; 10, beta-alanine; 11, taurine; 12, glycine; 13, hippurate. * indicates the metabolic signals were identified as “biomarkers” of response to hydrazine based on the defined criteria.
Figure 4PCA scores plots showing the metabolic trajectory in animals dosed with hydrazine at 90 mg/kg (a and b) and 30 mg/kg (c and d). The metabolic time trajectory was calculated by averaging the PC1 and PC2 scores, respectively, for each time point data. t1: −8–0 h; t2: 0–8 h; t3: 8–24 h; t4: 24–48 h; t5: 48–72 h; t6: 72–96 h; t7: 96–120 h; t8: 120–144 h; t9: 144–168 h.