| Literature DB >> 19291281 |
Tao Wang1, Kang Shao, Qinying Chu, Yanfei Ren, Yiming Mu, Lijia Qu, Jie He, Changwen Jin, Bin Xia.
Abstract
BACKGROUND: Spectral processing and post-experimental data analysis are the major tasks in NMR-based metabonomics studies. While there are commercial and free licensed software tools available to assist these tasks, researchers usually have to use multiple software packages for their studies because software packages generally focus on specific tasks. It would be beneficial to have a highly integrated platform, in which these tasks can be completed within one package. Moreover, with open source architecture, newly proposed algorithms or methods for spectral processing and data analysis can be implemented much more easily and accessed freely by the public.Entities:
Mesh:
Year: 2009 PMID: 19291281 PMCID: PMC2666662 DOI: 10.1186/1471-2105-10-83
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Functional modules and the workflow of Automics.
Figure 2Screenshot of the main graphical interface. (A) Directory browser for spectral dataset; (B) List of a selected spectral dataset; (C) Spectral processing window: displaying, moving, zooming and labeling etc. for spectral visualization either in single mode or multiple mode; (D) Column plot of a bucket/binning digitized spectrum; (E) Worksheet of the data organization module; (F) Column plot of the explained variance for R2 (cum) and Q2 (cum); (G) Scatter plot of PLS scores; (H) Column plot of PLS regression coefficients.
Figure 3High throughput automatic spectral processing modules. (A) Fast Fourier transform; (B) Automatic phase correction; (C) Peak alignment; (D) Bucket/Binning.
Figure 4Overlay display of 1D . The spectra were processed by automatic modules in the following steps: fast Fourier transform, phase correction (new introduced phase correction method), baseline correction (linear fitting) and peak alignment (global shift method).
Figure 5PLS analysis of type 2 diabetic samples (57) and healthy samples (41) using Automics. (A) PLS scores show evident clustering between diabetic (○) and healthy (△) samples. The optimal separation occurs in the second and third components (t2, t3). (B) Regression coefficients of the corresponding PLS model. (C) PLS scores after application of DOSC for removal of one orthogonal component. (D) Regression coefficients of the PLS model after application of DOSC. (E) PLS scores after application of O-PLS. Note that the significant improvement for separation is both achieved by DOSC and O-PLS, and now the optimal separation occurs in the first principal component. (F) Regression coefficients of the PLS model after application of O-PLS.
Figure 6Feature for overlay display of regression coefficients and the corresponding spectrum in Automics. (A) Display of the whole spectral region; (B) Display of a zoomed part; (Green curve, NMR spectrum; Red curve, regression coefficients).
Comparison of different classification methods
| 95.5% (63/66) | 75.0% (24/32) | 91.2% (52/57) | 85.4% (35/41) | 89.8% (88/98) | |
| 98.5% (65/66) | 78.1% (25/32) | 91.2% (52/57) | 92.7% (38/41) | 91.8% (90/98) | |
| 100% (66/66) | 84.4% (27/32) | 93.0% (53/57) | 95.1% (40/41) | 94.9% (93/98) | |
| 100% (66/66) | 81.3% (26/32) | 91.2% (52/57) | 95.1% (40/41) | 93.9% (92/98) | |
| 98.5% (65/66) | 90.6% (29/32) | 94.7% (54/57) | 97.6% (40/41) | 95.9% (94/98) | |
| 95.5% (63/66) | 71.9% (23/32) | 84.2% (48/57) | 92.7% (38/41) | 87.8% (86/98) | |
| 90.9% (60/66) | 75.0% (24/32) | 87.7% (50/57) | 82.9% (34/41) | 85.7% (84/98) | |
| 95.5% (63/66) | 81.3% (26/32) | 93.0% (53/57) | 87.8% (36/41) | 90.8% (89/98) | |
| 100% (66/66) | 81.3% (26/32) | 94.7% (54/57) | 92.7% (38/41) | 93.9% (92/98) | |
| 100% (66/66) | 87.5% (28/32) | 94.7% (54/57) | 97.6% (40/41) | 95.9% (94/98) | |
| 100% (66/66) | 90.6% (29/32) | 96.5% (55/57) | 97.6% (40/41) | 96.9% (95/98) | |
| 100% (66/66) | 96.9% (31/32) | 100% (57/57) | 97.6% (40/41) | 99.0% (97/98) |
Prediction results were from different classification methods (25 healthy and 41 diabetic samples in the training set; 16 healthy and 16 diabetic samples in the testing set). Recognition rate is the correctly classified rate in the training set. Prediction rate is the correctly classified rate in the testing set. Sensitivity is the rate of true positive classified as positive. Specificity is the rate of true negative classified as negative (Ctrl, mean-centered scaling; UV, auto scaling; DOSC, direct orthogonal signal correction; O-PLS, orthogonal projections to latent structures; FC, Fisher's criterion for feature selection).