| Literature DB >> 29915347 |
Ling Hao1, Jingxin Wang2, David Page3, Sanjay Asthana4, Henrik Zetterberg5,6,7,8, Cynthia Carlsson4, Ozioma C Okonkwo4, Lingjun Li9,10.
Abstract
Mass spectrometry-based metabolomics has undergone significant progresses in the past decade, with a variety of software packages being developed for data analysis. However, systematic comparison of different metabolomics software tools has rarely been conducted. In this study, several representative software packages were comparatively evaluated throughout the entire pipeline of metabolomics data analysis, including data processing, statistical analysis, feature selection, metabolite identification, pathway analysis, and classification model construction. LC-MS-based metabolomics was applied to preclinical Alzheimer's disease (AD) using a small cohort of human cerebrospinal fluid (CSF) samples (N = 30). All three software packages, XCMS Online, SIEVE, and Compound Discoverer, provided consistent and reproducible data processing results. A hybrid method combining statistical test and support vector machine feature selection was employed to screen key metabolites, achieving a complementary selection of candidate biomarkers from three software packages. Machine learning classification using candidate biomarkers generated highly accurate and predictive models to classify patients into preclinical AD or control category. Overall, our study demonstrated a systematic evaluation of different MS-based metabolomics software packages for the entire data analysis pipeline which was applied to the candidate biomarker discovery of preclinical AD.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29915347 PMCID: PMC6006240 DOI: 10.1038/s41598-018-27031-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Data processing evaluation of three software packages, Compound Discoverer, SIEVE, and XCMS Online. (a) Histograms of peak area relative standard deviations (RSD) of all detected features in QC data set. (b) Histograms of log2-transformed peak areas of all detected features in QC data set. (c) Histograms of log2-transformed ratios of all detected features in preclinical AD vs. control data set.
Figure 2Multivariate statistical analyses of human CSF samples in preclinical AD vs. control vs. QC groups. (a) Principal component analysis. (b) Partial least squares-discriminant analysis.
Figure 3Data analysis flowchart (a) and the overlapping candidate biomarkers of preclinical AD resulted from three software packages (b).
Figure 4Heatmaps of representative candidate biomarkers among shared metabolites from three software packages. The data was log transformed and auto-scaled.
Binary classification performance of candidate biomarkers to differentiate preclinical AD and control groups using three software packages.
| Software | Feature number | Sensitivity | Specificity | Precision | ROC area |
|---|---|---|---|---|---|
| CD | 90 | 0.967 | 0.962 | 0.969 | 0.964 |
| SIEVE | 86 | 0.967 | 0.962 | 0.969 | 0.964 |
| XCMS Online | 81 | 0.933 | 0.933 | 0.933 | 0.933 |
Figure 5Dysregulated Metabolic pathways in human CSF of preclinical AD vs. control patients. P-values and corrected p-values of metabolic pathways were calculated by weighing the number of compounds in the set against in the background in MBROLE software.