| Literature DB >> 27958387 |
Bo Li1, Jing Tang1, Qingxia Yang1, Xuejiao Cui1, Shuang Li1, Sijie Chen2, Quanxing Cao1, Weiwei Xue1, Na Chen1, Feng Zhu1.
Abstract
In untargeted metabolomics analysis, several factors (e.g., unwanted experimental &biological variations and technical errors) may hamper the identification of differential metabolic features, which requires the data-driven normalization approaches before feature selection. So far, ≥16 normalization methods have been widely applied for processing the LC/MS based metabolomics data. However, the performance and the sample size dependence of those methods have not yet been exhaustively compared and no online tool for comparatively and comprehensively evaluating the performance of all 16 normalization methods has been provided. In this study, a comprehensive comparison on these methods was conducted. As a result, 16 methods were categorized into three groups based on their normalization performances across various sample sizes. The VSN, the Log Transformation and the PQN were identified as methods of the best normalization performance, while the Contrast consistently underperformed across all sub-datasets of different benchmark data. Moreover, an interactive web tool comprehensively evaluating the performance of 16 methods specifically for normalizing LC/MS based metabolomics data was constructed and hosted at http://server.idrb.cqu.edu.cn/MetaPre/. In summary, this study could serve as a useful guidance to the selection of suitable normalization methods in analyzing the LC/MS based metabolomics data.Entities:
Mesh:
Year: 2016 PMID: 27958387 PMCID: PMC5153651 DOI: 10.1038/srep38881
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The overall research design and flowchart of this study.
Performance evaluation of 16 normalization methods across 10 sub-datasets based on the benchmark data MTBLS28 (ESI+ and ESI−).
| Normalization method | MetaboLights ID (ionization mode) | Sample size of 10 various sub-datasets used in the training set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 90 | 180 | 270 | 360 | 450 | 540 | 630 | 720 | 810 | 900 | ||
| Auto Scaling | MTBLS28 (ESI+) | 0.5905 | 0.6286 | 0.6381 | 0.6952 | 0.5810 | 0.6286 | 0.6381 | 0.6381 | 0.6476 | 0.6667 |
| MTBLS28 (ESI−) | 0.5524 | 0.5333 | 0.5810 | 0.6190 | 0.6000 | 0.6190 | 0.6571 | 0.6476 | 0.6381 | 0.6667 | |
| Contrast | MTBLS28 (ESI+) | 0.4762 | 0.5238 | 0.4095 | 0.4667 | 0.5238 | 0.5619 | 0.5429 | 0.5619 | 0.4762 | 0.5143 |
| MTBLS28 (ESI−) | 0.4381 | 0.3429 | 0.4667 | 0.3524 | 0.3524 | 0.3714 | 0.4000 | 0.3619 | 0.3524 | 0.3333 | |
| Cubic Splines | MTBLS28 (ESI+) | 0.6095 | 0.6095 | 0.6762 | 0.6952 | 0.6095 | 0.7143 | 0.6476 | 0.7048 | 0.6762 | 0.7048 |
| MTBLS28 (ESI−) | 0.6667 | 0.6095 | 0.6381 | 0.6000 | 0.6476 | 0.6952 | 0.6952 | 0.6762 | 0.7238 | 0.6667 | |
| Cyclic Loess | MTBLS28 (ESI+) | 0.6000 | 0.6190 | 0.6667 | 0.6381 | 0.6476 | 0.6476 | 0.6381 | 0.6762 | 0.7524 | 0.7238 |
| MTBLS28 (ESI−) | 0.5905 | 0.6667 | 0.6571 | 0.6190 | 0.6476 | 0.6667 | 0.6857 | 0.6762 | 0.6857 | 0.6857 | |
| Level Scaling | MTBLS28 (ESI+) | 0.5905 | 0.6190 | 0.6381 | 0.6476 | 0.6286 | 0.6476 | 0.6286 | 0.6381 | 0.6286 | 0.6095 |
| MTBLS28 (ESI−) | 0.5714 | 0.5619 | 0.6095 | 0.6190 | 0.6000 | 0.6095 | 0.6667 | 0.6476 | 0.6381 | 0.6095 | |
| Li-Wong | MTBLS28 (ESI+) | 0.5524 | 0.6095 | 0.5524 | 0.6000 | 0.5619 | 0.5714 | 0.6190 | 0.6286 | 0.6667 | 0.6762 |
| MTBLS28 (ESI−) | 0.5048 | 0.6286 | 0.5429 | 0.6571 | 0.6476 | 0.6762 | 0.6381 | 0.5905 | 0.6476 | 0.6286 | |
| Linear Baseline | MTBLS28 (ESI+) | 0.6095 | 0.6095 | 0.6476 | 0.7238 | 0.6095 | 0.6286 | 0.6667 | 0.6762 | 0.6952 | 0.6286 |
| MTBLS28 (ESI−) | 0.5619 | 0.5619 | 0.5905 | 0.6095 | 0.6190 | 0.6000 | 0.6762 | 0.6571 | 0.6286 | 0.6476 | |
| Log Transformation | MTBLS28 (ESI+) | 0.6476 | 0.6857 | 0.6952 | 0.6476 | 0.7048 | 0.6667 | 0.6857 | 0.6952 | 0.6762 | 0.6952 |
| MTBLS28 (ESI−) | 0.6095 | 0.6286 | 0.6190 | 0.6762 | 0.6095 | 0.6571 | 0.6857 | 0.6190 | 0.6952 | 0.6476 | |
| MSTUS | MTBLS28 (ESI+) | 0.5905 | 0.6095 | 0.6286 | 0.6476 | 0.6000 | 0.6571 | 0.6667 | 0.6762 | 0.6667 | 0.6762 |
| MTBLS28 (ESI−) | 0.5619 | 0.5714 | 0.5619 | 0.6000 | 0.6095 | 0.6286 | 0.6762 | 0.6762 | 0.6667 | 0.6476 | |
| Pareto Scaling | MTBLS28 (ESI+) | 0.6190 | 0.6571 | 0.6190 | 0.5905 | 0.6476 | 0.6667 | 0.6476 | 0.6952 | 0.6381 | 0.6667 |
| MTBLS28 (ESI−) | 0.5714 | 0.5619 | 0.5905 | 0.6095 | 0.6000 | 0.6190 | 0.6667 | 0.6381 | 0.6381 | 0.6476 | |
| Power Scaling | MTBLS28 (ESI+) | 0.6000 | 0.6571 | 0.5905 | 0.6190 | 0.6000 | 0.6286 | 0.6952 | 0.6476 | 0.7048 | 0.6952 |
| MTBLS28 (ESI−) | 0.5810 | 0.6381 | 0.5905 | 0.5714 | 0.5810 | 0.6095 | 0.6476 | 0.6095 | 0.6286 | 0.6476 | |
| PQN | MTBLS28 (ESI+) | 0.5619 | 0.6381 | 0.6476 | 0.6286 | 0.6381 | 0.6667 | 0.6762 | 0.6952 | 0.7524 | 0.7333 |
| MTBLS28 (ESI−) | 0.6000 | 0.6095 | 0.6476 | 0.6381 | 0.6571 | 0.6857 | 0.6667 | 0.7238 | 0.7429 | 0.6667 | |
| Quantile | MTBLS28 (ESI+) | 0.6095 | 0.6190 | 0.6476 | 0.6952 | 0.6095 | 0.6476 | 0.6762 | 0.7143 | 0.7048 | 0.6857 |
| MTBLS28 (ESI−) | 0.6667 | 0.6190 | 0.6381 | 0.6095 | 0.6286 | 0.6667 | 0.6857 | 0.6571 | 0.6381 | 0.7238 | |
| Range Scaling | MTBLS28 (ESI+) | 0.5714 | 0.6190 | 0.6190 | 0.5905 | 0.6381 | 0.6000 | 0.6190 | 0.6476 | 0.6381 | 0.6952 |
| MTBLS28 (ESI−) | 0.5810 | 0.5429 | 0.5714 | 0.5905 | 0.6000 | 0.6095 | 0.6190 | 0.6476 | 0.6667 | 0.6571 | |
| Vast Scaling | MTBLS28 (ESI+) | 0.5524 | 0.6190 | 0.6286 | 0.5905 | 0.6381 | 0.5905 | 0.6286 | 0.6571 | 0.6476 | 0.6381 |
| MTBLS28 (ESI−) | 0.5333 | 0.5714 | 0.6095 | 0.6095 | 0.6095 | 0.5810 | 0.6571 | 0.6190 | 0.6667 | 0.6190 | |
| VSN | MTBLS28 (ESI+) | 0.6381 | 0.6381 | 0.6286 | 0.7048 | 0.6476 | 0.7048 | 0.7048 | 0.6571 | 0.7524 | 0.7429 |
| MTBLS28 (ESI−) | 0.6571 | 0.6762 | 0.6476 | 0.6762 | 0.6476 | 0.6667 | 0.6762 | 0.7048 | 0.6857 | 0.6857 | |
The performance was evaluated by the prediction accuracies (ACCs) on the validation set. The ACC equals to (true positive + true negative)/(true positive + false positive + true negative + false negative).
Figure 2Normalization performance of 16 methods measured by receiver operating characteristic (ROC) curves based on four benchmark datasets: (a) MTBLS28 ESI+, (b) MTBLS28 ESI−, (c) MTBLS17 ESI+ and (d) MTBLS17 ESI−. The training dataset of (a) and (b) composed of 900 samples (400 lung cancer patients and 500 healthy individuals), and that of (c) and (d) consisted of 170 samples (50 HCC patients and 120 people with cirrhosis). The grey diagonal represented an invalid model with the corresponding area under the curve (AUC) value equaled to 0.5. All lines were generated by the LOESS regression.
Figure 3Cluster analysis of 16 normalization methods according to their AUC values (across 10 various sample sizes) calculated based on four benchmark datasets: (a) MTBLS28 ESI+, (b) MTBLS28 ESI−, (c) MTBLS17 ESI+ and (d) MTBLS17 ESI−. The data were presented in matrix format in which columns represent specific training dataset of various sample size and rows represent each normalization method. Each cell in heat map represents AUC value of a normalization method trained on one specific training sample. The cell of the highest AUC value was set as exact blue with those lower AUC values gradually fading towards red (the lowest AUC value). Hierarchical clustering analyses were conducted using Manhattan metric and Ward’s minimum variance algorithm.
Figure 4Method groups categorized according to the normalization performances across various sample sizes based on four benchmark datasets: (a) MTBLS28 ESI+, (b) MTBLS28 ESI−, (c) MTBLS17 ESI+ and (d) MTBLS17 ESI−. (G–A) superior performance group; (G-B1) good performance group including methods occasionally classified to top green area of Fig. 3; (G-B2) good performance group including methods consistently staying in middle blue area of Fig. 3; (C) poor performance group. All lines were generated by the LOESS regression.
Figure 5General operational procedure for using MetaPre.