| Literature DB >> 32324219 |
Qingxia Yang1,2, Yunxia Wang1, Ying Zhang1, Fengcheng Li1, Weiqi Xia1, Ying Zhou3, Yunqing Qiu3, Honglin Li4, Feng Zhu1,2.
Abstract
Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.Entities:
Mesh:
Year: 2020 PMID: 32324219 PMCID: PMC7319444 DOI: 10.1093/nar/gkaa258
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Key features added to NOREVA 2.0 which realize the normalization and evaluation of both time-course and multi-class metabolomic data (left panel), integrate the normalization methods of combination strategy proposed by recent publication (6) (right panel), and identifying the well-performing methods by assessing the largest set of normalizations to date (168 in total, significantly larger than the 24 methods in NOREVA 1.0, middle panel).
Summarization of and comparison between the functions provided in NOREVA 2.0 and 1.0. The check mark (√) indicated that the corresponding function(s) had been available for using, while the cross (×) denoted the non-existence of such function
| No. | The unique functions provided | NOREVA 2.0 | NOREVA 1.0 |
|---|---|---|---|
| 1 | Identifying the well-performing normalizations using multiple criteria | √ | √ |
| 2 | Removing the overall unwanted variations using ISs/QCMs | √ | √ |
| 3 | Correcting the signal drifts based on QCSs and subsequent data normalization | √ | √ |
| 4 | Realizing the normalization and performance evaluation for time-course metabolomics | √ | × |
| 5 | Enabling the normalization and performance evaluation for multi-class metabolomics | √ | × |
| 6 | Integrating over one hundred novel normalization methods of the combination strategy | √ | × |
| 7 | Discovering the best ones by comprehensively assessing the largest set of methods | √ | × |
Eight benchmark datasets collected for case study analysis. Particularly, four time-course & four multi-class metabolomic benchmarks were collected. The number of time-points/classes in each benchmark was provided and described. GC–MS: gas chromatography–mass spectrometry; IS: internal standard; LC–MS: liquid chromatography–mass spectrometry; QCS: quality control sample
| Dataset ID & Platform | Remarks on Each Dataset | Dataset Description |
|---|---|---|
| MTBLS665 ( | Untargeted metabolomic dataset of 3 time-points without QCS & IS | 4,236 metabolites from people before |
| MTBLS518 ( | Untargeted metabolomic dataset of 7 time-points with QCS | 14,339 metabolites from |
| MTBLS319 ( | Untargeted metabolomic dataset of 3 time-points with IS | 116 metabolites from the mutation strains of |
| MTBLS656 ( | Targeted metabolomic dataset of 3 time-points without QCS & IS | 259 metabolites from the healthy volunteers of a time-series consecutive sample collections at 0 hr, 12 hrs, and 24 hrs |
| MTBLS59 ( | Untargeted metabolomic dataset of 4 classes without QCS & IS | 1,632 metabolites from 4 types of apple extracts (control, other 3 spiked with nine compounds of different concentrations) |
| MTBLS520 ( | Untargeted metabolomic dataset of 9 classes with QCS | 4,172 metabolites from 9 different bryophytes ( |
| MTBLS370 ( | Untargeted metabolomic dataset of 4 classes with IS | 885 extracellular metabolites from fresh medium, |
| MTBLS370 ( | Targeted metabolomic dataset of 4 classes without QCS & IS | 72 extracellular metabolites from fresh medium, |
The performances of representative normalization methods on different types of benchmarks (collectively assessed by four different criteria). () assessing results for three untargeted time-course benchmarks: without QCS & IS (73), with QCS (74), with IS (75) and one targeted time-course benchmark (76); () assessing results for three untargeted multi-class benchmarks: without QCS & IS (72), with QCS (77) and with IS (78) and one targeted multi-class benchmark (78). Based on the ‘Superior’, ‘Good’, and ‘Poor’ performances defined in the second section of MATERIALS AND METHODS, the background of each assessment result was colored in green, light green, and red for the ‘Superior’, ‘Good’, and ‘Poor’ performances, respectively. The abbreviations of normalization methods were described in Supplementary Table S1
|
|
Figure 2.Comparing the performances of three normalization methods on the time-course benchmark MTBLS665 (73) based on the well-established metabolic marker (kynurenine) elevated in the patient plasma after malaria infection and then declined after treatment (80,81). (A) the normalization method (Mean) applied in the original study of the MTBLS665 benchmark (73); (B) the normalization method (range scaling+EigenMS) identified to be consistently well-performing under all four criteria by NOREVA as shown in Table 3A; (C) the normalization method (contrast+level scaling) identified to be consistently poorly-performing under all four criteria by NOREVA as shown in Table 3A. The violin plots were used to illustrate the concentration distribution of kynurenine among individuals, and the dots indicated the exact concentrations of kynurenine in an individual at certain time-points (T0, T1 and T2). All concentrations were scaled into the range between 0 and 1.
Figure 3.Comparison of two representative normalization methods based on nine spiking compounds. (A) the concentration distribution among four studied groups after the normalization using level scaling+EigenMS (LEV+EIG); (B) the concentration distribution among all studied groups after the normalization via auto scaling+total sum (AUT+SUM). Base on the comprehensive performance assessments of all 168 normalization methods, LEV+EIG demonstrated consistently Superior performance across all criteria, while AUT+SUM was identified to be consistently poorly-performing under all four criteria (as demonstrated in Table 3). Particularly, two out of the nine spiking compounds (trans-resveratrol & cyanidin-3-galactoside) are not naturally present in the studied extract, so the constant concentrations were spiked for each compound (0.4 and 0.57 mg/l, respectively). Six out of the remaining seven compounds (catechin, phloridzin, epicatechin, quercetin-3-galactoside, quercetin-3-rhamnoside & quercetin-3-glucoside) were spiked into three groups with the gradual increase of concentration (from control to an increase of 20%, then 40%, and finally 100%). The last compound (quercetin) was also spiked with a variation of concentration (from control to an increase of 20%, then 40%, and finally 40%).
Figure 4.Comprehensive assessment among all normalization methods (the top-100 were shown) based on the collective evaluations using four different criteria. The assessing outcomes for time-course datasets: (A) MTBLS665 without QCS & IS (73) & (B) MTBLS518 with QCS (74), and multi-class benchmarks: (C) MTBLS59 without QCS & IS (72) & (D) MTBLS520 with QCS (77) were comprehensively ranked and colored using performances. Based on the description in the second section of MATERIALS AND METHODS, the background of each evaluation result was shown in green, light green and red for Superior, Good and Poor performance, respectively. The abbreviations of the normalization methods were described in Supplementary Table S1. Criteria Ca, Cb, Cc and Cd were measured by PMAD, purity, CW and AUC, respectively.