| Literature DB >> 27073351 |
Ron Wehrens1, Jos A Hageman2, Fred van Eeuwijk2, Rik Kooke3, Pádraic J Flood4, Erik Wijnker5, Joost J B Keurentjes6, Arjen Lommen7, Henriëtte D L M van Eekelen8, Robert D Hall9, Roland Mumm8, Ric C H de Vos8.
Abstract
INTRODUCTION: Batch effects in large untargeted metabolomics experiments are almost unavoidable, especially when sensitive detection techniques like mass spectrometry (MS) are employed. In order to obtain peak intensities that are comparable across all batches, corrections need to be performed. Since non-detects, i.e., signals with an intensity too low to be detected with certainty, are common in metabolomics studies, the batch correction methods need to take these into account.Entities:
Keywords: Arabidopsis thaliana; Batch correction; Mass spectrometry; Non-detects; Untargeted metabolomics
Year: 2016 PMID: 27073351 PMCID: PMC4796354 DOI: 10.1007/s11306-016-1015-8
Source DB: PubMed Journal: Metabolomics ISSN: 1573-3882 Impact factor: 4.290
Overview of batch correction methods considered in this paper
| Method | Based on | Non-detects | Methodology |
|---|---|---|---|
| Q | QCs | NA | LS regression |
| Qc | QCs | NA | Censored regression |
| Q0 | QCs | 0 | LS regression |
| Q1 | QCs | LOD/2 | LS regression |
| Q2 | QCs | LOD | LS regression |
| S | Study | NA | LS regression |
| Sc | Study | NA | Censored regression |
| S0 | Study | 0 | LS regression |
| S1 | Study | LOD/2 | LS regression |
| S2 | Study | LOD | LS regression |
| R0 | QCs | 0 | PCA |
| R1 | QCs | LOD/2 | PCA |
| R2 | QCs | LOD | PCA |
Methods “Q” are based on different forms of regression using the QCs, methods “S” on regressions using the study samples, and “R” on the RUV method, a PCA of the QCs. Non-detects are handled as missing values (NA) or imputed with a single value (0, LOD/2, or LOD), column “non-detects”
Fig. 1Data for a single metabolite measured in two batches of 80 samples each. a Showing uncorrected data, there is a clear overall intensity difference between the batches, and a gradual intensity decrease within both batches. QCs are indicated by red dots, study samples with circles. Correction lines fitted through the QCs in the individual batches are indicated by the red lines. The intensities after correction are shown in b
Fig. 2PCA plots of the LC–MS data for the Arabidopsis hapmap population (data set I). a Shows the uncorrected data where the different batches can clearly be recognized, especially batches 1 and 2. b Shows, as an example of what can be achieved, the result after correction with strategy Q
Fig. 3Repeatabilities for individual metabolites. Uncorrected data on the x axis; corrected data (strategy Q) on the y axis. In almost all cases repeatabilities show an improvement upon correction
Fig. 4Comparison of the performance of the batch correction methods for the LC–MS Arabidopsis hapmap data set. The best values are in the top left corner: low values for the PCA distance criterion on the x axis, and high repeatabilities (y axis)
Fig. 5Results of the corrections for the hapmap GC–MS data. a Corrections based on batch information only (strategies Q and S). b Batch information as well as injection sequence are used in the correction with the S strategies. The values for the RUV corrections and the uncorrected data are the same in both panels
Fig. 6Correction results for the diallel study data set. a Corrections based only on batch averages; b corrections based on batch and injection order information. In both panels the points for the RUV corrections and uncorrected data are identical
The percentage of cases (metabolite/batch combinations) for which correction is impossible for the three data sets and the correction strategies considered
| Data set I (%) | Data set II (%) | Data set III (%) | |
|---|---|---|---|
| Q (ave) | – | 29.2 | 14.3 |
| Q (lin) | 37.1 | – | 58.0 |
| S (ave) | – | 5.6 | 1.3 |
| S (lin) | 9.0 | 11.3 | 2.3 |
| R | 0.0 | 0.0 | 0.0 |
Injection order is not taken into account in the lines denoted “ave”; it is in the lines denoted “lin”