| Literature DB >> 31438611 |
Nico Borgsmüller1,2,3, Yoann Gloaguen1,2,3, Tobias Opialla2,3,4, Eric Blanc1,5, Emilie Sicard6, Anne-Lise Royer7, Bruno Le Bizec7, Stéphanie Durand6, Carole Migné6, Mélanie Pétéra6, Estelle Pujos-Guillot6, Franck Giacomoni6, Yann Guitton7, Dieter Beule1,3,5, Jennifer Kirwan8,9.
Abstract
Lack of reliable peak detection impedes automated analysis of large-scale gas chromatography-mass spectrometry (GC-MS) metabolomics datasets. Performance and outcome of individual peak-picking algorithms can differ widely depending on both algorithmic approach and parameters, as well as data acquisition method. Therefore, comparing and contrasting between algorithms is difficult. Here we present a workflow for improved peak picking (WiPP), a parameter optimising, multi-algorithm peak detection for GC-MS metabolomics. WiPP evaluates the quality of detected peaks using a machine learning-based classification scheme based on seven peak classes. The quality information returned by the classifier for each individual peak is merged with results from different peak detection algorithms to create one final high-quality peak set for immediate down-stream analysis. Medium- and low-quality peaks are kept for further inspection. By applying WiPP to standard compound mixes and a complex biological dataset, we demonstrate that peak detection is improved through the novel way to assign peak quality, an automated parameter optimisation, and results in integration across different embedded peak picking algorithms. Furthermore, our approach can provide an impartial performance comparison of different peak picking algorithms. WiPP is freely available on GitHub (https://github.com/bihealth/WiPP) under MIT licence.Entities:
Keywords: gas chromatography-mass spectrometry (GC-MS); machine learning; metabolomics; parameter optimisation; peak classification; peak detection; pre-processing workflow; support vector machine
Year: 2019 PMID: 31438611 PMCID: PMC6780109 DOI: 10.3390/metabo9090171
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Number of peaks detected by individual algorithms on the high, medium, and low concentrations of the standard mix dataset 1 in low resolution (A,B) and high resolution (C,D). (A,C): Number of unique high-quality peaks as classified by workflow for improved peak picking (WiPP) and their algorithm of origin. (B,D): Proportion of peaks detected by centWave and matchedFilter, rejected by at least one of the quality filters in WiPP.
Figure 2Number of metabolites detected and annotated manually or automatically by WiPP compared to the number of compounds present in the three concentrations of dataset 1.
Comparison of the results found by the published study to the one produced using WiPP automated workflow. (X) Data could not be automatically computed. (–) Missing data.
| ID | Identified (WiPP) | Fold Change (Study) | Fold Change (WiPP) | ||
|---|---|---|---|---|---|
| Glutamic Acid | + |
| 1.9 |
| 1.89 |
| α−tocopherol | + |
| 1.5 |
| 1.36 |
| Valine | + |
| 1.5 |
| 1.52 |
| Citric Acid | + |
| −1.3 |
| −1.20 |
| Sorbose | + |
| −2.4 |
| −1.66 |
| Cholesterol | + |
| 1.1 |
| 1.10 |
| Lactic Acid | + |
| −1.3 | X | X |
| Leucine | + |
| 1.6 | X | X |
| Isoleucine | + |
| 1.5 | X | X |
| Hexose | + | – | – |
| −1.61 |
Figure 3Schematic representation of the peak sets. (A) Chromatographic representation of peaks detected by the two peak picking algorithms accepted or rejected by WiPP. An individual ID is assigned to each peak. Peak 4 and 5 are erroneously detected by algorithm 1 as one single merged peak, hence the light blue colour between the distinct peaks properly detected by algorithm 2. (B) Peak called by peak picking algorithm 1 and 2 compared to the true peak set of a dataset before parameter optimisation. (C) Peak called by algorithm 1 and 2 after parameter optimisation. (D) Peaks accepted and rejected by WiPP compared to the true peak set. Circled numbers represent the peak ID from subfigure A and are placed in their respective regions in peak space.
Figure 4Schematic representation of the seven peak classes defined in WiPP. For clarity purposes, only one m/z is represented here. (A) Apex shifted to the left. (B) Centred apex. (C) Apex shifted to the right. (D) Noise. (E) Peak with wide margins to window borders. (F) Peak exceeds window borders. (G) Merged/shoulder peak.
Figure 5Flowchart of the WiPP method consisting of 6 steps. Steps 1 to 3 consisted of generating the training data and training the classifiers using a calibration dataset. Step 4 optimised the parameters of individual algorithms using an optimisation dataset and the trained classifiers. Step 5 ran the optimised peak detection algorithms on the full biological dataset. Step 6 classified, filtered, and merged the outputs of individual peak picking algorithms to generate a high-quality peak set.