| Literature DB >> 24176773 |
Erhan Kenar1, Holger Franken, Sara Forcisi, Kilian Wörmann, Hans-Ulrich Häring, Rainer Lehmann, Philippe Schmitt-Kopplin, Andreas Zell, Oliver Kohlbacher.
Abstract
Liquid chromatography coupled to mass spectrometry (LC-MS) has become a standard technology in metabolomics. In particular, label-free quantification based on LC-MS is easily amenable to large-scale studies and thus well suited to clinical metabolomics. Large-scale studies, however, require automated processing of the large and complex LC-MS datasets. We present a novel algorithm for the detection of mass traces and their aggregation into features (i.e. all signals caused by the same analyte species) that is computationally efficient and sensitive and that leads to reproducible quantification results. The algorithm is based on a sensitive detection of mass traces, which are then assembled into features based on mass-to-charge spacing, co-elution information, and a support vector machine-based classifier able to identify potential metabolite isotope patterns. The algorithm is not limited to metabolites but is applicable to a wide range of small molecules (e.g. lipidomics, peptidomics), as well as to other separation technologies. We assessed the algorithm's robustness with regard to varying noise levels on synthetic data and then validated the approach on experimental data investigating human plasma samples. We obtained excellent results in a fully automated data-processing pipeline with respect to both accuracy and reproducibility. Relative to state-of-the art algorithms, ours demonstrated increased precision and recall of the method. The algorithm is available as part of the open-source software package OpenMS and runs on all major operating systems.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24176773 PMCID: PMC3879626 DOI: 10.1074/mcp.M113.031278
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
Fig. 1.General procedure of feature finding. A, starting with the most intense peaks (magenta dots), potential mass traces are extended with peaks compatible in m/z back and forth in retention time. B, each mass trace is smoothed to facilitate the determination of chromatographic maxima. Multimodal elution profiles are split into smaller mass traces with respect to the number of maxima. C, based on this set of mass traces, potential feature hypotheses are generated and scored according to their compatibility with theoretical isotope patterns. D, finally, the best-scoring hypotheses are assembled to features.
Standard compounds of the spike-in experiments
| Standard | CAS-RN | Retention time (min) | |
|---|---|---|---|
| Propionyl- | 1182037-75-7 | 221.15751 | 1.4 |
| Nialamide | 51-12-7 | 299.15025 | 5.7 |
| Sulfadimethoxine-d6 | 73068-02-7 | 317.11851 | 8.4 |
| Reserpine | 50-55-5 | 609.28065 | 10.6 |
| Terfenadine | 50679-08-8 | 472.32099 | 12.4 |
| Hexadecanoyl- | 1334532-26-1 | 403.36096 | 16.6 |
| Octadecanoyl- | N.A. | 431.39227 | 18.6 |
Fig. 2.Relationship between stock solution concentrations and feature intensities for reserpine and terfenadine. For each concentration, an error bar depicts the standard deviation of the triplicate intensities. The solid line connects the mean intensities obtained from each of the 11 triplicates. For each compound, a line was fit via linear regression to the 33 concentration-intensity data points with the goodness-of-fit R2 (dashed line). The inlaid plots show a close-up of the low-concentration range between 0 and 0.5 ppm.
Fig. 3.Numbers of reproducible features yielded by our algorithm and XCMS/CAMERA in the spike-in human plasma dataset. Each bin of the histogram plot represents a number of measurements a feature was reproducibly detected in. The bars show the absolute numbers of features that were detected with the respective reproducibility.
Fig. 4.Intensity variation between features that were matched in at least half of the 33 spike-in human plasma measurements. For each matched feature, we computed the intensities' coefficients of variation over the respective measurements. All matched features were distributed to three bins with respect to the intensity's magnitude of order. Based on the intensity bin's underlying distributions, median coefficient of variation values were determined and are depicted in the bar plot. We conducted one-sided Wilcoxon tests to investigate whether the observed difference in the median coefficients of variation among the methods was significant. The results from the pairwise comparisons are marked according to significance level (asterisks). To enable a direct comparison of FeatureFinderMetabo and XCMS/CAMERA feature intensities, we normalized all measurements by quantile normalization.
Performance scores computed for the simulated plant metabolite dataset (the algorithmic parameters were set appropriately to the simulated characteristics of the data)
| Method | Recall | Precision | F-score |
|---|---|---|---|
| FeatureFinderMetabo | 0.96 | 0.97 | 0.97 |
| XCMS/CAMERA | 0.88 | 0.37 | 0.52 |