| Literature DB >> 33244585 |
Eric Bach1, Simon Rogers2, John Williamson2, Juho Rousu1.
Abstract
MOTIVATION: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2).Entities:
Year: 2021 PMID: 33244585 PMCID: PMC8289373 DOI: 10.1093/bioinformatics/btaa998
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Workflow of our framework and its main components. (a) Data acquisition in an LC-MS experiment resulting in a set of (MS2, RT)-tuples of unknown molecules. (b) Illustration of the underlying graphical model. (c) Ensemble of spanning trees to approximate the MRF and their integration using averaged marginals. (d) Output to the user: ranked molecular candidate lists based on the approximated marginals. (e) Incorporation of the predicted retention orders for a particular assignment for z via the edge potential function. (f) Illustration of the RankSVM model
Summary of the datasets used for the evaluation of our score-integration framework
| Dataset | Ionization | Mass spectra info. | Molecular candidates | Chromatography | |||
|---|---|---|---|---|---|---|---|
| MS1 info. | #MS2 | Tot. #Cand. | Median #Cand. | Column | Eluent | ||
| CASMI 2016 | Negative | Precursor m/z | 81 | 74 589 | 420 | Phenomenex Kinetex EVO C18 | H2O → MeOH (both 0.1% formic acid) |
| CASMI 2016 | Positive | Precursor m/z | 127 | 183 633 | 919 | Phenomenex Kinetex EVO C18 | H2O → MeOH (both 0.1% formic acid) |
| EA (Massbank) | Negative | Precursor m/z | 154 | 75 107 | 119.5 | Waters XBridge C18 | H2O → MeOH (both 0.1% formic acid) |
| EA (Massbank) | Positive | Precursor m/z | 319 | 215 893 | 246 | Waters XBridge C18 | H2O → MeOH (both 0.1% formic acid) |
Extracted from ChemSpider. CASMI: ±5 ppm window around monoisotopic exact mass of correct candidate. EA: MF of correct candidate.
Fig. 2.Top-k accuracies, averaged across all datasets and ionizations, plotted against the number of random spanning trees (L) used for the approximation. The baseline using only MS2 information is plotted in black. The sigmoid function is used in the score integration. The differences between the sum- and max-marginals’ average performance for the L values is significant (P < 0.001 for Top-1 and 20, Two-sided Wilcoxon Signed-Rank test
Identification accuracies (top-k) for the different datasets and ionization modes
| Negative | Positive | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset | Method | Top-1 | Top-5 | Top-10 | Top-20 | Top-1 | Top-5 | Top-10 | Top-20 |
| CASMI 2016 | MS2 + RT ( |
| 47.2 (***) | 57.0 (**) | 70.1 (***) |
|
|
|
|
| MS2 + RT (Chain-graph) | 13.2 (***) |
|
| 69.4 (***) | 11.9 | 36.5 | 50.2 (***) | 60.7 (***) | |
| MS2 + RT (MetFrag 2.2) | 14.0 (***) | 42.0 | 55.5 |
| 13.7 (***) | 36.2 | 46.2 | 57.5 | |
| Only MS2 | 11.1 | 44.2 | 55.3 | 68.0 | 11.8 | 37.3 | 47.0 | 58.3 | |
| EA Massbank | MS2 + RT ( | 28.7 (***) |
|
| 83.6 (***) |
|
|
|
|
| MS2 + RT (Chain-graph) | 27.2 (***) | 59.5 (***) | 72.4 (***) | 81.8 (***) | 23.9 (***) | 59.2 | 70.1 | 79.1 (***) | |
| MS2 + RT (MetFrag 2.2) |
| 59.2 (***) | 73.6 (***) |
| 24.0 (***) | 59.0 | 69.5 | 77.1 | |
| Only MS2 | 22.8 | 57.6 | 69.5 | 78.5 | 21.2 | 59.0 | 69.7 | 77.6 |
Note: Compares our score-integration framework (MS2 + RT (our)), against the baseline (Only MS2), MetFrag 2.2 with predicted RT and the Chain-graph model. The best performance for each dataset and ionization is indicated by bold-font. The stars (*) represent the significant improvement over the baseline calculated using a one-sided Wilcoxon signed-rank test on the sample top-k accuracies (P < 0.05 (*), P < 0.01 (**) and P < 0.001 (***)).
Pairwise test for significant improvement of the MS2 + RT score-integration methods: Our, MetFrag 2.2 and Chain-graph
| Method | Top-1 | Top-20 | ||
|---|---|---|---|---|
| (MS2 + RT) | Chain-graph | MetFrag 2.2 | Chain-graph | MetFrag 2.2 |
|
|
|
|
|
|
| Chain-graph | — | n.s. | — |
|
| MetFrag 2.2 |
| — | n.s. | — |
Note: We show the P-values for testing the improvement of the row over the column method using a one-sided Wilcoxon signed-rank test. The test is performed over all top-k accuracy samples (datasets and ionization). MetFrag 2.2 and Chain-graph could not significantly outperform our framework. P-values are marked with ‘n.s.’.
Top-k accuracies averaged across all datasets for two MS2-scorers
| MS2-scorers | Method | Top-1 | Top-5 | Top-10 | Top-20 |
|---|---|---|---|---|---|
| MetFrag | MS2 + RT | 21.3 | 52.9 | 64.0 | 74.3 |
| Only MS2 | 16.7 | 49.5 | 60.4 | 70.6 | |
| IOKR | MS2 + RT | 26.7 | 52.1 | 62.5 | 70.3 |
| Only MS2 | 25.1 | 49.5 | 60.3 | 67.6 |
Top-k accuracies averaged on the CASMI data (pos. & neg.) using either MetFrag or IOKR as MS2-scorer for two different candidate sets: ‘All’ molecules queried using a mass window; only those with ‘correct molecular formula’
| MetFrag | IOKR | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Candidate Set | Method | Top-1 | Top-5 | Top-10 | Top-20 | Top-1 | Top-5 | Top-10 | Top-20 |
| All | MS2 + RT | 14.6 (***) | 44.0 (***) | 54.6 (***) | 66.5 (***) | 26.0 (***) | 48.0 (***) | 60.0 (***) | 69.1 (***) |
| Only MS2 | 11.4 | 40.7 | 51.2 | 63.2 | 24.4 | 46.0 | 58.4 | 65.5 | |
| Correct MF | MS2 + RT | 17.7 (***) | 48.4 (***) | 59.8 (***) | 71.0 (***) | 30.6 | 52.3 | 66.2 (***) | 75.1 (*) |
| Only MS2 | 13.1 | 46.0 | 56.9 | 68.7 | 30.6 | 53.9 | 65.3 | 74.8 |
Note: The stars (*) represent the significant improvement over the Only MS2 (see Table 2 for details on the significance test).
Fig. 3.Top-k accuracies and improvements averaged across all datasets. Plots for different percentages of available MS2 spectra: 0% (only MS1 and RT) to 100% of MS2 spectra (previous experiments)