| Literature DB >> 24931979 |
Huibin Shen1, Kai Dührkop2, Sebastian Böcker2, Juho Rousu1.
Abstract
MOTIVATION: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures.Entities:
Mesh:
Year: 2014 PMID: 24931979 PMCID: PMC4058957 DOI: 10.1093/bioinformatics/btu275
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The metabolite identification framework through MKL. First, we construct the fragmentation tree from the MS/MS spectrum. Second, we compute kernels for both MS/MS data and fragmentation trees. Third, MKL is used to combine kernels and predict molecular fingerprints. Finally, fingerprints are used for molecular structure database retrieval
Micro-average performance of individual kernels
| Acc (%) | F1 (%) | Acc (%) | F1 (%) | |
|---|---|---|---|---|
| 79.5 ± 0.5 | 69.9 ± 0.9 | 78.9 ± 1.0 | 69.0 ± 2.2 | |
| 79.4 ± 0.3 | 69.6 ± 0.4 | 78.5 ± 1.2 | 68.4 ± 2.7 | |
| 77.8 ± 0.5 | 66.8 ± 0.7 | 77.4 ± 1.0 | 66.7 ± 2.0 | |
| 81.6 ± 0.8 | 73.2 ± 1.1 | 78.6 ± 1.0 | 68.4 ± 1.2 | |
| 78.4 ± 0.6 | 68.5 ± 0.8 | 76.7 ± 0.9 | 65.4 ± 1.6 | |
| 80.3 ± 0.7 | 71.1 ± 0.8 | 79.8 ± 1.0 | 70.5 ± 0.9 | |
| 80.6 ± 0.5 | 71.6 ± 0.7 | 78.7 ± 1.4 | 68.9 ± 2.4 | |
| 78.7 ± 0.7 | 68.4 ± 1.2 | 76.4 ± 1.0 | 65.5 ± 1.1 | |
| 72.9 ± 0.3 | 58.8 ± 0.5 | 72.2 ± 0.6 | 57.9 ± 0.5 | |
| 74.9 ± 0.4 | 61.9 ± 0.8 | 77.8 ± 0.8 | 67.2 ± 2.0 | |
| 76.7 ± 0.6 | 64.0 ± 0.7 | 72.9 ± 1.1 | 58.6 ± 1.2 | |
PPK is the method from Heinonen , which we compare against.
Micro-average performance of MKL algorithms
| Acc (%) | F1 (%) | Acc (%) | F1 (%) | |
|---|---|---|---|---|
| 85.0 ± 0.6 | 78.3 ± 0.7 | 82.2 ± 0.6 | 73.9 ± 1.5 | |
| 78.6 ± 0.7 | 82.4 ± 0.7 | 74.4 ± 1.4 | ||
| 85.0 ± 0.5 | ||||
| 84.9 ± 0.5 | 77.8 ± 0.5 | 82.1 ± 0.6 | 74.0 ± 0.7 | |
| 84.7 ± 0.5 | 77.5 ± 0.5 | 82.2 ± 0.5 | 74.0 ± 0.9 | |
| 78.5 ± 0.7 | 82.4 ± 0.6 | 74.4 ± 1.3 | ||
| 78.5 ± 0.8 | 82.3 ± 0.6 | 74.2 ± 1.0 | ||
| 85.1 ± 0.6 | 78.5 ± 0.7 | 82.3 ± 0.6 | 74.1 ± 1.3 | |
Sign test for the performance of MKL algorithms on the METLIN and MassBank datasets
| −− | − | ++ | ++ | −− | −− | ||||
| ++ | ++ | ++ | + | ++ | |||||
| + | ++ | ++ | + | ||||||
| −− | −− | −− | ++ | −− | −− | −− | |||
| −− | −− | −− | −− | −− | −− | −− | |||
| − | ++ | ++ | |||||||
| ++ | ++ | ++ | |||||||
| ++ | −− | − | ++ | ++ | |||||
| − | −− | + | + | + | |||||
| + | −− | + | ++ | ++ | ++ | ||||
| ++ | ++ | ++ | ++ | ++ | ++ | ++ | |||
| − | − | −− | − | −− | − | ||||
| − | −− | −− | − | − | |||||
| −− | + | + | |||||||
| −− | −− | ++ | + | − | |||||
| − | −− | −− | + | + | |||||
| − | + | −− | −− | −− | |||||
| + | ++ | ++ | |||||||
| + | ++ | ||||||||
| −− | − | + | −− | −− | |||||
| – | −− | −− | − | −− | −− | −− | |||
| ++ | ++ | ++ | |||||||
| ++ | ++ | ++ | |||||||
| ++ | ++ | ||||||||
| − | + | ||||||||
| + | ++ | ++ | |||||||
| ++ | ++ | ||||||||
| −− | −− | −− | −− | – | |||||
| −− | −− | ||||||||
| ++ | |||||||||
| ++ | |||||||||
| – | + |
‘+’ indicates the method in the row is better than the method in the column (‘−’ otherwise) with significance P-value between 0.01 and 0.05; blank indicates no significance. Similarly, ‘’ and ‘−−’ indicate significance with P-value < 0.01. Upper table is for accuracy and lower table is for F1.
Fig. 2.(a and b) show the performance for identification when searching KEGG using 300-ppm mass window with predicted molecular fingerprints, with fingerprints trained with METLIN and MassBank datasets, respectively. NUM denotes the number of candidate molecules returned per query. (c and d) show the proportion of data that were correctly identified in the top 1 rank against a series of mass windows