| Literature DB >> 29086042 |
Emma L Schymanski1, Christoph Ruttkies2, Martin Krauss3, Céline Brouard4,5, Tobias Kind6, Kai Dührkop7, Felicity Allen8, Arpana Vaniya6,9, Dries Verdegem10, Sebastian Böcker7, Juho Rousu4,5, Huibin Shen4,5, Hiroshi Tsugawa11, Tanvir Sajed8, Oliver Fiehn6,12, Bart Ghesquière10, Steffen Neumann2.
Abstract
BACKGROUND: The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification.Entities:
Keywords: Compound identification; High resolution mass spectrometry; In silico fragmentation; Metabolomics; Structure elucidation
Year: 2017 PMID: 29086042 PMCID: PMC5368104 DOI: 10.1186/s13321-017-0207-1
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Overlapping challenges between Category 1 and Categories 2 and 3
| Name | Category 1 | Categories 2 and 3 | Mode |
|---|---|---|---|
| Creatinine | Challenge-010 | Challenge-084 | Positive |
| Anthrone | Challenge-011 | Challenge-162 | Positive |
| Flavone | Challenge-012 | Challenge-166 | Positive |
| Medroxyprogesterone | Challenge-013 | Challenge-184 | Positive |
| Abietic acid | Challenge-014 | Challenge-207 | Positive |
| Estrone-3-( | Challenge-015 | Challenge-034 | Negative |
| Alizarin | Challenge-016 | Challenge-045 | Negative |
| Thyroxine | Challenge-017 | Challenge-048 | Negative |
| Purpurin | Challenge-018 | Challenge-054 | Negative |
| Monensin | Challenge-019 | Challenge-079 | Negative |
Results summary for Categories 2 and 3: medal tally and other statistics
The first, second and third place by “Gold medals” (used to declare CASMI winners) are highlighted in red, orange and yellow, respectively. The best value per statistic is marked in bold
Results summary for additional Category 2 entries
| Allen | Brouard | Dührkop | Ruttkies | Vaniya | Verdegem | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| CFM_retrain |
| CSI:IOKR_AR* | CSI:IOKR_U |
| CSI:FID_leaveout* | MetFrag* | MetFrag+CFM* |
|
| MAGMa* | |
| Top 1 Neg. | 12 | 12 | 9 | 9 | 8 | 0 | 0 | 9 |
| 14 | 8 | 7 |
| Top 1 Pos. | 27 | 28 | 53 | 69 | 50 |
| 36 | 15 | 21 | 32 | 16 | 14 |
| Top 1 | 39 | 40 | 62 |
| 58 | 70 | 36 | 24 | 41 | 46 | 24 | 21 |
| Top 3 | 77 | 73 | 93 |
| 95 | 90 | 70 | 60 | 84 | 79 | 59 | 51 |
| Top 10 | 123 | 116 | 118 |
| 118 | 100 | 88 | 108 | 127 | 101 | 105 | 106 |
| Mean rank | 47.98 | 44.53 | 127.3 | 95.09 | 123.3 | 25.17 | 52.02 | 51.92 | 33.97 |
| 70.79 | 70.24 |
| Med. rank | 6 | 7 | 5.25 | 4 | 5 |
| 3 | 8.75 | 6 | 3 | 9.8 | 9.8 |
| Mean RRP | 0.906 | 0.917 | 0.874 | 0.887 | 0.857 |
| 0.931 | 0.905 | 0.915 | 0.804 | 0.88 | 0.88 |
| Med. RRP | 0.987 | 0.985 | 0.988 | 0.993 | 0.98 |
| 0.995 | 0.98 | 0.991 | 0.922 | 0.972 | 0.969 |
| Gold | 53 | 52 | 73 |
| 70 | 74 | 41 | 32 | 51 | 61 | 35 | 31 |
| Formula 1 | 1957 | 1900 | 2276 |
| 2237 | 2156 | 1596 | 1593 | 2058 | 1867 | 1524 | 1463 |
| Medal Sc. | 275 | 269 | 375 |
| 371 | 396 | 252 | 198 | 292 | 305 | 195 | 175 |
| Q_10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1.4 |
| Q_25 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 3 | 2 | 1 | 3 | 3.5 |
| Q_50 | 6 | 7 | 5.25 | 4 | 5 | 1 | 3 | 8.75 | 6 | 3 | 9.8 | 9.8 |
| Q_75 | 36.25 | 27.63 | 55.5 | 36 | 78.75 | 6 | 17 | 37.88 | 25 | 17 | 66.1 | 64.5 |
| Q_90 | 121.8 | 104.6 | 192.9 | 134.9 | 288.9 | 37.5 | 72.4 | 120.9 | 87.65 | 68.75 | 187.1 | 148.5 |
The column header of entries used in Table 2 are given in italics. The best value per statistic is marked in bold. * indicates internal and post-competition submissions. Med. = median. Q_X indicates Xth quantile
Results summary for additional Category 3 entries
| Allen | Kind | Ruttkies | ||||
|---|---|---|---|---|---|---|
| CFM orig +DB |
|
| MetFrag+RT+Refs* | MetFrag+CFM+RT+Refs* | MetFrag+CFM+RT+Refs+MoNA* | |
| Top 1 | 117 | 120 | 146 | 162 |
| 155 |
| Top 3 | 159 | 160 | 162 |
| 180 | 182 |
| Top 10 | 182 | 182 | 174 | 191 |
| 194 |
| Mean rank | 14 | 13.62 | 6.4 | 7.04 | 5.39 |
|
| Median rank |
|
|
|
|
|
|
| Mean RRP | 0.969 | 0.971 | 0.904 | 0.987 | 0.989 |
|
| Median RRP |
|
|
|
|
|
|
| Gold | 124 | 128 | 148 | 168 |
| 167 |
| Formula 1 | 3798 | 3861 | 4011 | 4469 |
| 4437 |
| Medal score | 687 | 700 | 766 | 855 |
| 840 |
| Q_10 | 1 | 1 | 1 | 1 | 1 | 1 |
| Q_25 | 1 | 1 | 1 | 1 | 1 | 1 |
| Q_50 | 1 | 1 | 1 | 1 | 1 | 1 |
| Q_75 | 3 | 3 | 2 | 1 | 1 | 2 |
| Q_90 | 13.7 | 14.0 | 15.0 | 5.0 | 5.0 | 4.3 |
The column header of entries used in Table 2 are given in italics. The best value per statistic is marked in bold. * Indicates internal and post-competition submissions. Q_X indicates Xth quantile
Contribution of Metadata to the results
| RT | MoNA | Lowest CSID | Refs | Combined MS/MS | Combined MS/MS+RT+Refs | Combined MS/MS+RT+Refs+MoNA | |
|---|---|---|---|---|---|---|---|
| Top 1 | 1 | 70 | 113 | 143 | 82 |
|
|
| Top 3 | 5 | 87 | 158 | 177 | 126 | 183 |
|
| Top 10 | 20 | 104 | 177 |
| 166 | 194 | 195 |
| Mean rank | 504.5 | 238.3 | 37.7 |
| 13.4 | 3.9 | 3.7 |
| Median rank | 135 | 10.25 |
|
| 2 |
|
|
| Mean RRP | 0.576 | 0.780 | 0.959 |
| 0.955 | 0.990 | 0.991 |
| Median RRP | 0.630 | 0.977 |
|
| 0.998 |
|
|
The first four columns contain submissions formed using just one type of metadata, the “Combined MS/MS” column was formed by equally weighting all Category 2 entries from Table 2, while the last two columns combined this with retention time and references without and with MoNA, respectively
The best value per statistic is marked in bold
Comparison of Categories 1, 2 and 3 results for the overlapping challenges in Category 1
The median ranks of Categories 1 and 3 (highlighted) are remarkably similar
Fig. 1Heat Map of CASMI Challenges 1–81 (negative mode). Both Category 2 (green labels on the right) and 3 (blue labels) participants are included. Missing values (correct solution missed, or no submission for a challenge) were replaced with the number of candidates for that challenge. Ranks are log-scaled from good (blue) to poor (red). Team Dührkop was omitted as they did not submit for any challenge, while CSI:IOKR_AR and CFM_retrain were omitted as these were identical with their original submissions. An interactive version of this plot with legible challenge numbers is available from [47]
Fig. 2Heat Map of CASMI Challenges 82–208 (positive mode) both Category 2 (green labels on the right) and 3 (blue labels) participants are included. Missing values (correct solution missed, or no submission for a challenge) were replaced with the number of candidates for that challenge. Ranks are log-scaled from good (blue) to poor (red). Interactive version with legible challenge numbers available from [48]
Global leaveout analysis for additional Category 2 entries—including only challenges where the correct answer was not in any training set
| Allen | Brouard | Dührkop | Ruttkies | Vaniya | Verdegem | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| CFM_re-train |
| CSI_IOKR_AR* | CSI_IOKR_U |
| CSI:FID_leaveout* | MetFrag* | MetFrag+CFM* |
|
| MAGMa* | |
| Top 1 Neg. | 6 | 6 | 6 | 6 | 4 | 0 | 0 | 4 | 10 | 7 | 4 | 3 |
| Top 1 Pos. | 4 | 9 | 9 | 10 | 7 | 9 | 13 | 1 | 3 | 3 | 2 | 2 |
| Top 1 | 10 | 15 | 15 |
| 11 | 9 | 13 | 5 | 13 | 10 | 6 | 5 |
| Top 3 | 23 | 24 | 29 | 26 | 27 | 17 | 23 | 16 | 27 | 25 | 16 | 14 |
| Top 10 | 46 | 40 | 45 | 46 | 40 | 25 | 32 | 39 | 47 | 38 | 35 | 35 |
| Mean rank | 52.57 | 64.05 | 106.5 | 97.84 | 99.92 | 52.81 | 41.48 | 68.38 | 37.16 | 28.7 | 76.75 | 100.4 |
| Med. rank | 10 | 12.5 | 8 | 10 | 12 | 7 | 3 | 14.5 | 8 | 7.5 | 23.5 | 20.5 |
| Mean RRP | 0.863 | 0.872 | 0.849 | 0.856 | 0.837 | 0.891 | 0.91 | 0.863 | 0.878 | 0.738 | 0.832 | 0.811 |
| Med. RRP | 0.966 | 0.961 | 0.963 | 0.967 | 0.956 | 0.981 | 0.993 | 0.942 | 0.972 | 0.806 | 0.924 | 0.902 |
| Gold | 18 | 21 | 23 |
| 19 | 11 | 17 | 7 | 18 | 18 | 10 | 9 |
| F1 score | 628 | 654 |
| 691 | 632 | 403 | 557 | 484 | 707 | 594 | 462 | 434 |
| Medal Sc. | 79 | 94 |
| 98 | 91 | 59 | 87 | 50 | 95 | 85 | 46 | 46 |
n = 43 (negative) and n = 44 (positive)
The best value for selected statistics is marked in bold
Fig. 3The distribution of references for CASMI 2016 candidates
Fig. 4The influence of Metadata on CASMI 2016 first seven groups—light green MS/MS information only, i.e. Category 2. Dark green with metadata, i.e. Category 3 participants. Note these are plotted according to the number 1 ranks, not wins. Next 4 groups: dark green metadata only; Last group: light green is the equally-weighted combination of the six individual Category 2 entries and dark green is this plus metadata as shown in Table 5