| Literature DB >> 29086039 |
Ivana Blaženović1,2,3, Tobias Kind3, Hrvoje Torbašinović4, Slobodan Obrenović4, Sajjan S Mehta3, Hiroshi Tsugawa5, Tobias Wermuth3, Nicolas Schauer2, Martina Jahn1, Rebekka Biedendieck1, Dieter Jahn1, Oliver Fiehn6,7.
Abstract
In mass spectrometry-based untargeted metabolomics, rarely more than 30% of the compounds are identified. Without the true identity of these molecules it is impossible to draw conclusions about the biological mechanisms, pathway relationships and provenance of compounds. The only way at present to address this discrepancy is to use in silico fragmentation software to identify unknown compounds by comparing and ranking theoretical MS/MS fragmentations from target structures to experimental tandem mass spectra (MS/MS). We compared the performance of four publicly available in silico fragmentation algorithms (MetFragCL, CFM-ID, MAGMa+ and MS-FINDER) that participated in the 2016 CASMI challenge. We found that optimizing the use of metadata, weighting factors and the manner of combining different tools eventually defined the ultimate outcomes of each method. We comprehensively analysed how outcomes of different tools could be combined and reached a final success rate of 93% for the training data, and 87% for the challenge data, using a combination of MAGMa+, CFM-ID and compound importance information along with MS/MS matching. Matching MS/MS spectra against the MS/MS libraries without using any in silico tool yielded 60% correct hits, showing that the use of in silico methods is still important.Entities:
Keywords: Compound identification; In silico fragmentation; MS/MS; Mass spectrometry; Metabolomics; Structure elucidation
Year: 2017 PMID: 29086039 PMCID: PMC5445034 DOI: 10.1186/s13321-017-0219-x
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Structure elucidation workflow of small molecules. a In silico fragmentation can be used to identify and rank unknown MS/MS spectra by matching theoretical fragments to experimental MS/MS spectra. b The voting/consensus combines the output of multiple in silico fragmentation tools, uses compound and MS/MS databases lookups to further boost compound ranks
Fig. 2Principal component analysis of the molecular descriptor space from the training and validation sets. Individual outliers show compounds only found in a specific data set. Overlapping dots show very similar compounds
Results for the training data of the CASMI 2016 contest
| # | Tools | Top hits | Top 5 | Top 10 | Top 20 |
|---|---|---|---|---|---|
| 1 | MetFrag + CFM-ID + DB + MS/MS Voting/consensus | 290 | 304 | 305 | 306 |
| 2 | CFM-ID + ID_sorted + MAGMa(+) + DB + MS/MS Voting/consensus | 289 | 304 | 306 | 308 |
| 3 | MetFrag + ID_sorted + DB + MS/MS Voting/consensus | 288 | 305 | 306 | 308 |
| 4 | MetFrag + DB + MS/MS | 288 | 305 | 305 | 307 |
| 5 | MAGMa(+) + ID_sorted + DB + MS/MS Voting/consensus | 288 | 304 | 307 | 309 |
| 6 | CFM-ID + ID_sorted + MAGMa(+) + MetFrag + DB + MS/MS Voting/consensus | 288 | 304 | 305 | 308 |
| 7 | MetFrag + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 288 | 304 | 305 | 307 |
| 8 | CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 288 | 303 | 306 | 307 |
| 9 | MetFrag + MAGMa(+) + DB + MS/MS Voting/consensus | 288 | 303 | 305 | 307 |
| 10 | CFM-ID + ID_sorted + DB + MS/MS Voting/consensus | 287 | 304 | 306 | 308 |
| 11 | CFM-ID + DB + MS/MS | 287 | 304 | 304 | 306 |
| 12 | ID-sorted + DB + MS/MS | 286 | 306 | 306 | 308 |
| 13 | MetFrag + MS-FINDER + DB + MS/MS Voting/consensus | 286 | 302 | 305 | 307 |
| 14 | MS-FINDER + CFM-ID + DB + MS/MS Voting/consensus | 286 | 301 | 304 | 305 |
| 15 | MAGMa(+) + DB + MS/MS | 286 | 301 | 302 | 303 |
| 16 | MetFrag + MS-FINDER + CFM-ID + DB + MS/MS Voting/consensus | 285 | 303 | 305 | 307 |
| 17 | MS-FINDER + ID_sorted + DB + MS/MS Voting/consensus | 285 | 302 | 306 | 307 |
| 18 | MetFrag + MS-FINDER + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 285 | 302 | 305 | 307 |
| 19 | MS-FINDER + DB + MS/MS | 285 | 300 | 302 | 303 |
| 20 | CFM-ID + ID_sorted + MAGMa(+) + MetFrag + MS-FINDER + DB + MS/MS Voting/consensus | 284 | 303 | 306 | 307 |
| 21 | MetFrag + MS-FINDER + MAGMa(+) + DB + MS/MS Voting/consensus | 284 | 302 | 306 | 306 |
| 22 | MS-FINDER + MAGMa(+) + DB + MS/MS Voting/consensus | 284 | 301 | 305 | 306 |
| 23 | MS-FINDER + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 283 | 302 | 305 | 305 |
| 24 | MetFrag + CFM-ID + DB Voting/consensus | 243 | 291 | 296 | 304 |
| 25 | MetFrag + MS-FINDER + CFM-ID + DB Voting/consensus | 242 | 289 | 298 | 301 |
| 26 | MetFrag + CFM-ID + MAGMa(+) + DB Voting/consensus | 240 | 290 | 297 | 304 |
| 27 | MS-FINDER + DB | 239 | 284 | 294 | 296 |
| 28 | MetFrag + DB | 238 | 290 | 296 | 301 |
| 29 | MS-FINDER + CFM-ID + DB Voting/consensus | 238 | 287 | 297 | 298 |
| 30 | MS-FINDER + CFM-ID + MAGMa(+) + DB Voting/consensus | 237 | 288 | 298 | 300 |
| 31 | CFM-ID + MAGMa(+) + DB Voting/consensus | 236 | 289 | 298 | 303 |
| 32 | MetFrag + MS-FINDER + DB Voting/consensus | 236 | 289 | 297 | 300 |
| 33 | MetFrag + MS-FINDER + MAGMa(+) + DB Voting/consensus | 236 | 288 | 298 | 300 |
| 34 | MAGMa(+) + DB | 236 | 287 | 294 | 299 |
| 35 | CFM-ID + DB | 236 | 286 | 295 | 302 |
| 36 | MetFrag + MAGMa(+) + DB Voting/consensus | 235 | 290 | 298 | 301 |
| 37 | MS-FINDER + MAGMa(+) + DB Voting/consensus | 235 | 288 | 298 | 299 |
| 38 | ID-sorted + DB | 227 | 291 | 301 | 303 |
| 39 | Randomize + DB + MS/MS | 195 | 273 | 289 | 305 |
| 40 | Randomize + DB | 193 | 268 | 283 | 298 |
| 41 | ID-sorted | 143 | 249 | 267 | 270 |
| 42 | MetFrag + CFM-ID in silico Voting/consensus | 69 | 155 | 194 | 230 |
| 43 | MetFrag + CFM-ID + MAGMa(+) in silico Voting/consensus | 62 | 154 | 187 | 228 |
| 44 | MetFrag + MS-FINDER + CFM-ID + MAGMa(+) in silico Voting/consensus | 62 | 145 | 180 | 228 |
| 45 | MetFrag + MS-FINDER + CFM-ID in silico Voting/consensus | 58 | 145 | 179 | 221 |
| 46 | MS-FINDER + CFM-ID + MAGMa(+) in silico Voting/consensus | 58 | 133 | 170 | 213 |
| 47 | CFM-ID + MAGMa(+) in silico Voting/consensus | 55 | 134 | 179 | 221 |
| 48 | MetFrag in silico only | 52 | 134 | 171 | 210 |
| 49 | MetFrag + MAGMa(+) in silico Voting/consensus | 52 | 133 | 171 | 210 |
| 50 | MAGMa + in silico only | 50 | 121 | 151 | 189 |
| 51 | MS-FINDER + CFM-ID in silico Voting/consensus | 50 | 111 | 141 | 188 |
| 52 | MetFrag + MS-FINDER + MAGMa(+) in silico Voting/consensus | 49 | 128 | 153 | 210 |
| 53 | CFM-ID in silico only | 48 | 124 | 170 | 209 |
| 54 | MS-FINDER + MAGMa(+) in silico Voting/consensus | 44 | 105 | 135 | 183 |
| 55 | MetFrag + MS-FINDER in silico Voting/consensus | 43 | 120 | 143 | 178 |
| 56 | MS-FINDER in silico only | 32 | 86 | 117 | 145 |
| 57 | Randomize | 4 | 13 | 27 | 46 |
‘MetFragCL, CFM-ID, MAGMa+ and MS-FINDER’ designate results obtained by the in silico fragmentation software tools. ‘DB’ designates priority ranking by presence in chemical and biochemical databases. ‘MS/MS’ designates presence in MS/MS libraries based on >400 dot-product similarity. 312 MS/MS spectra of the CASMI 2016 training data were used
Sensitivity, ω, calculated for each tool (MetFragCL, CFM-ID, MAGMa+ and MS-FINDER) based on the correctly assigned structures in the top rank, top 5, top 10 and top 20 using the training data set of 312 MS/MS spectra
| # | Tools | Top hits | Top 5 | Top 10 | Top 20 |
|---|---|---|---|---|---|
| 1 | MetFragCL in silico only | 0.1666 | 0.4294 | 0.548 | 0.673 |
| 2 | MAGMa+ in silico only | 0.1602 | 0.3878 | 0.4839 | 0.6057 |
| 3 | CFM-ID in silico only | 0.1538 | 0.3974 | 0.5448 | 0.6698 |
| 4 | MS-FINDER in silico only | 0.10256 | 0.27564 | 0.3750 | 0.4647 |
The sensitivity was calculated as follows: ω = true positive/(true positive + false negative). The calculated sensitivities were used on the challenge data set
Fig. 3Comparison of the accuracy of compound annotations obtained by in silico fragmentation tools alone and in combination with metadata for both CASMI data sets
Results for the challenge (validation) data of the CASMI 2016 contest
| # | Tools | Top hits | Top 5 | Top 10 | Top 20 |
|---|---|---|---|---|---|
| 1 | CFM-ID + ID_sorted + DB + MS/MS Voting/consensus | 181 | 194 | 201 | 204 |
| 2 | CFM-ID + ID_sorted + MAGMa(+) + DB + MS/MS Voting/consensus | 180 | 195 | 200 | 205 |
| 3 | CFM-ID + ID_sorted + MAGMa(+) + MetFrag + DB + MS/MS Voting/consensus | 180 | 194 | 200 | 204 |
| 4 | CFM-ID + DB + MS/MS | 180 | 193 | 199 | 201 |
| 5 | MAGMa(+) + ID_sorted + DB + MS/MS Voting/consensus | 180 | 193 | 197 | 201 |
| 6 | CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 180 | 192 | 195 | 202 |
| 7 | MetFrag + MAGMa(+) + DB + MS/MS Voting/consensus | 180 | 188 | 194 | 198 |
| 8 | MAGMa(+) + DB + MS/MS | 180 | 188 | 192 | 198 |
| 9 | MetFrag + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 179 | 190 | 196 | 201 |
| 10 | MetFrag + CFM-ID + DB + MS/MS Voting/consensus | 178 | 192 | 199 | 203 |
| 11 | CFM-ID + ID_sorted + MAGMa(+) + MetFrag + MS-FINDER + DB + MS/MS Voting/consensus | 175 | 191 | 200 | 203 |
| 12 | MetFrag + MS-FINDER + CFM-ID + DB + MS/MS Voting/consensus | 175 | 189 | 194 | 200 |
| 13 | MetFrag + MS-FINDER + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 175 | 189 | 194 | 200 |
| 14 | MS-FINDER + ID_sorted + DB + MS/MS Voting/consensus | 175 | 189 | 194 | 199 |
| 15 | MS-FINDER + CFM-ID + MAGMa(+) + DB + MS/MS Voting/consensus | 175 | 188 | 196 | 201 |
| 16 | MetFrag + MS-FINDER + MAGMa(+) + DB + MS/MS Voting/consensus | 175 | 186 | 191 | 197 |
| 17 | MS-FINDER + MAGMa(+) + DB + MS/MS Voting/consensus | 175 | 185 | 190 | 195 |
| 18 | ID_SORTED + DB + MS/MS | 174 | 195 | 198 | 204 |
| 19 | MetFrag + ID_sorted + DB + MS/MS Voting/consensus | 174 | 194 | 199 | 203 |
| 20 | MS-FINDER + CFM-ID + DB + MS/MS Voting/consensus | 174 | 189 | 195 | 201 |
| 21 | MetFrag + DB + MS/MS | 174 | 189 | 192 | 197 |
| 22 | MetFrag + MS-FINDER + DB + MS/MS Voting/consensus | 174 | 187 | 190 | 197 |
| 23 | MS-FINDER + DB + MS/MS | 174 | 184 | 185 | 191 |
| 24 | MetFrag + CFM-ID + MAGMa(+) + DB Voting/consensus | 151 | 184 | 192 | 198 |
| 25 | CFM-ID + DB | 151 | 183 | 191 | 197 |
| 26 | MetFrag + MS-FINDER + CFM-ID + DB Voting/consensus | 151 | 180 | 191 | 198 |
| 27 | MS-FINDER + CFM-ID + MAGMa(+) + DB Voting/consensus | 151 | 179 | 191 | 198 |
| 28 | CFM-ID + MAGMa(+) + DB Voting/consensus | 150 | 184 | 189 | 199 |
| 29 | MetFrag + MAGMa(+) + DB Voting/consensus | 150 | 181 | 189 | 194 |
| 30 | MetFrag + MS-FINDER + MAGMa(+) + DB Voting/consensus | 150 | 178 | 186 | 193 |
| 31 | MS-FINDER + MAGMa(+) + DB Voting/consensus | 150 | 174 | 183 | 191 |
| 32 | MetFrag + CFM-ID + DB Voting/consensus | 149 | 186 | 196 | 201 |
| 33 | MAGMa(+) + DB | 149 | 180 | 185 | 193 |
| 34 | MS-FINDER + CFM-ID + DB Voting/consensus | 149 | 179 | 189 | 199 |
| 35 | MS-FINDER + DB | 148 | 173 | 178 | 186 |
| 36 | MetFrag + DB | 147 | 185 | 188 | 194 |
| 37 | MetFrag + MS-FINDER + DB Voting/consensus | 147 | 178 | 184 | 193 |
| 38 | ID_SORTED + DB | 134 | 188 | 194 | 202 |
| 39 | Randomize + DB + MS/MS | 123 | 184 | 189 | 197 |
| 40 | Randomize + DB | 119 | 176 | 180 | 189 |
| 41 | ID_SORTED | 106 | 169 | 177 | 186 |
| 42 | MetFrag in silico | 53 | 92 | 111 | 137 |
| 43 | MetFrag + MS-FINDER + CFM-ID in silico Voting/consensus | 51 | 95 | 129 | 151 |
| 44 | MetFrag + CFM-ID in silico Voting/consensus | 47 | 102 | 129 | 153 |
| 45 | MetFrag + MS-FINDER + CFM-ID + MAGMa(+) in silico Voting/consensus | 46 | 97 | 128 | 152 |
| 46 | MetFrag + CFM-ID + MAGMa(+) in silico Voting/consensus | 42 | 104 | 126 | 150 |
| 47 | CFM-ID + MAGMa(+) in silico Voting/consensus | 39 | 94 | 123 | 148 |
| 48 | MetFrag + MAGMa(+) in silico Voting/consensus | 39 | 90 | 111 | 128 |
| 49 | MetFrag + MS-FINDER + MAGMa(+) in silico Voting/consensus | 38 | 79 | 117 | 138 |
| 50 | MS-FINDER + CFM-ID + MAGMa(+) in silico Voting/consensus | 34 | 97 | 127 | 147 |
| 51 | MetFrag + MS-FINDER in silico Voting/consensus | 33 | 76 | 103 | 125 |
| 52 | MS-FINDER + MAGMa(+) in silico Voting/consensus | 32 | 69 | 93 | 119 |
| 53 | MS-FINDER + CFM-ID in silico Voting/consensus | 30 | 76 | 110 | 139 |
| 54 | CFM-ID in silico (dot product) | 29 | 76 | 104 | 122 |
| 55 | MAGMa(+) in silico | 28 | 72 | 98 | 117 |
| 56 | MS-FINDER in silico | 23 | 57 | 79 | 93 |
| 57 | Randomize | 20 | 27 | 28 | 121 |
‘MetFragCL, CFM-ID, MAGMa+ and MS-FINDER’ designate results obtained by the in silico fragmentation software tools. ‘DB’ designates priority ranking by presence in chemical and biochemical databases. ‘MS/MS’ designates presence in MS/MS libraries based on >400 dot-product similarity. 208 MS/MS spectra of the CASMI 2016 training data were used