| Literature DB >> 31013937 |
Yannick Djoumbou-Feunang1, Allison Pon2, Naama Karu3, Jiamin Zheng4, Carin Li5, David Arndt6, Maheswor Gautam7, Felicity Allen8, David S Wishart9,10.
Abstract
Metabolite identification for untargeted metabolomics is often hampered by the lack of experimentally collected reference spectra from tandem mass spectrometry (MS/MS). To circumvent this problem, Competitive Fragmentation Modeling-ID (CFM-ID) was developed to accurately predict electrospray ionization-MS/MS (ESI-MS/MS) spectra from chemical structures and to aid in compound identification via MS/MS spectral matching. While earlier versions of CFM-ID performed very well, CFM-ID's performance for predicting the MS/MS spectra of certain classes of compounds, including many lipids, was quite poor. Furthermore, CFM-ID's compound identification capabilities were limited because it did not use experimentally available MS/MS spectra nor did it exploit metadata in its spectral matching algorithm. Here, we describe significant improvements to CFM-ID's performance and speed. These include (1) the implementation of a rule-based fragmentation approach for lipid MS/MS spectral prediction, which greatly improves the speed and accuracy of CFM-ID; (2) the inclusion of experimental MS/MS spectra and other metadata to enhance CFM-ID's compound identification abilities; (3) the development of new scoring functions that improves CFM-ID's accuracy by 21.1%; and (4) the implementation of a chemical classification algorithm that correctly classifies unknown chemicals (based on their MS/MS spectra) in >80% of the cases. This improved version called CFM-ID 3.0 is freely available as a web server. Its source code is also accessible online.Entities:
Keywords: MS spectral prediction; combinatorial fragmentation; liquid chromatography; mass spectrometry; metabolite identification; rule-based fragmentation; structure-based chemical classification
Year: 2019 PMID: 31013937 PMCID: PMC6523630 DOI: 10.3390/metabo9040072
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Fragmentation patterns of phosphatidylcholines obtained from their [M+H]+ precursor ions. Among all resulting fragments, only the precursor ion is observed at each of the three energy levels. The ion fragment C5H14NO4P+ (red arrow) corresponding to phosphocholine is observed at 20 and 40 eV, and the remaining fragments were observed only at 40 eV.
Number of fragmentation rules and adduct types covered for each chemical category.
| Lipid Class | Number of Covered Rules | Number of Covered Adduct Types |
|---|---|---|
| 1-Monoacylglycerols | 8 | [M+Li]+; [M+NH4]+ |
| 2-Monoacylglycerols | 11 | [M+H]+; [M+NH4]+; [M+Na]+ |
| 1,2-Diacylglycerols | 10 | [M+NH4]+; [M+Na]+ |
| Triacylglycerols | 19 | [M+Na]+; [M+NH4]+; [M+Li]+ |
| Phosphatidic acids | 22 | [M+H]+; [M+Na]+; [M−H]− |
| Phosphatidylcholines | 41 | [M+H]+; [M+Na]+; [M+Li]+; [M+Cl]− |
| Phosphatidylethanolamines | 24 | [M+H]+; [M+Na]+; [M−H]− |
| Lysophosphatidylcholines | 29 | [M+H]+; [M+Na]+; [M+Li]+; [M+Cl]− |
| Lysophosphatidic acids | 12 | [M+H]+; [M−H]− |
| Phosphatidylserines | 28 | [M+H]+; [M+Li]+; [M+Na]+; [M−H]− |
| Ceramides | 17 | [M+H]+; [M+Li]+; [M−H]− |
| Sphingomyelins | 13 | [M+H]+; [M+Li]+; [M+Na]+ |
| Cardiolipins | 13 | [M−2H](2H)− |
| Phosphatidylglycerols | 11 | [M−H]− |
| Lysophosphatidylglycerols | 7 | [M−H]− |
| Plasmanyl-PC | 17 | [M+H]+; [M+Cl]− |
| Plasmenyl-PC | 17 | [M+H]+; [M+Cl]− |
| 1-Alkanylglycerophosphocholines | 15 | [M+H]+; [M+Cl]−; [M+Na]+ |
| 1-Alkenylglycerophosphocholines | 13 | [M+H]+; [M+Cl]− |
| Phosphatidylinositols | 9 | [M−H]− |
| Lysophosphatidylinositols | 8 | [M−H]− |
| Total | 344 | 50 |
Statistics for the Competitive Fragmentation Modeling-ID (CFM-ID) 3.0 spectral database.
| Feature | Value |
|---|---|
| Total number of unique compounds | 229,084 |
| Total number of unique ESI-MS/MS spectra | 397,679 |
| Total number of experimental ESI-MS/MS spectra | 87,570 |
| Total number of predicted ESI-MS/MS spectra | 310,109 |
| Number of compounds with ≥1 experimental ESI-MS/MS spectra | 13,537 |
| Number of compounds with ≥1 predicted ESI-MS/MS spectra | 108,972 |
| Number of compounds with ≥2 citations | 229,084 |
| Average number of citations per compound | 272 |
| Number of compounds with chemical classification assignments | 229,084 |
| Average number of chemical category assignments/compound | 25 |
Figure 2Head-to-tail plot of experimental and predicted electrospray ionization-tandem mass spectroscopy (ESI-MS/MS) spectra of PC(16:0/16:0). (a) Head-to-tail plot showing an experimental ESI-MS/MS spectrum of dipalmitoyl phosphatidylcholine (PC(16:0/16:0)) measured at 40 eV, and the matching ESI-MS/MS spectrum predicted by CFM-ID 2.0. The computed spectral similarity score is 0.07. (b) Head-to-tail plot showing an experimental of ESI-MS/MS spectrum of dipalmitoyl phosphatidylcholine measured in positive ion mode ([M+H]+) at 40 eV, and the matching ESI-MS/MS spectrum predicted by CFM-ID 3.0. The computed spectral similarity score is 0.88. (c) Head-to-tail plot showing an experimental of ESI-MS/MS spectrum of dipalmitoyl phosphatidylcholine measured in positive ion mode ([M+H]+) at 40 eV, and the matching ESI-MS/MS spectrum predicted by LipidBlast. The computed spectral similarity score is 0.13.
Figure 3Head-to-tail plot of experimental and predicted ESI-MS/MS spectra of (PS(16:0/18:1(9Z))). (a) Head-to-tail plot showing an experimental of ESI-MS/MS spectrum of 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (PS(16:0/18:1(9Z))) measured at 40 eV, and the matching ESI-MS/MS spectrum predicted by CFM-ID 2.0. The computed spectral similarity score is 0.10. (b) Head-to-tail plot showing an experimental ESI-MS/MS spectrum of 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (PS(16:0/18:1(9Z))) measured at 40 eV, and the matching ESI-MS/MS spectrum predicted by CFM-ID 3.0. The computed similarity score is 0.92. (c) Head-to-tail plot showing an experimental ESI-MS/MS spectrum of 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (PS(16:0/18:1(9Z))) measured at 40 eV, and the matching ESI-MS/MS spectrum predicted by LipidBlast. The computed similarity score is 0.91.
Computed spectral similarity scores between experimental and predicted ESI-MS/MS spectra at three energy levels (10, 20, and 40 eV). The results show higher similarities, and thus an improvement when using a rule-based approach (CFM-ID 3.0) over a combinatorial one (CFM-ID 2.0) for the prediction of lipid ESI-MS/MS spectra. The spectral similarities of the LipidBlast-generated consensus spectra further illustrate this trend. When available, the same LipidBlast-generated consensus spectrum was used for comparisons at each energy level. N/A corresponds to cases where (1) the adduct type was not covered by CFM-ID 2.0 at all, or (2) the adduct type was not covered by LipidBlast for the chemical class to which the test compound belongs.
| Compound | Adduct | Energy (eV) | CFM-ID 3.0 (Score) | CFM-ID 2.0 (Score) | LipidBlast (Score) |
|---|---|---|---|---|---|
| PA(16:0/18:1(9Z)) | [M−H]− | 10 | 1.00 | 0.36 | 0.00 |
| PS(16:0/18:1(9Z)) | [M−H]− | 10 | 1.00 | 0.31 | 0.00 |
| CL(18:0/18:0/18:0/18:0) | [M−2H](2H) | 10 | 0.98 | N/A | 0.00 |
| DG(18:0/20:4/0:0) | [M+Na]+ | 10 | 0.92 | 0.00 | N/A |
| PA(16:0/18:1(9Z)) | [M−H]− | 20 | 0.55 | 0.02 | 0.00 |
| PS(16:0/18:1(9Z)) | [M−H]− | 20 | 0.98 | 0.03 | 0.00 |
| CL(18:0/18:0/18:0/18:0) | [M−2H](2H) | 20 | 0.97 | N/A | 0.12 |
| DG(18:0/20:4/0:0) | [M+Na]+ | 20 | 0.93 | 0.00 | N/A |
| PA(16:0/18:1(9Z)) | [M−H]− | 40 | 0.96 | 0.03 | 0.90 |
| PS(16:0/18:1(9Z)) | [M−H]− | 40 | 0.92 | 0.10 | 0.91 |
| CL(18:0/18:0/18:0/18:0) | [M−2H](2H) | 40 | 0.91 | N/A | 0.89 |
| DG(18:0/20:4/0:0) | [M+Na]+ | 40 | 0.18 | 0.00 | N/A |
| PC(16:0/16:0) | [M+H]+ | 40 | 0.88 | 0.07 | 0.13 |
| TG(18:1/18:1/18:2) | [M+NH4]+ | 40 | 0.78 | 0.01 | 0.84 |
Comparison of CFM-ID 3.0, CFM-ID 2.0, and MS-FINDER scoring functions upon identification of 185 compounds from 208 ESI-MS/MS spectra. Reported are the total number of challenges in which the corresponding implementation of the scoring function ranked the query compound in the top 1, top 3, and top 10. The average and median ranks for the query compound are also reported. A chemical classification is assessed as correct if the predicted category matches a category originally assigned by ClassyFire. N/A, not applicable; * performance when applied over the expanded spectral library database including the 208 experimental ESI-MS/MS from the CASMI 2016 contest (category 3).
| Version | # Top 1 | # Top 3 | # Top 10 | Average Rank | Median Rank | # Correct Classifications |
|---|---|---|---|---|---|---|
| 149 | 194 | 204 | 1.8 | 1 | 168 | |
|
| 123 | 171 | 201 | 2.4 | 1 | N/A |
|
| 120 | 160 | 182 | 13.64 | 1 | N/A |
|
| 146 | 162 | 174 | 6.4 | 1 | N/A |
| 208 | 208 | 208 | 1 | 1 | N/A | |
| 208 | 208 | 208 | 1 | 1 | N/A |