| Literature DB >> 31708947 |
Adriano Rutz1, Miwa Dounoue-Kubo1,2, Simon Ollivier1, Jonathan Bisson3,4, Mohsen Bagheri1,5, Tongchai Saesong6, Samad Nejad Ebrahimi5, Kornkanok Ingkaninan6, Jean-Luc Wolfender1, Pierre-Marie Allard1.
Abstract
Mass spectrometry (MS) offers unrivalled sensitivity for the metabolite profiling of complex biological matrices encountered in natural products (NP) research. The massive and complex sets of spectral data generated by such platforms require computational approaches for their interpretation. Within such approaches, computational metabolite annotation automatically links spectral data to candidate structures via a score, which is usually established between the acquired data and experimental or theoretical spectral databases (DB). This process leads to various candidate structures for each MS features. However, at this stage, obtaining high annotation confidence level remains a challenge notably due to the extensive chemodiversity of specialized metabolomes. The design of a metascore is a way to capture complementary experimental attributes and improve the annotation process. Here, we show that integrating the taxonomic position of the biological source of the analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and complement the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5- to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata, particularly biological sources, is timely and critical for the NP research community.Entities:
Keywords: chemotaxonomy; computational metabolomics; metabolite annotation; natural products; scoring system; specialized metabolome; taxonomic distance
Year: 2019 PMID: 31708947 PMCID: PMC6824209 DOI: 10.3389/fpls.2019.01329
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Conceptual overview of a possible metascoring system for specialized metabolite annotation incorporating 1) spectral similarity or fingerprint similarity 2) taxonomic distance between the biological source of the queried spectra and candidate annotations 3) structural consistency within a cluster [see (da Silva et al., 2018)] and 4) physico-chemical consistency (see Ruttkies et al., 2016; Bach et al., 2018). A factor (w ) should allow to attribute relative weight to individual scores and modulate their contribution to the overall score.
Figure 2General outline of the taxonomically informed scoring system. Candidates’ structures are complemented with their biological sources at the family, genus and species level, when available. A score, inversely proportional to the taxonomic distance between the biological source of the standard compound and the one of the candidate compounds is given when the biological source of the candidate structures matches the biological source of the standard at the family, genus and species level, respectively. The maximum score for each candidate is then added to its spectral score to yield a complemented spectral score. Finally, candidates are re-ranked according to the complemented spectral score.
Figure 3Characteristics of the benchmarking dataset. (A) Comparative distribution of accurate masses of entries in the DNP and in the benchmarking dataset. (B) Distribution of biological sources of entries in the benchmarking dataset at the family taxa level (cutoff at 15 entries per family). (C) Comparative distribution of chemical classes (ClassyFire Class level) within the DNP and the benchmarking set.
Figure 4(A) Influence of the taxonomically informed scoring (TIS) on the F1 score of each metabolite annotation tool. The F1 score offers a global evaluation of the precision and recall rate of the annotation process, the higher the better. (B) Venn diagrams representing common and unique correct annotations of each tool at rank 1 before and after the taxonomically informed scoring step.
Figure 5Results of the Bayesian optimization converge toward the optimal scores combination required for a maximal number of correct annotations ranked at the first position. This is observed for four randomly degraded training sets (first optimization round displayed). sp, gen and fam axes correspond to the score given when a match is found at the species, genus or family taxa level, respectively. Results confirm that the applied scores should be inversely proportional to the taxonomic distance between the biological source associated with the queried spectra and the biological source of the candidate structures.
Output of the taxonomically informed scoring annotation using ISDB-DNP for feature m/z 342.1670 at 1.42 min. Predicentrine, the correct annotation, which was initially ranked at the 9th position, is ranked at the first position after taxonomically informed scoring.
| ClusterID | Structure | Short IK | Molecule Name | Family | Genus | Species | Family Score | Genus Score | Species Score | Max Taxo Score | SpectralScore | Normalize Spectral Score | Combined Spectral + Taxo Score | Rank Initial | Rank Final |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1772 | OUTYMWDDJORZOH | Predicentrine | Papaveraceae | Glaucium | Glaucium oxylobum | 0.81 | 1.62 | 0 | 1.62 | 0.36 | 0.23 | 1.85 | 9 | 1 | |
| 1772 | QELDJEKNFOQJOY | Isocorydine | Papaveraceae | Glaucium | NA | 0.81 | 1.62 | 0 | 1.62 | 0.34 | 0.22 | 1.84 | 11 | 2 | |
| 1772 | KDFKJOFJHSVROC | Isocorypalmine | Papaveraceae | Glaucium | Glaucium fimbrilligerum | 0.81 | 1.62 | 0 | 1.62 | 0.3 | 0.14 | 1.76 | 28 | 3 | |
| 1772 | JADHMUPTWPBTMT | Secosarcocapnidine Me ether, N-De-Me | Papaveraceae | Sarcocapnos | Sarcocapnos crassifolia | 0.81 | 0 | 0 | 0.81 | 0.4 | 0.32 | 1.13 | 1 | 4 | |
| 1772 | WNBUTZHPPULVTP | Secocularidine Me ether, N-de-Me | Papaveraceae | Ceratocapnos | Ceratocapnos claviculata | 0.81 | 0 | 0 | 0.81 | 0.39 | 0.29 | 1.1 | 2 | 5 |