| Literature DB >> 25789275 |
Sean R Johnson1, Bernd Markus Lange1.
Abstract
Various databases have been developed to aid in assigning structures to spectral peaks observed in metabolomics experiments. In this review article, we discuss the utility of currently available open-access spectral and chemical databases for natural products discovery. We also provide recommendations on how the research community can contribute to further improvements.Entities:
Keywords: gas chromatography; high performance liquid chromatography; mass spectrometry; metabolomics; natural product; nuclear magnetic resonance spectroscopy; secondary metabolite
Year: 2015 PMID: 25789275 PMCID: PMC4349186 DOI: 10.3389/fbioe.2015.00022
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Typical workflow of (A) natural products discovery and (B) metabolomics projects. Common objectives include metabolite identification by searching spectral databases.
Figure 2Examples of natural products with vast different physicochemical properties to illustrate challenges for analytical chemistry. Molecular weight differences: (A) acetyl salicylic acid and (B) triscutin A. Polarity differences: (C) γ-aminobutyric acid and (D) paclitaxel (taxol®).
Online spectral databases for natural product identification.
| Database | URL | Search parameters | Compounds | Spectra | Record format | Accepts submissions | Notes | Reference |
|---|---|---|---|---|---|---|---|---|
| BML-NMR | Name | 208 | 3,328 NMR | Vendor specific, MSI-XML | No | Each compound measured with 16 different NMR parameter sets | Ludwig et al. ( | |
| BMRB | Name, mass, structure, 13C-NMR shifts, 1H-NMR shifts, HSQC-NMR peaks, other | 1,249 | 8,996 NMR | NMR-STAR, CSV, vendor specific | Yes | Ulrich et al. ( | ||
| GMD | Name, mass, formula, functional group, MS peaks, retention index, other | 2,220 | 26,587 MS | NIST, JCAMP-DX, TagFinder, Target Search | Yes | GC retention indexes. Multiparameter search interface. Decision tree tool for substructure identification. API access | Wagner et al. ( | |
| GNPS | MS2 in mzML format, name, adduct, other | >5,500 | 27,593 MS2 | mgf | Yes | Automated dereplication workflow from MS2 data | Unpublished | |
| HMDB | Structure, mass, adduct, MS peaks, MS2 peaks, GC retention time, GC retention index, 1H-NMR shifts, 13C-NMR shifts, 2D TOCSY 13C HSQC other | 41,806 | 2,240 NMR; 1,220 MS; 8,176 MS2 | Text, vendorspecific, NIST | No | Specific for human metabolites. Not all compounds have experimental spectra | Wishart et al. ( | |
| MassBank | Structure, name, mass, formula, fragment, MS(n) peaks, neutral loss | >11,000 | 40,889 MS including MSn | MassBank | Yes | Many of the records also include detailed information about chromatographic conditions and retention times. Batch search available. SOAP API | Horai et al. ( | |
| METLIN | Mass, adduct, fragment, name, formula, neutral loss, MS2 peaks | 240,515 | 61,872 MS2 | Not downloadable | No | Batch search available. Not all compounds have experimental spectra | Smith et al. ( | |
| MMCD | Name, structure, NMR shifts and connectivity, mass, adduct | 20,306 | 5,256 NMR | Text, vendor specific | No | Batch search available. Can use multiple kinds of spectra in a single search. Not all compounds have experimental spectra | Cui et al. ( | |
| NAPROC-13 | Name, chemical family, formula, mass, publication, 13C shift, and multiplicity | 20,297 | 20,297 NMR | Not downloadable | No | Iterative search where shifts can be added to the search one at a time. Search by shift connectivity | López-Pérez et al. ( | |
| NMR ShiftDB | Name, formula, citation, structure, NMR shifts (multiple nuclei), experimental conditions | 42,838 | 50,883 NMR | CML, JCAMP-DX, tab separated, SQL | Yes | Lists NMR chemical shifts, but not peak size. Database is available from the SourceForge page | Steinbeck et al. ( | |
| ReSpect | Mass, adduct, fragment, name, keyword, formula, MS(n) spectrum | 3,710 | 9,017 MSn spectra | MassBank | No | Specific for phytochemicals | Sawada et al. ( | |
| SDBS | Name, formula, mass, IR peaks, 13C-NMR shifts, 1H-NMR shifts, MS peaks | 34,000 | 29,000 NMR; 24,700 MS | Not downloadable | No | Can use multiple kinds of spectra in a single search. Limit of 50 searches per day | Yamamoto et al. ( | |
| Spektraris | Mass, retention time, relative retention time, MS(n) peaks, formula, 1H-NMR shifts, 13C-NMR shifts | 733 | 466 NMR; 1,445 MS; 1,181 MS2 | MassBank, tab separated, JCAMP-DX, vendor specific | Yes | Multiple parameter search interface. NMR data currently limited to taxane diterpenes | Cuthbertson et al. ( | |
| SpinAssign | 13C-HSQC NMR shifts, 1H-NMR shifts, 13C-NMR shifts | 980 | 980 NMR | Not downloadable | No | Optimized for mixtures | Chikayama et al. ( |
“Compounds” count all compounds that could be returned as a search hit, including those for which no experimental spectra are available. “Spectra” counts only experimentally measured spectra. Databases surveyed in September and October 2014.
Overlap of coverage (in percent) among open-access chemical databases with a bulk download option (one-versus-one comparison).
| Database | Total number of compounds | BML-NMR | BMRB | GMD | GNPS | HMDB | MassBank | ReSpect | Spektraris |
|---|---|---|---|---|---|---|---|---|---|
| BML-NMR | 199 | 79 | 60 | 85 | 88 | 90 | 79 | 14 | |
| BMRB | 1,159 | 14 | 29 | 35 | 39 | 52 | 27 | 7 | |
| GMD | 879 | 14 | 38 | 48 | 62 | 66 | 35 | 10 | |
| GNPS | 5,105 | 3 | 8 | 8 | 11 | 40 | 14 | 7 | |
| HMDB | 1,046 | 17 | 43 | 52 | 54 | 68 | 38 | 9 | |
| MassBank | 11,012 | 2 | 6 | 5 | 19 | 6 | 6 | 4 | |
| ReSpect | 718 | 22 | 43 | 43 | 100 | 56 | 89 | 13 | |
| Spektraris | 723 | 4 | 11 | 12 | 52 | 13 | 67 | 12 | |
| Combined (unique) | 15,247 |
“Total” refers to the number of unique chemicals annotated with a structure and associated with at least one MS or NMR spectrum. Example: BMRB contains spectra for 1,159 structure-annotated unique chemical entities. Of these, 29% are also found in GMD. GMD contains spectra for 879 structure-annotated unique chemical entities. Of these, 38% are also found in BMRB.
Spectral data available in open-access chemical databases with a bulk download option.
| Database | MS | MS | 1H-NMR | 13C-NMR | 2D-NMR |
|---|---|---|---|---|---|
| BML-NMR | 0 | 0 | 199 | 0 | 199 |
| BMRB | 0 | 0 | 1,153 | 1, 154 | 755 |
| GMD | 879 | 0 | 0 | 0 | 0 |
| GNPS | 0 | 5,105 | 0 | 0 | 0 |
| HMDB | 255 | 971 | 823 | 109 | 815 |
| MassBank | 9,241 | 2,736 | 0 | 0 | 0 |
| ReSpect | 0 | 718 | 0 | 0 | 0 |
| Spektraris | 482 | 311 | 240 | 216 | 0 |
| Combined (unique) | 9,651 | 6,333 | 1,829 | 1, 383 | 1, 183 |
The number of unique chemicals annotated with a structure and associated with at least one MS or NMR spectrum is given.
Major compound classes represented in open-access spectral databases.
| Database | Major compound classes | Data source |
|---|---|---|
| BML-NMR | Focus on human metabolites | Discussion in Ludwig et al. ( |
| BMRB | Plant cell wall components (486) plus various other plant metabolites | Information on project website |
| HMDB | Focus on human metabolites. Of the 1,046 compounds associated with spectral data, 253 are lipids (e.g., fatty acids, steroids, and prenol lipids) and 174 are amino acid derivatives | Metabolite class annotation available |
| NAPROC-13 | Terpenoids are the best represented compound class (15,527). Other well-represented classes are steroids (729), flavonoids (1,769), “aromatics” (1,236), chromans (304), and lignans (294) | Metabolite class annotation available |
| ReSpect | Focus on plant metabolites. Flavonoids are the best represented class (1,360). Other well-represented classes are terpenoids (519), phenylpropanoids (341), alkaloids (256), amino acid derivatives (236), and glucosinolates (93) | Metabolite class annotation available |
| Spektraris | Focus on plant metabolites. MS spectra for alkaloids (>100), flavonoids (>80), lignans and phenylpropanoids (>60), and terpenoids (>50). NMR spectra for terpenoids (248) | All of the NMR spectra are for taxanes. The MS spectra are not annotated with compound class |