| Literature DB >> 34427506 |
Joris J R Louwen1, Justin J J van der Hooft1.
Abstract
Microbial specialized metabolites are key mediators in host-microbiome interactions. Most of the chemical space produced by the microbiome currently remains unexplored and uncharacterized. This situation calls for new and improved methods to exploit the growing publicly available genomic and metabolomic data sets and connect the outcomes to structural and functional knowledge inferred from transcriptomics and proteomics experiments. Here, we first describe currently available approaches that support the comprehensive mining of metabolomics and genomics data. Next, we provide our vision on how to move forward toward the automated linking of omics data of specialized metabolites to their structures, biosynthesis pathways, producers, and functions.Entities:
Keywords: computational biology; computational metabolomics; data mining; genomics; integrative omics; mass spectrometry; microbiome; natural products; specialized metabolites
Year: 2021 PMID: 34427506 PMCID: PMC8407348 DOI: 10.1128/mSystems.00726-21
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Current state-of-the-art ecosystem of genomics (left) and metabolomics (right) natural product research, brought together by paired omics approaches (middle). Genomes are mined for biosynthetic gene clusters (BGCs) through tools such as antiSMASH and DeepBGC, and BGCs with structurally characterized products are stored in databases like MIBiG. BGCs are clustered into families with BiG-SCAPE and BiG-SLiCE. To infer compound classes, molecular families, and substructures, metabolomes (represented by collections of MS/MS spectra) are mined with tools such as ClassyFire, GNPS, MS2LDA, and MolNetEnhancer. Structural annotations relevant for microbiome research are stored in databases such as NP Atlas and MotifDB, and reference spectra are available in repositories such as GNPS-MassIVE. Paired data stored in platforms such as the Paired Omics Data Platform (PoDP) combine the two sides, which facilitates multi-omics approaches such as NPLinker that links gene cluster families (GCFs) to molecular families (MFs) through sample occurrence (also known as strain correlation) and feature-based matching.
FIG 2Current and envisioned advances in multi-omics natural product discovery research. (A) Improved detection of subclusters and relevant natural product-related chemical compound classes in BGCs and MS/MS spectra will become possible based on machine learning-based computational tools. (B) We envision combining the existing BGC-metabolite matching approaches with substructure and chemical class predictions in platforms such as NPLinker. NPClassifier is a novel ML-based class predictor that considers both structural features and historical relationships between metabolites as defined by natural product researchers. (C) Mass spectral embeddings learned by Spec2Vec and trained with MS2DeepScore will enable fast and improved spectral similarity scoring. The bases for these mass spectral embeddings are the relationships between mass fragments and neutral losses based on their presence/absence in a large set of mass spectra. We expect that these embeddings will allow the rapid annotation of classes, substructures, or other labels such as pathways or functions based on clustering techniques. Finally, the developed workflows can also form the basis for improved comparative and repository-wide metabolomics approaches that highlight shared and novel chemistry produced by microbiomes.