| Literature DB >> 32728407 |
Alicia H Russell1, Andrew W Truman1.
Abstract
Genome mining is a computational method for the automatic detection and annotation of biosynthetic gene clusters (BGCs) from genomic data. This approach has been increasingly utilised in natural product (NP) discovery due to the large amount of sequencing data that is now available. Ribosomally synthesised and post-translationally modified peptides (RiPPs) are a class of structurally complex NP with diverse bioactivities. RiPPs have recently been shown to occupy a much larger expanse of genomic and chemical space than previously appreciated, indicating that annotation of RiPP BGCs in genomes may have been overlooked in the past. This review provides an overview of the genome mining tools that have been specifically developed to aid in the discovery of RiPP BGCs, which have been built from an increasing knowledgebase of RiPP structures and biosynthesis. Given these recent advances, the application of targeted genome mining has great potential to accelerate the discovery of important molecules such as antimicrobial and anticancer agents whilst increasing our understanding about how these compounds are biosynthesised in nature.Entities:
Keywords: Antibiotic; BGC, biosynthetic gene cluster; Bioinformatics; Biosynthesis; DNN, deep neural network; Genome mining; HMM, hidden Markov model; MS, mass spectrometry; NP, natural product; Natural product; ORF, open reading frame; PTM, post-translational modification; RTE, RiPP tailoring enzyme; RiPP; RiPP, Ribosomally synthesised and post-translationally modified peptide
Year: 2020 PMID: 32728407 PMCID: PMC7369419 DOI: 10.1016/j.csbj.2020.06.032
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Schematic of RiPP biosynthesis.
Fig. 2Examples of RiPP natural products. A. Structures of a thiopeptide (thiostrepton), a recently discovered antibiotic (darobactin), a redox cofactor (pyrroloquinoline quinone, PQQ) and a thioviridamide-like molecule (prethioviridamide). B. Precursor peptides corresponding to these RiPPs, where core peptides are coloured red.
Summary of genome mining tools available for RiPPs.
| BAGEL4 | BGC identification and annotation Multiple RiPP classes | Web | Sequence file (FASTA) or built-in set of publicly available genomes in RefSeq database | Html output showing BGC regions with gene annotations Sequence alignment with curated precursor peptides Downloadable GenBank files, FASTA files, gene tables and promoter/terminator information | |
| antiSMASH5 | BGC identification, annotation and analysis Multiple RiPP classes | Web | Sequence file (FASTA, GenBank or EMBL) or NCBI nucleotide accession | Html output showing BGC regions with gene annotations and predicted class Predicted PP and cleavage sites for some RiPP classes Downloadable GenBank files and other data for BGC regions KnownClusterBlast analysis | |
| PRISM4 | BGC identification, PP cleavage and PTM prediction Multiple RiPP classes | Web | Sequence file (FASTA or GenBank) | Html output showing BGC regions with gene annotations and predicted class Predictions of core peptide and final structures SMILES strings for predicted structures, FASTA sequences of BGCs | |
| RiPPMiner | BGC identification and RiPP class Predictions of structure, cleavage and crosslinks Multiple RiPP classes | Web | Peptide = PP sequence (raw or FASTA) | Html output with predicted structure and class SMILES strings for predicted structures | |
| Genome = sequence file (FASTA) | Html output showing identified clusters and annotations as well as peptide cleavage, crosslinks and structural predictions SMILES strings of predicted structures List of other small ORFs present in BGC | ||||
| RODEO2 | RiPP BGC identification, PP identification and structural prediction Lasso peptides, lanthipeptides, thiopeptides & sactipeptides | Web or Python | List of bait protein accession numbers. Optional: HMMs and configuration file | Html files with BGC information and Pfam domain annotation .csv files of PP sequences and BGC Pfam domains | |
| RiPPER | PP and BGC recognition Class independent | Docker | List of bait protein accession numbers | GenBank files of retrieved BGCs annotated with short peptides Table of PP data RODEO files for retrieved BGCs | |
| NeuRiPP | PP recognition Class independent | Python | PP sequence file (FASTA) | File of sequences classified by NeuRiPP as positive PPs Separate file of non-RiPP peptides |
Summary of MS-based mining tools available for RiPPs.
| RiPPquest/MetaMiner | MS-guided genome mining, optimised for large datasets Multiple RiPP classes | Python or web (GNPS) | LC-MS/MS data file (MGF, mzXML, mxML or mzData) and sequence file (FASTA, antiSMASH GenBank output or BOA txt output) | .tsv files with information about identified peptides and RiPP class | |
| Pep2Path | BGC identification from peptide MS data | Python | Comma-separated sequence of mass shifts or amino acids, and a sequence file (FASTA, GenBank or EMBL) | Table with best peptide matches | |
| CycloNovo | Cyclopeptide identification and prediction | Python or web (GNPS) | MS data file (mzXML or MGF) | MGF file of identified cyclopeptide spectra Spectra listed with cyclopeptide scoring (txt) Peptide sequencing reconstructions (txt) | |
| DeepRiPP | PP structural and class predictions BGC identification Multiple RiPP classes | Web | NLPPrecursor: PP sequence (FASTA) | NLPPrecursor: Html output of predicted RiPP class and cleavage site | |
| BARLEY: core peptide sequence and RTE | BARLEY: Html output of alignment with similar RiPPs and structure predictions | ||||
| CLAMS: MS data (mzML) | CLAMS: Html output with list of MS peaks | ||||
| DeepRiPP (full): sequence file (FASTA) and optional MS file (mzML) | DeepRiPP (full): integrated Html output of NLPPrecursor, BARLEY and CLAMS Attempted matching between structure prediction and MS data |
Summary of databases available for RiPPs and their BGCs.
| ThioBase | Thiopeptide specific Structure and activity BGCs and core peptide sequences Literature links | |
| BACTIBASE | Structural and physiochemical properties of bacteriocins Literature and sequence database links | |
| BAGEL database | RiPP and bacteriocins Precursor peptide sequences Literature and sequence database links | |
| RiPPMiner database | RiPP structures Precursor peptide sequences and modified residue details Literature links | |
| IMG-ABC | NP BGC database from all genomes in IMG All antiSMASH-identified NP classes Searchable by BGC class | |
| MIBiG | Repository of NP BGCs Searchable by BGC class Structure and BGC details Literature links | |
| antiSMASH database | antiSMASH outputs for sequenced bacterial genomes All antiSMASH-identified NP classes Searchable by BGC class |
Fig. 3Examples of RiPPs whose discovery was guided by the use of genome mining tools. The compound name, class and tool are listed alongside each structure.
Fig. 4Overview of RiPP mining results for Streptomyces scabies 87–22. A. Genetic details of all RiPP BGCs identified by one or more tools. B. Summary of predictions made by each tool for a given RiPP BGC. Regions highlighted in red relate to predicted core peptides. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Summary of structural predictions provided for lanthipeptide BGCs by antiSMASH, RiPPMiner, PRISM and DeepRiPP. A. Summary of predictions (note: both PRISM and DeepRiPP predict multiple possible RiPP products and only the first prediction is visualised here). B. Structures of two characterised lanthipeptides whose BGCs have homology to BGC1 and BGC2.