| Literature DB >> 29112695 |
Kai Blin, Hyun Uk Kim, Marnix H Medema, Tilmann Weber.
Abstract
Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.Entities:
Keywords: antiSMASH; antibiotics; biosynthetic gene cluster; genome mining; natural products; secondary metabolites
Mesh:
Substances:
Year: 2019 PMID: 29112695 PMCID: PMC6781578 DOI: 10.1093/bib/bbx146
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
URLs of Web servers, Web tools and databases referred to in the review
| Tool | Functions | URL | Reference |
|---|---|---|---|
| antiSMASH 4 | Genome mining |
| [ |
| BGC analysis | |||
| Domain analysis | |||
| antiSMASH database | BGC database |
| [ |
| ARTS | Genome mining |
| [ |
| BAGEL 3 | Genome mining |
| [ |
| CASSIS | BGC boundary prediction |
| [ |
| CRISPy-web | sgRNA design |
| [ |
| eSNaPD v2 | Genome mining |
| [ |
| FunGeneClusterS | BGC boundary prediction |
| [ |
| fungiSMASH | Genome mining |
| [ |
| BGC analysis | |||
| Domain analysis | |||
| GNP | Metabolomics |
| [ |
| GRAPE/GARLIC | Genome mining |
| [ |
| MIBiG | BGC database |
| [ |
| reference data set | |||
| NaPDoS | Genome mining |
| [ |
| NORINE | Nonribosomal peptide database |
| [ |
| NP.searcher | Genome mining |
| [ |
| Domain analysis | |||
| NRPSpredictor | Domain analysis |
| [ |
| plantiSMASH | Genome mining |
| [ |
| BGC analysis | |||
| PRISM 3 | Genome mining |
| [ |
| BGC analysis | |||
| Domain analysis | |||
| RODEO | Genome mining |
| [ |
| RiPP analysis | |||
| (SEARCHPKS)/SBSPKS v2 | Domain analysis |
| [ |
| BGC database | |||
| Smiles2Monomers | Retro-biosynthetic monomer prediction |
| [ |
| SMURF | Genome mining |
| [ |
Figure 1General workflow of an antiSMASH analysis of bacterial, fungal and plant genomes. Computational resources in the left and right boxes have been integrated with antiSMASH 4 for enhanced genome mining performance, whereas those in the box in the bottom correspond to third-party applications that use antiSMASH for the detection of BGCs.
A: BGC types detectable by pHMM-based rules with antiSMASH, PRISM and SMURF. B: Rule-independent methods to detect BGCs
| A: Rule-based detection of gene clustersa | |||
|---|---|---|---|
| BGC-type | antiSMASH | PRISM/RiPP PRISM | SMURF |
| Aminocoumarins | X | X | |
| Aminoglycosides/ aminocyclitols | X | ||
| Antimetabolites | X | ||
| Aryl polyenes | X | X | |
| Autoinducing peptide | X | ||
| Bacteriocins | X | ||
| Beta-lactams | X | X | |
| Bottromycin | X | X | |
| Butyrolactones | X | X | |
| ClusterFinder fatty acid | X | ||
| ClusterFinder saccharide | X | ||
| ComX | X | ||
| Cyanobactins | X | X | |
| Ectoines | X | X | |
| Furan | X | X | |
| Fused (pheganomycin-like) | X | ||
| Glycocin | X | X | |
| Head-to-tail cyclized peptide | X | X | |
| Homoserine lactone | X | X | |
| Indoles | X | X | |
| Ladderane lipids | X | X | |
| Lantipeptides class I | X | X | |
| Lantipeptides class II | X | X | |
| Lantipeptides class III/IV | X | X | |
| Lasso peptide | X | X | |
| Linaridin | X | X | |
| Linear azol(in)e-containing | X | X | |
| Melanins | X | X | |
| Microcin | X | ||
| Microviridin | X | X | |
| Nonribosomal peptides | X | X | X |
| Nucleosides | X | ||
| Oligosaccharide | X | ||
| Other (unusual) PKS | X | ||
| Others | X | ||
| Phenazine | X | X | |
| Phosphoglycolipids | X | X | |
| Phosphonate | X | X | |
| Polyunsaturated fatty acids | X | ||
| Prochlorosin | X | ||
| Proteusin | X | X | |
| Sactipeptide | X | X | |
| Non-NRP siderophores | X | ||
| Streptide | X | ||
| Terpene | X | X | |
| Thiopeptides | X | X | |
| Thioviridamide | X | ||
| Trans-AT type I PKS | X | X | |
| Trifolitoxin | X | ||
| Type I PKS | X | X | X |
| Type II PKS | X | X | |
| Type III PKS | X | X | |
| YM-216391 | X | ||
(continued)
| B: Rule-independent methods | |||
|---|---|---|---|
| Method | Principle | Implemented in | References |
| ClusterFinder | HMM-based classification of which PFAM domains are likely to be found inside or outside a BGC | antiSMASH | [ |
| EvoMining | Phylogenomic identification of enzymes with expanded substrate spectrum; such enzymes are often found in BGCs | EvoMining | [ |
| Resistance gene-based mining | Identification of potential antibiotic resistance genes; often such genes are part of BGCs to provide self-protection of the producing organism | ARTS | [ |
aFor details on the pHMM’s and specific rules used by the different genome mining programs, please consult the original publications of antiSMASH [6, 32], PRISM [24, 34] or SMURF [28].