Literature DB >> 32637051

Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in Streptomyces.

Namil Lee¹, Soonkyu Hwang¹, Jihun Kim¹, Suhyung Cho¹, Bernhard Palsson^2,3,4, Byung-Kwan Cho^1,5,6.

Abstract

Streptomyces are a large and valuable resource of bioactive and complex secondary metabolites, many of which have important clinical applications. With the advances in high throughput genome sequencing methods, various in silico genome mining strategies have been developed and applied to the mapping of the Streptomyces genome. These studies have revealed that Streptomyces possess an even more significant number of uncharacterized silent secondary metabolite biosynthetic gene clusters (smBGCs) than previously estimated. Linking smBGCs to their encoded products has played a critical role in the discovery of novel secondary metabolites, as well as, knowledge-based engineering of smBGCs to produce altered products. In this mini review, we discuss recent progress in Streptomyces genome sequencing and the application of genome mining approaches to identify and characterize smBGCs. Furthermore, we discuss several challenges that need to be overcome to accelerate the genome mining process and ultimately support the discovery of novel bioactive compounds.

Entities: CellLine Chemical Disease Mutation Species

Keywords: Biosynthetic gene clusters; Genome mining; Secondary metabolites; Streptomyces

Year: 2020 PMID： 32637051 PMCID： PMC7327026 DOI： 10.1016/j.csbj.2020.06.024

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Streptomyces species are filamentous Gram-positive bacteria found in the soil and a member of the largest genus of Actinobacteria. They are well-known for their ability to produce a wide array of bioactive secondary metabolites, which have a number of antiviral, antifungal, anticancer, immunosuppressive, and antibiotic functions. The large number of secondary metabolites produced by these bacteria allow them to compete in diverse microbial communities and survive in various habitats including soils, rivers, lakes, and marine ecosystems [1]. Since the first report of Streptomyces ability to produce antibiotics in the 1940s, a significant number of novel antibiotics have been characterized by screening the antimicrobial activity of soil Streptomyces against the target pathogens. Most of the currently available antibiotic classes were discovered in and produced from Streptomyces species isolated between 1940 and 1962. However, after two decades of success using traditional biochemical screening approaches, since cultured soil microorganism constitutes less than 0.1% of the total of soil microorganisms, the rediscovery rate of known species and compounds has continuously increased and reached 99% in the late 1980s, with no new classes of antibiotics being approved since [2], [3]. Meanwhile, with the rapid emergence of broad-spectrum antibiotic resistance, it is increasingly important that we isolate novel classes of antimicrobial compounds. This search for new bioactive products has reinvigorated the field of Streptomyces research [4]. One bottleneck in the traditional screening methods is that Streptomyces often downregulates or inactivates secondary metabolite production under axenic laboratory culture conditions. Secondary metabolites are produced from the multi enzyme complexes encoded by the secondary metabolite biosynthetic gene clusters (smBGCs). smBGCs generally contain whole pathways that facilitate precursor biosynthesis, assembly, modification, resistance, and regulation of their product. The expression of these clusters is tightly controlled by complex regulatory networks governed by biotic and abiotic stresses found in the bacteria’s natural habitat [5]. Therefore, only a small fraction of secondary metabolites can be produced under laboratory culture conditions, especially when we do not know the precise environmental stimuli needed to induce their synthesis. To fully realize the biosynthetic potential of Streptomyces, it is necessary to develop tools to identify all of the smBGCs, including those that are silenced under laboratory conditions, in their entirety encoded in the Streptomyces genomes. With the recent advances in DNA sequencing technology, the number of fully sequenced Streptomyces genomes has increased exponentially [6], [7]. As a result, it has become increasingly necessary to develop a suite of bioinformatics tools that can be used to annotate and mine these genomes. Several bioinformatics tools have been developed, including BAGEL [8], ClustScan [9], CLUSEAN [10], NP.searcher [11], PRISM [12], and antiSMASH [13], to identify smBGCs within the genome, with most of these technologies relying on the highly conserved sequences within the smBGCs to map their location. Genome mining approaches have revealed that each Streptomyces species possesses about 30 smBGCs, including many clusters whose products are not yet identified. These findings have supported the hypothesis that the biosynthetic potential of Streptomyces has been underestimated [14]. Genome mining approaches enable prediction of smBGCs from Streptomyces genome data quickly and easily, but characterizing these predicted smBGCs still requires extensive laboratory work, including the activation of silenced smBGCs, purification of the final products, and determination of their chemical structure. Therefore, accelerating the process linking the product with their corresponding smBGCs is of paramount importance in the effort to advance our practical understanding of the secondary metabolite biosynthetic pathways of these bacteria. This mini review focuses on the genome mining approaches for smBGC identification from Streptomyces genome data and their utility in discovering novel bioactive compounds. We briefly describe the current status of the Streptomyces genome sequencing projects and the importance of high-quality genomic data for smBGC identification. Next, we introduce several in silico genome mining tools that have been developed for this purpose and describe the characterization process of several key examples. Finally, we highlight future challenges that need to be overcome for the efficient discovery of novel secondary metabolites from Streptomyces.

Current status of the Streptomyces genome sequencing projects

Features of Streptomyces genomes

Unlike other bacteria, Streptomyces have large linear chromosomes with high G + C content. The origin of replication (oriC) is usually located at the center of the linear chromosome and terminal inverted repeats sequences (TIRs), which covalently bind with the terminal proteins, are found at each end [15]. Interestingly, the lengths and sequences of the TIRs are highly variable between species, and the number of TIR iterations does not correlate with the size of the genome [16]. The most distinct feature of the Streptomyces genome is the high degree of chromosomal instability, which leads to frequent spontaneous deletions and rearrangements, especially at the ends of the chromosome. For example, about 0.5% of the germinating spores of Streptomyces lividans undergo large deletions removing up to 25% of the genome (~2 Mbp) under laboratory-culture conditions [17]. As a result, essential genes related to cell maintenance, including transcription, translation, and DNA replication, are located in the “core” region of the chromosome. In contrast, conditionally adaptive genes, especially those related to the secondary metabolism, are usually located within the “arm” regions of the chromosome [18]. This chromosomal plasticity results in a high degree of variation in the smBGCs, which could have been acquired via prevalent gene duplications and horizontal gene transfer with other Streptomyces species or bacteria [19].

Currently available Streptomyces genomes

Since the whole genome sequences of Streptomyces coelicolor A3(2) and Streptomyces avermitilis were completed by shotgun sequencing in 2003 [20], [21], a number of Streptomyces genome sequences has been reported. Next Generation Sequencing (NGS) has revolutionized the field and enabled a drastic increase in the number of reported genomes for Streptomyces since 2013 (Fig. 1A). According to the RefSeq database, a total of 1,749 Streptomyces genomes had been deposited as of the 6th of February 2020, and more than 73% of the genomes were sequenced by NGS techniques, such as Illumina, PacBio, 454, and MinION. The 1,749 Streptomyces genomes composed of 867 contig level (i.e., genomes include only contigs), 646 scaffold level (i.e., genomes include scaffolds and contigs), 36 chromosome level (i.e., genomes include chromosomes, scaffolds, and contigs), and 200 complete genomes (Fig. 1A) [6]. Considering that the 36 chromosome level genomes were assembled using ambiguous (N) bases, and that three of the complete genomes contained ambiguous (N) bases, high-quality Streptomyces genomes comprise only about 11% of the total Streptomyces genomes available. The length of the 236 scaffold and complete chromosomes ranged from 5.9 to 12.7 Mbp and an average G + C content of 71.7% (B). Large genome size and unusually high G + C content are the representative features of the Streptomyces genome as mentioned above. Interestingly, the shorter the chromosome length, the higher the G + C content observed (Fig. 1B). This is probably because G + C content in the “core” region is highly conserved between various species, while the “arms” are less conserved and contain relatively low G + C content.

Fig. 1

Current status of (A) Annual number of deposited Streptomyces genomes in RefSeq database as of the 6th of Feb 2020. (B) Chromosome length and G + C content of 236 scaffold and complete level Streptomyces genomes.

Importance of high-quality genome sequences for genome mining

SmBGC prediction using contig-level genome is usually inappropriate because genes in an smBGC are often predicted to be scattered through several contigs. As described above, approximately 90% of the reported Streptomyces genomes are incomplete, containing varying degrees of contig or ambiguous sequence contributions. Securing high-quality Streptomyces genome sequences is challenging as a result of the low fidelity of current sequencing techniques when dealing with high G + C and repetitive sequences [7]. Furthermore, because of its linear chromosome, it is difficult to evaluate the completeness of the genome assembly when compared to other bacteria with circular chromosomes. Completeness of the genome is typically quantified using Benchmarking Universal Single-Copy Orthologs (BUSCO), which measures the number of copies of single copy genes in the sequence data and provides a quantitative assessment of genome assembly and gene sets [22]. In 2016, genome completeness of 653 Streptomyces genomes was analyzed using BUSCO, which revealed that about 36% of Streptomyces genomes have poor completeness [23]. Given that the Streptomyces BUSCO markers used in this analysis included only 40 genes, and the fact that the number of single copy genes in the Streptomyces genome now sits at 352, even the reported completeness of the Streptomyces genomes needs to be reassessed. In addition to the completeness of genome assembly, quality of genome sequence (i.e., quality of bases) is also important for determining smBGC, in the aspect of accurate coding sequence (CDS) prediction. Especially as most of the smBGCs are composed of long core biosynthetic genes (>5 kb) containing repetitive sequences, thus, inaccurate genome sequence often results in frameshift errors during the prediction of CDSs within the smBGCs. For instance, the genome sequence of Streptomyces clavuligerus ATCC 27064, which produces β-lactam class antibiotic clavulanic acid, has been determined, but the quality of the reported genomes was poor and contained a large number of ambiguous (N) sequences [24], [25]. Recently, a high-quality genome sequence for S. clavuligerus was obtained using PacBio and Illumina sequencing methods, revealing that 2,184 genes out of a total of 7,163 genes were miss- or not-predicted in the previous low-quality genome sequences, including 47 genes encoded in smBGCs. The accurate CDS prediction often improves the functional annotation of genes. For example, in a low-quality genome, CRV15_02370, which located in terpene BGC, was annotated as unknown lipoprotein. Meanwhile, in the high-quality genome, the exact sequence of tandem ambiguous (N) sequences located at the upstream region of CRV15_02370 was determined, resulting in correction and re-annotation of the CRV15_02370 as 1-hydroxy-2-methyl-2-butenyl 4-diphosphatereductase [26]. As in the case of S. clavuligerus, applying both the PacBio sequencing method generates long reads of several kb, and the Illumina sequencing method, which has a low error rate, could be the solution to obtaining high-quality Streptomyces genomes [6], [26]. In addition, Oxford nanopore sequencing method, which has been dominating the long-read sequencing platform with PacBio sequencing method, is more cost-effective and provides even longer reads (current record of 2.3 Mbp) than PacBio sequencing method [27]. Thus, Oxford nanopore sequencing method is expected to be an attractive alternative to PacBio sequencing method for securing high-quality Streptomyces genomes. Nevertheless, the functional annotation of high-quality Streptomyces genome still yields a considerable amount of hypothetical proteins, due to the limited number of experimentally validated genes in the database. Indeed, about 24% of total S. clavuligerus genes and 25% of smBGC encoded genes were annotated as unknown genes. Even worse, the currently automated gene annotation pipelines utilize incorrect annotation existing in previous genomes to annotate new genomes because the public annotation database does not update any corrected annotation errors, neglecting the spread of misinformed functional role of the gene [28]. These incomplete functional annotations have been hampered the accurate genome mining of smBGCs and mechanistic understanding of secondary metabolite biosynthesis. Frequent and efficient update of the smBGC database with the support of individual functional genomic studies would mitigate these problems.

Genome mining for smBGCs

Classical approaches for the identification of smBGCs

The traditional method for identifying smBGCs in Streptomyces relies on the identification of the secondary metabolites using chemistry-based methods, like mass spectrometry and NMR, and then isolating the corresponding biosynthetic genes by randomized gene deletion or mutagenesis, followed by screening for nonproducing clones [29], [30], [31]. This tedious method was improved after the identification of the conserved regions in the smBGCs, which could be used to screen for unknown smBGCs. Secondary metabolites have tremendous structural diversity, but biosynthetic machineries, including assembling and tailoring enzymes, for secondary metabolites belong to the same highly conserved enzyme families [32]. Especially, polyketide synthases (PKS) producing polyketides (PK) and non-ribosomal peptide synthetases (NRPS) producing non-ribosomally synthesized peptides (NRP) which are assembled by the core enzymes of large multi-modular complexes consisting of highly conserved domains and sequences. Designing probes based on these conserved sequences and screening for smBGCs using Southern blots was a popular and widely used approach for several decades. One example of this approach is the identification of the aminocoumarin antibiotic clorobiocin BGC discovered by screening the cosmid library of Streptomyces roseochromogenes using two heterologous probes designed against the sequence of the novobiocin BGC [33].

In silico tools for genome mining of smBGCs

Development of in silico nucleotide or amino acid sequence alignment tools, such as BLAST, Diamond, and HMMer, enabled researchers to mine for novel smBGCs in databases and genome sequences using a conserved sequence without the time-consuming processes of performing a Southern blot. The first microbial natural-product biosynthetic loci database for in silico genome mining of smBGCs was DECIPHER, which is a proprietary database constructed by Ecopia Biosciences Inc. [34]. Since then, various free databases and tools for smBGC prediction have been developed, including BAGEL [8], ClustScan [9], CLUSEAN [10], and NP.searcher [11]. Many of these tools have already been comprehensively reviewed, and the recently released web portal called “Secondary Metabolite Bioinformatics Portal” provides a description of and manual for each of these mining software and databases [35]. However, most of these tools are limited to the discovery of specific classes of secondary metabolites, including PKS and NRPS. PRISM and antiSMASH are representative in silico tools for predicting various types of smBGCs [12], [13]. These tools predict smBGC types by employing a sequence alignment-based profile in a Hidden Markov Model (HMM) of genes that are specific for certain types of smBGCs. For example, antiSMASH identifies smBGCs based on the highly conserved core biosynthetic enzymes and evaluates the results using a set of manually curated BGC cluster rules, followed by discarding false positives using negative models (e.g., fatty acid synthases are homologous to PKSs). The latest version, PRISM version 3, can identify 22 different types of smBGCs, and antiSMASH version 5 can predict up to 52 different types of smBGCs. Both tools are user-friendly web applications, which provide rapid gene annotation when bacterial genomes are submitted in FASTA format, making them popular tools in current mining studies. These are the most used genome mining tools, but these rule-based tools are restricted to detect similar smBGCs to known pathways. Accordingly, recently, smBGC mining tools that utilize machine learning strategies like ClusterFinder and DeepBGC, have been developed to allow the identification of unknown smBGCs [36], [37]. However, current machine learning based genome mining tools have a much higher false-positive rate than the rule-based tools. Moreover, these tools are trained with the set of known clusters (e.g., MIBiG database) or a set of clusters predicted by one of the rule-based tools (e.g., antiSMASH); thus, it is still challenging to detect completely novel smBGCs. SmBGC mining of Streptomyces genomes using these in silico tools confirmed that the genetic potential of Streptomyces to produce secondary metabolites has been under-estimated. According to the genome-wide study of Actinobacteria, the genomes of each Streptomyces species possesses about 40 smBGCs [14], [38]. Considering that Streptomyces is the largest genus of Actinobacteria (approximately 700 valid species at present) and that the smBGCs of each Streptomyces are highly different, Streptomyces are inestimable resources for the discovery of novel bioactive compounds. In addition, a recent genome mining study of the 1,110 publicly available Streptomyces genomes suggested the importance of genome mining at the strain level as it increases the likelihood that researchers discover useful derivatives of known secondary metabolites and expands the diversity of recognized secondary metabolites used in new mining approaches [14].

Characterizing smBGCs identified by genome mining

Although genome mining approaches showcase the full biosynthetic potential of Streptomyces, it is worthless without linking the predicted smBGCs to their product. In this section, we describe several examples of genome mining approaches, which connect various metabolites with their corresponding smBGCs using (i) reverse (metabolites to genes) or (ii) forward (genes to metabolites) approaches. The reverse approach allows researchers to determine the BGCs of known secondary metabolites, and forward approach identifies the products of novel smBGCs (Fig. 2).

Fig. 2

Overview of genome mining approaches to identify smBGCs in Minimum Information about a Biosynthetic Gene cluster (MIBiG) is repository for secondary metabolite biosynthetic gene clusters.

Overview of genome mining approaches to identify smBGCs in Minimum Information about a Biosynthetic Gene cluster (MIBiG) is repository for secondary metabolite biosynthetic gene clusters. In the pre-genomic era, especially in the golden age of antibiotic discovery (1950 to 1960), plenty of Streptomyces species were isolated from the environment and screened for antimicrobial activity. However, after isolation, only antimicrobial compounds were identified via chemistry-based methods, and in most cases, the corresponding smBGCs were not determined as a result of the lack of information and relevant technologies, including DNA sequencing method [39], [40]. In recent years, advances in genome mining tools have allowed researchers to adopt a reverse approach to determining the BGCs of known secondary metabolites produced from Streptomyces. These efforts have enabled us to identify and elucidate the biosynthetic pathways of various important secondary metabolites much faster and more efficiently than conventional randomized mutagenesis-based methods (Table 1). For example, anthracimycin, a macrolide antibiotic that exhibits antibacterial activity against methicillin-resistant Staphylococcus aureus and vancomycin-resistant enterococci [41], was isolated from Streptomyces sp. T676 in 1995, but its BGC could not be determined at the time [42]. Recently, the genome sequence of Streptomyces sp. T676 was captured, and two type I modular PKS gene clusters were identified by genome mining using antiSMASH. Through additional bioinformatics analysis, one PKS gene cluster was identified as the candidate pathway for the production of anthracimycin, and heterologous expression of this BGC in S. coelicolor resulted in the production of anthracimycin [42]. SmBGC information obtained from reverse approaches has expanded smBGC databases, increasing the accuracy of the genome mining tools and the number of predictable smBGC types.

Table 1

Selected examples of the reverse approach in smBGC genome mining from Streptomyces.

Strains	Genome mining methods	Compound name	Year	Ref.
Streptomyces chromofuscus ATCC 49982	PKS gene search	Herboxidiene	2012	[54]
Streptomyces netropsis CGMCC 4.1650	BLAST	Pyrroleamides	2014	[55]
Streptomyces sp. T676	antiSMASH	Anthracimycin	2015	[42]
Streptomyces paulus NRRL 8115	antiSMASH	Paulomycin	2015	[60]
Streptomyces olivaceus strain FXJ7.023	antiSMASH	Lobophorin	2016	[56]
Streptomyces sp. MSC090213JE08	antiSMASH	Ishigamide	2016	[57]
Streptomyces leeuwenhoekii DSM 42122	antiSMASH	Chaxamycin	2016	[58]
Streptomyces sp. CNR-698	BLASTP	Ammosamides	2016	[59]
Stretpomyces anulatus 3533-SV4	RiPP gene search	Telomestatin	2017	[61]
Streptomyces lydicus A02	BLASTP and antiSMASH	Natamycin	2017	[62]
Streptomyces sp. MP131-18	antiSMASH	Lynamicins and spiroindimicins	2017	[63]
Streptomyces sp. SD85	antiSMASH	Sceliphrolactam	2018	[64]
Streptomyces sp. strain fd1-xmd	antiSMASH	Streptothricin and tunicamycin	2018	[65]
Streptomyces koyangensis SCSIO 5802	antiSMASH	Neoabyssomicin and abyssomicin	2018	[66]
Streptomyces olivaceus FXJ8.012	BLAST	Mycemycin	2018	[67]
Streptomyces sp. ATCC 14903	antiSMASH and BLAST	Actinonin	2018	[68]
Streptomyces aureofaciens ATCC 31442	antiSMASH	Triacsins	2018	[69]
Streptomyces lunaelactis MM109^T	antiSMASH	Ferroverdins and bagremycins	2019	[70]
Streptomyces nigrescens HEK616	BLASTP	Streptoaminals	2019	[71]
Streptomyces sp. Tu 4128	antiSMASH	Bagremycin	2019	[72]
Streptomyces caniferus CA-271066	antiSMASH	Caniferolides	2019	[73]
Streptomyces sp. S816	antiSMASH	Pentamycin	2019	[74]
Streptomyces humidus CA-100629	antiSMASH	Humidimycin	2020	[75]
Streptomyces cacaoi subsp. cacaoi NBRC 12748 T	antiSMASH and NRPSsp	Pentaminomycin	2020	[76]

Selected examples of the reverse approach in smBGC genome mining from Streptomyces. Accumulated Streptomyces genome sequences and advanced genome mining tools provide opportunities for the forward approach to smBGC identification, which allows researchers to identify the novel smBGCs from the genome, then identify the product of this smBGC (Table 2). Curacozole is the first sequential oxazole/methyloxazole/thiazole ring-containing macrocyclic peptide compound identified using a genome mining based approach. Genome mining of Streptomyces curacoi isolated a new precursor peptide gene for ribosomally synthesized and post-translationally modified peptides (RiPPs). Purifying and determining the structure of the product of this RiPP BGC using ESI-MS and NMR resulted in the discovery of new cytotoxic compound, curacozole [43]. In the case of curacozole, the structural prediction of RiPPs from the genomic data is comparatively easier than that of other secondary metabolites, because the entire sequence of the core peptides translated from the nucleotide sequence is generally retained in the final product. To overcome the low productivity of curacozole and allow its robust purification, S. curacoi was treated with rifampicin to induce mutations, one of which occurred within the RNA polymerase β subunit, which facilitated an increased production of secondary metabolites. Thus, successful forward approaches for smBGC genome mining require two things; (i) there needs to be a predictable draft structure of the final product and (ii) the novel smBGCs need to be expressed at a high enough level to produce detectable quantities of the secondary metabolite.

Table 2

Selected examples of the forward approach in smBGC genome mining from Streptomyces.

Strains	Genome mining methods	Compound name	Year	Ref.
Streptomyces coelicolor M145	NRPS gene search	Coelichelin	2005	[77]
Streptomyces coelicolor M145	Type III PKS gene search	Germicidin	2006	[78]
Streptomyces venezuelae ATCC 10712	Lanthipeptides gene search	Venezuelin	2010	[79]
Streptomyces ambofaciens ATCC 23877	SEARCHPKS and SEARCHGTr	Stambomycins	2011	[80]
Streptomyces coeruleorubidus	BLASTP	Pacidamycin	2011	[81]
Streptomyces sp. W007	BLASTP	Angucyclinone antibiotics	2012	[82]
Streptomyces peucetius ATCC 27952	NRPS gene search	Siderophore	2013	[83]
Streptomyces sp. SANK 60404	BLASTP	Cembrane	2013	[84]
Streptomyces viridochromogenes DSM 40736	RiPPquest	Informatipeptin	2014	[85]
Streptomyces collinus Tü 365	antiSMASH	Streptocolin	2015	[86]
Streptomyces leeuwenhoekii strain C58	antiSMASH	Chaxapeptin	2015	[87]
Streptomyces chartreusis AN1542	BLASTP	Complestatin	2016	[88]
Streptomyces venezuelae ATCC 10712	BLASTP	Venemycin	2016	[89]
Streptomyces kebangsaanensis	antiSMASH	Phenazine antibiotic	2017	[90]
Streptomyces atratus SCSIO ZH16	antiSMASH	Ilamycins	2017	[91]
Streptomyces argillaceus ATCC 12956	antiSMASH	Argimycins P	2017	[92]
Streptomyces lavendulae FRI-5	antiSMASH	New diol-containing polyketide	2017	[93]
Streptomyces sp. YIM 130001	antiSMASH	Thiopeptide Antibiotic	2018	[94]
Streptomyces avermitilis KA-320	PKS gene search	Phthoxazolin A	2018	[95]
Streptomyces actuosus ATCC 25421	antiSMASH	Avermipeptin Analogue	2018	[96]
Streptomyces sp. YIM 130001	antiSMASH	Geninthiocin B	2018	[94]
Streptomyces sp. DUT11	antiSMASH and BLAST	Tunicamycin	2018	[97]
Streptomyces curacoi NBRC 12761 ^T	antiSMASH and BLAST	Curacozole (cytotoxic peptide)	2019	[43]
Streptomyces albus subsp. Chlorinus NRRL B-24108	antiSMASH	Nybomycin	2018	[98]
Streptomyces isolatess ICC1 and ICC4	antiSMASH	2′,5′–dimethoxyflavone and nordentatin	2019	[99]
Streptomyces hawaiiensis NRRL 15010	antiSMASH and BLAST	Acyldepsipeptide (ADEP)	2019	[100]
Streptomyces atratus SCSIO ZH16	antiSMASH	Atratumycin	2019	[101]
Streptomyces leeuwenhoekii C34^T	antiSMASH	Leepeptin	2019	[102]
Streptomyces olivaceus SCSIO T05	antiSMASH	Lobophorin CR4	2019	[103]
Streptomyces sp. Tü6314	antiSMASH	Streptoketides	2020	[104]

Selected examples of the forward approach in smBGC genome mining from Streptomyces. There are several computational methods to predict the putative products of smBGCs, which use databases of experimentally characterized smBGCs as a reference, especially for PKSs and NRPSs. These methods use the basic rules of structure prediction which consider the substrate specificity of the catalytic domains of PKSs and the NRPSs modules to construct the backbone structure of the product, which is followed by the identification of tailoring domains to estimate further modifications or cyclization of the compounds and these results are mapped back to the database to give the user an idea of the secondary metabolite produced by their unknown smBGC. Comprehensive genome mining tools, antiSMASH and PRISM, also provide the chemical structure predictions of putative products from unknown smBGCs [44]. The accuracy of chemistry prediction is dependent on the algorithm and the database used to predict the catalytic domains of the enzyme and the substrate specificity of the domains. When PRISM version 1 was released, it was the unique tool capable of predicting the chemical structure of type II PKs, and the chemistry prediction accuracy for NRPs and type I PKs was also much higher than antiSMASH version 3.0 or NP.searcher [45]. After further improvement, PRISM version 3 became available for chemical structure prediction of products arising from non-modular biosynthetic paradigms, including RiPPs, aminocoumarins, antimetabolites, bisindoles, and phosphonate-containing natural product [12]. AntiSMASH also improved chemistry prediction when updated to version 4.0, but it provides conservative structure prediction compared to PRISM, which generates a wide range of combinatorial libraries of predicted structures by considering the uncertainty of tailoring sites [46]. Although the chemistry prediction accuracy of the most recent versions of PRISM version 4 and antiSMASH version 5.0 has never been compared, it is appropriate to use both tools according to the user's research purposes. Despite the aforementioned advances in chemistry prediction, lack of information on tailoring enzymes and frequent assignment of nearby smBGCs as hybrid smBGCs still require further experimental validation of the chemistry prediction. To fulfill the second requirement for forward mapping approaches, several other technologies were integrated into the genome mining approach to increase secondary metabolite production or activate silent smBGCs. Since, most smBGCs of Streptomyces are silent under laboratory-culture condition, altering the expression level of smBGCs to produce enough amount of secondary metabolites have to come before linking the secondary metabolites to the corresponding smBGCs. This method relies on the treatment of cultures with elicitors or mutagens to increase the expression of smBGCs as in the case of curacozole discovery. Genome engineering is also a suitable method for inducing silent smBGCs, for example, one study used CRISPR-Cas9 to introduce constitutive promoters to silent novel smBGCs loci forcing the production of unique metabolites which were then evaluated by NMR [47]. Since smBGCs consist of dozens of genes, to efficiently activate the entire cluster, most of the studies have engineered the expression level of global or cluster-specific regulatory genes. Genome engineering of Streptomyces for the characterization of silent smBGCs has strengthened with the development of synthetic biology tools for Streptomyces [48]. However, genome engineering is not always applicable as a result of the difficulty in manipulating the genome of these bacteria and their slow growth rates. Heterologous expression of silent smBGCs in other Streptomyces is also a suitable alternative [49]. To enable this, there has been a significant amount of efforts put into the construction of a Streptomyces chassis strain, which has a reduced chemical diversity as a result of the removal of its endogenous smBGCs, meaning that it can be used as a heterologous expression host for novel smBGCs characterization with reduced confounding effects [50]. Forward experimentation is significantly more challenging than the reverse method when it comes to chemical characterization of secondary metabolites. Notably, the existence of a large number of completely unknown genes, which may encode enzymes catalyzing the product tailoring steps, prevent the accuracy of predictions for the forward approach. As the smBGC database constantly expands along with the accumulation of individual functional genomics experiments, the forward approach will continue to evolve and has the most potential for the isolation and identification of novel bioactive compounds from Streptomyces.

Summary and outlook

In this mini review, we discussed the current status of Streptomyces genome sequencing data and in silico genome mining tools for smBGCs prediction. Technical advances in DNA sequencing and the rapid development of in silico genome mining tools demonstrate that the biosynthetic potential of Streptomyces has been vastly underestimated. We went on to discuss the fact that mining of smBGCs from the Streptomyces genome and characterization of their corresponding products using forward and reverse approaches are feasible and illustrated this with several examples. Reverse approaches link known secondary metabolites to their corresponding smBGCs and expand the current smBGC database pools enhancing the accuracy and versatility of in silico genome mining tools. In contrast, forward approaches enable the discovery of novel bioactive compounds from the Streptomyces, securing new drug candidates. The important lesson from the genome mining examples is that major bottlenecks in this process are limitation on detecting poorly characterized classes of smBGCs and determining final products of detected smBGCs. Several challenges must be overcome to enable the efficient discovery of novel secondary metabolites from Streptomyces. For accurate in silico structure predictions of putative products from smBGCs, mechanistic understanding of secondary metabolite biosynthesis based on accumulated knowledge is still lacking. Simultaneously, the induction of silent smBGCs to experimentally validate the structure of final products remains difficult, which means that there needs to be a focus on the development of synthetic biology tools for genome engineering and the construction of a Streptomyces chassis strain to facilitate heterologous expression. The final use of Streptomyces smBGC information obtained from genome mining approaches will be the knowledge-based repurposing of smBGCs to produce derivatives of original products or non-natural compounds to improve human health and industry. Recently, several groups undertook the construction of a new assembly line for the production of fuels and synthetic industrial compounds facilitated by the rearrangement of PKS and NRPs modules [51], [52]. In addition, ClusterCAD, an in silico toolkit for designing novel PKS assembly lines, has been developed and applied in several retro-biosynthesis studies [53]. If genome mining and characterization of smBGCs’ products are repeated in a positive feedback cycle, it could ultimately be used to design and generate synthetic BGCs for the production of novel bioactive compounds.

CRediT authorship contribution statement

Namil Lee: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Soonkyu Hwang: Formal analysis. Jihun Kim: Formal analysis. Suhyung Cho: Writing - original draft. Bernhard Palsson: Writing - original draft. Byung-Kwan Cho: Conceptualization, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

26 in total

1. Bioinformatic comparison of three Embleya species and description of steffimycins production by Embleya sp. NF3.

Authors: Karol Rodríguez-Peña; Maria Paula Gómez-Román; Martha Lydia Macías-Rubalcava; Leticia Rocha-Zavaleta; Romina Rodríguez-Sanoja; Sergio Sánchez
Journal: Appl Microbiol Biotechnol Date: 2022-04-11 Impact factor: 4.813

2. Genomic Organization of Streptomyces flavotricini NGL1 and Streptomyces erythrochromogenes HMS4 Reveals Differential Plant Beneficial Attributes and Laccase Production Capabilities.

Authors: Richa Salwan; Randhir Kaur; Vivek Sharma
Journal: Mol Biotechnol Date: 2021-11-16 Impact factor: 2.695

3. Genome Mining and Metabolomics Unveil Pseudonochelin: A Siderophore Containing 5-Aminosalicylate from a Marine-Derived Pseudonocardia sp. Bacterium.

Authors: Fan Zhang; René F Ramos Alvarenga; Kurt Throckmorton; Shaurya Chanana; Doug R Braun; Jen Fossen; Miao Zhao; Sue McCrone; Mary Kay Harper; Scott R Rajski; Warren E Rose; David R Andes; Michael G Thomas; Tim S Bugni
Journal: Org Lett Date: 2022-06-01 Impact factor: 6.072

Review 4. Targeted Large-Scale Genome Mining and Candidate Prioritization for Natural Product Discovery.

Authors: Jessie James Limlingan Malit; Hiu Yu Cherie Leung; Pei-Yuan Qian
Journal: Mar Drugs Date: 2022-06-16 Impact factor: 6.085

Review 5. Marine Microbial-Derived Resource Exploration: Uncovering the Hidden Potential of Marine Carotenoids.

Authors: Ray Steven; Zalfa Humaira; Yosua Natanael; Fenny M Dwivany; Joko P Trinugroho; Ari Dwijayanti; Tati Kristianti; Trina Ekawati Tallei; Talha Bin Emran; Heewon Jeon; Fahad A Alhumaydhi; Ocky Karna Radjasa; Bonglee Kim
Journal: Mar Drugs Date: 2022-05-26 Impact factor: 6.085

6. Unique Physiological and Genetic Features of Ofloxacin-Resistant Streptomyces Mutants.

Authors: Kanata Hoshino; Ryoko Hamauzu; Hiroyuki Nakagawa; Shinya Kodani; Takeshi Hosaka
Journal: Appl Environ Microbiol Date: 2021-12-22 Impact factor: 5.005

Review 7. Recent Advances in Discovery of Lead Structures from Microbial Natural Products: Genomics- and Metabolomics-Guided Acceleration.

Authors: Linda Sukmarini
Journal: Molecules Date: 2021-04-27 Impact factor: 4.411

8. Multi-omics Comparative Analysis of Streptomyces Mutants Obtained by Iterative Atmosphere and Room-Temperature Plasma Mutagenesis.

Authors: Tan Liu; Zhiyong Huang; Xi Gui; Wei Xiang; Yubo Jin; Jun Chen; Jing Zhao
Journal: Front Microbiol Date: 2021-01-28 Impact factor: 5.640

9. StreptomeDB 3.0: an updated compendium of streptomycetes natural products.

Authors: Aurélien F A Moumbock; Mingjie Gao; Ammar Qaseem; Jianyu Li; Pascal A Kirchner; Bakoh Ndingkokhar; Boris D Bekono; Conrad V Simoben; Smith B Babiaka; Yvette I Malange; Florian Sauter; Paul Zierep; Fidele Ntie-Kang; Stefan Günther
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

Review 10. Synthetic biology approaches to actinomycete strain improvement.

Authors: Rainer Breitling; Martina Avbelj; Oksana Bilyk; Francesco Del Carratore; Alessandro Filisetti; Erik K R Hanko; Marianna Iorio; Rosario Pérez Redondo; Fernando Reyes; Michelle Rudden; Emmanuele Severi; Lucija Slemc; Kamila Schmidt; Dominic R Whittall; Stefano Donadio; Antonio Rodríguez García; Olga Genilloud; Gregor Kosec; Davide De Lucrezia; Hrvoje Petković; Gavin Thomas; Eriko Takano
Journal: FEMS Microbiol Lett Date: 2021-06-11 Impact factor: 2.742