Literature DB >> 32054853

Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters.

Namil Lee1, Woori Kim1, Soonkyu Hwang1, Yongjae Lee1, Suhyung Cho1, Bernhard Palsson2,3,4, Byung-Kwan Cho5,6,7.   

Abstract

Streptomyces are Gram-positive bacteria of significant industrial importance due to their ability to produce a wide range of antibiotics and bioactive secondary metabolites. Recent advances in genome mining have revealed that Streptomyces genomes possess a large number of unexplored silent secondary metabolite biosynthetic gene clusters (smBGCs). This indicates that Streptomyces genomes continue to be an invaluable source for new drug discovery. Here, we present high-quality genome sequences of 22 Streptomyces species and eight different Streptomyces venezuelae strains assembled by a hybrid strategy exploiting both long-read and short-read genome sequencing methods. The assembled genomes have more than 97.4% gene space completeness and total lengths ranging from 6.7 to 10.1 Mbp. Their annotation identified 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs on average. In silico prediction of smBGCs identified a total of 922 clusters, including many clusters whose products are unknown. We anticipate that the availability of these genomes will accelerate discovery of novel secondary metabolites from Streptomyces and elucidate complex smBGC regulation.

Entities:  

Mesh:

Year:  2020        PMID: 32054853      PMCID: PMC7018776          DOI: 10.1038/s41597-020-0395-9

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

With the rapid emergence of antibiotic microbial resistance (AMR) to all major classes of antibiotics and the decline in number of potential candidates for new antibiotics, there is a pressing need for the discovery of novel antibacterial compounds[1]. Streptomyces, soil dwelling gram-positive bacteria, continue to be promising microorganisms for the production of clinically important secondary metabolites, including not only antibiotics, but also antiviral, antifungal, and antiparasitic agents, and antitumorals and immunosuppressant compounds[2]. Streptomyces are distinguished by their complex life cycle and high G + C content (often over 70%) in their linear genomes[3,4]. Traditionally, drug discovery from Streptomyces has been based on bioactivity screening followed by mass spectrometry and NMR-based molecular identification[5]. However, recent advances in genomics-based approaches revealed that most of the secondary metabolite biosynthetic gene clusters (smBGCs) of streptomycetes are inactive under laboratory conditions, suggesting that the ability of streptomycetes to produce secondary metabolites has been under-estimated[5,6]. Each Streptomyces species has the genetic potential to produce more than 30 secondary metabolites on average, which are diverse and differ between species[7,8]. Considering Streptomyces is the largest genus of actinobacteria with approximately 900 species characterized so far, streptomycetes are a valuable resource for the discovery of novel secondary metabolites[9]. SmBGCs, especially polyketide and non-ribosomal peptide synthetase types, are often composed of extraordinarily long genes (>5 kb) encoding multi-modular enzymes with repetitive domain structures. Therefore, accurate gene annotations based on high quality genome sequences are essential for the precise identification of smBGCs[10]. Gene annotation with the high quality genome of S. clavuligerus revealed that 30% out of a total of 7,163 protein coding genes were incorrectly annotated in the previous draft genome of S. clavuligerus containing ambiguous and inaccurate nucleotides, indicating the importance of high quality genome sequences[11]. In addition, high quality genome sequences are essential for multi-omics analysis, which facilitates the understanding of the complex regulation on smBGCs and rational engineering for increasing secondary metabolites production[11,12]. Among the 1,614 streptomycetes genomes that have been deposited in the NCBI Assembly database to date (as of 9th December 2019), only 189 and 35 assemblies were designated as complete genome level and chromosome level, respectively. More than 86% of assemblies were draft-quality genome sequences, which contain fragmented multiple contigs or ambiguous sequences[4,13-15]. One of the main obstacles to obtaining high quality genomic information of streptomycetes is the low fidelity of sequencing techniques when dealing with high G w C genomes and frequently repetitive sequences such as terminal inverted repeats[13]. In addition, since streptomycetes have linear chromosome, it is difficult to confirm the completeness of the assembled chromosome. In this study, we present the high-quality genome sequences of 30 streptomycetes, increasing the total number of reported complete Streptomyces genome by about 10%. The target streptomycetes were 22 Streptomyces type strains and eight different Streptomyces venezuelae strains, most of which are currently used as industrial strains for producing various bioactive compounds. We applied hybrid assembly strategy with long-read (PacBio) and short-read (Illumina) sequencing techniques to obtain complete genome sequences. PacBio sequencing provides long reads of several kb in length which allows the readthrough of regions with low complexity, enabling the assembly of repetitive regions, which are difficult to assemble by using Illumina sequencing reads, even with the high coverage data[16]. However, Illumina sequencing provides reads with a lower error rate compared to the PacBio sequencing, and assembled contigs based on the Illumina sequencing reads are not simply a subset of the contigs from PacBio sequencing reads[13,17]. Therefore, reconciling PacBio and Illumina sequencing methods enables one to generate more complete genomes by overcoming the shortcomings of each method. During the genome assembly using reads from PacBio (0.46~5.18 Gbp) and Illumina (0.5~3.0 Gbp) sequencing, we constructed 6.7 to 10.1 Mbp of streptomycetes genomes, most of which consist of single chromosomes with 72% G + C contents on average. Inaccurate sequences in the assembled genome were corrected using Illumina sequencing reads. The complete streptomycetes genomes have more than 97.4% gene space completeness and on average 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs were annotated. Finally, based on the complete genome sequences and annotations, we predicted a total of 922 smBGCs. The complete genome sequences and newly determined smBGCs in this study should prove to be a fundamental resource for understanding the genetic basis of streptomycetes and for discovering novel secondary metabolites.

Methods

Genomic DNA (gDNA) extraction

Total 30 streptomycetes were purchased from Korean Collection for Type Cultures (KCTC, Korea). A stock of streptomycetes were inoculated to 50 mL of liquid culture medium with 0.16 g mL−1 of glass beads (3 ± 0.3 mm diameter) in 250 mL baffled flask and grown at 30 °C in a 200 rpm orbital shaker. Each streptomycetes was grown in one of four different culture medium, R5(–) medium (25 mM TES (pH 7.2), 103 g L−1 sucrose, 1% (w/v) glucose, 5 g L−1 yeast extract, 10.12 g L−1 MgCl2∙6H2O, 0.25 g L−1 K2SO4, 0.1 g L−1 casamino acids, 0.08 g L−1 ZnCl2, 0.4 mg L−1 FeCl3, 0.02 mg L−1 CuCl2∙2H2O, 0.02 mg L−1 MnCl2∙4H2O, 0.02 mg L−1 Na2B4O7∙10H2O, and 0.02 mg L−1 (NH4)6Mo7O24∙4H2O), 1 × sporulation medium (3.33 g L−1 glucose, 1 g L−1 yeast extract, 1 g L−1 beef extract, 2 g L−1 tryptose, and 0.006 g L−1 FeSO4∙7H2O), YEME medium (340 g L−1 sucrose, 10 g L−1 glucose, 3 g L−1 yeast extract, 5 g L−1 bacto peptone, and 3 g L−1 oxoid malt extract), and MYM medium (4 g L−1 maltose, 4 g L−1 yeast extract, 10 g L−1 malt extract). For gDNA extraction, 25 mL cultured cells were harvested at the exponential growth phase and washed twice with same volume of 10 mM EDTA, followed by the lysozyme (10 mg mL−1) treatment at 37 °C for 45 min. gDNA was extracted using a Wizard Genomic DNA Purification Kit (Promega, Madison, WI, USA) according to the manufacturer’s instruction. Quality and quantity of extracted gDNA samples were evaluated using 1% agarose gel electrophoresis and Nanodrop (Thermo Fisher Scientific, Waltham, MA, USA), respectively.

Short-read (Illumina) genome sequencing

For construction of short-read genome sequencing library, 2.5 μg of gDNA was sheared to approximately 350 bp by a Covaris instrument (Covaris Inc., Woburn, MA, USA) with the following conditions; Power 175, Duty factor 20%, C. burst 200, Time 23 s, 8 times. The library was constructed using a TruSeq DNA PCR-Free LT kit (Illumina Inc., San Diego, CA, USA) following manufacturer’s instruction. Briefly, the fragmented DNA samples were cleaned and end-repaired, followed by the adaptor ligation and bead-based size selection ranging from 400 to 500 bp. Quantity of final libraries was measured using Qubit® dsDNA HS Assay Kit (Thermo Fisher Scientific) and the library size was determined using Agilent 2200 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Among the constructed sequencing libraries, 29 libraries were sequenced with the HiSeq. 2500 (Illumina Inc.) as 100 bp single-end reads and remaining one library for S. tsukubaensis was sequenced with the Miseq v.2 (Illumina Inc.) with 50 bp single-read recipe. Finally, 0.46 to 5.18 Gbp of raw sequence data were obtained and the read qualities were examined by creating sequencing QC reports function of CLC genomic workbench version 6.5.1 (CLC bio, Denmark) (Online-only Table 1 and Fig. 1a).
Online-only Table 1

Summary of PacBio and Illumina genome sequencing data for 30 streptomycetes.

No.SpeciesStrainPlatformRaw reads (No.)Mean raw reads length (bp)Clean reads (No.)Mean clean reads length (bp)SRA accession number
1Streptomyces clavuligerus ATCC27064Illumina9,853,2051009,852,872100SRR9192366
pacBio150,2926,57260,83415,473SRR9290551
2Streptomyces tsukubaensis NRRL18488Illumina9,181,843509,176,29950SRR9192343
pacBio150,2928,16393,48112,715SRR9290550
3Streptomyces galilaeus ATCC14969Illumina15,603,98810015,603,366100SRR9192357
pacBio150,2925,45895,6918,339SRR9290549
4Streptomyces nitrosporeus ATCC12769Illumina15,858,55110015,857,973100SRR9192358
pacBio150,2925,007101,5727,222SRR9290548
5Streptomyces subrutilus ATCC27467Illumina23,157,47510023,157,337100SRR9192359
pacBio150,2928,75792,56913,591SRR9290547
6Streptomyces viridosporus T7A ATCC39115Illumina24,919,39310024,918,458100SRR9192360
pacBio150,2927,91092,13912,571SRR9290546
7Streptomyces kanamyceticus ATCC12853Illumina14,877,60610014,877,044100SRR9192361
pacBio150,2925,04459,62712,291SRR9290545
8Streptomyces aureofaciens ATCC11989Illumina19,155,83210019,155,718100SRR9192362
pacBio150,2925,89088,4629,681SRR9290544
9Streptomyces prasinus ATCC13879Illumina18,655,41810018,654,718100SRR9192388
pacBio150,2925,56084,1079,288SRR9290553
10Streptomyces fradiae ATCC10745Illumina9,446,5131009,445,832100SRR9192354
pacBio150,2926,84394,56810,538SRR9290561
11Streptomyces alboniger ATCC12461Illumina16,425,35610016,425,241100SRR9192355
pacBio300,5843,638159,2706,351SRR9290560
12Streptomyces coeruleorubidus ATCC13740Illumina17,546,98110017,546,866100SRR9192339
pacBio300,5842,836121,3776,164SRR9290563
13Streptomyces cinereoruber ATCC19740Illumina20,182,75410020,181,890100SRR9192340
pacBio150,2923,31952,2799,266SRR9290562
14Streptomyces nodosus ATCC14899Illumina14,034,92710014,034,843100SRR9192341
pacBio150,2928,38794,71912,692SRR9290557
15Streptomyces vinaceus ATCC27476Illumina6,612,1601006,610,219100SRR9192335
pacBio150,2924,98475,9289,432SRR9290559
16Streptomyces platensis ATCC23948Illumina14,819,35710014,819,243100SRR9192336
pacBio150,2924,86972,5739,619SRR9290558
17Streptomyces spectabilis ATCC27465Illumina18,717,01810018,716,280100SRR9192337
pacBio300,58410,049202,96614,047SRR9290555
18Streptomyces chartreusis ATCC14922Illumina16,988,96610016,988,850100SRR9192338
pacBio150,2928,81388,86914,106SRR9290554
19Stretpomyces rimosus ATCC10970Illumina25,109,75810025,108,764100SRR9192333
pacBio300,5847,066164,89112,342SRR9290564
20Streptomyces albofaciens ATCC23873Illumina19,360,70310019,360,564100SRR9192364
pacBio150,29211,091110,52114,419SRR9290552
21Streptomyces filamentosus ATCC23958Illumina21,181,46010021,180,615100SRR9192342
pacBio150,2924,80570,6778,178SRR9290556
22Streptomyces venezuelae ATCC 10712ATCC 10712Illumina18,667,12210018,664,663100SRR9192374
pacBio150,2929,10187,86015,328SRR9290565
23Streptomyces venezuelae ATCC 21113ATCC 21113Illumina21,561,53310021,559,491100SRR9192334
pacBio150,29210,148104,69514,180SRR9290566
24Streptomyces venezuelae ATCC 10595ATCC 10595Illumina16,798,92310016,797,236100SRR9192369
pacBio150,2927,82992,68112,197SRR9290567
25Streptomyces venezuelae ATCC 15068ATCC 15068Illumina15,310,62010015,308,905100SRR9192368
pacBio150,29210,754107,41214,515SRR9290568
26Streptomyces venezuelae ATCC 14583ATCC 14583Illumina19,423,66810019,421,332100SRR9192371
pacBio150,29210,888106,81314,647SRR9290569
27Streptomyces venezuelae ATCC 14584ATCC 14584Illumina15,447,78310015,446,240100SRR9192370
pacBio150,2928,844100,52312,580SRR9290540
28Streptomyces venezuelae ATCC 14585ATCC 14585Illumina51,795,64410051,791,331100SRR9192373
pacBio150,2928,27596,24312,388SRR9290541
29Streptomyces venezuelae ATCC 21782ATCC 21782Illumina19,569,33710019,567,069100SRR9192372
pacBio150,2926,53970,65513,089SRR9290542
30Streptomyces venezuelae ATCC 21018ATCC 21018Illumina16,469,40610016,467,790100SRR9192375
pacBio150,29210,754107,41214,515SRR9290543
Fig. 1

Quality of the genome sequencing data. (a) Distribution of Illumina reads quality based on Phred score. (b) Read quality distribution of PacBio reads. Black line indicates total number of bases in the reads which have greater read quality than the corresponding read quality value on x-axis.

Quality of the genome sequencing data. (a) Distribution of Illumina reads quality based on Phred score. (b) Read quality distribution of PacBio reads. Black line indicates total number of bases in the reads which have greater read quality than the corresponding read quality value on x-axis.

Long-read (PacBio) genome sequencing

A total of 5 μg gDNA was used as input for PacBio genome sequencing library preparation. The sequencing library was constructed with the PacBio SMRTbellTM Template Prep Kit (Pacific Biosciences, Menlo Park, CA, USA) following manufacturer’s instructions. Fragments smaller than 20 kbp were removed using the Blue Pippin Size selection system (Sage Science, Beverly, MA, USA) and the constructed libraries were validated using Agilent 2100 Bioanalyzer (Agilent Technologies). Final SMRTbell libraries were sequenced using one or two SMRT cells with P6-C4-chemistry (DNA Sequencing Reagent 4.0) on the PacBio RS II sequencing platform (Pacific Biosciences). Approximately, 0.5 to 3.0 Gbp of raw sequence data were generated (Online-only Table 1).

Genome assembly

Among the raw PacBio sequencing reads, only the reads with a read quality value greater than 0.75 and a length longer than 50 bp were filtered (Fig. 1b). Post filtered reads were assembled by the hierarchical genome assembly process workflow (HGAP, Version 2.3), including consensus polishing with Quiver[18]. For each assembled contig, error correction was performed based on their estimated genome size and average coverage. Raw reads from the Illumina sequencing were quality trimmed using CLC genomic workbench version 6.5.1 (ambiguous limit 2 and quality limit 0.05) and assembled using de novo assembly function of CLC genomic workbench version 6.5.1 with default parameters. To expand the assembled contigs, all of assembled PacBio and Illumina contigs were aligned using MAUVE 2.4.0[19] and linked using GAP5 program (Staden package)[20].

Genome correction

Quality trimmed Illumina sequencing reads were mapped to the assembled genome using CLC genomic workbench version 6.5.1 (mismatch cost 2, insertion cost 3, deletion cost 3, length fraction 0.9, and similarity fraction 0.9). Conflicts showing more than 80% frequency for Illumina reads were corrected as Illumina sequence (Table 1). In addition, percentage of mapped Illumina reads on to the assembled genome represents degree of completeness (Table 1 and Fig. 2b). Completeness of gene space was estimated using the BUSCO v3 (Table 2)[21].
Table 1

The statistics of genome assembly and correction.

No.SpeciesFinal scaffolds (No.)Scaffold length before correction (bp)Mapped Illumina reads (%)Conflict positions (No.)Added bases (No.)Deleted bases (No.)Scaffold length after correction (bp)G + C contets (%)Assembly accession number
1Streptomyces clavuligerus26,748,589 and 1,795,49671.16 and 14.037436,748,591 and 1,795,49572.5GCA_005519465.1
2Streptomyces tsukubaensis17,963,72795.13151507,963,74271.9GCA_003932715.1
3Streptomyces galilaeus17,756,17690.565134167,756,19471.4GCA_008704575.1
4Streptomyces nitrosporeus17,581,54393.505135167,581,56272.2GCA_008704555.1
5Streptomyces subrutilus17,604,70596.4128626907,604,97473.4GCA_008704535.1
6Streptomyces viridosporus T7A17,280,44790.44908907,280,53672.6GCA_008704515.1
7Streptomyces kanamyceticus110,133,52599.09376375310,133,89771.0GCA_008704495.1
8Streptomyces aureofaciens17,757,87384.8616957,757,87772.6GCA_008704475.1
9Streptomyces prasinus17,646,57689.701,0251,02157,647,59272.0GCA_008704445.1
10Streptomyces fradiae16,725,57497.635506,725,57974.7GCA_008704425.1
11Streptomyces alboniger17,962,59499.1219319317,962,78671.2GCA_008704395.1
12Streptomyces coeruleorubidus19,334,39999.671,2971,29909,335,69871.1GCA_008705135.1
13Streptomyces cinereoruber17,516,47499.7417817807,516,65272.9GCA_009299385.1
14Streptomyces nodosus17,772,56499.51262527,772,58770.9GCA_008704995.1
15Streptomyces vinaceus17,673,32992.4618018007,673,50972.3GCA_008704935.1
16Streptomyces platensis18,500,67399.75354352138,501,01271.1GCA_008704855.1
17Streptomyces spectabilis19,806,22295.3093493809,807,16072.4GCA_008704795.1
18Streptomyces chartreusis19,911,63798.4246146109,912,09871.0GCA_008704715.1
19Stretpomyces rimosus19,361,13296.22222209,361,15472.0GCA_008704655.1
20Streptomyces albofaciens24,757,761 and 4,494,33653.36 and 45.5350450134,757,978 and 4,494,61772.3GCA_008634025.1
21Streptomyces filamentosus25,742,252 and 2,129,92875.22 and 24.283,2183,22815,744,022 and 2,131,38573.6GCA_008634015.1
22Streptomyces venezuelae ATCC 1071218,223,43999.849681158,223,50572.5GCA_008639165.1
23Streptomyces venezuelae ATCC 2111317,893,62299.8517318107,893,80372.5GCA_008639045.1
24Streptomyces venezuelae ATCC 1059517,871,44995.50353437,871,48072.5GCA_008705255.1
25Streptomyces venezuelae ATCC 1506818,557,61599.7158758708,558,20271.9GCA_008642375.1
26Streptomyces venezuelae ATCC 1458318,018,46187.17292748,018,48471.3GCA_008642355.1
27Streptomyces venezuelae ATCC 1458418,941,82399.0025525508,942,07871.2GCA_008642315.1
28Streptomyces venezuelae ATCC 1458518,048,13982.346441268,048,15471.3GCA_008642335.1
29Streptomyces venezuelae ATCC 2178217,525,23590.50878707,525,32271.9GCA_008642295.1
30Streptomyces venezuelae ATCC 2101817,746,21491.61595747,746,26772.1GCA_008642275.1
Fig. 2

Genome assembly of 30 streptomycetes. (a) Strategy for genome assembly and corrections. (b) Profile of Illumina reads mapped on assembled genomes. Data were visualized using SignalMap (Roche NimbleGen, Inc.). Red line indicates the average Illumina read coverage of all genomic positions.

Table 2

Gene space completeness of completed genomes.

No.SpeciesComplete and single-copyComplete and duplicatedFragmentedMissingTotalGene space completeness (%)
1Streptomyces clavuligerus34300935297.4
2Streptomyces tsukubaensis35000235299.4
3Streptomyces galilaeus35100135299.7
4Streptomyces nitrosporeus352000352100.0
5Streptomyces subrutilus34900335299.1
6Streptomyces viridosporus T7A35100135299.7
7Streptomyces kanamyceticus352000352100.0
8Streptomyces aureofaciens35000235299.4
9Streptomyces prasinus35000235299.4
10Streptomyces fradiae35100135299.7
11Streptomyces alboniger35100135299.7
12Streptomyces coeruleorubidus35100135299.7
13Streptomyces cinereoruber35100135299.7
14Streptomyces nodosus35001135299.4
15Streptomyces vinaceus34901235299.1
16Streptomyces platensis35100135299.7
17Streptomyces spectabilis35001135299.4
18Streptomyces chartreusis35100135299.7
19Stretpomyces rimosus35100135299.7
20Streptomyces albofaciens34640235299.4
21Streptomyces filamentosus35100135299.7
22Streptomyces venezuelae ATCC 10712352000352100.0
23Streptomyces venezuelae ATCC 21113352000352100.0
24Streptomyces venezuelae ATCC 10595352000352100.0
25Streptomyces venezuelae ATCC 1506835100135299.7
26Streptomyces venezuelae ATCC 1458335100135299.7
27Streptomyces venezuelae ATCC 1458435100135299.7
28Streptomyces venezuelae ATCC 1458535100135299.7
29Streptomyces venezuelae ATCC 2178234900335299.1
30Streptomyces venezuelae ATCC 2101835000235299.4
The statistics of genome assembly and correction. Genome assembly of 30 streptomycetes. (a) Strategy for genome assembly and corrections. (b) Profile of Illumina reads mapped on assembled genomes. Data were visualized using SignalMap (Roche NimbleGen, Inc.). Red line indicates the average Illumina read coverage of all genomic positions. Gene space completeness of completed genomes.

Genome annotation and secondary metabolite biosynthetic gene cluster prediction

The complete genome sequences of streptomycetes were submitted to the NCBI GenBank database and annotated by the latest updated version of NCBI Prokaryotic Genome Annotation Pipeline (PGAP)[22]. Using the GenBank formatted files of each genomes as input, secondary metabolite biosynthetic gene clusters were predicted by antiSMASH 4.0[23].

Data Records

Raw reads from short-read (Illumina) and long-read (PacBio) sequencing were deposited in the NCBI Sequence Read Archive (SRA) (Online-only Table 1)[24,25]. 30 complete genome sequences were deposited in GenBank via the NCBI’s submission portal (Table 3)[26-55]. Detailed information on the predicted 922 smBGCs in 30 streptomycetes genomes has been deposited in FigShare[56].
Table 3

Summary of genome annotation.

No.SpeciesCDS (No.)16s rRNA (No.)tRNA (No.)Genome accession numberBioProject accession number
1Streptomyces clavuligerus6,8801866CP027858PRJNA414136
2Streptomyces tsukubaensis6,3761866CP020700PRJNA382016
3Streptomyces galilaeus6,7251876CP023703PRJNA412292
4Streptomyces nitrosporeus6,3641874CP023702PRJNA412292
5Streptomyces subrutilus6,4312168CP023701PRJNA412292
6Streptomyces viridosporus T7A6,2111870CP023700PRJNA412292
7Streptomyces kanamyceticus8,3841866CP023699PRJNA412292
8Streptomyces aureofaciens6,4533371CP023698PRJNA412292
9Streptomyces prasinus6,2631868CP023697PRJNA412292
10Streptomyces fradiae5,4651865CP023696PRJNA412292
11Streptomyces alboniger6,6131867CP023695PRJNA412292
12Streptomyces coeruleorubidus8,0581867CP023694PRJNA412292
13Streptomyces cinereoruber6,3921869CP023693PRJNA412292
14Streptomyces nodosus6,4911868CP023747PRJNA412292
15Streptomyces vinaceus6,6032168CP023692PRJNA412292
16Streptomyces platensis7,0322167CP023691PRJNA412292
17Streptomyces spectabilis8,2121865CP023690PRJNA412292
18Streptomyces chartreusis8,3961871CP023689PRJNA412292
19Stretpomyces rimosus7,7562168CP023688PRJNA412292
20Streptomyces albofaciens7,5202167PDCM00000000PRJNA412292
21Streptomyces filamentosus6,8322470PDCL00000000PRJNA412292
22Streptomyces venezuelae ATCC 107127,3772167CP029197PRJNA454547
23Streptomyces venezuelae ATCC 211136,9872167CP029196PRJNA454547
24Streptomyces venezuelae ATCC 105956,9422167CP029195PRJNA454547
25Streptomyces venezuelae ATCC 150687,7002169CP029194PRJNA454547
26Streptomyces venezuelae ATCC 145837,1541866CP029193PRJNA454547
27Streptomyces venezuelae ATCC 145847,8321865CP029192PRJNA454547
28Streptomyces venezuelae ATCC 145857,0961866CP029191PRJNA454547
29Streptomyces venezuelae ATCC 217826,6551869CP029190PRJNA454547
30Streptomyces venezuelae ATCC 210186,7692171CP029189PRJNA454547
Summary of genome annotation.

Technical Validation

Streptomyces have drawn considerable attention because of their ability to produce various clinically important secondary metabolites. Total 30 streptomycetes genomes were sequenced by using both PacBio and Illumina sequencing methods to elucidate their biosynthetic potential. After cleaning the reads, on average 98,380 PacBio reads with 11,725 bp length and 18,223,235 Illumina reads with 100 bp length (50 bp for S. tsukubaensis) were generated (Fig. 1a,b and Online-only Table 1). Through the assembly of reads from two sequencing platforms using HGAP, CLC workbench, MAUVE, and GAP5 programs, single linear scaffolds ranging from 6.7 to 10.1 Mbp in length with 72% G + C contents were obtained for 27 streptomycetes, whereas two scaffolds were finally constructed for three remaining streptomycetes, S. clavuligerus (6.7 and 1.8 Mbp), S. albofaciens (4.8 and 4.5 Mbp), and S. filamentosus (5.7 and 2.1 Mbp) (Table 1). S. clavuligerus has been reported to have a large linear plasmid with a length of 1.8 Mbp, so the genome was correctly assembled into a single chromosome, while the S. albofaciens and S. filamentosus genomes appear to be assembled into two divided scaffolds[11,57]. To increase the accuracy of the assembled genome sequences, Illumina sequences showing more than 80% coverage at the conflict sites were taken as the corrected ones (Table 1). Approximately, 96.32% of Illumina sequencing reads were successfully mapped to the corresponding genomes (Table 1 and Fig. 2b). The completeness of the genomes were assessed using the BUSCO approach with a total of 352 orthologue groups from the Actinobacteria Dataset[21]. Results showed that 29 genomes have more than 99.1% gene space completeness and the S. clavuligerus genome has 97.4% gene space completeness (Table 2). Following NCBI PGAP, 30 genomes were annotated with 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs on average (Table 3). Finally, based on the annotation, a total of 922 smBGCs were predicted in 30 streptomycetes genomes (Fig. 3). Detailed information, such as genomic positions, types, and putative products of each smBGC are publicly available in Figshare[56].
Fig. 3

Secondary metabolite biosynthetic gene clusters in 30 complete streptomycetes genomes.

Secondary metabolite biosynthetic gene clusters in 30 complete streptomycetes genomes.
Measurement(s)DNA • genome • sequence_assembly • sequence feature annotation
Technology Type(s)DNA sequencing • sequence assembly process • sequence annotation
Factor Type(s)strain
Sample Characteristic - OrganismStreptomyces
  23 in total

1.  Activation and discovery of tsukubarubicin from Streptomyces tsukubaensis through overexpressing SARPs.

Authors:  Qing-Bin Wu; Xin-Ai Chen; Zhong-Yuan Lv; Xiao-Ying Zhang; Yu Liu; Yong-Quan Li
Journal:  Appl Microbiol Biotechnol       Date:  2021-05-22       Impact factor: 4.813

Review 2.  System-level understanding of gene expression and regulation for engineering secondary metabolite production in Streptomyces.

Authors:  Yongjae Lee; Namil Lee; Soonkyu Hwang; Kangsan Kim; Woori Kim; Jihun Kim; Suhyung Cho; Bernhard O Palsson; Byung-Kwan Cho
Journal:  J Ind Microbiol Biotechnol       Date:  2020-08-10       Impact factor: 3.346

3.  ActinoBase: tools and protocols for researchers working on Streptomyces and other filamentous actinobacteria.

Authors:  Morgan Anne Feeney; Jake Terry Newitt; Emily Addington; Lis Algora-Gallardo; Craig Allan; Lucas Balis; Anna S Birke; Laia Castaño-Espriu; Louise K Charkoudian; Rebecca Devine; Damien Gayrard; Jacob Hamilton; Oliver Hennrich; Paul A Hoskisson; Molly Keith-Baker; Joshua G Klein; Worarat Kruasuwan; David R Mark; Yvonne Mast; Rebecca E McHugh; Thomas C McLean; Elmira Mohit; John T Munnoch; Jordan Murray; Katie Noble; Hiroshi Otani; Jonathan Parra; Camila F Pereira; Louisa Perry; Linamaria Pintor-Escobar; Leighton Pritchard; Samuel M M Prudence; Alicia H Russell; Jana K Schniete; Ryan F Seipke; Nelly Sélem-Mojica; Agustina Undabarrena; Kristiina Vind; Gilles P van Wezel; Barrie Wilkinson; Sarah F Worsley; Katherine R Duncan; Lorena T Fernández-Martínez; Matthew I Hutchings
Journal:  Microb Genom       Date:  2022-07

4.  Comparative genomic analysis of Streptomyces rapamycinicus NRRL 5491 and its mutant overproducing rapamycin.

Authors:  Hee-Geun Jo; Joshua Julio Adidjaja; Do-Kyung Kim; Bu-Soo Park; Namil Lee; Byung-Kwan Cho; Hyun Uk Kim; Min-Kyu Oh
Journal:  Sci Rep       Date:  2022-06-18       Impact factor: 4.996

5.  Genome sequence and annotation of Streptomyces tendae UTMC 3329, acid and alkaline tolerant actinobacterium.

Authors:  Lida Eftekharivash; Javad Hamedi
Journal:  Iran J Microbiol       Date:  2020-08

Review 6.  Discovery of novel secondary metabolites encoded in actinomycete genomes through coculture.

Authors:  Ji Hun Kim; Namil Lee; Soonkyu Hwang; Woori Kim; Yongjae Lee; Suhyung Cho; Bernhard O Palsson; Byung-Kwan Cho
Journal:  J Ind Microbiol Biotechnol       Date:  2021-06-04       Impact factor: 4.258

Review 7.  Clavulanic Acid Production by Streptomyces clavuligerus: Insights from Systems Biology, Strain Engineering, and Downstream Processing.

Authors:  Víctor A López-Agudelo; David Gómez-Ríos; Howard Ramirez-Malule
Journal:  Antibiotics (Basel)       Date:  2021-01-18

8.  Whole genome sequencing of Streptomyces actuosus ISP-5337, Streptomyces sioyaensis B-5408, and Actinospica acidiphila B-2296 reveals secondary metabolomes with antibiotic potential.

Authors:  Haley M Majer; Rachel L Ehrlich; Azad Ahmed; Joshua P Earl; Garth D Ehrlich; Joris Beld
Journal:  Biotechnol Rep (Amst)       Date:  2021-02-09

9.  Multi-omics Comparative Analysis of Streptomyces Mutants Obtained by Iterative Atmosphere and Room-Temperature Plasma Mutagenesis.

Authors:  Tan Liu; Zhiyong Huang; Xi Gui; Wei Xiang; Yubo Jin; Jun Chen; Jing Zhao
Journal:  Front Microbiol       Date:  2021-01-28       Impact factor: 5.640

10.  Comparative Genomics Determines Strain-Dependent Secondary Metabolite Production in Streptomyces venezuelae Strains.

Authors:  Woori Kim; Namil Lee; Soonkyu Hwang; Yongjae Lee; Jihun Kim; Suhyung Cho; Bernhard Palsson; Byung-Kwan Cho
Journal:  Biomolecules       Date:  2020-06-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.