| Literature DB >> 32054853 |
Namil Lee1, Woori Kim1, Soonkyu Hwang1, Yongjae Lee1, Suhyung Cho1, Bernhard Palsson2,3,4, Byung-Kwan Cho5,6,7.
Abstract
Streptomyces are Gram-positive bacteria of significant industrial importance due to their ability to produce a wide range of antibiotics and bioactive secondary metabolites. Recent advances in genome mining have revealed that Streptomyces genomes possess a large number of unexplored silent secondary metabolite biosynthetic gene clusters (smBGCs). This indicates that Streptomyces genomes continue to be an invaluable source for new drug discovery. Here, we present high-quality genome sequences of 22 Streptomyces species and eight different Streptomyces venezuelae strains assembled by a hybrid strategy exploiting both long-read and short-read genome sequencing methods. The assembled genomes have more than 97.4% gene space completeness and total lengths ranging from 6.7 to 10.1 Mbp. Their annotation identified 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs on average. In silico prediction of smBGCs identified a total of 922 clusters, including many clusters whose products are unknown. We anticipate that the availability of these genomes will accelerate discovery of novel secondary metabolites from Streptomyces and elucidate complex smBGC regulation.Entities:
Mesh:
Year: 2020 PMID: 32054853 PMCID: PMC7018776 DOI: 10.1038/s41597-020-0395-9
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary of PacBio and Illumina genome sequencing data for 30 streptomycetes.
| No. | Species | Strain | Platform | Raw reads (No.) | Mean raw reads length (bp) | Clean reads (No.) | Mean clean reads length (bp) | SRA accession number |
|---|---|---|---|---|---|---|---|---|
| 1 | ATCC27064 | Illumina | 9,853,205 | 100 | 9,852,872 | 100 | SRR9192366 | |
| pacBio | 150,292 | 6,572 | 60,834 | 15,473 | SRR9290551 | |||
| 2 | NRRL18488 | Illumina | 9,181,843 | 50 | 9,176,299 | 50 | SRR9192343 | |
| pacBio | 150,292 | 8,163 | 93,481 | 12,715 | SRR9290550 | |||
| 3 | ATCC14969 | Illumina | 15,603,988 | 100 | 15,603,366 | 100 | SRR9192357 | |
| pacBio | 150,292 | 5,458 | 95,691 | 8,339 | SRR9290549 | |||
| 4 | ATCC12769 | Illumina | 15,858,551 | 100 | 15,857,973 | 100 | SRR9192358 | |
| pacBio | 150,292 | 5,007 | 101,572 | 7,222 | SRR9290548 | |||
| 5 | ATCC27467 | Illumina | 23,157,475 | 100 | 23,157,337 | 100 | SRR9192359 | |
| pacBio | 150,292 | 8,757 | 92,569 | 13,591 | SRR9290547 | |||
| 6 | ATCC39115 | Illumina | 24,919,393 | 100 | 24,918,458 | 100 | SRR9192360 | |
| pacBio | 150,292 | 7,910 | 92,139 | 12,571 | SRR9290546 | |||
| 7 | ATCC12853 | Illumina | 14,877,606 | 100 | 14,877,044 | 100 | SRR9192361 | |
| pacBio | 150,292 | 5,044 | 59,627 | 12,291 | SRR9290545 | |||
| 8 | ATCC11989 | Illumina | 19,155,832 | 100 | 19,155,718 | 100 | SRR9192362 | |
| pacBio | 150,292 | 5,890 | 88,462 | 9,681 | SRR9290544 | |||
| 9 | ATCC13879 | Illumina | 18,655,418 | 100 | 18,654,718 | 100 | SRR9192388 | |
| pacBio | 150,292 | 5,560 | 84,107 | 9,288 | SRR9290553 | |||
| 10 | ATCC10745 | Illumina | 9,446,513 | 100 | 9,445,832 | 100 | SRR9192354 | |
| pacBio | 150,292 | 6,843 | 94,568 | 10,538 | SRR9290561 | |||
| 11 | ATCC12461 | Illumina | 16,425,356 | 100 | 16,425,241 | 100 | SRR9192355 | |
| pacBio | 300,584 | 3,638 | 159,270 | 6,351 | SRR9290560 | |||
| 12 | ATCC13740 | Illumina | 17,546,981 | 100 | 17,546,866 | 100 | SRR9192339 | |
| pacBio | 300,584 | 2,836 | 121,377 | 6,164 | SRR9290563 | |||
| 13 | ATCC19740 | Illumina | 20,182,754 | 100 | 20,181,890 | 100 | SRR9192340 | |
| pacBio | 150,292 | 3,319 | 52,279 | 9,266 | SRR9290562 | |||
| 14 | ATCC14899 | Illumina | 14,034,927 | 100 | 14,034,843 | 100 | SRR9192341 | |
| pacBio | 150,292 | 8,387 | 94,719 | 12,692 | SRR9290557 | |||
| 15 | ATCC27476 | Illumina | 6,612,160 | 100 | 6,610,219 | 100 | SRR9192335 | |
| pacBio | 150,292 | 4,984 | 75,928 | 9,432 | SRR9290559 | |||
| 16 | ATCC23948 | Illumina | 14,819,357 | 100 | 14,819,243 | 100 | SRR9192336 | |
| pacBio | 150,292 | 4,869 | 72,573 | 9,619 | SRR9290558 | |||
| 17 | ATCC27465 | Illumina | 18,717,018 | 100 | 18,716,280 | 100 | SRR9192337 | |
| pacBio | 300,584 | 10,049 | 202,966 | 14,047 | SRR9290555 | |||
| 18 | ATCC14922 | Illumina | 16,988,966 | 100 | 16,988,850 | 100 | SRR9192338 | |
| pacBio | 150,292 | 8,813 | 88,869 | 14,106 | SRR9290554 | |||
| 19 | ATCC10970 | Illumina | 25,109,758 | 100 | 25,108,764 | 100 | SRR9192333 | |
| pacBio | 300,584 | 7,066 | 164,891 | 12,342 | SRR9290564 | |||
| 20 | ATCC23873 | Illumina | 19,360,703 | 100 | 19,360,564 | 100 | SRR9192364 | |
| pacBio | 150,292 | 11,091 | 110,521 | 14,419 | SRR9290552 | |||
| 21 | ATCC23958 | Illumina | 21,181,460 | 100 | 21,180,615 | 100 | SRR9192342 | |
| pacBio | 150,292 | 4,805 | 70,677 | 8,178 | SRR9290556 | |||
| 22 | ATCC 10712 | Illumina | 18,667,122 | 100 | 18,664,663 | 100 | SRR9192374 | |
| pacBio | 150,292 | 9,101 | 87,860 | 15,328 | SRR9290565 | |||
| 23 | ATCC 21113 | Illumina | 21,561,533 | 100 | 21,559,491 | 100 | SRR9192334 | |
| pacBio | 150,292 | 10,148 | 104,695 | 14,180 | SRR9290566 | |||
| 24 | ATCC 10595 | Illumina | 16,798,923 | 100 | 16,797,236 | 100 | SRR9192369 | |
| pacBio | 150,292 | 7,829 | 92,681 | 12,197 | SRR9290567 | |||
| 25 | ATCC 15068 | Illumina | 15,310,620 | 100 | 15,308,905 | 100 | SRR9192368 | |
| pacBio | 150,292 | 10,754 | 107,412 | 14,515 | SRR9290568 | |||
| 26 | ATCC 14583 | Illumina | 19,423,668 | 100 | 19,421,332 | 100 | SRR9192371 | |
| pacBio | 150,292 | 10,888 | 106,813 | 14,647 | SRR9290569 | |||
| 27 | ATCC 14584 | Illumina | 15,447,783 | 100 | 15,446,240 | 100 | SRR9192370 | |
| pacBio | 150,292 | 8,844 | 100,523 | 12,580 | SRR9290540 | |||
| 28 | ATCC 14585 | Illumina | 51,795,644 | 100 | 51,791,331 | 100 | SRR9192373 | |
| pacBio | 150,292 | 8,275 | 96,243 | 12,388 | SRR9290541 | |||
| 29 | ATCC 21782 | Illumina | 19,569,337 | 100 | 19,567,069 | 100 | SRR9192372 | |
| pacBio | 150,292 | 6,539 | 70,655 | 13,089 | SRR9290542 | |||
| 30 | ATCC 21018 | Illumina | 16,469,406 | 100 | 16,467,790 | 100 | SRR9192375 | |
| pacBio | 150,292 | 10,754 | 107,412 | 14,515 | SRR9290543 |
Fig. 1Quality of the genome sequencing data. (a) Distribution of Illumina reads quality based on Phred score. (b) Read quality distribution of PacBio reads. Black line indicates total number of bases in the reads which have greater read quality than the corresponding read quality value on x-axis.
The statistics of genome assembly and correction.
| No. | Species | Final scaffolds (No.) | Scaffold length before correction (bp) | Mapped Illumina reads (%) | Conflict positions (No.) | Added bases (No.) | Deleted bases (No.) | Scaffold length after correction (bp) | G + C contets (%) | Assembly accession number |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 6,748,589 and 1,795,496 | 71.16 and 14.03 | 7 | 4 | 3 | 6,748,591 and 1,795,495 | 72.5 | GCA_005519465.1 | |
| 2 | 1 | 7,963,727 | 95.13 | 15 | 15 | 0 | 7,963,742 | 71.9 | GCA_003932715.1 | |
| 3 | 1 | 7,756,176 | 90.56 | 51 | 34 | 16 | 7,756,194 | 71.4 | GCA_008704575.1 | |
| 4 | 1 | 7,581,543 | 93.50 | 51 | 35 | 16 | 7,581,562 | 72.2 | GCA_008704555.1 | |
| 5 | 1 | 7,604,705 | 96.41 | 286 | 269 | 0 | 7,604,974 | 73.4 | GCA_008704535.1 | |
| 6 | 1 | 7,280,447 | 90.44 | 90 | 89 | 0 | 7,280,536 | 72.6 | GCA_008704515.1 | |
| 7 | 1 | 10,133,525 | 99.09 | 376 | 375 | 3 | 10,133,897 | 71.0 | GCA_008704495.1 | |
| 8 | 1 | 7,757,873 | 84.86 | 16 | 9 | 5 | 7,757,877 | 72.6 | GCA_008704475.1 | |
| 9 | 1 | 7,646,576 | 89.70 | 1,025 | 1,021 | 5 | 7,647,592 | 72.0 | GCA_008704445.1 | |
| 10 | 1 | 6,725,574 | 97.63 | 5 | 5 | 0 | 6,725,579 | 74.7 | GCA_008704425.1 | |
| 11 | 1 | 7,962,594 | 99.12 | 193 | 193 | 1 | 7,962,786 | 71.2 | GCA_008704395.1 | |
| 12 | 1 | 9,334,399 | 99.67 | 1,297 | 1,299 | 0 | 9,335,698 | 71.1 | GCA_008705135.1 | |
| 13 | 1 | 7,516,474 | 99.74 | 178 | 178 | 0 | 7,516,652 | 72.9 | GCA_009299385.1 | |
| 14 | 1 | 7,772,564 | 99.51 | 26 | 25 | 2 | 7,772,587 | 70.9 | GCA_008704995.1 | |
| 15 | 1 | 7,673,329 | 92.46 | 180 | 180 | 0 | 7,673,509 | 72.3 | GCA_008704935.1 | |
| 16 | 1 | 8,500,673 | 99.75 | 354 | 352 | 13 | 8,501,012 | 71.1 | GCA_008704855.1 | |
| 17 | 1 | 9,806,222 | 95.30 | 934 | 938 | 0 | 9,807,160 | 72.4 | GCA_008704795.1 | |
| 18 | 1 | 9,911,637 | 98.42 | 461 | 461 | 0 | 9,912,098 | 71.0 | GCA_008704715.1 | |
| 19 | 1 | 9,361,132 | 96.22 | 22 | 22 | 0 | 9,361,154 | 72.0 | GCA_008704655.1 | |
| 20 | 2 | 4,757,761 and 4,494,336 | 53.36 and 45.53 | 504 | 501 | 3 | 4,757,978 and 4,494,617 | 72.3 | GCA_008634025.1 | |
| 21 | 2 | 5,742,252 and 2,129,928 | 75.22 and 24.28 | 3,218 | 3,228 | 1 | 5,744,022 and 2,131,385 | 73.6 | GCA_008634015.1 | |
| 22 | 1 | 8,223,439 | 99.84 | 96 | 81 | 15 | 8,223,505 | 72.5 | GCA_008639165.1 | |
| 23 | 1 | 7,893,622 | 99.85 | 173 | 181 | 0 | 7,893,803 | 72.5 | GCA_008639045.1 | |
| 24 | 1 | 7,871,449 | 95.50 | 35 | 34 | 3 | 7,871,480 | 72.5 | GCA_008705255.1 | |
| 25 | 1 | 8,557,615 | 99.71 | 587 | 587 | 0 | 8,558,202 | 71.9 | GCA_008642375.1 | |
| 26 | 1 | 8,018,461 | 87.17 | 29 | 27 | 4 | 8,018,484 | 71.3 | GCA_008642355.1 | |
| 27 | 1 | 8,941,823 | 99.00 | 255 | 255 | 0 | 8,942,078 | 71.2 | GCA_008642315.1 | |
| 28 | 1 | 8,048,139 | 82.34 | 64 | 41 | 26 | 8,048,154 | 71.3 | GCA_008642335.1 | |
| 29 | 1 | 7,525,235 | 90.50 | 87 | 87 | 0 | 7,525,322 | 71.9 | GCA_008642295.1 | |
| 30 | 1 | 7,746,214 | 91.61 | 59 | 57 | 4 | 7,746,267 | 72.1 | GCA_008642275.1 |
Fig. 2Genome assembly of 30 streptomycetes. (a) Strategy for genome assembly and corrections. (b) Profile of Illumina reads mapped on assembled genomes. Data were visualized using SignalMap (Roche NimbleGen, Inc.). Red line indicates the average Illumina read coverage of all genomic positions.
Gene space completeness of completed genomes.
| No. | Species | Complete and single-copy | Complete and duplicated | Fragmented | Missing | Total | Gene space completeness (%) |
|---|---|---|---|---|---|---|---|
| 1 | 343 | 0 | 0 | 9 | 352 | 97.4 | |
| 2 | 350 | 0 | 0 | 2 | 352 | 99.4 | |
| 3 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 4 | 352 | 0 | 0 | 0 | 352 | 100.0 | |
| 5 | 349 | 0 | 0 | 3 | 352 | 99.1 | |
| 6 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 7 | 352 | 0 | 0 | 0 | 352 | 100.0 | |
| 8 | 350 | 0 | 0 | 2 | 352 | 99.4 | |
| 9 | 350 | 0 | 0 | 2 | 352 | 99.4 | |
| 10 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 11 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 12 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 13 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 14 | 350 | 0 | 1 | 1 | 352 | 99.4 | |
| 15 | 349 | 0 | 1 | 2 | 352 | 99.1 | |
| 16 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 17 | 350 | 0 | 1 | 1 | 352 | 99.4 | |
| 18 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 19 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 20 | 346 | 4 | 0 | 2 | 352 | 99.4 | |
| 21 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 22 | 352 | 0 | 0 | 0 | 352 | 100.0 | |
| 23 | 352 | 0 | 0 | 0 | 352 | 100.0 | |
| 24 | 352 | 0 | 0 | 0 | 352 | 100.0 | |
| 25 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 26 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 27 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 28 | 351 | 0 | 0 | 1 | 352 | 99.7 | |
| 29 | 349 | 0 | 0 | 3 | 352 | 99.1 | |
| 30 | 350 | 0 | 0 | 2 | 352 | 99.4 |
Summary of genome annotation.
| No. | Species | CDS (No.) | 16s rRNA (No.) | tRNA (No.) | Genome accession number | BioProject accession number |
|---|---|---|---|---|---|---|
| 1 | 6,880 | 18 | 66 | CP027858 | PRJNA414136 | |
| 2 | 6,376 | 18 | 66 | CP020700 | PRJNA382016 | |
| 3 | 6,725 | 18 | 76 | CP023703 | PRJNA412292 | |
| 4 | 6,364 | 18 | 74 | CP023702 | PRJNA412292 | |
| 5 | 6,431 | 21 | 68 | CP023701 | PRJNA412292 | |
| 6 | 6,211 | 18 | 70 | CP023700 | PRJNA412292 | |
| 7 | 8,384 | 18 | 66 | CP023699 | PRJNA412292 | |
| 8 | 6,453 | 33 | 71 | CP023698 | PRJNA412292 | |
| 9 | 6,263 | 18 | 68 | CP023697 | PRJNA412292 | |
| 10 | 5,465 | 18 | 65 | CP023696 | PRJNA412292 | |
| 11 | 6,613 | 18 | 67 | CP023695 | PRJNA412292 | |
| 12 | 8,058 | 18 | 67 | CP023694 | PRJNA412292 | |
| 13 | 6,392 | 18 | 69 | CP023693 | PRJNA412292 | |
| 14 | 6,491 | 18 | 68 | CP023747 | PRJNA412292 | |
| 15 | 6,603 | 21 | 68 | CP023692 | PRJNA412292 | |
| 16 | 7,032 | 21 | 67 | CP023691 | PRJNA412292 | |
| 17 | 8,212 | 18 | 65 | CP023690 | PRJNA412292 | |
| 18 | 8,396 | 18 | 71 | CP023689 | PRJNA412292 | |
| 19 | 7,756 | 21 | 68 | CP023688 | PRJNA412292 | |
| 20 | 7,520 | 21 | 67 | PDCM00000000 | PRJNA412292 | |
| 21 | 6,832 | 24 | 70 | PDCL00000000 | PRJNA412292 | |
| 22 | 7,377 | 21 | 67 | CP029197 | PRJNA454547 | |
| 23 | 6,987 | 21 | 67 | CP029196 | PRJNA454547 | |
| 24 | 6,942 | 21 | 67 | CP029195 | PRJNA454547 | |
| 25 | 7,700 | 21 | 69 | CP029194 | PRJNA454547 | |
| 26 | 7,154 | 18 | 66 | CP029193 | PRJNA454547 | |
| 27 | 7,832 | 18 | 65 | CP029192 | PRJNA454547 | |
| 28 | 7,096 | 18 | 66 | CP029191 | PRJNA454547 | |
| 29 | 6,655 | 18 | 69 | CP029190 | PRJNA454547 | |
| 30 | 6,769 | 21 | 71 | CP029189 | PRJNA454547 |
Fig. 3Secondary metabolite biosynthetic gene clusters in 30 complete streptomycetes genomes.
| Measurement(s) | DNA • genome • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • sequence assembly process • sequence annotation |
| Factor Type(s) | strain |
| Sample Characteristic - Organism | Streptomyces |