Literature DB >> 32385251

Transcriptome and translatome profiles of Streptomyces species in different growth phases.

Woori Kim1, Soonkyu Hwang1, Namil Lee1, Yongjae Lee1, Suhyung Cho1, Bernhard Palsson2,3,4, Byung-Kwan Cho5,6,7.   

Abstract

Streptomyces are efficient producers of various bioactive compounds, which are mostly synthesized by their secondary metabolite biosynthetic gene clusters (smBGCs). The smBGCs are tightly controlled by complex regulatory systems at transcriptional and translational levels to effectively utilize precursors that are supplied by primary metabolism. Thus, dynamic changes in gene expression in response to cellular status at both the transcriptional and translational levels should be elucidated to directly reflect protein levels, rapid downstream responses, and cellular energy costs. In this study, RNA-Seq and ribosome profiling were performed for five industrially important Streptomyces species at different growth phases, for the deep sequencing of total mRNA, and only those mRNA fragments that are protected by translating ribosomes, respectively. Herein, 12.0 to 763.8 million raw reads were sufficiently obtained with high quality of more than 80% for the Phred score Q30 and high reproducibility. These data provide a comprehensive understanding of the transcriptional and translational landscape across the Streptomyces species and contribute to facilitating the rational engineering of secondary metabolite production.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32385251      PMCID: PMC7210306          DOI: 10.1038/s41597-020-0476-9

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Streptomyces, which comprise the largest genus of Actinobacteria, are huge natural reservoir of secondary metabolites, including antibiotics, immunosuppressants, and other medicinal compounds[1-6]. Recent advancements in high-throughput sequencing have led to the development of the genome mining approach, which implicates that the genome of each Streptomyces species has more than 30 secondary metabolite biosynthetic gene clusters (smBGCs) with potential to produce various unexplored secondary metabolites[2]. These secondary metabolites are synthesized by a series of enzymatic reactions, which depend on the supply of precursor molecules from primary metabolism, such as acetyl-coenzyme A and amino acids[7]. After active growth terminates, an overall metabolic transition occurs, which leads to the activation of secondary metabolite production[8,9]; this metabolic transition from primary to secondary metabolism is governed by multi-layered regulatory mechanisms at transcriptional, translational, and post-translational levels[10,11]. Thus, understanding the complex regulatory systems of the metabolic transition is important to enhance secondary metabolite production. The overall metabolic transition encompasses diverse genome-wide gene expression changes, which are regulated by signaling cascades from the pleiotropic regulators to pathway-specific regulators[8,10,12,13]. To understand the underlying molecular mechanisms of metabolic transitions, transcriptional changes that occur between growth phases have been studied[13-15]. For example, the time-series transcriptome analysis of Streptomyces coelicolor demonstrated that coherent genes that are involved in specific metabolism and their regulatory genes exhibit similar expression patterns during metabolic transitions; this suggests that primary metabolism-related genes are functionally connected to the smBGC genes through regulatory gene expression. Based on this suggestion, putative regulatory genes and their interconnected networks could be identified by screening genes that have similar expression patterns[13]. Bacteria can fine-tune gene expression both at the transcriptional and translational levels[16,17]. For example, Escherichia coli proteome analysis revealed that only approximately half of protein abundance is determined by transcriptional regulation, which indicates the existence of various post-transcriptional regulation[18]. In this regard, deciphering translational dynamics is important to understanding post-transcriptional regulations that are closely related to cellular protein levels[19]. Recently, ribosome profiling has been used to measure translational levels by deep sequencing of the ribosome-protected mRNA fragments (RPFs) at the position of the translating ribosome[20]. Several ribosome profiling studies in Streptomyces have been reported by our research group for S. coelicolor, S. clavuligerus, and S. lividans, which revealed translational buffering of secondary metabolism-related genes at a later growth phase and that translational abundance is more consistently maintained than transcript abundance[11,21,22]. Translational regulations are advantageous for the tight control of secondary metabolite biosynthesis, as translation requires the highest energy costs among all cellular reactions[19]. Moreover, the expression of smBGC-associated genes can be more rapidly regulated at the translational level than at the transcriptional level in response to dynamic environmental changes[23]. Given the dynamic relationship between transcription and translation, as exhibited by translational buffering[11], integrative analysis at both levels should unravel complex regulations in Streptomyces. However, transcriptomic and translatomic data have covered only a small portion of approximately 350 reported Streptomyces genomes, which have not been systematically validated at the multi-species level. In this study, we provide RNA-Seq and ribosome profiling data of five Streptomyces species at four different growth phases, followed by validation of the read quality. The species were S. avermitilis MA-4680, S. clavuligerus ATCC27064, S. lividans TK24, S. venezuelae ATCC15439, and S. tsukubaensis NRRL 18488, which are industrial strains that produce antifungal avermectin, β-lactamase inhibitor clavulanic acid, and immunosuppressant FK506, respectively[24-26]. S. lividans and S. venezuelae were characterized by their fast growth and ease of genetic manipulation, and have been employed as heterologous expression hosts[27-30]. An overview of the preparation of transcriptomic and translatomic data is illustrated in Fig. 1. A total of 12 to 83.5 million raw reads for RNA-Seq and 113 to 763.8 million raw reads for ribosome profiling were obtained. Although the RNA-Seq and ribosome profiling data of two species (S. clavuligerus[21] and S. lividans[22]) among the five species were already reported in previous studies by our research group, this study provided a uniformly processed and mapped dataset of all five species. This facilitates the efficiency of the comparative transcriptome and translatome analysis at multi-time points between multi-species. Further, understanding the transcriptional and translational regulatory mechanisms and developing regulatory synthetic parts, such as promoters, ribosome-binding sequences, 5′ untranslated regions, and terminators[4] from the dataset allows rational genome engineering for efficient secondary metabolite production by Streptomyces[11].
Fig. 1

Overall flow of RNA-Seq and ribosome profiling data construction of five Streptomyces species. (a) The sequencing library construction protocol for RNA-Seq and ribosome profiling. P5 and P7 were the PCR primers, Rd1 SP and Rd2 SP were the sequencing primers, and BC was the barcode sequence. (b) An overview of processing and mapping of the sequencing reads. The criteria or parameters are shown. The steps indicated with asterisk (*) are performed only for the ribosome profiling data. (c) The growth profile of five Streptomyces species in R5− medium. Sampling time points are represented by a grey dot, which are the early-exponential (E), transition (T), late-exponential (L), and stationary (S) points.

Overall flow of RNA-Seq and ribosome profiling data construction of five Streptomyces species. (a) The sequencing library construction protocol for RNA-Seq and ribosome profiling. P5 and P7 were the PCR primers, Rd1 SP and Rd2 SP were the sequencing primers, and BC was the barcode sequence. (b) An overview of processing and mapping of the sequencing reads. The criteria or parameters are shown. The steps indicated with asterisk (*) are performed only for the ribosome profiling data. (c) The growth profile of five Streptomyces species in R5− medium. Sampling time points are represented by a grey dot, which are the early-exponential (E), transition (T), late-exponential (L), and stationary (S) points.

Methods

Strains and cell growth

Streptomyces strains were inoculated from their 20% glycerol stock of spores into 50 mL of R5− liquid medium with 8 g of glass beads (3 ± 0.3 mm diameter) in a 250 mL baffled flask, grown at 30 °C, and pre-cultured at 250 rpm. The R5− liquid medium consists of 103 g L−1 sucrose, 0.25 g L−1 K2SO4, 10.12 g L−1 MgCl2∙6H2O, 10 g L−1 glucose, 0.1 g L−1 casamino acids, 5 g L−1 yeast extract, 5.73 g L−1 TES (pH 7.2), 0.08 mg L−1 ZnCl2, 0.4 mg L−1 FeCl3∙6H2O, 0.02 mg L−1 CuCl2∙2H2O, 0.02 mg L−1 MnCl2∙4H2O, 0.02 mg L−1 Na2B4O7∙10H2O, 0.02 mg L−1 (NH4)6Mo7O24∙4H2O, and 0.28 g L−1 NaOH. The grown mycelium was inoculated to fresh R5− medium with an initial optical density of 0.05 at 600 nm for the main culture as biological duplicates and grown under the previously mentioned conditions. The cells were sampled at four different time points based on the growth profile of each strain, as follows: early-exponential (E), transition (T), late-exponential (L), and stationary (S) phases. The E, T, L, and S time points were 13, 17, 19.5, and 33.5 h for S. avermitilis, 26, 80, 105.5, and 125 h for S. clavuligerus, 9.5, 14, 16, and 20 h for S. lividans, 12.5, 24.5, 30.5, and 48.5 h for S. venezuelae, and 15, 18.5, 28, and 48 h for S. tsukubaensis after inoculation, respectively (Fig. 1c). At the sampling time points for the ribosome profiling samples, thiostrepton (Sigma-Aldrich, St. Louis, MO, USA) was added to the cultures to a final concentration of 20 μM to compartment the translating ribosomes on the mRNA, which is a highly sensitive drug for Streptomyces compared to chloramphenicol or other drugs[31,32]. The cultures were then incubated for 5 min at 30 °C, and subsequently harvested for the construction of ribosome profiling libraries.

RNA-Seq library preparation and high-throughput sequencing

The overview of the library construction of RNA-Seq is illustrated in Fig. 1a. The harvested cells were washed with polysome buffer (20 mM Tris-HCl, pH 7.5; 140 mM NaCl and 5 mM MgCl2), and then resuspended with 500 μL lysis buffer (0.3 M sodium acetate, pH 5.2; 10 mM ethylenediaminetetraacetic acid and 1% Triton X-100). The resuspended cells were frozen with liquid nitrogen and grounded using a mortar and pestle. The ground mycelium was thawed and centrifuged at 4 °C for 10 min at 16,000 × g. The supernatant was collected and stored at −80 °C. Following the preparation of lysates from four growth phases as biological duplicates, the lysates were mixed with a solution of phenol:chloroform:isoamyl alcohol (25:24:1, v/v), and the mixtures were separated by centrifugation. DNA in the extracted RNA samples were removed by treatment with 2 μL DNase I (NEB, Ipswich, MA, USA), 5 μL 10 × DNase I buffer, and 1 μL SUPERase-In RNase Inhibitor (Thermo Scientific, Waltham, MA, USA). Lastly, the DNase I-treated RNA samples were purified using phenol:chloroform:isoamyl alcohol (25:24:1, v/v) and ethanol precipitation. To eliminate rRNAs in the recovered RNA samples, the Ribo-Zero rRNA Removal Kit for Bacteria (Epicentre, Madison, WI, USA) was used according to the manufacturer’s instructions. The quality of rRNA-depleted RNA samples was checked using 2% agarose gel electrophoresis. The suitable RNA samples were then used to construct RNA sequencing libraries using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA). The size distributions of the final libraries were checked using the Agilent 2200 TapeStation System (Agilent, Santa Clara, CA, USA). The constructed libraries were sequenced on the HiSeq. 2500 platform using either a 100-bp (S. lividans, S. avermitilis, S. clavuligerus, and S. venezuelae) or 50-bp (S. tsukubaensis) single-end read recipe (Fig. 1a).

Data processing of RNA-Seq reads

Raw FASTQ files were processed using the CLC Genomics Workbench (CLC Bio, Aarhus, Denmark). Raw reads were trimmed by their overall quality (score: 0.05; maximum ambiguous nucleotides: (2) and length (minimum length: 15 nucleotides). The filtered reads were mapped to each reference genome sequence with the default parameters (mismatch cost: 2; insertion cost: 2; deletion cost: 3; length fraction: 0.9; similarity fraction: 0.9; and ignore non-specific matches). The accession number of each reference genome is as follows: S. avermitilis MA-4680 (NC_010572), S. clavuligerus ATCC27064 (chromosome NZ_CP027858, plasmid NZ_CP027859), S. lividans TK24 (NZ_CP009124), S. venezuelae ATCC15439 (CP013129), and S. tsukubaensis NRRL18488 (chromosome CP020700, plasmid CP020701, and CP020702). The statistics pertaining to quality trimming and reference mapping are summarized in Table 1. The number of uniquely mapped reads to each gene were counted using the RNA-Seq analysis tool in the CLC Genomics Workbench and the read counts were normalized using the DESeq. 2 package in R[33].
Table 1

Overall statistics of RNA-Seq data.

SpeciesGrowth phaseNumber of raw readsAverage length (bp)Number of trimmed_readPercentage trimmedTrimmed reads length (bp)Number of randomly mapped readsNumber of uniquely mapped readsPercentage of uniquely mapped reads (%)Raw read FASTQ accession
S. avermitilis MA-4680E115,222,70010115,222,324100.00100.914,743,00114,475,09495.09SRP158023
E215,540,30410115,539,540100.00100.814,842,96514,232,75291.59
T118,962,69510118,961,958100.00100.817,584,93116,820,05388.70
T218,054,98310118,054,302100.00100.916,337,75014,948,45582.80
L113,904,00510113,903,462100.00100.912,858,18212,238,05088.02
L216,814,30510116,813,651100.00100.815,778,54415,212,12790.47
S116,662,55210116,661,924100.00100.915,627,23414,761,49488.59
S216,278,76610116,278,123100.00100.914,643,70612,519,51476.91
S. clavuligerus ATCC 27064E114,798,62810114,798,315100.00100.811,098,6649,036,99561.07SRP188290
E214,979,23810114,978,853100.00100.810,822,6768,622,47957.56
T115,701,66910115,701,289100.00100.810,501,9559,056,07757.68
T212,420,95210112,420,654100.00100.810,776,12410,096,09781.28
L113,207,84610113,207,520100.00100.87,770,9867,283,39355.15
L213,782,30210113,782,042100.00100.97,706,7857,193,33752.19
S113,526,27010113,525,948100.00100.812,683,45712,292,00090.88
S213,272,33210113,272,058100.00100.612,663,76311,921,21089.82
S. lividans TK24E115,062,70510115,062,394100.00100.913,098,71712,182,99980.88PRJEB31507
E215,941,90110115,941,640100.00100.914,010,89712,726,79179.83
T114,403,25510114,402,994100.00100.912,594,96011,858,70882.34
T215,701,75910115,701,526100.00100.914,333,36413,320,93384.84
L116,081,29410116,080,979100.00100.814,679,57314,003,91187.08
L215,402,57710115,402,313100.00100.813,896,46412,784,44383.00
S115,650,34810115,650,033100.00100.914,016,14112,866,71282.22
S217,244,36010117,243,710100.00100.913,310,07510,371,37860.15
S. venezuelae ATCC15439E113,343,48210113,339,75299.97100.911,002,1609,468,99370.98PRJEB34219
E213,150,52110113,147,02099.97100.910,562,0039,986,72575.96
T114,479,41710114,474,26999.96100.913,219,13412,480,95386.23
T212,310,42710112,307,77099.98100.910,406,0229,456,88276.84
L112,192,70810112,173,06999.84100.910,371,4189,415,01977.34
L212,728,23510112,723,10999.96100.910,435,4488,569,12467.35
S113,022,12210113,019,96499.98100.910,770,4849,268,46371.19
S211,969,03110111,957,87299.91100.99,138,8468,090,65467.66
S. tsukubaensis NRRL18488E141,652,9475141,627,59599.9450.941,292,66931,773,09276.33SRP103795
E235,401,0185135,382,99399.9550.834,839,05034,058,31196.26
T153,758,5145153,721,96599.9350.852,441,94551,123,24495.16
T225,432,8365125,421,46299.9650.824,909,55324,095,21194.78
L183,469,0195183,456,28199.9850.682,904,13576,980,98192.24
L239,371,0045139,339,18399.9250.838,739,77136,079,74491.71
S178,596,6945178,587,49199.9950.677,714,23861,851,60578.70
S251,475,1675151,441,66699.9350.950,553,22644,055,75385.64
Overall statistics of RNA-Seq data.

Ribosome profiling library preparation and high-throughput sequencing

An overview on the library construction of ribosome profiling is illustrated in Fig. 1a [21]. The mycelium that was treated by thiostrepton was collected by centrifugation at 4 °C for 10 min at 3,000 × g, and the cell pellet was washed with 2 mL of polysome buffer that was composed of 20 mM Tris-HCl (pH 7.5), 140 mM NaCl, and 5 mM MgCl2 with 20 μM thiostrepton. The washed pellet was re-suspended in 1 mL of lysis buffer composed of 950 μL of polysome buffer and 50 μL of 20% Triton X-100 with 20 μM thiostrepton. The resuspended cells were dripped into a mortar filled with liquid nitrogen and then grounded with a pestle. The cell debris was removed by centrifugation at 4 °C for 5 min at 3,000 × g. The supernatant was further clarified and collected by centrifugation at 4 °C for 10 min at 16,000 × g. To digest RNA in the lysate (containing 50 μg RNA), the S. avermitilis and S. tsukubaensis samples were treated with 750 U of RNase I (Invitrogen, Waltharn, MA, USA) at 37 °C for 45 min, and the remaining strains were treated with 400 U of Micrococcal Nuclease (MNase) (NEB), 20 μl of 10× MNase buffer, and 2 μl of 100× Bovine Serum Albumin (BSA) (NEB) at 37 °C for 2 h. The samples were then loaded onto Illustra MicroSpin S-400 HR Columns (GE Healthcare Life Sciences, Marlborough, MA, USA) that were previously washed three times with 500 μL of washing buffer, which was composed of 50 mM Tris-HCl (pH 8.0), 250 mM NaCl, 50 mM MgCl2, 25 mM EGTA, and 1% Triton X-100. The column was centrifuged at 4 °C for 2 min at 400 × g, and the flow-through was further purified by a phenol-chloroform-isoamyl alcohol extraction and ethanol precipitation. rRNA was depleted with the Ribo-Zero rRNA Removal Kit (Epicentre) according to the manufacturer’s instructions. The ribosome-protected RNA fragments (RPF) of between 26 and 34 bp were separated by electrophoresis for 65 min at 200 V using 15% polyacrylamide TBE-urea gel (Invitrogen), and eluted in 400 μL of RNA gel extraction buffer, which was composed of 300 mM sodium acetate pH 5.5, 1 mM EDTA, and 0.25% (w/v) SDS. The samples were frozen for 30 min at −80 °C and then incubated at 37 °C for 4 h. The eluted RNAs were isolated by ethanol precipitation and purified once again with the RNeasy MinElute Column (Qiagen, Hilden, Germany) using the manufacturer’s protocol. The enriched RPFs were then denatured for 90 s at 80 °C and incubated for 1 h at 37 °C with 5 mL of 10× T4 Polynucleotide Kinase (PNK) buffer (NEB), 20 U of SUPERase-In RNase Inhibitor, and 10 U of T4 PNK (NEB) to dephosphorylate the 3′ end. The dephosphorylated RNAs were purified using the RNeasy MinElute Column (Qiagen). The sequencing library was constructed from the end-repaired RPFs using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB) according to the manufacturer’s instructions. The final library of approximately 150–160 bp was size-selected by gel electrophoresis for 90 min at 100 V using a 2% agarose gel that was dyed with SYBR Gold Nucleic Acid Gel Stain (Bio-Rad, Hercules, CA, USA). The concentration of the final library was measured using a Qubit 2.0 Fluorometer (Invitrogen) and the Qubit dsDNA HS Kit. The size distribution was assessed using the Agilent 2200 TapeStation System (Agilent). The constructed library was sequenced on the Illumina HiSeq. 2500 platform using the 50-bp single-end read recipe (Fig. 1b).

Data processing of ribosome profiling reads

The libraries of seven samples of S. avermitilis—except for the E1 sample—and six samples of S. venezuelae—except for the T2 and S2 samples—were prepared and sequenced twice to increase the output and merged for further data processing (Table 2). The sequencing results were de-multiplexed and processed by CLC Genomics Workbench (CLC Bio). A total of 113,065,267 to 763,831,282 raw reads were generated for each replicate and were exported in the FASTQ format for the data upload. The reads were then mapped to the PhiX control sequences (NCBI Genbank accession number: NC_001422) to eliminate the PhiX control reads with the following parameters: mismatch cost: 2; insertion cost: 3; deletion cost: 3; length fraction: 0.9; similarity fraction: 0.9; and non-specific matches were randomly mapped. A total of 112,376,633 to 661,109,040 reads were unmapped. As these reads were sequenced from the 5′ end of the enriched RPF to 50 bp downstream, which is longer than the size-selected RPF (26 to 34 bp), the 5′ end sequences of the 3′ adapter sequence of the NEBNext Multiplex Small RNA Library Prep Set for Illumina (NEB) were also included. To remove the adapter sequences from the reads prior to mapping, the sequences were trimmed by the following parameters: action: remove adapter; strand: minus; mismatch cost: 2; gap cost: 3; internal match minimum score: 3; and end match minimum score: 3. Ultimately, the removed adapter sequence was 5′−ATACGAGATNNNNNNCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTT−3′, in which the NNNNNN sequences were CACTGT, ATTGGC, TACAAG, and TTTCA for index 5, 6, 12, and 19, respectively. The reads were additionally trimmed based on their overall quality (score: 0.05, maximum ambiguous nucleotides: 2) and length (>15 bp). The trimming steps yielded 90.68 to 98.47% of the PhiX control unmapped reads. To confirm the data quality and reproducibility of the reads to analyze the translational abundance of the genes, the reads were mapped to their genome sequence. A total of 84,947,464 to 590,644,871 reads with an average read length of 25.8 to 33.7 bp were mapped with random mapping of non-specific matches, while 1,833,155 to 103,819,037 reads with an average read length of 25.8 to 33.1 bp were mapped with ignored mapping of non-specific matches (mismatch cost: 2; insertion cost: 3; deletion cost: 3; length fraction: 0.9; similarity cost: 0.9). The overall statistics of the data processing are summarized in Table 2. The mapped information was exported in a BAM file format, and the number of mapped reads at each genomic position was counted as the read count. Normalized read value and principal component analysis (PCA) plots were generated using the DESeq. 2 package in R[34].
Table 2

Overall statistics of ribosome profiling data.

SpeciesGrowth phaseNumber of raw readsNumber of PhiX_unmapped readNumber of trimmed_readTrimmed reads length (bp)Number of randomly mapped readsNumber of uniquely mapped readsUniquely mapped read length (bp)Number of mapped reads within CDSRaw read FASTQ accession
S. avermitilis MA-4680E1269,943,816213,953,123209,920,05829.3203,095,35812,355,91629.97,361,468SRP158023
E2219,153,779155,106,616150,989,30732.2115,492,3052,022,42932.71,638,497
T1230,159,662148,885,319144,318,01032.7119,536,7933,541,73732.82,455,564
T2266,436,355175,849,102170,463,26931.5139,053,0542,304,886321,665,882
L1315,653,269223,796,896217,299,27932.5171,944,7667,386,27331.74,018,623
L2308,070,582228,756,230221,698,27231.5169,555,5842,929,59431.82,099,075
S1353,167,764253,272,121245,481,92432.7184,855,68813,061,11629.63,594,618
S2314,771,387223,201,435216,515,33532.3169,100,7337,789,27329.62,170,481
S. clavuligerus ATCC27064E1295,724,334202,630,787196,272,52230.1186,099,31780,030,58329.68,017,879SRP188290
E2307,178,979220,741,829200,168,12425.9187,152,13461,281,64925.810,793,485
T1253,508,213169,638,278162,668,88329.5153,361,82089,299,62829.26,590,232
T2278,275,008192,923,207186,424,11629.5175,804,00787,504,70129.16,622,288
L1270,412,414177,901,515172,769,47229.5157,803,55487,274,40529.35,179,486
L2247,353,047173,740,843167,591,08429.3153,762,95080,385,68329.14,290,733
S1238,332,934174,931,608168,131,66229.2151,884,98885,485,768296,272,720
S2265,467,800177,945,831170,844,37229.3158,910,65887,634,49829.18,614,984
S. lividans TK24E1309,069,871221,703,188211,211,18229.6199,318,42397,211,45829.619,459,202PRJEB31507
E2296,200,898195,130,032185,499,27830.7173,681,37281,149,25730.413,680,737
T1275,143,588188,420,163183,178,56029.6173,135,37824,771,54629.17,284,890
T2212,032,571140,458,753136,973,07831.8125,039,75521,109,39130.47,941,585
L1263,274,610144,638,209142,426,16931.8113,452,1669,735,52531.26,343,211
L2224,511,134154,437,906150,790,27631.9137,882,47119,449,4013113,544,509
S1181,850,628120,826,462116,547,30432.296,190,17910,046,16530.97,412,208
S2297,413,784249,272,969244,109,07432.7233,266,66313,457,82533.18,447,573
S. venezuelae ATCC15439E1631,858,582536,439,531522,569,14733.2489,456,63969,000,62731.55,079,524SRX6932518 ~ SRX6932525
E2535,926,210429,105,453415,708,34333.8390,659,79840,255,23231.83,642,920
T1394,870,178340,483,910329,759,62732.4300,485,61240,691,93430.92,723,458
T2166,241,490161,092,945157,601,24831.8138,943,00135,483,94030.62,162,631
L1763,831,282661,109,040641,879,09232.2590,644,87171,846,63630.75,611,836
L2646,261,568533,891,315520,671,90232482,564,89367,853,12030.94,261,890
S1451,939,879378,474,029369,204,07831.6297,503,31552,715,12530.63,509,283
S2168,577,692164,169,059158,023,89331.1147,637,91634,179,59630.41,764,845
S. tsukubaensis NRRL 18488E1125,024,824123,919,014121,549,76130.1102,297,8212,307,78630.71,572,115SRP103795
E2124,528,713123,522,173120,322,9293097,672,9561,833,15531.41,313,616
T1132,160,059131,056,038126,769,30531.199,903,6198,779,98129.72,895,583
T2113,065,267112,376,633109,608,58230.684,947,4642,814,40929.51,142,882
L1162,942,510161,871,882157,083,40930.7137,909,62552,297,44829.39,275,687
L2166,664,595165,825,698161,666,87230.9140,291,70256,575,98529.17,533,975
S1146,036,258144,790,983138,367,94829.7115,443,2266,902,38629.41,325,229
S2199,958,654199,442,987192,788,03130171,209,443103,819,03728.94,429,568
Overall statistics of ribosome profiling data.

Data Records

Raw read FASTQ files, trimmed read FASTQ files, mapped read BAM files, and the gene expression text files of all samples were uploaded to the public databases (Tables 1 and 2). Raw read FASTQ files of RNA-Seq and ribosome profiling of three species (S. avermitilis, S. clavuligerus, S. tsukubaensis) were deposited at the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA)[35-37]. Raw read FASTQ files of RNA-Seq and ribosome profiling of S. lividans were deposited at the European Nucleotide Archive (ENA)[38]. Raw read FASTQ files of RNA-Seq of S. venezuelae were deposited at the ENA[39]. Raw read FASTQ files of ribosome profiling of S. venezuelae were deposited at the NCBI SRA[40-47]. Trimmed read FASTQ files and mapped read BAM files of the raw read FASTQ files in the NCBI SRA (RNA-Seq and ribosome profiling data of S. avermitilis, S. clavuligerus, S. tsukubaensis, and ribosome profiling data of S. venezuelae) were deposited at the ENA with a new accession[48]. Trimmed read FASTQ files and mapped read BAM files of the raw read FASTQ files in the ENA (RNA-Seq and ribosome profiling data of S. lividans, and RNA-Seq data of S. venezuelae) were also deposited at the ENA with the same accession as each corresponding raw read FASTQ file[38,39]. The gene expression profile as raw read counts of RNA-Seq and ribosome profiling data of S. avermitilis[49], S. clavuligerus[50], S. tsukubaensis[51], and ribosome profiling data of S. venezuelae[52] are available in a text file format in the Gene Expression Omnibus (GEO) database. Also, the gene expression profiles of all datasets (RNA-Seq and ribosome profiling of the five species), including raw read counts, DESeq. 2 normalized values, fold change values between growth phases, and p-values for the fold changes, are available in a text file format in the Figshare[53]. The raw read FASTQ data of S. clavuligerus in NCBI SRA[36] was published in the previous study[21]. Also, the raw read FASTQ data of S. lividans in ENA[38] was published in the previous study[22]. Note that the ribosome profiling data of Streptomyces griseus was uploaded under the same accession with those of S. venezuelae, but they are not described in this study.

Technical Validation

RNA-Seq read quality validation

A total of 40 RNA-Seq runs that were applied to five species at four time points as duplicates yielded on average 16,430,039 reads (S. avermitilis), 13,961,155 reads (S. clavuligerus), 15,686,025 reads (S. lividans), 12,899,493 reads (S. venezuelae), and 51,144,650 reads (S. tsukubaensis). After trimming the sequencing reads by quality score and nucleotide length, more than 99.8% of the sequencing reads remained, which indicated high-sequencing quality. The remaining reads were used as input to generate sequencing QC reports in the CLC Genomics Workbench to validate the quality of the reads. At first, the overall read lengths were extremely long, corresponding to the sequencing read recipe (Fig. 2a, Table 1). For the four species that were sequenced with the 100-bp read recipe, the percentage of read lengths that were over 100 bp was more than 97.9%, and for S. tsukubaensis, which was sequenced with the 50-bp read recipe, the percentage of read lengths that were over 50 bp was more than 93.8%. Further, more than 98.6% (S. avermitilis), 98.9% (S. clavuligerus), 98.9% (S. lividans), 98.9% (S. venezuelae), and 96.4% (S. tsukubaensis) of the total reads exhibited an average Phred score of greater than 30, which indicates 99.9% base call accuracy (Fig. 2b). In addition, the quality of each base of the obtained reads was examined. The overall base positions of the sequencing reads were highly covered, and even the lowest average values of the coverage were 97.5% (S. avermitilis), 97.6% (S. clavuligerus), 97.8% (S. lividans), 98.3% (S. venezuelae), and 95.4% (S. tsukubaensis) at the last position, respectively (Fig. 2c). Moreover, the median values of the Phred scores per base position of the reads were consistently high across reads, with 40 scores in four species and 38 scores in S. tsukubaensis (Fig. 2d). From these quality validation results, we validated the quality of all obtained RNA sequencing reads for subsequent analysis.
Fig. 2

Read quality analysis of RNA-Seq samples of five Streptomyces species at four growth phases. The replicate of each growth phase is represented as “1” or “2” after the growth phase. (a) Read length distribution of trimmed reads. (b) Distribution of average Phred scores of the trimmed reads. (c) The number of sequences that cover individual base positions normalized to the total number of sequences at each base position. (d) The distribution of the median Phred quality scores that were observed at each base position. (e) PCA plot of RNA-Seq mapped reads of each gene. (f) Violin and box plot of the log2 normalized expression values.

Read quality analysis of RNA-Seq samples of five Streptomyces species at four growth phases. The replicate of each growth phase is represented as “1” or “2” after the growth phase. (a) Read length distribution of trimmed reads. (b) Distribution of average Phred scores of the trimmed reads. (c) The number of sequences that cover individual base positions normalized to the total number of sequences at each base position. (d) The distribution of the median Phred quality scores that were observed at each base position. (e) PCA plot of RNA-Seq mapped reads of each gene. (f) Violin and box plot of the log2 normalized expression values.

Assessment of transcriptome data

The qualified reads were mapped to each reference genome with a uniquely mapped percentage that ranges from 76.91% to 95.09% (S. avermitilis), 52.19% to 90.88% (S. clavuligerus), 60.15% to 87.08% (S. lividans), 67.35% to 86.23% (S. venezuelae), and 76.33% to 96.26% (S. tsukubaensis) (Table 1). The number of uniquely mapped reads at each gene was counted and normalized using the DESeq. 2 package in R[33] to reduce variation between samples. Using the normalized values, principal component analysis (PCA) was performed, which validated the high reproducibility of the sequencing data (Fig. 2e). The distribution of log2 (DESeq normalized value + 1) broadly ranged from 0 to 20 in the different growth phase samples (Fig. 2f).

Ribosome profiling read quality validation

A total of 40 ribosome profiling reads were obtained from five Streptomyces species at four time points as duplicates. Unlike the RNA-Seq data, the trimmed reads were considered as raw sequences of the enriched RPF sequences, as additional PhiX control and adapter sequences that were involved in the ribosome profiling steps must be removed (Fig. 1a,b). Since the RPF fragments were selected by size, ranging from 26 to 34 bp, the 3′ end of the total 50 bp sequencing read contained non-RPF sequences, such as the 3′ adapter sequences, which do not represent the quality of the RPF reads. Thus, the QC reports on the trimmed reads were exported from CLC Genomics Workbench (Qiagen) to assess the quality of the RPF reads. The read length distribution exhibited a broad range from 20 to 40 bp, with one or two enriched peaks (Fig. 3a). The enriched peak sizes were comparable to the monosome-protected sizes, and they varied for different species, while they were more conserved for different growth phase samples of the same species. The differences in RNA degradation efficiency of RNase I or MNase across species may be the primary reason for the observed size differences[54]. Further, the read quality that was measured by the average Phred scores was generally high in all samples of the five species; the quality of more than 94% of the reads was higher than Q20, and more than 80% were higher than Q30 (Fig. 3b). Both per-sequence and per-base analyses of the read quality were observed. As the read size ranged mostly between 25 and 35 bp after adapter trimming, the base number coverage at each position of the 50 bp read dramatically decreased at the 3′ end (Fig. 3c). In terms of species, most of them exhibited the highest decline at 28 to 30 bp, which was consistent with the read length distribution (Fig. 3a). Given the base coverage, the median Phred score per base was demonstrated to be from 1 to 35 bp (Fig. 3d). The overall median of the quality score was approximately Q38, while the median score at the 5′ end was slightly lower than that of the middle section, and the score at the 3′ end of select species showed dramatic reductions. The low quality at the 3′ end may be due to some portions of identical long reads, which were somehow enriched during the size selection step of library construction, which stimulates wrong base calling. For the S. clavuligerus E2 sample, enriched peaks of less than 20 bp in length for approximately 10% of the total reads were unexpectedly found, along with decreased coverage at 15 bp, but these peaks did not seem to affect the overall read quality (Figs. 3a,c,d). Overall, most of the reads were shorter than 35 bp, and the read quality of all samples was high and suitable for downstream analyses.
Fig. 3

Read quality analysis of ribosome profiling samples of five Streptomyces species at the four growth phases. The replicate of each growth phase is represented as “1” or “2” after the growth phase. (a) Read length distribution of trimmed reads. (b) Distribution of average Phred scores of the trimmed reads. (c) The number of sequences that cover individual base positions normalized to the total number of sequences at each base position. (d) The distribution of median Phred quality scores that were observed at each base position. (e) PCA plot of ribosome profiling mapped reads of each gene. (f) Violin and box plot of the log2 normalized expression values.

Read quality analysis of ribosome profiling samples of five Streptomyces species at the four growth phases. The replicate of each growth phase is represented as “1” or “2” after the growth phase. (a) Read length distribution of trimmed reads. (b) Distribution of average Phred scores of the trimmed reads. (c) The number of sequences that cover individual base positions normalized to the total number of sequences at each base position. (d) The distribution of median Phred quality scores that were observed at each base position. (e) PCA plot of ribosome profiling mapped reads of each gene. (f) Violin and box plot of the log2 normalized expression values.

Assessment of translatome data

To examine the additional quality of the reads for the translational abundance of each gene, the trimmed reads were mapped to their corresponding genome. Based on the mapping parameter, some reads would be non-specifically aligned to more than one genomic position due to highly repetitive genomic regions, including rRNA genes. Approximately, 43.1 to 590.6 million reads (75.3 to 96.8% of the trimmed reads) were mapped when the non-specifically matched reads were randomly assigned to one of the mapped positions, while 1.8 to 103.9 million reads (1.3 to 54.9% of the reads) were uniquely mapped when the non-specifically matched reads were excluded (Table 2). These results suggest that the non-specifically matched reads were generally more than half of the total mapped reads. Further, the proportion of these reads varied in different samples even within the same species, which is because the rRNA was enriched during the monosome recovery step, and the efficiency of rRNA removal differed across samples[55]. S. clavuligerus showed the highest uniquely mapped read number and ratio among five species, with an average of 82.4 million reads (47% of the trimmed reads). The S. clavuligerus E2 sample showed a relatively lower mapped number (61.3 million reads, 30.6% of the trimmed reads) compared to other S. clavuligerus samples. S. venezuelae showed 34.2 to 69 million mapped reads (average 13.9% of the trimmed reads), respectively. The two early samples of S. lividans showed 97.2 and 81.1 million mapped reads (46 and 43.8% of the trimmed reads), respectively, while other samples showed low numbers (9.7 to 24.8 million mapped reads, 5.5 to 15.4% of the trimmed reads). S. tsukubaensis showed various ranges for the mapping read number; L2 and S2 samples showed 56 and 103 million mapped reads (35 and 53.9% of the trimmed reads), respectively, while other samples showed a lower number of mapped reads (1.9 to 8.8 million mapped reads, 1.5 to 6.9% of the trimmed reads). S. avermitilis showed the lowest number of mapped reads and ratio among the five species, with 2 to 13.1 million mapped reads (1.3 to 5.9% of the trimmed reads). Although the minimum mapped read number among the samples was 1.8 million, the numbers obtained are, based on several bacterial transcriptome studies, considered sufficient for analysis of the whole translational profile and differential expression levels of genes, as 1 to 5 million reads are suggested for high statistical significance[56-59]. Among the uniquely mapped reads, some reads were mapped within RNA genes, rather than protein-coding genes, which mostly corresponded to tRNA genes. These reads may be the fragments of tRNA and rRNA that were bound to the ribosome and then enriched during monosome recovery[60]. Therefore, further validation was performed using only the mapped reads of the protein-coding genes. A total of 1.1 to 19.5 million reads were mapped to protein-coding genes that were 4.3 to 81.0% of the uniquely mapped genes, which indicates a high ratio of tRNA gene-mapped reads (Table 2). To validate the mapped read quality, the reproducibility of the mapped read number among biological replicates was investigated by PCA. All replicates were found to exhibit high reproducibility (Fig. 3e). The mapped read quality for quantitative analysis, such as the differential translational abundance of genes during growth, was examined by the distribution of the normalized values at four different growth phases, as described in the “Methods” section. The overall log2 value (DESeq normalized value + 1) broadly ranged from 0 to 20, which was considered significant to analyze the translational abundance in different growth phases (Fig. 3f). In conclusion, the mapped reads were confirmed to exhibit high quality in terms of sequencing depth, reproducibility, and translational abundance.
Measurement(s)transcriptome • translation • translatome
Technology Type(s)RNA sequencing • Ribo-Seq
Factor Type(s)Growth phases
Sample Characteristic - OrganismStreptomyces avermitilis • Streptomyces clavuligerus • Streptomyces lividans • Streptomyces venezuelae • Streptomyces tsukubensis
  34 in total

Review 1.  Systems biology of antibiotic production by microorganisms.

Authors:  J Stefan Rokem; Anna Eliasson Lantz; Jens Nielsen
Journal:  Nat Prod Rep       Date:  2007-05-30       Impact factor: 13.423

Review 2.  Streptomyces morphogenetics: dissecting differentiation in a filamentous bacterium.

Authors:  Klas Flärdh; Mark J Buttner
Journal:  Nat Rev Microbiol       Date:  2009-01       Impact factor: 60.633

Review 3.  Importance of microbial natural products and the need to revitalize their discovery.

Authors:  Arnold L Demain
Journal:  J Ind Microbiol Biotechnol       Date:  2013-08-30       Impact factor: 3.346

Review 4.  Systems biology and biotechnology of Streptomyces species for the production of secondary metabolites.

Authors:  Kyu-Sang Hwang; Hyun Uk Kim; Pep Charusanti; Bernhard Ø Palsson; Sang Yup Lee
Journal:  Biotechnol Adv       Date:  2013-11-02       Impact factor: 14.227

Review 5.  1995 Colworth Prize Lecture. The regulation of antibiotic production in Streptomyces coelicolor A3(2).

Authors:  Mervyn Bibb
Journal:  Microbiology (Reading)       Date:  1996-06       Impact factor: 2.777

Review 6.  Synthetic Biology Tools for Novel Secondary Metabolite Discovery in Streptomyces.

Authors:  Namil Lee; Soonkyu Hwang; Yongjae Lee; Suhyung Cho; Bernhard Palsson; Byung-Kwan Cho
Journal:  J Microbiol Biotechnol       Date:  2019-05-28       Impact factor: 2.351

Review 7.  Antibiotics produced by Streptomyces.

Authors:  Rudi Emerson de Lima Procópio; Ingrid Reis da Silva; Mayra Kassawara Martins; João Lúcio de Azevedo; Janete Magali de Araújo
Journal:  Braz J Infect Dis       Date:  2012-09-11       Impact factor: 1.949

Review 8.  Primary metabolism and its control in streptomycetes: a most unusual group of bacteria.

Authors:  D A Hodgson
Journal:  Adv Microb Physiol       Date:  2000       Impact factor: 3.517

9.  Optimized submerged batch fermentation strategy for systems scale studies of metabolic switching in Streptomyces coelicolor A3(2).

Authors:  Alexander Wentzel; Per Bruheim; Anders Øverby; Øyvind M Jakobsen; Håvard Sletta; Walid A M Omara; David A Hodgson; Trond E Ellingsen
Journal:  BMC Syst Biol       Date:  2012-06-07

10.  The dynamic transcriptional and translational landscape of the model antibiotic producer Streptomyces coelicolor A3(2).

Authors:  Yujin Jeong; Ji-Nu Kim; Min Woo Kim; Giselda Bucca; Suhyung Cho; Yeo Joon Yoon; Byung-Gee Kim; Jung-Hye Roe; Sun Chang Kim; Colin P Smith; Byung-Kwan Cho
Journal:  Nat Commun       Date:  2016-06-02       Impact factor: 14.919

View more
  6 in total

Review 1.  Exploring Newer Biosynthetic Gene Clusters in Marine Microbial Prospecting.

Authors:  Manigundan Kaari; Radhakrishnan Manikkam; Abirami Baskaran
Journal:  Mar Biotechnol (NY)       Date:  2022-04-08       Impact factor: 3.619

2.  Ms1 RNA Interacts With the RNA Polymerase Core in Streptomyces coelicolor and Was Identified in Majority of Actinobacteria Using a Linguistic Gene Synteny Search.

Authors:  Viola Vaňková Hausnerová; Olga Marvalová; Michaela Šiková; Mahmoud Shoman; Jarmila Havelková; Milada Kambová; Martina Janoušková; Dilip Kumar; Petr Halada; Marek Schwarz; Libor Krásný; Jarmila Hnilicová; Josef Pánek
Journal:  Front Microbiol       Date:  2022-05-11       Impact factor: 6.064

Review 3.  Clavulanic Acid Production by Streptomyces clavuligerus: Insights from Systems Biology, Strain Engineering, and Downstream Processing.

Authors:  Víctor A López-Agudelo; David Gómez-Ríos; Howard Ramirez-Malule
Journal:  Antibiotics (Basel)       Date:  2021-01-18

4.  System-Level Analysis of Transcriptional and Translational Regulatory Elements in Streptomyces griseus.

Authors:  Soonkyu Hwang; Namil Lee; Donghui Choe; Yongjae Lee; Woori Kim; Ji Hun Kim; Gahyeon Kim; Hyeseong Kim; Neung-Ho Ahn; Byoung-Hee Lee; Bernhard O Palsson; Byung-Kwan Cho
Journal:  Front Bioeng Biotechnol       Date:  2022-02-25

Review 5.  Synthetic biology approaches to actinomycete strain improvement.

Authors:  Rainer Breitling; Martina Avbelj; Oksana Bilyk; Francesco Del Carratore; Alessandro Filisetti; Erik K R Hanko; Marianna Iorio; Rosario Pérez Redondo; Fernando Reyes; Michelle Rudden; Emmanuele Severi; Lucija Slemc; Kamila Schmidt; Dominic R Whittall; Stefano Donadio; Antonio Rodríguez García; Olga Genilloud; Gregor Kosec; Davide De Lucrezia; Hrvoje Petković; Gavin Thomas; Eriko Takano
Journal:  FEMS Microbiol Lett       Date:  2021-06-11       Impact factor: 2.742

6.  Genome-scale analysis of genetic regulatory elements in Streptomyces avermitilis MA-4680 using transcript boundary information.

Authors:  Yongjae Lee; Namil Lee; Soonkyu Hwang; Woori Kim; Suhyung Cho; Bernhard O Palsson; Byung-Kwan Cho
Journal:  BMC Genomics       Date:  2022-01-21       Impact factor: 3.969

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.