| Literature DB >> 29326668 |
Xinfeng Li1, Han Mei1, Fang Chen1, Qing Tang1, Zhaoqing Yu1, Xiaojian Cao1, Binda T Andongma1, Shan-Ho Chou2, Jin He1.
Abstract
The non-pathogenic bacterium Mycobacterium smegmatis mc2155 has been widely used as a model organism in mycobacterial research, yet a detailed study about its transcription landscape remains to be established. Here we report the transcriptome, expression profiles and transcriptional structures through growth-phase-dependent RNA sequencing (RNA-seq) as well as other related experiments. We found: (1) 2,139 transcriptional start sites (TSSs) in the genome-wide scale, of which eight samples were randomly selected and further verified by 5'-RACE; (2) 2,233 independent monocistronic or polycistronic mRNAs in the transcriptome within the operon/sub-operon structures which are classified into five groups; (3) 47.50% (1016/2139) genes were transcribed into leaderless mRNAs, with the TSSs of 41.3% (883/2139) mRNAs overlapping with the first base of the annotated start codon. Initial amino acids of MSMEG_4921 and MSMEG_6422 proteins were identified by Edman degradation, indicating the presence of distinctive widespread leaderless features in M. smegmatis mc2155. (4) 150 genes with potentially wrong structural annotation, of which 124 proposed genes have been corrected; (5) eight highly active promoters, with their activities further determined by β-galactosidase assays. These data integrated the transcriptional landscape to genome information of model organism mc2155 and lay a solid foundation for further works in Mycobacterium.Entities:
Keywords: Mycobacterium smegmatis; gene structural re-annotation; highly active promoter; leaderless mRNA; operon; sub-operon; transcriptional start site; transcriptome
Year: 2017 PMID: 29326668 PMCID: PMC5741613 DOI: 10.3389/fmicb.2017.02505
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Strains used in this study.
| mc2155 | Wild-type | Li and He, |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261-P | P | This study |
| mc2155/pMV261- | Edman sequencing | This study |
| mc2155/pMV261- | Edman sequencing | This study |
| Cloning host | Stored by our laboratory | |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
| DH5α/pMD19- | Identification the TSS of | This study |
Figure 1Circular map of M. smegmatis mc2155 genome and the corresponding transcriptome. Numbers outside circle show the genome coordinate. Moving inward, the subsequent two rings show CDSs in forward and reverse strands, respectively, with colors representing different COG categories. The inner three rings were colored in black, blue and purple representing the transcriptome maps at 16, 26, and 39 h, respectively; the uneven lines above and below the middle circle lines represent the expression level greater or lower than average.
Figure 2Gene expression levels in RNA-seq data. (A) The median expression levels of different growth phases measured by RPKM for whole identified genes. RPKM values of each sample were analyzed using Wilcoxon test, *p < 0.05, **p < 0.01, ***p < 0.001. (B,C) respectively indicate the number of DEGs in distinct COG functional categories at the early-stationary phase (26 h) and mid-stationary phase (39 h). The COG functional categories are as follows: C, energy production and conversion; E, amino acid transport and metabolism; F, nucleotide transport and metabolism; G, carbohydrate transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; J, translation, ribosomal structure and biogenesis; K, transcription; L, replication, recombination and repair; M, cell wall/membrane/envelope biogenesis; O, posttranslational modification, protein turnover, chaperones; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; R, general function prediction only; S, function unknown; T, signal transduction mechanisms.
Figure 3KEGG enrichment analysis of DEGs between mid-exponential and mid-stationary phases. Y-axis label represents the distinct KEGG pathways, and X-axis label represents rich factor (rich factor = amount of DEGs in the pathway/amount of all genes in background gene set). The colors of the dots represent the Q-values of enrichment. Red color indicates high enrichment, while blue color indicates low enrichment. Pathway terms were sorted by Q-value in ascending order; and were marked in bold and underlined when Q-value < 0.05. The sizes of the dots represent the gene number of enrichment. (A) Top 30 up-regulated KEGG pathways. (B) Top 30 down-regulated KEGG pathways.
Figure 4Transcriptional maps of MSMEG_3068-3078 loci. (A) Whole view of strand-specific transcriptional maps of MSMEG_3068-3078 loci. MSMEG_3068, MSMEG_3071-3074, and MSMEG_3078 genes residing in the positive strand while genes of MSMEG_3069-3070 and MSMEG_3075-3077 are located in the negative strand. (B) Single-nucleotide resolution transcriptional maps of MSMEG_3071 and its TSS. (C) Single-nucleotide resolution transcriptional maps of MSMEG_3070 and its TSS. Lines of different colors represent samples in different growth phases, −1 and −2 represent the two biological replicates. The same as below.
Figure 5Identification of MSMEG_2196 TSS. (A) Whole view of transcriptional maps of MSMEG_2196-2199 loci. (B) TSS of MSMEG_2196 identified by 5′-RACE in this study. (C) Comparison of TSS of MSMEG_2196 identified by our RNA-seq data with that previously reported (Bharati et al., 2013). The new identified TSS is colored in red, and the putative −10 motif is marked in rectangle. The previously identified TSS is colored in purple, however, no conserved −10 motif can be found.
Figure 6Distribution of 5′-UTRs length. (A) The distribution of 5′-UTRs length. (B) The percentage of 5′-UTRs length. The length of 5′-UTRs <0 indicated potentially mis-annotated genes.
Figure 7N-terminal amino acids identification of MSMEG_4921 and MSMEG_6422. (A) and (B) respectively represent the transcriptional maps of MSMEG_4921 and MSMEG_6422, while (C) and (D) respectively show the N-terminal five amino acids of MSMEG_4921 and MSMEG_6422 proteins. Both amino acid sequences are identical to the annotation files. Noteworthy, the N-terminal translation initiator Met was removed by methionine amino peptidase (MetAP), which is often crucial for the function and stability of proteins.
Figure 8Transcriptional maps and gene structural re-annotation of MSMEG_1874 (A) and MSMEG_6901 (B). The nucleotides in red and red dot lines indicate the start codons annotated by algorithms-based software. The nucleotides in green and green dot lines indicate the TSS determined by RNA-seq, and these TSSs are also considered to be the accurate start codons of relevant genes.
Figure 9Examples of the five groups of identified operon. The RT-PCR forward (F) and reverse (R) primers are indicated in red arrows. (A) Confirmed, DOOR annotated MSMEG_4268 and MSMEG_4267 as an operon, our RNA-seq data and RT-PCR experiment also indicate this. (B) Extended, DOOR annotated MSMEG_4307 and MSMEG_4306 as an operon, without MSMEG_4305 (in red arrow); however, our RNA-seq data and RT-PCR experiment found that the transcription was extended to MSMEG_4305. (C) Dismissed, DOOR annotated MSMEG_1696 to MSMEG_1695 (in green arrow) as an operon, however, our RNA-seq data and RT-PCR experiment indicated that MSMEG_1696 was dismissed from transcription. (D) New, our RNA-seq data identified new operons not found by DOOR, which was also indicated by RT-PCR. (E) Alternative, MSMEG_6233 and MSMEG_6232 (in purple arrow) were found to be co-transcribed; and MSMEG_6232 seems to be alternatively transcribed from its own TSS.
Operon prediction in M. smegmatis mc2155.
| Confirmed | 1635 | 7 | |
| Extended | 61 | 7 | |
| Dismissed | 167 | 6 | |
| New | 65 | 5 | |
| Alternative | 273 | 2 |
Confirmed: in an operon, RNA-seq annotated equal number of gene to DOOR;
Extended: in a operon, RNA-seq annotated more genes than DOOR;
Dismissed: in a operon, RNA-seq annotated less genes than DOOR;
New: this operon was not annotated by DOOR;
Alternative: sub-operon, a gene inside of an operon was transcribed independently;
of each groups, several genes were selected to perform RT-PCR.
Figure 10Screening and identification of highly active promoter. P represents mc2155/pMV261-P-lacZ strain, and the same as below. β-galactosidase activities for the nine strains (Phsp60 as a control) were shown in (A–I). Data represent the averages of biological triplicates. Error bars indicate standard deviation.