| Literature DB >> 20525796 |
Irnov Irnov1, Cynthia M Sharma, Jörg Vogel, Wade C Winkler.
Abstract
Post-transcriptional regulatory mechanisms are widespread in bacteria. Interestingly, current published data hint that some of these mechanisms may be non-random with respect to their phylogenetic distribution. Although small, trans-acting regulatory RNAs commonly occur in bacterial genomes, they have been better characterized in Gram-negative bacteria, leaving the impression that they may be less important for Firmicutes. It has been presumed that Gram-positive bacteria, in particular the Firmicutes, are likely to utilize cis-acting regulatory RNAs located within the 5' mRNA leader region more often than trans-acting regulatory RNAs. In this analysis we catalog, by a deep sequencing-based approach, both classes of regulatory RNA candidates for Bacillus subtilis, the model microorganism for Firmicutes. We successfully recover most of the known small RNA regulators while also identifying a greater number of new candidate RNAs. We anticipate these data to be a broadly useful resource for analysis of post-transcriptional regulatory strategies in B. subtilis and other Firmicutes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20525796 PMCID: PMC2965217 DOI: 10.1093/nar/gkq454
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Length distribution of B. subtilis 5′-leader regions In total, 600 TSSs were identified based on the dRNA-seq analysis (Supplementary Table S1). The length of putative leader regions was calculated as the distance from the start of the cDNA reads to the start of the downstream coding region. The data is presented as a histogram with bin width = 20. Three individual leader regions longer than 450 nt (‘x1′) are denoted with the dashed line.
Candidates for long 5′ leader regions
| UTR size | cDNA # | Downstream gene | References | Distribution | |
|---|---|---|---|---|---|
| ncr2261 | 264 | 30 | Gerth | 2,7,8 | |
| ncr102 | 214 | 51 | Boor | 2,7,8 | |
| ncr776 | 212 | 30 | Gao | 7 | |
| ncr2328 | 199 | 91 | Jin and Sonenshein, 1994 | 2,7 | |
| ncr1471 | 189 | 19 | This work | (–) | |
| ncr2639 | 187 | 8 | Bisicchia | 2 | |
| ncr1422 | 183 | 8 | This work | (–) | |
| ncr921 | 180 | 748 | Shazand | 2,7,8 | |
| ncr1521 | 180 | 8 | This work | (–) | |
| ncr2264 | 179 | 19 | This work | 2,7,8 | |
| ncr1554 | 165 | 48 | This work | 2,7 | |
| ncr942 | 164 | 13 | This work | 2,3,4,7,8,9 | |
| ncr551 | 145 | 8 | This work | 2,7,8 | |
| ncr2017 | 143 | 8 | This work | 2,7,8 | |
| ncr2103 | 128 | 9 | This work | 2,7 | |
| ncr948 | 124 | 21 | Resnekov | 2,7,8 | |
| ncr812 | 123 | 12 | This work | 2,12 | |
| ncr2692 | 122 | 17 | Mauel | (–) | |
| ncr2879 | 121 | 65 | This work | 2,7 | |
| ncr95 | 121 | 21 | This work | Diverse | |
| ncr2755 | 111 | 34 | This work | 2,7,8 | |
| ncr2498 | 110 | 11 | Rowland and Taber, 1996 | (–) | |
| ncr1278 | 106 | 8 | This work | 2 | |
| ncr2243 | 104 | 156 | Barrick | Diverse | |
| ncr2896 | 104 | 14 | Ogasawara | 1,2,3,4,7,8,9,12 | |
| ncr665 | 258 | 16 | This work | 2 | |
| ncr1323 | 96 | 14 | This work | (–) | |
| ncr1443 | 420 | 617 | This work | (–) |
aCandidates for long 5′ leader regions (‘5′ UTR’) are selected based on cDNA signals from intergenic regions (of the enriched sample), which include reads that overlap with or end within 10 nt of the downstream gene. The 5′ UTR size is calculated as the distance from the start of cDNA signals up to the start of the coding region. All 5′ UTR sequences are included in Supplementary Figure S3.
bcDNA # is calculated as the average number of cDNAs corresponding to the first 15 nt from the 5′-end of the overall peak in the enriched sample. Only potential UTRs represented by seven or more cDNA hits are shown in this table. The remaining candidates are listed in Supplementary Table S1.
cThe complete references can be found in Supplementary Materials.
dOrganisms to which blast hits could be detected are denoted as: (–) B. subtilis only (1) Anoxybacillus flavithermus, (2) B. amyloliquefaciens, (3) B. anthracis, (4) B. cereus, (5) B. clausii, (6) B. intermedius, (7) B. licheniformis, (8) B. pumilus, (9) B. thuringiensis, (10) B. weihenstephanensis, (11) Brevibacillus brevis, (12) Geobacillus species, (13) Lysinibacillus sphaericus, (14) Paenibacillus species, (diverse) all of the above Bacillales families with the addition of Staphylococcus, Listeria, Streptococcus and Lactobacillus species.
eThere are two TSS detected for clpX. The TSS described here is 180 nt upstream of the previously characterized start site (Gerth et al., 1996), which was also observed in our data set.
fThe TSS for pdhA detected herein is 139 nt upstream of the one previously characterized by Gao et al. (2002).
gA long 5′ UTR upstream of rplU was predicted previously by Barrick et al. (2004) based on sequence conservation.
hTwo TSS upstream of rpmH are observed in agreement with Ogasawara et al. (1985).
Predicted sRNAs
| Peak | Start | End | cDNA # | Prev gene | Next gene | Gene direction | Name | Distribution | |
|---|---|---|---|---|---|---|---|---|---|
| ncr1160 | 70 | 2697037 | 2697106 | 920 | /−/+/−/ | – | (–) | ||
| ncr1159 | 50 | 2692882 | 2692931 | 352 | /−/+/−/ | – | (–) | ||
| ncr982 | 80 | 1917501 | 1917580 | 324 | /+/+/−/ | – | 8 | ||
| ncr1058 | 297 | 2273533 | 2273829 | 260 | /−/+/−/ | ncr46/bsrG | 2,7 | ||
| ncr1562 | 60 | 532583 | 532642 | 197 | /+/−/+/ | – | 8 | ||
| ncr1932 | 184 | 2273701 | 2273884 | 112 | /−/−/−/ | – | 2 | ||
| ncr1175 | 107 | 2773780 | 2773886 | 90 | /+/+/−/ | – | 2 | ||
| ncr1937 | 69 | 2283685 | 2283753 | 89 | /−/−/−/ | – | (–) | ||
| ncr2768 | 56 | 3852061 | 3852116 | 54 | /+/−/−/ | – | 2,6,8 | ||
| ncr724 | 56 | 1451260 | 1451315 | 46 | /+/+/+/ | – | (–) | ||
| ncr1857 | 260 | 2069869 | 2070128 | 44 | /−/−/−/ | bsrE | 2,3,4,7,8 | ||
| ncr1019 | 171 | 2069821 | 2069991 | 35 | /−/+/−/ | ncr39 | 2,7,8 | ||
| ncr1575 | 199 | 606407 | 606605 | 34 | /−/−/−/ | ncr10 | 2,3,4,9,10,13 | ||
| ncr2184 | 232 | 2779137 | 2779368 | 32 | /−/−/−/ | ncr60 | 2,7,8 | ||
| ncr471 | 173 | 820666 | 820838 | 24 | /−/+/+/ | – | (–) | ||
| ncr264 | 248 | 376678 | 376925 | 23 | /+/+/+/ | – | (–) | ||
| ncr952 | 151 | 1780404 | 1780554 | 20 | /+/+/−/ | – | 2 | ||
| ncr2424 | 58 | 3146126 | 3146183 | 20 | /−/−/−/ | – | 7 | ||
| ncr1915 | 59 | 2208755 | 2208813 | 18 | /−/−/−/ | – | (–) | ||
| ncr629 | 117 | 1233429 | 1233545 | 16 | /−/+/−/ | ncr22/rsaE | Diverse | ||
| ncr1015 | 120 | 2053989 | 2054108 | 14 | /−/+/−/ | – | 2,7,11,14 | ||
| ncr1155 | 233 | 2678729 | 2678961 | 13 | /+/+/−/ | ncr58/bsrH | 2,5,7,8,11 | ||
| ncr1241 | 128 | 3225697 | 3225824 | 13 | /−/+/−/ | – | 2,7 | ||
| ncr2299 | 99 | 2913485 | 2913583 | 13 | /−/−/−/ | – | (–) | ||
| ncr738 | 58 | 1467704 | 1467761 | 13 | /−/+/+/ | – | 2 | ||
| ncr969 | 58 | 1868404 | 1868461 | 12 | /+/+/+/ | – | 2 | ||
| ncr2637 | 74 | 3573045 | 3573118 | 11 | /−/−/−/ | – | 1,2,7,8 | ||
| ncr178 | 124 | 199857 | 199980 | 11 | /+/+/+/ | – | 2 | ||
| ncr560 | 230 | 1056390 | 1056619 | 11 | /−/+/+/ | ncr18 | 2 | ||
| ncr992 | 72 | 1925548 | 1925619 | 7 | /−/+/−/ | – | 2 | ||
| ncr620 | 100 | 1219702 | 1219801 | 7 | /−/+/+/ | – | 2,7 | ||
| ncr585 | 201 | 1150478 | 1150678 | 7 | /−/+/−/ | ncr20 | (–) | ||
| ncr1957 | 60 | 2316348 | 2316407 | 6 | /−/−/−/ | – | 2 |
aCandidates for small RNA are selected based on cDNA signals from the intergenic regions (of the enriched sample), which do not correspond to a known gene, with at least 25 nt distance from both the upstream and downstream genes. The length is measured from the start to the end of the cDNA signals.
bOrganisms to which blast hits could be detected are denoted as: (–) B. subtilis only (1) Anoxybacillus flavithermus, (2) B. amyloliquefaciens, (3) B. anthracis, (4) B. cereus, (5) B. clausii, (6) B. intermedius, (7) B. licheniformis, (8) B. pumilus, (9) B. thuringiensis, (10) B. weihenstephanensis, (11) Brevibacillus brevis, (12) Geobacillus species, (13) Lysinibacillus sphaericus, (14) Paenibacillus species, (diverse) all of the above Bacillales families with the addition of Staphylococcus species.
Predicted sRNAs exhibiting lowered abundance
| Peak | Start | End | cDNA # | Prev gene | Next gene | Gene direction | Name | Distribution | |
|---|---|---|---|---|---|---|---|---|---|
| ncr1855 | 94 | 2069075 | 2069168 | 5 | /−/−/−/ | – | 2,7,8 | ||
| ncr2507 | 85 | 3302792 | 3302876 | 5 | /−/−/+/ | – | (–) | ||
| ncr1670 | 68 | 1077246 | 1077313 | 5 | /−/−/+/ | – | 2,7 | ||
| ncr214 | 108 | 275609 | 275716 | 5 | /+/+/+/ | – | (–) | ||
| ncr2360 | 188 | 3036340 | 3036527 | 5 | /+/−/−/ | – | 2 | ||
| ncr2173 | 97 | 2734262 | 2734358 | 5 | /−/−/−/ | – | (–) | ||
| ncr2185 | 159 | 2780319 | 2780477 | 4 | /−/−/−/ | – | 2,7 | ||
| ncr2166 | 83 | 2678994 | 2679076 | 4 | /+/−/−/ | – | 2,7 | ||
| ncr1876 | 140 | 2099817 | 2099956 | 4 | /−/−/−/ | – | 2 | ||
| ncr1566 | 79 | 559532 | 559610 | 4 | /+/−/−/ | – | 2,7,8 | ||
| ncr976 | 52 | 1900528 | 1900579 | 4 | /+/+/−/ | – | (–) | ||
| ncr2160 | 259 | 2647405 | 2647663 | 3 | /−/−/−/ | – | 2,7,12 | ||
| ncr1421 | 223 | 3996388 | 3996610 | 3 | /+/+/+/ | – | (–) | ||
| ncr1755 | 187 | 1453368 | 1453554 | 3 | /+/−/+/ | ncr35 | (–) | ||
| ncr1733 | 107 | 1357727 | 1357833 | 3 | /+/−/−/ | ncr26 | (–) | ||
| ncr1118 | 86 | 2540930 | 2541015 | 3 | /−/+/+/ | – | 2,7 | ||
| ncr2339 | 57 | 2991183 | 2991239 | 3 | /−/−/−/ | – | (–) | ||
| ncr2665 | 77 | 3631679 | 3631755 | 3 | /−/−/−/ | – | (–) | ||
| ncr1935 | 104 | 2282621 | 2282724 | 2 | /+/−/−/ | – | (–) | ||
| ncr1052 | 183 | 2221800 | 2221982 | 2 | /−/+/+/ | ncr44 | (–) | ||
| ncr721 | 112 | 1446806 | 1446917 | 2 | /+/+/+/ | ncr34 | (–) | ||
| ncr181 | 158 | 204991 | 205148 | 2 | /+/+/+/ | ncr4 | (–) | ||
| ncr465 | 51 | 796025 | 796075 | 2 | /+/+/+/ | – | (–) | ||
| ncr2897 | 118 | 157 | 274 | 2 | start | /+/+/+ | – | (–) | |
| ncr2857 | 103 | 4122960 | 4123062 | 2 | /−/−/+/ | – | (–) | ||
| ncr977 | 106 | 1901991 | 1902096 | 2 | /−/+/+/ | – | (–) | ||
| ncr2752 | 256 | 3804713 | 3804968 | 2 | /−/−/−/ | ncr75 | 2,7,8 | ||
| ncr1565 | 61 | 554497 | 554557 | 2 | /+/−/+/ | – | (–) | ||
| ncr2658 | 60 | 3625573 | 3625632 | 2 | /−/−/−/ | – | 2 | ||
| ncr826 | 72 | 1596338 | 1596409 | 2 | /+/+/+/ | – | 2,8 | ||
| ncr1221 | 73 | 3072289 | 3072361 | 2 | /−/+/+/ | – | 2 | ||
| ncr2179 | 111 | 2752023 | 2752133 | 2 | /−/−/−/ | – | 2,7 |
aCandidates for small RNA are selected based on cDNA signals from the intergenic regions (of the enriched sample), which do not correspond to a known gene, with at least 25 nt distance from both the upstream and downstream genes. The length is measured from the start to the end of cDNA signals.
bOrganisms to which blast hits could be detected are denoted as: (–) B. subtilis only (1) Anoxybacillus flavithermus, (2) B. amyloliquefaciens, (3) B. anthracis, (4) B. cereus, (5) B. clausii, (6) B. intermedius, (7) B. licheniformis, (8) B. pumilus, (9) B. thuringiensis, (10) B. weihenstephanensis, (11) Brevibacillus brevis, (12) Geobacillus species, (13) Lysinibacillus sphaericus and (14) Paenibacillus species.
cStart indicates the start of the genomic replication (0o).
Figure 2.Visualization of enriched cDNA reads for B. subtilis 6S RNAs. The upper panel shows the sequence and predicted secondary structure for 6S-1. Denoted in red is the sequence, which is transcribed by RNAP from 6S as an RNA template into the short product RNA (pRNA). The bottom panel shows the genomic context and the distribution of cDNA reads mapped to both 6S-1 and 6S-2 loci. Arrows denote the direction of transcription. The cDNA reads for 6S-1 and 6S-2 are shown in the same relative scale. In contrast, the cDNA reads corresponding to the pRNA are approximately 10-fold less abundant compared to the 6S and, thus, are shown as a close-up for visualization purposes.
Figure 3.The expression, predicted secondary structure, and genomic context of B. subtilis sRNA candidates: ncr1175 (A), ncr982 (B), ncr1241 (C) and ncr1015 (D). For each of these RNAs, the expression level was assessed by northern blotting using total samples obtained from stationary phase cells cultured in minimal media. ‘Asterisk’ indicates that the size of sRNA detected by northern blotting is in agreement with the size of the putative sRNA as predicted by sequencing data. ‘Filled triangle’ denotes a sRNA with a different predicted size (e.g. due to processing or termination events). The genomic locus of each sRNA is shown with its enriched cDNA hits and the two flanking genes. The transcriptional unit is indicated by an arrow. An open circle denotes a potential intrinsic terminator. Candidate mRNA targets, as predicted by TargetRNA software, for are also included in the figure. The region of the sRNA predicted to associate with the target mRNA is highlighted in gray. Secondary structures were predicted using RNAfold and RNAz software.
Figure 6.Novel TA systems predicted by deep sequencing analysis. (A) Genomic locus of three new TA systems in B. subtilis. The toxic protein (gray arrow) and the RNA antitoxin (black arrow) are all arranged in tail-to-tail configuration. Note that the txpA and ratA system had been previously characterized (46). (B) Northern blotting of the toxin and antitoxin RNAs. ‘Asterisk’ indicates that the size of the sRNA as predicted by northern blot is in agreement with sequencing data. ‘Filled triangle’ denotes sRNA with different predicted size (e.g. due to processing or termination events). The expression level for bsrH and as-bsrH were too low to be detected by northern blotting in our analysis. (C) Putative sequences for as-bsrE, bsrG and bsrH toxins. Predicted membrane spanning regions are highlighted in gray. (D) Sequence alignment of the bsrE, as-bsrG and as-bsrH RNA antitoxins. Regions with base-pairing potentials are shown with different colors and labeled as P1–P4.
Figure 4.The expression, predicted secondary structure, and genomic context of B. subtilis sRNA candidates: ncr1575 (A), ncr952 (B) and RsaE/ncr629 (C). For each of these RNAs, the expression level was assessed by northern blotting using total samples obtained from stationary phase cells cultured in minimal media. ‘Asterisk’ indicates that the size of sRNA detected by northern blotting is in agreement with the size of the putative sRNA as predicted by the sequencing data. ‘Filled triangle’ denotes a sRNA with a different predicted size (e.g. due to processing or termination events). The genomic locus of each sRNA is shown with its enriched cDNA hits and the two flanking genes. The transcriptional unit is indicated by an arrow. An open circle denotes a potential intrinsic terminator. Candidate mRNA targets, as predicted by TargetRNA software, for are also included in the figure. The region of the sRNA predicted to associate with the target mRNA is highlighted in gray.
Figure 5.Putative sRNAs encoded within prophage regions. 16 sRNA candidates (denoted by arrow) originated from prophage or prophage-like regions (SPβ, skin, P6, P7) and are shown relative to their genomic location. Genes immediately upstream and downstream of the sRNA are also listed.
Predicted novel antisense RNA (asRNA) candidates
| Peak | Start | End | cDNA # | Opposite gene | Overlap | asRNA direction | Name | |
|---|---|---|---|---|---|---|---|---|
| ncr2706 | 47 | 3738263 | 3738309 | 114 | N | − | ||
| ncr1430 | 70 | 4035606 | 4035675 | 58 | 5′ | + | ||
| ncr1687 | 24 | 1154737 | 1154760 | 26 | G | − | ||
| ncr1296 | 236 | 3460215 | 3460450 | 26 | G | + | shd102 | |
| ncr1207 | 231 | 2997369 | 2997599 | 15 | G | + | shd84 | |
| ncr1334 | 103 | 3669968 | 3670070 | 12 | 3′ | + | shd112 | |
| ncr1812 | 239 | 1915034 | 1915272 | 11 | 5′ | − | SurA | |
| ncr1265 | 218 | 3307602 | 3307819 | 10 | G | + | ||
| ncr1193 | 130 | 2892567 | 2892696 | 10 | G | + | shd83 | |
| ncr2153 | 101 | 2641931 | 2642031 | 9 | G | − | ||
| ncr1383 | 204 | 3862340 | 3862543 | 8 | 3′ | + | shd115 | |
| ncr1135 | 160 | 2600156 | 2600315 | 7 | 3′ | + | shd77 | |
| ncr1186 | 17 | 2849374 | 2849390 | 7 | G | + | ||
| ncr1006 | 219 | 2002176 | 2002394 | 6 | 5′ | + | ||
| ncr1799 | 25 | 1775732 | 1775756 | 6 | 5′ | − | ||
| ncr1479 | 232 | 4213213 | 4213444 | 6 | G | + | shd127 | |
| ncr1046 | 71 | 2190676 | 2190746 | 5 | G | + | shd60 | |
| ncr1557 | 159 | 519216 | 519374 | 5 | 3′ | − | shd23 | |
| ncr2058 | 110 | 2483671 | 2483780 | 5 | G | − | ||
| ncr394 | 76 | 646225 | 646300 | 5 | G | + | shd26 | |
| ncr2160 | 259 | 2647405 | 2647663 | 3 | G | − | ||
| ncr1351 | 227 | 3747160 | 3747386 | 2 | 3′ | + | ||
| ncr1565 | 61 | 554497 | 554557 | 2 | 3′ | − | ||
| ncr2885 | 106 | 4186388 | 4186493 | 2 | 3′ | − | ||
| ncr1546 | 50 | 452734 | 452783 | 2 | 3′ | − | ||
| ncr507 | 30 | 924181 | 924210 | 2 | 3′ | + | ||
| ncr2410 | 249 | 3123766 | 3124014 | 2 | 3′ | − |
aCandidates for antisense RNA are selected based on cDNA signals (of the enriched sample) that either start within genes with the opposite orientation or end within 50 nt away from such genes. The length is measured from the start to the end of cDNA signals.
bOverlapping region is classified as follows: G = asRNA is fully complementary to the opposite gene. 5′ = asRNA 5′ end is complementary to the 5′ region of the opposite gene. 3′ = asRNA 3′ end is complementary to the 3′ region of the opposite gene. N = no overlap; the distance between 3′-end of the asRNA and the opposite gene is less than 50 nt and there is no predicted intrinsic terminator.
Figure 7.Novel arrangement of an antisense RNA predicted to base pair with the bglP 5′ leader region, which includes a cis-acting regulatory RNA. IGB representation of enriched cDNA reads corresponding to ncr1430 (top) and bglP UTR (bottom). Each transcriptional unit is represented by an arrow and the potential intrinsic terminator region is shown by a circle. ncr1430 RNA is predicted to base-paired with the ribosomal binding site of bglP (denoted by gray box).