| Literature DB >> 34065296 |
Juan A Subirana1, Xavier Messeguer1.
Abstract
Little is known about DNA tandem repeats across prokaryotes. We have recently described an enigmatic group of tandem repeats in bacterial genomes with a constant repeat size but variable sequence. These findings strongly suggest that tandem repeat size in some bacteria is under strong selective constraints. Here, we extend these studies and describe tandem repeats in a large set of Bacillus. Some species have very few repeats, while other species have a large number. Most tandem repeats have repeats with a constant size (either 52 or 20-21 nt), but a variable sequence. We characterize in detail these intriguing tandem repeats. Individual species have several families of tandem repeats with the same repeat length and different sequence. This result is in strong contrast with eukaryotes, where tandem repeats of many sizes are found in any species. We discuss the possibility that they are transcribed as small RNA molecules. They may also be involved in the stabilization of the nucleoid through interaction with proteins. We also show that the distribution of tandem repeats in different species has a taxonomic significance. The data we present for all tandem repeats and their families in these bacterial species will be useful for further genomic studies.Entities:
Keywords: Bacillus; Bacillus coagulans; bacteria; bacterial nucleoid; non-coding DNA; satellites; small RNA; tandem repeats
Year: 2021 PMID: 34065296 PMCID: PMC8161180 DOI: 10.3390/ijms22105373
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Number of genomes as a function of the number of tandem repeats in bins of 10 tandem repeats. Genomes with few tandem repeats (0–10) represent 40% of the total genomes analyzed.
Figure 2The number of tandem repeats with each repeat length is shown. The entirety of all tandem repeats of all Bacillus genomes has been used. The bar at 61 nt includes all lengths over 60 nt. The distribution is clearly non-random, three types of repeat lengths predominate: 20–21, 51–53, and multiples of three nucleotides. A few repeats of size 40–41 nt are also present, which are related to the 20–21 nt tandem repeats.
Main Bacillus 52 nt tandem repeat families. Sequence of repeats.
| Number of | % 52 nt | Family | Consensus Repeat | |
|---|---|---|---|---|
|
| 64 | 73.4 | 37_52_10 |
|
| 61_52_6 |
| |||
|
| 53av | 90.8 | 1_52_139 |
|
| 2_52_35 |
| |||
| 8_52_18 |
| |||
|
| 64 | 71.9 | 22_52_12 |
|
|
| 59 | 88.1 | 52_51_7 |
|
| 92_54_4 |
| |||
|
| 80 | 73.7 | 42_52_9 |
|
| 77_52_5 |
| |||
|
| 38 | 84.2 | 79_52_5 |
|
| 96_54_4 |
| |||
|
| 70 | 78.6 | 4_52_21 |
|
| 5_52_21 |
| |||
| 30_53_11 |
| |||
| New species | ||||
|
| 103 | 92.2 | 3_50_31 |
|
| 5_52_25 |
| |||
| 26_53_11 |
| |||
|
| 91 | 74.7 | 4_52_27 |
|
| 71_52_7 |
| |||
| 72_52_7 |
| |||
|
| 90av | 96.6 | 7_52_18 |
|
|
| 103 | 86.4 | 9_52_16 |
|
| 11_51_15 |
| |||
| 41_51_9 |
| |||
| sp.m3-13 | 72 | 73.6 | 20_52_13 |
|
| sp.SG-1 | 77 | 81.8 | 42_53_9 |
|
|
| 82 | 59.8 | Many | |
|
| 95 | 88.4 | Many |
In this table, the main families with the 52 nt repeat in each species are shown. We also include the results obtained in our previous study [3]. New species are those added in the present work. Species which contain many small tandem repeat families are indicated by “many”. Tandem repeats in B. selenatarsenatis are shared with two closely related species: B. boroniphilus and B. subterraneus.
Characteristic signals in tandem repeats.
| NCBI Code | Repeat (nt) | Characteristic | |
|---|---|---|---|
|
| NZ_CP026649.1 | 52 | TCTAYG |
|
| NC_014829.1 | 52 | GGTCATCAT |
|
| NZ_KV917374.1 | 52 | AAAgGGAAT |
|
| NZ_KV440949.1 | 52 | TTTTC |
|
| NZ_CP016020.1 | 21 | TCGCGG |
Figure 3Alignment of three B. coagulans genomes; NCBI codes: NC_015634.1, NZ_CP026649.1, and NZ_CP025437.1. Tandem repeats are plotted as vertical black lines with a thickness proportional to the tandem repeat length. The whole genomes are presented in the upper frame; there is an extensive overall alignment, but many small gaps are apparent. The lower frame shows a small amplified region (50 Kb). Further examples are given in Figure S2. The correspondence of tandem repeats in different genomes is only approximate.
Coding features of Bacillus tandem repeats with a repeat length of 60 nt.
| NCBI Code | Tandem Repeat | Protein Gene | ||||
|---|---|---|---|---|---|---|
| Start | Length | Start | Length | NCBI Code | ||
| NC_015634.1 |
| 2718394 | 301 | 2718351 | 447 | WP_013860576 |
| NZ_CP026649.1 |
| Heavily mutated | 3215973 | 387 | WP_035183339 | |
| NZ_LT603683.1 |
| 580499 | 301 | 580438 | 438 | WP_065894177 |
| NC_006582.1 |
| 3644328 | 241 | 3644228 | 402 | WP_011248345 |
| NC_017190.1 |
| 438961 | 301 | 438923 | 369 | WP_014471456 |
| NC_014551.1 |
| 456884 | 241 | 456846 | 309 | WP_013351072 |
| NC_006322.1 |
| 540638 | 301 | 540531 | 441 | WP_011197566 |
| NZ_CP007640.1 |
| 4062283 | 301 | 4062244 | 375 | WP_010789649 |
| NC_000964 |
| 494545 | 301 | 494506 | 372 | WP_003246542 |
Distribution of tandem repeats in different groups of Bacillus.
| NCBI Code | Genome Size (Mb) | GC% | Number of Tandem Repeats with a Given Repeat Size | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | 10–19 | 20–21 | 22–50 | 51–53 | >53 | ||||
|
| |||||||||
| NZ_KV917374.1 |
| 5.43 | 36.4 | 91 | 7 | 7 | 3 |
| 5 |
| NC_014829.1 |
| 4.68 | 36.5 | 64 | 7 | 0 | 2 |
| 8 |
| NC_015634.1 |
| 3.07 | 47.3 | 26 | 0 | 0 | 1 |
| 4 |
| NC_016023.1 |
| 3.55 | 46.5 | 63 | 0 | 1 | 1 |
| 4 |
| NZ_CP023704.1 |
| 4.02 | 37.5 | 70 | 2 | 4 | 7 |
| 2 |
|
| |||||||||
| NC_022524.1 |
| 4.88 | 46 | 50 | 2 |
| 8 |
| 1 |
| NZ_BASE01000145 |
| 4.76 | 42.1 | 82 | 1 |
| 1 |
| 1 |
|
| |||||||||
| NZ_CP016020.1 |
| 4.36 | 36.5 | 32 | 2 |
| 0 | 0 | 1 |
| NZ_CP011008.1 |
| 5.52 | 39.8 | 40 | 3 |
| 9 | 0 | 1 |
| NZ_CP017080.1 |
| 5.01 | 42.3 | 38 | 1 |
| 13 | 0 | 2 |
|
| |||||||||
| NC_004722.1 |
| 5.51 | 35.3 | 25 | 9 | 1 |
| 0 | 0 |
| NZ_CP018931.1 |
| 5.24 | 35.4 | 31 | 5 | 3 |
| 0 | 2 |
| NZ_CP007512.1 |
| 5.88 | 35.0 | 31 | 9 | 4 |
| 0 | 3 |
| NC_003997.3 |
| 5.23 | 35.4 | 19 | 3 | 5 |
| 0 | 0 |
| NZ_CP009692.1 |
| 5.64 | 35.4 | 26 | 7 | 2 |
| 0 | 2 |
|
| |||||||||
| NC_014103.1 |
| 5.1 | 38.1 |
| 7 | 1 | 0 | 0 | 0 |
| NC_017138.1 |
| 5.08 | 38.1 |
| 2 | 2 | 0 | 0 | 1 |
| NZ_CP011007.1 |
| 3.88 | 41.5 |
| 0 | 0 | 4 | 0 | 0 |
| NC_014551.1 |
| 3.98 | 46.1 |
| 1 | 0 | 0 | 0 | 2 |
| NC_000964.3 |
| 4.22 | 43.5 |
| 0 | 0 | 0 | 0 | 1 |
| NZ_CP012024.1 |
| 3.38 | 40.8 |
| |||||
| NZ_CP012502.1 |
| 3.58 | 46.1 |
| |||||
| NZ_CP017786.1 |
| 3.64 | 41.5 |
| |||||
Only a few examples of each group are shown. The characteristic feature of each group is enhanced in bold.
Tandem repeats in different phylogenetic groups of Bacillus.
| Group | Genome | CG% | Number of | 52 nt |
|---|---|---|---|---|
| CEREUS | 5.5 | 35.4 | 25.7 | NO |
| SUBTILIS | 4.2 | 44.9 | 3.8 | NO |
| PUMILUS | 3.8 | 41.3 | 3.2 | NO |
| METHANOLICUS | 3.3–6.4 | 36–42 | 12–54 | Variable |
| MEGATERIUM | 3.9–5.5 | 35–38 | 2–7 | NO |
| SIMPLEX | 4.6–5.5 | 39–42 | 12–40 | NO |
| HALODURANS-A | 4.6 | 38.7 | 80.3 | YES |
| HALODURANS-B | 4.1 | 42.3 | 4 | NO |
| COAGULANS | 3.6 | 37–46 | 56 | YES |
| MISCELLANEOUS | 3.2–5.3 | 33–45 | 0–49 | Variable |
Average values are given when the group is homogeneous. Details are given in Supplementary Table S6. The SIMPLEX group is the only group characterized by the presence of abundant tandem repeats with a 21 nt repeat.
Figure 4Models of transcribed RNA tandem repeats: (a) Model of an RNA with nine repeats of 52 nt, prepared with RNA; (b) RNA sponge: Fragment of folded RNA (five repeats) interacting with several proteins; (c) interaction of tandem repeat RNA (black) with messenger RNA (red); (d) interaction of tandem repeat RNA (black) with messenger RNA (red), facilitated by interaction with the Hfq protein hexamer; (e) model of the complex of a partially doubly stranded RNA and the Hfq protein, PDB code 4V2S [29]; (f) direct interaction of an RNA tandem repeat with DNA.