| Literature DB >> 31533616 |
Juan A Subirana1, Xavier Messeguer2.
Abstract
BACKGROUND: Satellites or tandem repeats are very abundant in many eukaryotic genomes. Occasionally they have been reported to be present in some prokaryotes, but to our knowledge there is no general comparative study on their occurrence. For this reason we present here an overview of the distribution and properties of satellites in a set of representative species. Our results provide novel insights into the evolutionary relationship between eukaryotes, Archaea and Bacteria.Entities:
Keywords: Archaea; Bacteria; Leptospira; Methanocella; Methanosarcina; Satellites; Tandem repeats
Mesh:
Substances:
Year: 2019 PMID: 31533616 PMCID: PMC6749651 DOI: 10.1186/s12862-019-1504-2
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fig. 1Box-plots showing the distribution of satellite densities (Satellites/Mb) in the genomes of different prokaryotic groups. The numbers below the name indicate the number of species in each group. Data for Methanosarcina are not included. The miscellanea category in Archaea includes several groups of species with a small number of satellites (0–3 Satellites/Mb); in Bacteria we have merged all groups with a small number of species in the reference NCBI list. Median values in all cases are in the range 0–3 satellites/Mb. Detailed data for all species are available in Additional file 1
Prokaryotic species with a large number of satellites (> 20)
| Species | Class | NCBI code | Size (Mb) | % CG | Nr sats | Sats/Mb | >30 (%) |
|---|---|---|---|---|---|---|---|
| Archaea | |||||||
| Methanosarcina vacuolata | Methanomicrobia | NZ_CP009520 | 4.56 | 39.7 | 406 | 89 | 11.8 |
| Methanosarcina barkeri | Methanomicrobia | NZ_CP009517 | 4.56 | 39.1 | 379 | 83.1 | 4.2 |
| Methanosarcina barkeri | Methanomicrobia | NZ_CP009528 | 4.57 | 39.2 | 328 | 71.7 | 9.5 |
| Methanosarcina lacustris | Methanomicrobia | NZ_CP009515 | 4.14 | 41.8 | 249 | 60.1 | 6.83 |
| Methanosarcina acetivorans | Methanomicrobia | NC_003552.1 | 5.75 | 42.7 | 189 | 32.9 | 14.8 |
| Methanosarcina mazei | Methanomicrobia | NZ_CP009512 | 4.14 | 41.4 | 155 | 37.4 | 4.52 |
| Methanosarcina siciliae | Methanomicrobia | NZ_CP009506 | 5.02 | 42.9 | 124 | 24.7 | 12.9 |
| Methanosarcina thermophila | Methanomicrobia | NZ_CP009501 | 3.13 | 41.1 | 43 | 13.7 | 9.3 |
| Methanosarcina horonobensis | Methanomicrobia | NZ_CP009516 | 5.02 | 41.3 | 35 | 6.97 | 37.1 |
| Methanobrevibacter ruminantium | Methanobacteria | NC_013790.1 | 2.94 | 32.6 | 83 | 28.3 | 43.4 |
| Methanobrevibacter olleyae | Methanobacteria | NZ_CP014265 | 2.20 | 26.9 | 57 | 25.9 | 40.3 |
| Methanococcus voltae | Methanococci | NC_014222.1 | 1.94 | 28.6 | 48 | 24.8 | 0 |
| Halorubrum lacusprofundi | Halobacteria | NC_012029.1 | 2.74 | 63.9 | 24 | 8.77 | 12.5 |
| Natrialba magadii | Halobacteria | NC_013922.1 | 4.44 | 61.0 | 23 | 5.18 | 4.35 |
| Methanoculleus marisnigri | Methanomicrobia | NC_009051.1 | 2.48 | 62.1 | 21 | 8.47 | 71.4 |
| Methanobacterium paludis | Methanobacteria | NC_015574.1 | 2.55 | 35.7 | 20 | 7.85 | 20 |
| Bacteria | |||||||
| Chloroflexus aurantiacus | Chloroflexi | NC_010175.1 | 5.26 | 56.7 | 76 | 14.4 | 42.1 |
| Burkholderia pseudomallei chrII | Betaproteobacteria | NC_006351.1 | 3.17 | 68.1 | 73 | 23.0 | 5.5 |
| Burkholderia mallei chrII | Betaproteobacteria | NC_006349.1 | 2.33 | 68.5 | 51 | 21.9 | 0 |
| Burkholderia mallei chrI | Betaproteobacteria | NC_006348.1 | 3.51 | 68.5 | 45 | 12.8 | 6.7 |
| Leptospira interrogans chrI | Spirochaetia | NC_004342.2 | 4.34 | 35 | 42 | 9.68 | 100 |
| Clostridioides difficile | Firmicutes | NC_009089.1 | 4.30 | 29.1 | 40 | 9.31 | 67.5 |
| Streptomyces coelicolor | Actinobacteria | NC_003888.3 | 9.05 | 72 | 38 | 4.20 | 21.1 |
| Rhodopirellula baltica | Planctomycetes | NC_005027.1 | 7.15 | 55.4 | 29 | 4.06 | 27.6 |
| Mycobacterium bovis | Actinobacteria | NC_002945.4 | 4.35 | 65.6 | 28 | 6.44 | 46.4 |
| Bacillus thuringiensis | Firmicutes | NC_005957.1 | 5.31 | 35.4 | 28 | 5.27 | 42.9 |
| Bacillus cereus | Firmicutes | NC_004722.1 | 5.43 | 35.3 | 25 | 4.61 | 56 |
| Amycolatopsis mediterranei | Actinobacteria | NC_014318.1 | 10.2 | 71.3 | 23 | 2.25 | 39.1 |
| Pseudomonas syringae | Gammaproteobacteria | NC_007005.1 | 6.09 | 59.2 | 22 | 3.61 | 77.3 |
| Mycobacterium tuberculosis | Actinobacteria | NC_000962.3 | 4.41 | 65.6 | 22 | 4.99 | 31.8 |
| Xanthomonas campestris | Gammaproteobacteria | NC_003902.1 | 5.08 | 65.1 | 21 | 4.14 | 23.8 |
Fig. 2Percentage of satellites with long repeats (over 30 nt) for all eukaryotic species which have more than 20 satellites. A list with details for all these species is given in Table 1. Bacteria are shown as red triangles, Methanosarcina as green dots and all other Archaea as blue dots. The value for the nematode Caenorhabditis elegans is represented as a black dot
Fig. 3Satellite length as a function of repeat size in Methanosarcina. All 1908 satellites found are represented. Fragments of genes coding for three types of amino acid repeats in proteins are found in the groups around 69, 100–150 and 260 repeat size, as discussed in the text. A large number of satellites related to microsatellites are apparent near the origin of the plot
Main satellite families with long repeat lengths
| Archaea | Bacteria | ||
|---|---|---|---|
| Family | Genus or species | Family | Genus or species |
| Fam_1_126_16 |
| Fam_13_96_8 | Several species |
| Fam_15_246_6 |
| Fam_14_39_8 | Several species |
| Fam_16_141_6 |
| Fam_17_100_7 |
|
| Fam_17_78_6 |
| Fam_23_69_6 |
|
| Fam_23_156_5 |
| Fam_24_46_6 |
|
| Fam_24_141_5 |
| Fam_25_46_6 |
|
| Fam_25_120_5 |
| Fam_26_37_6 |
|
| Fam_26_108_5 |
| Fam_31_69_5 |
|
| Fam_27_37_5 |
| Fam_32_45_5 |
|
| Fam_44_255_4 |
| Fam_33_30_5 |
|
| Fam_45_123_4 |
| Fam_42_93_4 |
|
| Fam_46_102_4 |
| Fam_43_39_4 |
|
| Fam_47_69_4 |
| Fam_44_36_4 |
|
| Fam_48_45_4 |
| Fam_45_36_4 |
|
| Fam_95_258_3 |
| Fam_46_36_4 |
|
| Fam_96_138_3 |
| Fam_47_33_4 |
|
| Fam_97_126_3 |
| Fam_56_150_3 |
|
| Fam_98_120_3 |
| Fam_57_156_3 |
|
| Fam_99_120_3 |
| Fam_58_114_3 |
|
| Fam100_102_3 |
| Fam_59_108_3 |
|
| Fam_101_93_3 |
| Fam_60_60_3 |
|
| Fam_102_51_3 |
| Fam_61_56_3 |
|
| Fam_103_42_3 |
| Fam_62_46_3 |
|
| Fam_63_48_3 | Several species | ||
| Fam_64_45_3 |
| ||
| Fam_65_42_3 |
| ||
| Fam_66_39_3 |
| ||
| Fam_67_39_3 |
| ||
| Fam_68_36_3 |
| ||
Satellite families are designed by a code of three numbers. The first number in the code indicates the order of this family, as measured by the number of satellites in the family. The second number corresponds to the repeat length of the family. The third number indicates the number of satellites included in the family. When the family belongs to a single species, its name is indicated. Note that most families have a repeat which is a multiple of 3 nt, with the notable exceptions of L. interrogans and M. conradii. In this list are only included those families with at least 3 satellites and a repeat length over 30 nt. A complete list of all families is given in Additional file 7