| Literature DB >> 32807079 |
Hongxi Zhang1, Douyue Li1, Xiangyan Zhao1, Saichao Pan1, Xiaolong Wu1, Shan Peng1, Hanrou Huang1, Ruixue Shi1, Zhongyang Tan2.
Abstract
BACKGROUND: The ubiquitous presence of short tandem repeats (STRs) in virtually all genomes implicates their functional relevance, while a widely-accepted definition of STR is yet to be established. Previous studies majorly focus on relatively longer STRs, while shorter repeats were generally excluded. Herein, we have adopted a more generous criteria to define shorter repeats, which has led to the definition of a much larger number of STRs that lack prior analysis. Using this definition, we analyzed the short repeats in 55 randomly selected segments in 55 randomly selected genomic sequences from a fairly wide range of species covering animals, plants, fungi, protozoa, bacteria, archaea and viruses.Entities:
Mesh:
Year: 2020 PMID: 32807079 PMCID: PMC7430839 DOI: 10.1186/s12864-020-06949-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1A high percentage of STRs in genomes and genomes probably tend to produce repeats. a STR percentages of 55 randomly-selected reported segments and the control group, which were the sequences generated with the same nucleotide numbers and components as those of the 55 selected reported segments but the random nucleotide orders by a program written in C language. b Contradiction analysis of disappearance and high percentage of STRs in the genomes
The lengths (bp) of STRs with different repeat unit types and different iterations in the segment of the reported human reference X chromosomal sequence at the location of 144,822–231,384 bp
| Iteration | Monoa | Di | Tri | Tetra | Penta | Hexa | Total |
|---|---|---|---|---|---|---|---|
| I2 | (18128)b | 10,040 | 3540 | 2056 | 1250 | 480 | 17,366 |
| I3 | 9702 | 1782 | 288 | 156 | 45 | 18 | 11,991 |
| I4 | 3844 | 368 | 12 | 112 | – | – | 4336 |
| I5 | 2095 | 120 | 15 | 20 | – | – | 2250 |
| I6 | 600 | 24 | 18 | 0 | – | – | 642 |
| I7 | 182 | 14 | -c | 28 | – | – | 224 |
| I8 | 128 | 16 | – | 0 | – | – | 144 |
| I9 | 54 | 18 | – | 36 | – | – | 108 |
| I10 | 50 | 0 | – | – | – | – | 50 |
| I11 | 55 | 22 | – | – | – | – | 77 |
| I12 | 24 | – | – | – | – | – | 24 |
| I13 | 65 | – | – | – | – | – | 65 |
| I14 | 56 | – | – | – | – | – | 56 |
| I15 | 45 | – | – | – | – | – | 45 |
| I16 | 64 | – | – | – | – | – | 64 |
| I17 | 0 | – | – | – | – | – | 0 |
| I18 | 36 | – | – | – | – | – | 36 |
| I19 | 19 | – | – | – | – | – | 19 |
| I20 | 0 | – | – | – | – | – | 0 |
| I21 | 42 | – | – | – | – | – | 42 |
| I22 | 0 | – | – | – | – | – | 0 |
| I23 | 23 | – | – | – | – | – | 23 |
| I24 | 0 | – | – | – | – | – | 0 |
| I25 | 25 | – | – | – | – | – | 25 |
| I26 | – | – | – | – | – | – | – |
| I27 | – | – | – | – | – | – | – |
| I28 | – | – | – | – | – | – | – |
| Sum | 17,109 | 12,404 | 3873 | 2408 | 1295 | 498 | 37,587 |
a Mononucleotide repeat (Mono), Dinucleotide repeat (Di), Trinucleotide repeat (Tri), Tetranucleotide repeat (Tetra), Pentanucleotide repeat (Penta), Hexanucleotide repeat (Hexa)
b The length of mononucleotide repeats with iterations of 2 was not included in this statistics and just used as the reference here
c Beyond the largest iteration of this repeat unit type in corresponding analyzed segments were expressed as “-“
Fig. 2Straight strand models of semi-conservative replication and slippage. a The space of a nucleotide was drawn. * indicates that those number is the theoretical values (top); The stable straight model of semi-conservative replication (middle); The comparison of hydrogen bond and 3′-5′ phosphodiester bonds (bottom) [55–57]. # indicates the strength ratio was calculated by the strength of hydrogen bond dividing that of phosphodiester bond. b The impossible straight slippage models of mononucleotide, dinucleotide and trinucleotide repeats according to the strict geometric calculation of the space of a nucleotide and the stability of hydrogen and phosphodiester bonds
Fig. 3The DNA chain is highly curved or folded in the nucleus and the impossible curved slippage model. a Schematic diagram of the size of the nuclear space (top) [61]; The normal replicating enzymes complex straighten the DNA chain, while the disturbed replicating enzymes complex may cause the DNA molecule return to curved state (bottom). b Impossible curved template slippage model according to the strict geometric calculation of the space of a nucleotide and the stability of hydrogen and phosphodiester bonds (top); Mono- and dinucleotide repeats may be impossibly produced in curved replicating strands (middle and bottom)
Fig. 4Stable folded slippage models of mononucleotide to hexanucleotide repeats amplification according to the strict geometric calculation of the space of a nucleotide and the stability of hydrogen and phosphodiester bonds. Repeat units tend to be expanded in the replicating strands when the template strands are on the inner side of the folded slippage models respectively. The bottom 3 sub-figures were the folded slippage models in three-dimensional helix form
Fig. 5Stable folded slippage models of mononucleotide to hexanucleotide repeats contraction according to the strict geometric calculation of the space of a nucleotide and the stability of hydrogen and phosphodiester bonds. Repeat units tend to be subtracted in the replicating strands when the template strands are on the outside of the folded slippage models respectively. The bottom 3 sub-figures were the folded slippage models in three-dimensional helix form
Fig. 6Repeat production incline to expansion. Fo, Fi refer to the force required for the two template strands to bend, respectively. Fo > Fi means that the force of the template strand bending downward is greater than the bending upward, and Pe > Pc means that the possibility of the template strand bending upward is greater than the downward bending