| Literature DB >> 33078196 |
Isidoro Feliciello1,2, Željka Pezer1, Dušan Kordiš3, Branka Bruvo Mađarić1, Đurđica Ugarković1.
Abstract
Major human alpha satellite DNA repeats are preferentially assembled within (peri)centromeric regions but are also dispersed within euchromatin in the form of clustered or short single repeat arrays. To study the evolutionary history of single euchromatic human alpha satellite repeats (ARs), we analyzed their orthologous loci across the primate genomes. The continuous insertion of euchromatic ARs throughout the evolutionary history of primates starting with the ancestors of Simiformes (45-60 Ma) and continuing up to the ancestors of Homo is revealed. Once inserted, the euchromatic ARs were stably transmitted to the descendant species, some exhibiting copy number variation, whereas their sequence divergence followed the species phylogeny. Many euchromatic ARs have sequence characteristics of (peri)centromeric alpha repeats suggesting heterochromatin as a source of dispersed euchromatic ARs. The majority of euchromatic ARs are inserted in the vicinity of other repetitive elements such as L1, Alu, and ERV or are embedded within them. Irrespective of the insertion context, each AR insertion seems to be unique and once inserted, ARs do not seem to be subsequently spread to new genomic locations. In spite of association with (retro)transposable elements, there is no indication that such elements play a role in ARs proliferation. The presence of short duplications at most of ARs insertion sites suggests site-directed recombination between homologous motifs in ARs and in the target genomic sequence, probably mediated by extrachromosomal circular DNA, as a mechanism of spreading within euchromatin.Entities:
Keywords: euchromatin; evolution; heterochromatin; primates; proliferation; satellite DNA
Year: 2020 PMID: 33078196 PMCID: PMC7719264 DOI: 10.1093/gbe/evaa224
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
List of Euchromatic Human ARs Used in This Study
| AR No. | Size | Monomer Type | Position (Associated Gene), Distance (bp) | Insertion Site Characteristics | Dating of Insertion | TS-Duplicated Sequence |
|---|---|---|---|---|---|---|
| sat_1 | 1.3 | — | Intergenic ( | Adjacent to L1PA10 | Simiformes | CTT |
| sat_83 | 2.4 | M1+, X | Intron ( | Btw L1MB8 and L2c | Simiformes | AGGAGT |
| sat_368 | 0.8 | — | Intergenic ( | Btw AluSz6 and L1ME1 | Simiformes | — |
|
| 1 | R1 | Intron ( | Adjacent to MIR (SINE) | Simiformes | GAA |
|
| 0.9 | R1 | Intron ( | Adjacent to MIR (SINE) | Simiformes | GAA |
| sat_496 | 1.4 | — | Intron ( | Btw AluSx1 and MLT1C (ERVL) | Simiformes | AAG |
| sat_623 | 0.5 | — | Intron ( | No adjacent repeats | Simiformes | — |
|
| 1 | Xm, X | Intergenic ( |
| Simiformes | TGA |
| sat_704 | 1.4 | — | Intergenic ( | No adjacent repeats | Simiformes | AGTG |
| sat_685 | 1.9 | — | Intron ( | Adjacent to L1PA11 | Simiformes | AAA |
| sat_730 | 0.5 | Um | Intergenic ( | Btw L1PA8A and simple repeat | Simiformes | ATGAAAAAAA |
| sat_721 | 0.9 | — | Intergenic ( | Btw TAn and ATGGn | Simiformes | ATAT |
| sat_722 | 1 | — | Intergenic (LOC105377247), 129,191 | Btw (ATAAT)n and FordPrefect hAT | Simiformes | GCTA |
| sat_825 | 0.7 | — | Intergenic ( | Btw L1MB2 and AluSx4 | Simiformes | GAAA |
| sat_826 | 0.8 | — | Intergenic ( | No adjacent repeats | Simiformes | — |
|
| 0.7 | M1+ | Intergenic ( |
| Simiformes | GCT |
| sat_828 | 1.7 | — | Intron ( | Adjacent to AluJr | Simiformes | GAAAAAG |
| sat_1122 | 0.5 | — | Intergenic ( | No adjacent repeats | Simiformes | GTGA |
|
| 1.5 | M1+ | Intron ( |
| Simiformes | ATC |
|
| 0.5 | — | Intron ( |
| Catarrhini | AT |
| sat_85 | 0.6 | — | Intron ( | Adjacent to MIR (SINE) | Catarrhini | — |
| sat_87 | 0.5 | — | Intron ( | Adjacent to MER5A hAT-Charlie DNA transp. | Catarrhini | GAG |
|
| 0.6 | M1+ | Intron ( |
| Catarrhini | — |
| sat_621 | 1.8 | M1+ | Intergenic ( | No adjacent repeats | Catarrhini | — |
|
| 0.7 | — | Intron ( |
| Catarrhini | AGA |
| sat_606 | 0.5 | — | Intron ( | No adjacent repeats | Catarrhini | CTT |
|
| 0.7 | — | Intron ( |
| Catarrhini | GCT |
| sat_705 | 1.4 | — | Intron ( | Adjacent to L1PA10 | Catarrhini | — |
| sat_373 | 0.9 | — | Intron ( | No adjacent repeats | Catarrhini | GTTTT |
| sat_706 | 3.4 | M1+ | Intergenic ( | Btw LTR18B and L2a | Catarrhini | TTA |
|
| 0.5 | — | Intergenic ( |
| Hominoidea | AAC |
| sat_86 | 0.3 | — | Intron ( | Adjacent to MER5A | Hominoidea | — |
|
| 2.7 | M1+ | Intergenic ( |
| Hominoidea | TCAC |
|
| 1.5 | — | Intron ( | Adjacent to AluYjk | Hominoidea | ACA |
|
| 0.9 | — | Intron ( | Adjacent to AluYjk | Hominoidea | ACA |
|
| 3.7 | — | Intron ( | Adjacent to AluYjk | Hominoidea | TCA |
|
| 1 | — | Intron ( | Adjacent to AluYjk | Hominoidea | TTGT |
|
| 1.2 | — | Intron ( | Adjacent to AluSp | Hominoidea | ACT |
|
| 0.8 | — | Intron ( | Adjacent to AluSp | Hominoidea | TAT |
|
| 0.8 | — | Intron ( | Adjacent to AluSp | Hominoidea | AGCT |
| sat_367 | 1.5 | — | Intergenic ( | No adjacent repeats | Hominoidea | TGT |
| sat_729 | 0.7 | M1+ | Intergenic ( | No adjacent repeats | Hominoidea | TGT |
| sat_378 | 1.8 | — | Intergenic ( | No adjacent repeat | Hominidae | AAG |
| sat_410 | 0.5 | — | Intergenic ( | Btw L1M3 and AluY | Hominidae | GAT |
|
| 5 | M1+ | Intergenic ( | Btw AluSq2 and AluSc | Hominidae | AC |
|
| 27 | M1+, Um | Intergenic ( | Btw AluSc and (TG)n | Hominidae | AC |
|
| 32 | M1+ | Intergenic ( | Btw MSTA (ERVL) and L1PA4 | Hominidae | TTG |
|
| 31 | M1+ | Intergenic ( | Btw L1PA4 and MER11C (ERVL) | Hominidae | — |
|
| 15 | M1+ | Intergenic ( | Adjacent to MER11C (ERVL) | Hominidae | TTG |
| sat_823 | 0.7 | — | Intergenic ( | Btw 2 L1PA7 | Hominidae | TAA |
|
| 11 | M1+ | Intergenic ( | Adjacent to AluY | Hominidae | AAACCTG |
|
| 26 | M1+ | Intergenic ( | Btw AluY and L1PA7 | Hominidae | TG |
| sat_864 | 2 | M1+ | Intergenic ( | Btw MER21C and MLT2B3 (ERVL) | Hominidae | TTGG |
| sat_59 | 0.5 | M1+ | Intergenic ( | Btw 2 MSTD-int ERVL | Homininae | CTA |
| sat_257 | 0.7 | — | Intron ( | Adjacent to L1PA8 | Homininae | CAT |
| sat_372 | 5.9 | M1+ | Intergenic ( | No adjacent repeats | Homininae | ACAT |
| sat_379 | 1.2 | Um | Intron ( | No adjacent repeats | Homininae | ACT |
| sat_607 | 0.9 | R2, Xm | Intergenic ( | Btw L1PA2 and Charlie8 | Homininae | AAC |
| sat_731 | 0.5 | M1+ | Intergenic ( | Adjacent to AluSx1 | Homininae | CTCCAA |
| sat_822 | 1.1 | M1+ | Intron ( | Btw AluSc and Alu | Homininae | CTT |
| sat_1123 | 4.1 | D2, 3M1 | Intron, LOC101928195-pseudogene | Btw MamGypLTR3a and L1PA3 | Homininae | AACAG |
| sat_1145 | 4.1 | D2, 3M1 | Intergenic (LOC101928381), ncRNA, 36,431 | Btw MamGypLTR3a and L1PA3 | Homininae | AACAG |
| sat_1146 | 0.9 | — | Intergenic ( | Adjacent to AluJo | Homininae | TGA |
|
| 1 | J1A | Intergenic ( |
| Hominini | GTA |
|
| 0.9 | R1 | Intergenic ( |
| Hominini | CAT |
|
| 67 | M1, R2, X | Intron ( | Adjacent to SVA_D–E | Homo | ACACTG |
|
| 49 | R1–2, M1, Um | Intron ( | Btw SVA_D–E and L1PA3 | Homo | GAA |
|
| 77 | D1–2, R, M | Intron ( | Adjacent to L1PA3 | Homo | ATAA |
Note.—Size is expressed as number of monomers, monomer types (if available, according to Shepelev et al. [2015]), position relative to the genes, and distances to the nearest genes are shown: negative distances mean 5′ position of a gene to the AR. The association with other repetitive elements at the insertion site and dating of the insertion is listed as well as target site (TS)-duplicated sequences at the insertion sites. For ARs inserted in other repetitive elements, position of insertion within the particular element is shown in parentheses. ARs embedded in other repeats are indicated in bold and clustered ARs in italic.
Fig. 1(a) Number of single human ARs inserted within euchromatin during evolutionary history is indicated in red on the phylogenetic tree of simians (Simiformes). Species for which genome sequence is available are indicated: human (Homo sapiens), chimp (Pan troglodytes), bonobo (Pan paniscus), gorilla (Gorilla gorilla gorilla), orangutan (Pongo pygmaeus abelii), gibbon (Nomascus leucogenys), rhesus (Macaca mulatta), crab-eating macaque (Macaca fascicularis), baboon (Papio anubis), green monkey (Chlorocebus sabaeus), golden snub-nosed monkey (Rhinopithecus roxellana), proboscis monkey (Nasalis larvatus), marmoset (Callithrix jacchus), and squirrel monkey (Saimiri boliviensis). For alpha repeats sat_83, sat_706 and sat_372 which show CNV, size of repeats in each species is indicated. (b) Schematic representation of intrastrand recombination responsible for CNV of alpha repeats sat_83 and sat_706. Homologous regions in AR sequences at which recombination occurred are indicated by different signs. In sat_83, there is a single recombination site, whereas in sat_706 there are two sites.
Fig. 2ML trees based on human dispersed ARs, intronic sat_828 and sat_380-381 as well as intergenic sat_704 and sat_703 and their orthologous sequences in different primate assemblies: hg38-human, panTro6-chimp, panPan2-bonobo, gorGor5-gorilla, panAbe3-orangutan, nomLeu3-gibbon, chlSab2-green monkey, macFas5-crab-eating macaque, rheMac-rhesus, papAnu4-baboon, rhiRox1-golden snub-nosed monkey, nasLar1-proboscis monkey, calJac3-marmoset, and saiBol1-squirrel monkey. Numbers on the nodes depict ML aLRT/NJ bootstrap support values.
Fig. 3Repeat composition in 22 introns containing ARs (a), 100 randomly chosen introns without ARs (b) in 20 kb region around intergenic ARs (c) and average proportion of repeats in: genome, introns with ARS, 20 kb region around intergenic ARs and randomly chosen introns (d). The value of n in parentheses denotes the total number of repetitive elements within analyzed region.
Fig. 4Organization of repetitive families in the vicinity of intergenic ARs sat_827 and sat_1122 as well as intronic sat_605, in different primate species indicated in the phylogenetic tree. ARs are shown in red and other repetitive families are marked with different colors (see legend). Sat_827 and sat_605 are embedded in Tigger DNA transposon and Alu (SINE), respectively, whereas sat_1122 is inserted within unique region.
Fig. 5Model of the generation of dispersed ARs in euchromatin. The model postulates that ARs are, due to intrastrand homologous recombination, excised from the tandemly arranged heterochromatic repeats in the form of extrachromosomal circular satellite DNA. Short segments of homology, indicated in yellow, between circularized alpha repeats and target regions in euchromatin are necessary for their insertion by site-specific homologous recombination. Once inserted, alpha repeats are not further spread throughout euchromatin.
List of Single ARs Located within Human Euchromatin Which Are Embedded within Other Repetitive Elements in the Ancestors of the Catarrhini, Simiformes, Hominoidea, and Hominini, As Well As a List of Repeats at Orthologous Regions in Those Primates Separated before the Insertion of the Particular Alpha Repeat Occurred
| Catarrhini | Marmoset | Squirrel Monkey |
|---|---|---|
| sat_605- |
|
|
| sat_497- |
|
|
| sat_60-ERV | ERV | ERV |
| sat_612-L1 | L1 | L1 |
|
|
|
|
| sat_827-Tigger | Tigger | Tigger |
| sat_703-L1 | — | L1 |
| sat_1147-L1 | — | — |
|
|
|
|
| sat_58-ERV | ERV | ERV |
| sat_358-ERV | ERV | ERV |
|
|
|
|
| sat_84-L1 | L1 | L1 |
| sat_613-ERV | ERV | ERV |