| Literature DB >> 34813350 |
Karen H Miga1,2, Ivan A Alexandrov3,4,5.
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.Entities:
Keywords: centromere; epigenetics; genome; repeat; satellite DNA; variation
Mesh:
Year: 2021 PMID: 34813350 PMCID: PMC9549924 DOI: 10.1146/annurev-genet-071719-020519
Source DB: PubMed Journal: Annu Rev Genet ISSN: 0066-4197 Impact factor: 13.826
Figure 1Structure and evolution of alpha satellite arrays. (a) Illustration of the general genomic organization of a human centromeric region, which includes one homogeneous core made of chromosome-specific HORs (red) and the imperfect symmetrical organization of smaller arrays of various other homogeneous HORs [pseudocentromeres or inactive HOR arrays (light gray)], divergent HORs [recent relic centromeres (dark gray)], and multiple distinct divergent monomeric arrays (older relic centromeres, with blocks indicating colors describing phylogenetic assignments listed in Supplemental Table 1). These regions typically include other pericentromeric satellite classes [e.g., HSat1–HSat3 (teal)] and SDs. The entire centromeric region is defined by those sequences in the cenhap (48), presented as gray flanking regions extending into the p-arm and q-arm. Arrayed triangles indicate alpha satellite monomers and HORs of various length and structures composed of several different monomers. (b) Centromere X array haplotype maps, as determined from DXZ1 (S3CXH1L) HOR clustering and divergence data, provide evidence for block organization and gradient of divergence throughout all the layers. Classification of haplotypes is determined by phylogenetic relationships of the DXZ1 HOR repeats, revealing three distinct larger haplotypes (gray, yellow, and light purple). The larger haplotype structure (three major branches on the phylogenetic tree of haplotype consensus HORs) can be further characterized into 14 DXZ1-HOR subgroupings representing individual haplotypes (6, 65). One subbranch (white) represented by one HOR is a hybrid between two other haplotypes. The numbers in parentheses indicate the number of HORs in each clade. The dot plot for the self-aligned DXZ1 array (lighter areas have higher homogeneity) and StV map with few variant HORs (white) are also shown. (c) Kinetochore selection model for satellite array evolution. This model (see Section 2.9) proposes that selfish selection operates on the array through the amplification of the repeat (light blue) due to the association with kinetochore (green) assembly, which locates itself on repeats to which it happens to have maximal affinity. Over time, the new satellite array (light blue) replaces the original satellite array (yellow), which shrinks progressively due to the ongoing deletion process. Centromeric arrays that are no longer associated with the kinetochore are considered dead and are arranged symmetrically, flanking the live arrays. Dead arrays are depicted as light gray (oldest region), dark gray (medium old), and adjacent yellow (newly inactivated dead alpha satellite array). Abbreviations: cenhap, centromere-spanning haplotype; HOR, higher-order repeat; HOR (L): live, or HOR array associated with kinetochore assembly; HSat, classical human satellites; SD, segmental duplication; StV, structural variant of a HOR. Figure adapted from data presented in Reference 6.
Figure 2Epigenetic characterization of three complete centromeric arrays from T2T assemblies of chr1, chrX, and chr8. Access to complete and accurate assemblies of human centromeric regions provides a new opportunity to characterize all live alpha satellite HOR arrays [shown for D1Z7, chr1-SF1 (pink); DXZ1, chrX-SF3 (blue); and D8Z2, chr8-SF2 (purple)] and adjunct dead arrays. Further, these maps offer a high-resolution study of CENP-B-binding motifs (dark green represents repeats where the motif is in forward orientation and light green represents those with a motif in reverse orientation), and pJα-binding site sequences (light purple). Note that the regions enriched in reverse motifs indicate an inversion in centromere 1, the single unique event in all of the live centromeres. With the exception of centromere 8 (where CENP-B boxes and pJα are intermixed in the live array), live arrays within centromeric regions on chromosomes 1 and X contain CENP-B boxes, and flanking divergent monomeric regions contain pJα. The map of CpG methylation in ultralong Nanopore data obtained using long-read mapping protocols (previously described in 67) reveals dips in methylation that are coincident with sites of kinetochore assembly [illustrated with enrichment of CENP-A in native ChIP-seq data (52)]. Abbreviations: CENP-A, centromere protein A; CENP-B, centromere protein B; ChIP-seq, chromatin immunoprecipitation sequencing; chr, chromosome; HOR, higher-order repeat; SF, suprachromosomal family; T2T, telomere-to-telomere.