Literature DB >> 33020265

Dynamic evolution of great ape Y chromosomes.

Monika Cechova1, Rahulsimham Vegesna1,2, Marta Tomaszkiewicz1, Robert S Harris1, Di Chen1, Samarth Rangavittal1,2, Paul Medvedev3,4, Kateryna D Makova5.   

Abstract

The mammalian male-specific Y chromosome plays a critical role in sex determination and male fertility. However, because of its repetitive and haploid nature, it is frequently absent from genome assemblies and remains enigmatic. The Y chromosomes of great apes represent a particular puzzle: their gene content is more similar between human and gorilla than between human and chimpanzee, even though human and chimpanzee share a more recent common ancestor. To solve this puzzle, here we constructed a dataset including Ys from all extant great ape genera. We generated assemblies of bonobo and orangutan Ys from short and long sequencing reads and aligned them with the publicly available human, chimpanzee, and gorilla Y assemblies. Analyzing this dataset, we found that the genus Pan, which includes chimpanzee and bonobo, experienced accelerated substitution rates. Pan also exhibited elevated gene death rates. These observations are consistent with high levels of sperm competition in Pan Furthermore, we inferred that the great ape common ancestor already possessed multicopy sequences homologous to most human and chimpanzee palindromes. Nonetheless, each species also acquired distinct ampliconic sequences. We also detected increased chromatin contacts between and within palindromes (from Hi-C data), likely facilitating gene conversion and structural rearrangements. Our results highlight the dynamic mode of Y chromosome evolution and open avenues for studies of male-specific dispersal in endangered great ape species.
Copyright © 2020 the Author(s). Published by PNAS.

Entities:  

Keywords:  gene content evolution; palindromes; sex chromosomes

Mesh:

Year:  2020        PMID: 33020265      PMCID: PMC7585023          DOI: 10.1073/pnas.2001749117

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


The mammalian male-specific sex chromosome—the Y—is vital for sex determination and male fertility and is a useful marker for population genetics studies. It carries SRY, which encodes the testis-determining factor that initiates male sex determination (1). The human Y also harbors azoospermia factor regions, deletions of which can cause infertility (2). Y chromosome sequences have been used to analyze male dispersal (3) and hybridization sex bias (4) in natural populations. Thus, the Y is important biologically and its sequences have critical practical implications. Moreover, study of the Y is needed to obtain a complete picture of mammalian genome evolution. Yet, due to its repetitive and haploid nature, the Y has been assembled for only a handful of mammalian species (5). Among great apes, the Y has so far been assembled only in human (6), chimpanzee (7), and gorilla (8). A comparative study of these Y assemblies (8) uncovered some unexpected patterns which could not be explained with the data from three species alone. Despite a recent divergence of these species (∼7 million years ago [MYA]) (9), their Y chromosomes differ enormously in size and gene content, in sharp contrast to the stability of the rest of the genome. For example, the chimpanzee Y is only half the size of the human Y, and the percentage of gene families shared by these two chromosomes (68%) that split ∼6 MYA (9) is similar to that shared by human and chicken autosomes that split ∼310 MYA (7). Puzzlingly, in terms of shared genes and overall architecture, the human Y is more similar to the gorilla Y than to the chimpanzee Y even though human and chimpanzee have a more recent common ancestor (8). Y chromosomes from additional great ape species should be sequenced to understand whether high interspecific variability in gene content and architecture is characteristic of all great ape Ys. All great ape Y chromosomes studied thus far include pseudoautosomal regions (PARs), which recombine with the X chromosome, and male-specific X-degenerate, ampliconic, and heterochromatic regions, which evolve as a single linkage group (6–8). The X-degenerate regions are composed of segments with different levels of homology to the X chromosome, which are called strata, corresponding to stepwise losses of X-Y recombination. Because of lack of recombination (except for occasional X-Y gene conversion (10, 11), X-degenerate regions are expected to accumulate gene-disrupting mutations; however, this has not been examined in detail. The ampliconic regions consist of repetitive sequences that have >50% identity to each other and contain palindromes—inverted repeats (separated by a spacer) up to several megabases long the arms of which are >99.9% identical (6). Palindromes are thought to evolve to allow intrachromosomal (Y-Y) gene conversion (12) which rescues the otherwise nonrecombining male-specific regions from deleterious mutations (13). We presently lack knowledge about how conserved palindrome sequences are across great apes. In general, X-degenerate regions are more conserved, whereas ampliconic regions are prone to rearrangements, and heterochromatic regions, which are rich in satellite repeats (14), evolve very rapidly among species (7, 8, 14). However, the evolution of great ape Y chromosomes outside of human, chimpanzee, and gorilla has not been explored. Known Y chromosome protein-coding genes are located in either the X-degenerate or ampliconic regions. X-degenerate genes (16 on the human Y) are single-copy, ubiquitously expressed genes with housekeeping functions (15). Multicopy ampliconic genes (nine gene families on the human Y, eight of which—all but TSPY—are located in palindromes) are expressed only in testis and function during spermatogenesis (6). Some human Y genes are deleted or pseudogenized in other great apes and thus are not essential for all species (8, 16). To illuminate genes essential for male reproduction in nonhuman great apes, all of which are endangered species, a cross-species analysis of Y-gene content evolution is needed. Here we compared the Y chromosomes in five species representing all four great ape genera: the human (Homo) lineage diverged from the chimpanzee (Pan), gorilla (Gorilla), and orangutan (Pongo) lineages ∼6, ∼7, and ∼13 MYA, respectively (9), and the bonobo and chimpanzee lineages (belonging to the genus Pan), which diverged ∼1 MYA (17). We produced draft assemblies of the bonobo and Sumatran orangutan Ys and combined them with the human, chimpanzee, and gorilla Y assemblies (6–8) to construct great ape Y multispecies alignments. This comprehensive dataset enabled us to answer several pivotal questions about the evolution of great ape Y chromosomes. First, we assessed lineage-specific substitution rates and identified species experiencing significant rate acceleration. Second, we determined interspecific gene content turnover. Third, we evaluated the conservation of palindromic sequences and examined chromatin interactions within ampliconic regions. Our results highlight the highly dynamic nature of great ape Y chromosome evolution.

Results

Assemblies.

To obtain Y chromosome assemblies for all major great ape lineages, we augmented publicly available human and chimpanzee assemblies (6, 7) by producing draft bonobo and Sumatran orangutan (henceforth called “orangutan”) assemblies and by improving the gorilla assembly (8) of Y male-specific regions (, and see for details). The resulting assemblies (henceforth called “Y assemblies”) were of high quality, as evidenced by their high degree of homology to the human and chimpanzee Ys () and by the presence of sequences of the most expected homologs (16) of human Y genes (). They also were of sufficient continuity (), particularly when the highly repetitive structure of the Y is taken into account.

Ampliconic and X-Degenerate Scaffolds.

To determine which scaffolds are ampliconic and which are X-degenerate in our bonobo, gorilla, and orangutan Y assemblies [such annotations are already available for the human and chimpanzee Ys (6, 7)], we developed a classifier which combines the copy count in the assemblies with mapping read depth information from whole-genome sequencing of male individuals (). This approach was needed as ampliconic regions can be collapsed in assemblies based on next-generation sequencing data (5). Using this classifier, we identified 12.5, 10.0, and 14.5 Mb of X-degenerate scaffolds in bonobo, gorilla, and orangutan, respectively. The length of ampliconic regions was more variable: 10.8 Mb in bonobo, 4.0 Mb in gorilla, and 2.2 M in orangutan. Due to potential collapse of repeats, we might have underestimated the true lengths of ampliconic regions. However, their length estimates are expected to reflect their complexity: e.g., the complexity might be low in the orangutan Y, which is consistent with a high read depth in its Y ampliconic scaffolds () and with its long gene-harboring repetitive arrays previously found cytogenetically (18).

Alignments.

We aligned the sequences of the Y chromosomes from five great ape species (see for details). The resulting multispecies alignment allowed us to identify species-specific sequences, sequences shared by all species, and sequences shared by some but not all species (). These results were confirmed by pairwise alignments (. For example, as was shown previously (8), the gorilla Y had the highest percentage of its sequence aligning to the human Y (75.5 and 89.6% from multispecies and pairwise alignments, respectively). In terms of sequence identity (), the chimpanzee and bonobo Ys were most similar to each other (99.1 to 99.2% and ∼98% from multispecies and pairwise alignments, respectively), while the orangutan Y had the lowest identity to any other great ape Y chromosomes (∼93 to 94% and ∼92% from multispecies and pairwise alignments, respectively). From multispecies alignments (), the human Y was most similar in sequence to the chimpanzee or bonobo Ys (97.9 and 97.8%, respectively), less similar to the gorilla Y (97.2%), and the least similar to the orangutan Y (93.6%), in agreement with the accepted phylogeny of these species (9). The pairwise alignments confirmed this trend (). These results argue against incomplete lineage sorting at the male-specific Y chromosome locus in great apes.

Substitution Rates on the Y.

We next asked whether the chimpanzee Y chromosome, the architecture and gene content of which differ drastically from the human and gorilla Ys (8), experienced an elevated substitution rate. Using our multispecies Y chromosome alignment, we estimated substitution rates along the branches of the great ape phylogenetic tree (Fig. 1 and see for details). A similar analysis was performed using an alignment of autosomes (Fig. 1). A higher substitution rate on the Y than on the autosomes, i.e., male mutation bias (19), was found for each branch of the phylogeny (Fig. 1 and ). Notably, the Y-to-autosomal substitution rate ratio was higher in the Pan lineage, including the chimpanzee (1.76) and bonobo (1.64) lineages and the lineage of their common ancestor (1.78), than in the human lineage (1.45). These trends did not change after correcting for ancestral polymorphism (). We subsequently used a test akin to the relative rate test (20) and addressed whether the Pan lineage experienced more substitutions than the human lineage (). Using gorilla as an outgroup, we observed a significantly higher number of substitutions that occurred between chimpanzee and gorilla than between human and gorilla. For autosomes, this number was 0.6% higher in the chimpanzeegorilla than in the humangorilla comparison, whereas for the Y, it was 7.9% higher (P < 1 × 10−5 in both cases, χ2-test on the contingency table). Similarly, we observed a higher number of substitutions that occurred between bonobo and gorilla than between human and gorilla. This number was 2.9% higher in the bonobogorilla than in the humangorilla comparison for autosomes and as much as 9.6% higher for the Y (P < 1 × 10−5 in both cases, χ2-test on the contingency table). Thus, while the Pan lineage experienced an elevated substitution rate at both autosomes and the Y, this elevation was particularly strong on the Y.
Fig. 1.

Phylogenetic tree of nucleotide sequences. (A) Y chromosome. (B) Autosomes. Branch lengths (substitutions per 100 sites) were estimated from multispecies alignment blocks including all five species.

Phylogenetic tree of nucleotide sequences. (A) Y chromosome. (B) Autosomes. Branch lengths (substitutions per 100 sites) were estimated from multispecies alignment blocks including all five species.

Gene Content Evolution.

Utilizing sequence assemblies and testis expression data (21), we evaluated gene content and the rates of gene birth and death on the Y chromosomes of five great ape species. First, we examined the presence/absence of homologs of human Y chromosome genes (16 X-degenerate genes + 9 ampliconic gene families = 25 gene families; for multicopy ampliconic gene families, we were not studying copy number variation, but only presence/absence of a family in a species; ). Such data were previously available for the chimpanzee Y, in which 7 of 25 human Y gene families became pseudogenized or deleted (7), and for the gorilla Y, in which only one gene family (VCY) of 25 is absent (8). Here, we compiled the data for bonobo and orangutan. From the 25 gene families present on the human Y, the bonobo Y lacked 7 (HSFY, PRY, TBL1Y, TXLNGY, USP9Y, VCY, and XKRY) and the orangutan Y lacked 5 (TXLNGY, CYorf15A, PRKY, USP9Y, and VCY). Second, our gene annotation pipeline did not identify novel genes in the bonobo and orangutan Y assemblies (), similar to previous results for the chimpanzee (7) and gorilla (8) Ys. Thus, we obtained the complete information about gene family content on the Y chromosome in five great ape species. Using this information and utilizing the macaque Y chromosome (22) as an outgroup, we reconstructed gene content at ancestral nodes and studied the rates of gene birth and death (23) across the great ape phylogeny. Because X-degenerate and ampliconic genes might exhibit different trends, we analyzed them separately (Fig. 2 and ). Considering gene births, none can occur for X-degenerate genes because they were present on the proto-sex chromosomes. Only one gene birth (VCY, in the humanchimpanzeebonobo common ancestor) was observed for ampliconic genes, leading to overall low gene birth rates. Considering gene deaths, three ampliconic gene families and three X-generate genes were lost by the chimpanzeebonobo common ancestor, leading to death rates of 0.095 and 0.049 events/MY, respectively. Bonobo lost an additional ampliconic gene, whereas chimpanzee lost an additional X-degenerate gene, leading to death rates of 0.182 and 0.080 events/MY, respectively. In contrast, no deaths of either ampliconic or X-degenerate genes were observed in human and gorilla. Orangutan did not experience any deaths of ampliconic genes, but lost four X-degenerate genes. Its X-degenerate gene death rate (0.021 events/MY) was still lower than that in the chimpanzee lineage (0.080 events/MY) or in the bonobochimpanzee common ancestor (0.049 events/MY). To summarize, the Pan genus exhibited the highest death rates for both X-degenerate and ampliconic genes across great apes. Additionally, we observed significantly higher nonsynonymous-to-synonymous rate ratios for four X-degenerate genes (DDX3Y, EIF1AY, PRKY, and ZFY) and one ampliconic gene (CDY) in bonobo, chimpanzee, and/or their common ancestor (). However, none of these ratios was significantly greater than one, providing no evidence for positive selection.
Fig. 2.

Evolution of Y chromosome gene content in great apes. The reconstructed history of gene birth and death for X-degenerate (blue) and ampliconic (red) genes was overlaid on the great ape phylogenetic tree (not drawn to scale), using macaque as an outgroup. The rates of gene birth and death (in events per million years) are shown in parentheses (for complete data, see ). The list at the root includes the genes that were present in the common ancestor of great apes and macaque. In addition to most of the genes on the human Y, the macaque Y harbors the X-degenerate MXRA5Y gene, which we found to be deleted in orangutan and pseudogenized in bonobo, chimpanzee, gorilla, and human. We currently cannot find a full-length copy of the VCY gene in bonobo (). TXLNGY and DDX3Y are also known as CYorf15B and DBY, respectively.

Evolution of Y chromosome gene content in great apes. The reconstructed history of gene birth and death for X-degenerate (blue) and ampliconic (red) genes was overlaid on the great ape phylogenetic tree (not drawn to scale), using macaque as an outgroup. The rates of gene birth and death (in events per million years) are shown in parentheses (for complete data, see ). The list at the root includes the genes that were present in the common ancestor of great apes and macaque. In addition to most of the genes on the human Y, the macaque Y harbors the X-degenerate MXRA5Y gene, which we found to be deleted in orangutan and pseudogenized in bonobo, chimpanzee, gorilla, and human. We currently cannot find a full-length copy of the VCY gene in bonobo (). TXLNGY and DDX3Y are also known as CYorf15B and DBY, respectively.

Conservation of Human and Chimpanzee Palindrome Sequences.

Did the palindromes (human palindromes are labeled with P and chimpanzee palindromes are labeled with C) now present on the human Y (P1 to P8) and chimpanzee Y (C1 to C19) evolve before or after the great ape lineages split? To answer this question, we identified the proportions of human and chimpanzee palindrome sequences that aligned to bonobo, orangutan, and gorilla Ys in our multispecies alignments (Fig. 3 and ). Among human palindromes, P5 and P6 were the most conserved (covered by 86 to 96% of other great ape Y assemblies), whereas the majority of P3 sequences were human specific (covered by only 31 to 37% of other great ape Y assemblies). Nevertheless, the common ancestor of great apes likely already had substantial lengths of sequences homologous to P1, P2, and P4 to P8, and some sequences of P3 (Fig. 3). Chimpanzee palindromes C17, C18, and C19 are homologous to human palindromes P8, P7, and P6, respectively (7). Therefore, we focused on the other chimpanzee palindromes and, following ref. 7, divided them into five homologous groups: C1 (C1+C6+C8+C10+C14+C16), C2 (C2+C11+C15), C3 (C3+C12), C4 (C4+C13), and C5 (C5+C7+C9) (). The palindromes in the C3, C4, and C5 groups had substantial proportions (47 to 95%) of their sequences covered by alignments with other great ape Ys (Fig. 3). In contrast, most C2 sequences (85%) were shared only with bonobo, and a substantial proportion of C1 sequences was chimpanzee specific. Nonetheless, the common ancestor of great apes likely already had large amounts of sequences homologous to group C3, C4, and C5 palindromes and also some sequences homologous to group C1 and C2 palindromes (Fig. 3).
Fig. 3.

Evolution of sequences homologous to human and chimpanzee palindromes. (A) Heatmaps showing coverage for each palindrome in each species in the multispecies alignment (the last column includes all palindromes) and box plots representing copy number (natural log) of 1-kb windows which have homology with human or chimpanzee palindromes (the last box plot is for X-degenerate genes; the data for human and chimpanzee can be found in ). (B) The great ape phylogenetic tree (not drawn to scale) with evolution of human (shown in blue) and chimpanzee (red) palindromic sequences overlayed on it. Palindrome names in boldface indicate that their sequences were present in two or more copies. Negative (-) and positive (+) signs indicate loss and gain of palindrome sequence (possibly only partial), respectively. Arrows represent gain (↑) or loss (↓) of palindrome copy number. If several equally parsimonious scenarios were possible, we conservatively assumed a later date of acquisition of the multicopy state for a palindrome (, for additional details; the data are shown in ).

Evolution of sequences homologous to human and chimpanzee palindromes. (A) Heatmaps showing coverage for each palindrome in each species in the multispecies alignment (the last column includes all palindromes) and box plots representing copy number (natural log) of 1-kb windows which have homology with human or chimpanzee palindromes (the last box plot is for X-degenerate genes; the data for human and chimpanzee can be found in ). (B) The great ape phylogenetic tree (not drawn to scale) with evolution of human (shown in blue) and chimpanzee (red) palindromic sequences overlayed on it. Palindrome names in boldface indicate that their sequences were present in two or more copies. Negative (-) and positive (+) signs indicate loss and gain of palindrome sequence (possibly only partial), respectively. Arrows represent gain (↑) or loss (↓) of palindrome copy number. If several equally parsimonious scenarios were possible, we conservatively assumed a later date of acquisition of the multicopy state for a palindrome (, for additional details; the data are shown in ). To determine whether the bonobo, orangutan, and gorilla sequences homologous to human or chimpanzee palindromes were multicopy (i.e., present in more than one copy), and thus could have been arranged as palindromes in the common ancestor of great apes, we obtained their read depths from whole-genome sequencing of their respective males (Fig. 3 and ). This approach was used because we expected that some palindromes were collapsed in our Y assemblies. We also used the data on the homology between human and chimpanzee palindromes summarized from the literature (6–8) (). Using maximum parsimony reconstruction, we concluded () that sequences homologous to P4, P5, P8, and C4 and partial sequences homologous to P1, P2, and C2 were multicopy in the common ancestor of great apes (Fig. 3). Sequences homologous to P3, P6, and C1 were multicopy in the humangorilla common ancestor, and those homologous to P7 and C5 were multicopy in the humanchimpanzee common ancestor (Fig. 3 and ).

Species-Specific Multicopy Sequences in Bonobo, Gorilla, and Orangutan.

In addition to finding sequences homologous to human and/or chimpanzee palindromes, we detected 9.36, 1.73, and 3.35 Mb of species-specific sequences in our bonobo, gorilla, and orangutan Y assemblies (). By mapping male whole-genome sequencing reads to these sequences (), we found that 81, 44, and 30% of them in bonobo, gorilla, and orangutan had a copy number of 2 or above (). Thus, large portions of Y species-specific sequences are multicopy and might harbor species-specific palindromes.

Frequent Chromatin Interactions between and within Palindromes.

Because Y ampliconic regions undergo Y-Y gene conversion and Non-Allelic Homologous Recombination (NAHR) (13), we hypothesized that these processes are facilitated by increased chromatin interactions. To evaluate this, we studied chromatin interactions on the Y utilizing a statistical approach specifically developed for handling Hi-C data originating from repetitive sequences (24). We used publicly available Hi-C data generated for human and chimpanzee induced pluripotent stem cells (iPSCs) (25) and for human umbilical vein endothelial cells (26). We found prominent chromatin contacts both between and within palindromes located inside ampliconic regions on the Y (Fig. 4 ). In fact, the contacts in the human palindromic regions were significantly overrepresented when compared with the expectation based on the proportion of the Y occupied by palindromes (P < 0.001, permutation test with palindromic/nonpalindromic group categories; ), suggesting biological importance. Notably, we observed similar patterns for two different human cell types, as well as for both human and chimpanzee iPSCs (Fig. 4 and ).
Fig. 4.

Chromatin contacts on the human and chimpanzee Y chromosomes, as evaluated from iPSCs. (A) Human Y chromosome contacts with palindromes (highlighted in light blue), pseudoautosomal regions (green), and centromere (red). The schematic representation of the sequencing classes on the Y chromosome is adapted from ref. 6. (B) Chimpanzee Y chromosome contacts with palindromes (highlighted in light blue). The schematic representation of the sequencing classes on the Y chromosome is adapted from ref. 7. (C) Chromatin interactions for the three largest palindromes on the human Y (the next three largest palindromes are displayed in ). To resolve ambiguity due to multimapping reads, each interaction was assigned a probability based on the fraction of reads supporting it (see , for details). Palindrome arms are shown as blue arrows, and the spacer is shown as white space between them.

Chromatin contacts on the human and chimpanzee Y chromosomes, as evaluated from iPSCs. (A) Human Y chromosome contacts with palindromes (highlighted in light blue), pseudoautosomal regions (green), and centromere (red). The schematic representation of the sequencing classes on the Y chromosome is adapted from ref. 6. (B) Chimpanzee Y chromosome contacts with palindromes (highlighted in light blue). The schematic representation of the sequencing classes on the Y chromosome is adapted from ref. 7. (C) Chromatin interactions for the three largest palindromes on the human Y (the next three largest palindromes are displayed in ). To resolve ambiguity due to multimapping reads, each interaction was assigned a probability based on the fraction of reads supporting it (see , for details). Palindrome arms are shown as blue arrows, and the spacer is shown as white space between them. We also hypothesized that arms of the same palindrome interact with each other via chromatin contacts. Our analysis of human Hi-C data from iPSCs (25) suggests that palindrome arms are indeed colocalized—a pattern particularly prominent for the large palindromes P1 and P5 (Fig. 4). These results suggest that, in addition to the enrichment in the local interactions expected to be present in the Hi-C data (27), homologous regions of the two arms of a palindrome interact with each other with high frequency.

Discussion

Substitution Rates.

Higher substitution rates on the Y than on the autosomes, which we found across the great ape phylogeny, confirm another study (28) and are consistent with male mutation bias likely caused by a higher number of cell divisions in the male than in the female germline (19). Higher autosomal substitution rates that we detected in the Pan than Homo lineage corroborate yet another study (29) and can be explained by a shorter generation time in Pan. A higher Y-to-autosomal substitution ratio (i.e., stronger male mutation bias) in the Pan than in the Homo lineage, as observed by us here, could be due to several reasons. First, species with sperm competition produce more sperm and thus undergo a greater number of replication rounds, generating more mutations on the Y and potentially leading to stronger male mutation bias than species without sperm competition (19). Consistent with this expectation, chimpanzee and bonobo experience sperm competition and exhibit strong male mutation bias, as compared with no sperm competition (30) and weak male mutation bias in human and gorilla (). Contradicting this expectation, orangutans have limited sperm competition (30), but exhibit strong male mutation bias (). Second, a shorter spermatogenic cycle can increase the number of replication rounds per time unit and can elevate Y substitution rates, leading to stronger male mutation bias. In agreement with this explanation, the spermatogenic cycle is shorter in chimpanzee than in human (31, 32); the data are limited for other great apes. Third, a stronger male mutation bias would be expected in Pan than in Homo if the ratio of male-to-female generation times was respectively higher (33). However, the opposite is true: this ratio is higher in Homo than in Pan (33). Phylogenetic studies produce estimates of male mutation bias that might be affected by ancient genetic polymorphism in closely related species (28). Even though we corrected for this effect (), our results should be taken with caution because of incomplete data on the sizes of ancestral great ape populations (34). Pedigree studies inferring male mutation bias are unaffected by ancient genetic polymorphism. One such study detected significantly higher male mutation bias in chimpanzee than in human (35), in agreement with our results, while another study found no significant differences in male mutation bias among great apes (36). These two studies analyzed only a handful of trios per species, and thus their conclusions should be reevaluated in larger studies.

Ampliconic Sequences.

We found that substantial portions of most human palindromes, and of most chimpanzee palindrome groups, were likely multicopy (and thus potentially palindromic) in the common ancestor of great apes, suggesting conservation over >13 MY. Moreover, two of the three rhesus macaque palindromes are conserved with human palindromes P4 and P5 (22), indicating conservation over >25 MY. Our study also found species-specific amplification or loss of ampliconic sequences, indicating that their evolution is rapid. Thus, repetitive sequences constitute a biologically significant component of great ape Y chromosomes, and their multicopy state might be selected for. Ampliconic sequences are thought to have evolved multiple times in diverse species to enable Y-Y NAHR including intrachromosomal gene conversion and nonallelic crossing-over (reviewed in ref. 37). Y-Y NAHR can compensate for degeneration in the absence of interchromosomal recombination on the Y by removing deleterious mutations (38, 39), can decrease the drift-driven loss of less mutated alleles, can lead to concerted evolution of repeats (13), and can increase the fixation rate of beneficial mutations (37). Yet, despite its critical importance for the Y, how Y-Y NAHR occurs mechanistically is not well understood. Our analysis of Hi-C data suggested that ampliconic sequences and palindrome arms colocalize on the Y in both human and chimpanzee, potentially facilitating Y-Y NAHR. The latter process is frequently used to explain rapid evolution of the ampliconic gene families’ copy number (40), as well as structural rearrangements (41), some of which lead to spermatogenic failure, sex reversal, and Turner syndrome (42). Previous studies (e.g., reviewed in refs. 12, 13, 37) focused on the role of Y-Y recombination in preserving Y ampliconic gene families, which are critical for spermatogenesis and fertility (6), and suggested that this phenomenon explains the major adaptive role of palindromic sequences. However, two human palindromes, P6 and P7, do not harbor any known protein-coding genes (6) and are multicopy in most great ape species that we examined (Fig. 3 and ). We hypothesize that conservation of these palindromes is driven not by spermatogenesis-related genes, but by elements regulating gene expression (). Indeed, by analyzing ENCODE (43) datasets (), we found candidate open-chromatin and protein-binding sites in P6 and P7 (). Interestingly, these sites were found in tissues other than testis, suggesting that they regulate expression of genes outside of the Y chromosome and echoing findings in Drosophila and mouse Y chromosomes (44, 45). Note that our observations should be considered preliminary because of the limitations (e.g., low read mappability) of studying regulatory elements in repetitive (in this case palindromic) regions and should be confirmed in future studies. We inferred that the gene content in the common ancestor of great apes likely was the same as is currently found in gorilla and included eight ampliconic and 16 X-degenerate genes (Fig. 2). Analyzing the data on the evolution of ampliconic gene content (Fig. 2), palindrome sequence (Fig. 3), and ampliconic gene copy number (fig. 2 in ref. 21) jointly, we can infer which ampliconic genes were present in the multicopy state in the great ape common ancestor. Our results suggest that such an ancestor had multicopy sequences homologous to P1, P2, P4, P5, and P8 (Fig. 3), which carry DAZ, BPY2, CDY, HSFY, XKRY, and VCY on the human Y (5). Except for VCY, which was likely acquired by the humanchimpanzee common ancestor (), the remaining five genes were presumably present as multicopy gene families in the common ancestor of great apes, because three of them—DAZ, BPY2, and CDY—are present as multicopy in all great ape species (21), and the other two—HSFY and XKRY—are present in all great ape species but chimpanzee and bonobo (21), in which they were lost (Fig. 2). The macaque Y ampliconic region has the HSFY and CDY gene families located in palindromes that are homologous to human P4 and P5, respectively (22), providing further evidence of their ancient origins. Additionally, Bhowmick and colleagues (46) argued that major expansions of CDY, HSFY, TSPY, and XKRY families had already occurred in the common ancestor of Old World monkeys and apes. The RBMY gene family was likely present in a single-copy state in the common ancestor of great apes on palindrome P3 (the palindrome P3 sequence is present in a single copy in orangutan) (Fig. 3). However, in human, some RBMY copies are located outside of P3 in inverted repeat 2 (IR2) (6), which implies that this gene family was expanding in part independently of P3. Bhowmick and colleagues (46) suggested that the divergence between RMBY copies located in P3 and IR2 occurred in the common ancestor of Old World monkeys and apes. We discovered that there is only one gene family that was born across the entire great ape Y phylogeny: VCY was acquired by the common ancestor of human and chimpanzee (). As a result, except for this branch, we found uniformly low rates of gene birth. A low rate of ampliconic gene birth contradicts predictions of high birth rate made in previous studies for such genes (37), but suggests that great ape radiation does not provide sufficient time for gene acquisition by ampliconic regions. Ampliconic regions on the Y chromosomes of several other mammals acquired such genes (47, 48); however, the timing of such acquisitions is unknown. We expected to observe a high death rate for X-degenerate genes, but a low death rate for ampliconic genes, because the former genes do not undergo Y-Y gene conversion and thus should accumulate deleterious mutations, whereas the latter genes are multicopy and can be rescued by Y-Y gene conversion. Unexpectedly, the rates of gene death were similar between ampliconic and X-degenerate genes. Indeed, across the great ape Y phylogenetic tree, ∼44.4% of ampliconic gene families were either deleted or pseudogenized, as compared with ∼43.8% of X-degenerate genes. While our data did not support our hypothesis, other findings suggest that death of ampliconic genes is a gradual process. Indeed, ampliconic gene families dead in some great ape species have reduced copy number in other species (21, 40), lowering the chances for Y-Y gene conversion. The rates of gene death varied among great ape species. In particular, we observed high rates of death in the lineages of bonobo, chimpanzee, and their common ancestor. What could be the evolutionary forces driving such a high rate of gene death, likely operating in the Pan lineage continuously since its divergence from the human lineage? First, gene-disrupting or gene deletion mutations could be hitchhiking in haplotypes with beneficial mutations. Positive selection might be acting in the Pan lineage due to sperm competition (49). No gene deaths in the human and gorilla lineages, experiencing no sperm competition, and low gene death rates in orangutan, experiencing limited sperm competition, are consistent with this explanation. Our tests for positive selection acting at protein-coding genes did not produce significant results, although they indicated significantly elevated nonsynonymous-to-synonymous rate ratios for five genes in bonobo, chimpanzee, and/or their common ancestor (). We might have limited power to detect positive selection from phylogenetic data collected for closely related species. Second, the Pan Y could have undergone stronger drift leading to fixation of variants lacking genes that were already in the process of becoming nonessential. At first sight, the existing data contradict this explanation, as nucleotide diversity for the chimpanzee and bonobo Y chromosomes was found to be high (50) and in fact higher than that for the gorilla and orangutan Ys (51). However, only a small number of orangutan and gorilla individuals were examined in the latter study (51), and thus this conclusion needs to be reevaluated in the future.

Future Directions.

Future studies should include sequencing of the Y chromosome for a substantial number of individuals per species, and such data are expected to provide a resolution between the evolutionary scenarios driving high substitution and gene death rates in the Pan lineage. Future investigations should also focus on deciphering the sequences of different copies and isoforms of ampliconic genes (52), which should allow one to examine natural selection potentially operating on those genes in more detail. Because chromatin organization depends on the tissue of origin (26), the high prevalence of intra-ampliconic contacts that we found in two somatic tissues should be confirmed in testis and sperm. Additionally, comparing chromatin organization and evolution of palindromes in the Y vs. X chromosomes should aid in understanding the unique role that repetitive regions might play on the Y. From a more applied perspective, the bonobo and orangutan Y assemblies presented here are useful for developing genetic markers to track male dispersal in these endangered species. This is of utmost importance because both species experience population decreases due to habitat loss. Therefore, our results are expected to be of great utility to conservation genetics efforts aimed at restoring these populations.

Materials and Methods

See , for details.

Assemblies and Alignments.

For bonobo and Sumatran orangutan, we generated and assembled (53) deep-coverage short sequencing reads from male individuals and identified putative Y contigs by mapping them against the corresponding female reference assemblies (54). These contigs were then scaffolded with mate-pair reads (55). The orangutan Y assembly was further improved by merging (56) with another high-quality assembly generated with 10× Genomics technology (57). The bonobo Y assembly was improved by additional scaffolding with long Y-enriched Pacific Biosciences reads (58, 59). We improved the continuity of the gorilla Y assembly by merging two previously published assemblies (8, 60). To remove PARs, we filtered each species-specific Y assembly against the corresponding female reference genome. Great ape Y assemblies were aligned with PROGRESSIVECACTUS (61). Substitution rates were estimated for alignment blocks containing all five species with the GTR model (62) implemented in PHYLOFIT (63).

Gene Content Analysis.

To retrieve the bonobo and orangutan genes, we aligned the scaffolds from their Y chromosome assemblies to the respective species-specific or closest-species-specific reference coding sequences using BWA-MEM (64). Novel gene predictions were evaluated with AUGUSTUS (65). The evolutionary history of Y-gene content and gene birth and death rates was reconstructed using procedures in ref. 23.

Palindrome Analysis.

To analyze conservation of human and chimpanzee palindromes, we found all multispecies alignment blocks that overlap their coordinates and identified the percentage of nonrepetitive bases in such blocks per species. To evaluate the copy number of sequences homologous to human and chimpanzee palindromes in bonobo, gorilla, and orangutan, we mapped whole-genome male sequencing reads to the corresponding 1-kb windows which overlap intervals of the human and chimpanzee Y palindromes using BWA-MEM (64) and compared their read depth with that of single-copy X-degenerate genes. The copy number of species-specific ampliconic sequences in bonobo and orangutan were evaluated similarly. Regulatory factor-binding sites in human palindromes P6 and P7 were extracted from ENCODE (43). To analyze a potential enrichment of ampliconic interactions, Hi-C data (25, 26) were processed with mHi-C (24).
  62 in total

1.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood.

Authors:  Adam Siepel; David Haussler
Journal:  Mol Biol Evol       Date:  2003-12-05       Impact factor: 16.240

Review 2.  The Y chromosomes of the great apes.

Authors:  Pille Hallast; Mark A Jobling
Journal:  Hum Genet       Date:  2017-03-06       Impact factor: 4.132

3.  Reconstruction of highly heterogeneous gene-content evolution across the three domains of life.

Authors:  Wataru Iwasaki; Toshihisa Takagi
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

4.  Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes.

Authors:  Y Q Shirleen Soh; Jessica Alföldi; Tatyana Pyntikova; Laura G Brown; Tina Graves; Patrick J Minx; Robert S Fulton; Colin Kremitzki; Natalia Koutseva; Jacob L Mueller; Steve Rozen; Jennifer F Hughes; Elaine Owens; James E Womack; William J Murphy; Qing Cao; Pieter de Jong; Wesley C Warren; Richard K Wilson; Helen Skaletsky; David C Page
Journal:  Cell       Date:  2014-10-30       Impact factor: 41.582

5.  The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses.

Authors:  Jody Hey
Journal:  Mol Biol Evol       Date:  2009-12-02       Impact factor: 16.240

6.  Direct estimation of mutations in great apes reconciles phylogenetic dating.

Authors:  Søren Besenbacher; Christina Hvilsom; Tomas Marques-Bonet; Thomas Mailund; Mikkel Heide Schierup
Journal:  Nat Ecol Evol       Date:  2019-01-21       Impact factor: 15.460

7.  Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes.

Authors:  Jennifer F Hughes; Helen Skaletsky; Laura G Brown; Tatyana Pyntikova; Tina Graves; Robert S Fulton; Shannon Dugan; Yan Ding; Christian J Buhay; Colin Kremitzki; Qiaoyan Wang; Hua Shen; Michael Holder; Donna Villasana; Lynne V Nazareth; Andrew Cree; Laura Courtney; Joelle Veizer; Holland Kotkiewicz; Ting-Jan Cho; Natalia Koutseva; Steve Rozen; Donna M Muzny; Wesley C Warren; Richard A Gibbs; Richard K Wilson; David C Page
Journal:  Nature       Date:  2012-02-22       Impact factor: 49.962

8.  Inter-chromosomal contact networks provide insights into Mammalian chromatin organization.

Authors:  Stefanie Kaufmann; Christiane Fuchs; Mariya Gonik; Ekaterina E Khrameeva; Andrey A Mironov; Dmitrij Frishman
Journal:  PLoS One       Date:  2015-05-11       Impact factor: 3.240

9.  Correcting palindromes in long reads after whole-genome amplification.

Authors:  Sven Warris; Elio Schijlen; Henri van de Geest; Rahulsimham Vegesna; Thamara Hesselink; Bas Te Lintel Hekkert; Gabino Sanchez Perez; Paul Medvedev; Kateryna D Makova; Dick de Ridder
Journal:  BMC Genomics       Date:  2018-11-06       Impact factor: 3.969

10.  Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon.

Authors:  Kristoffer Sahlin; Marta Tomaszkiewicz; Kateryna D Makova; Paul Medvedev
Journal:  Nat Commun       Date:  2018-11-02       Impact factor: 14.919

View more
  5 in total

1.  New insights into the evolution of human Y chromosome palindromes through mutation and gene conversion.

Authors:  Maria Bonito; Eugenia D'Atanasio; Francesco Ravasini; Selene Cariati; Andrea Finocchio; Andrea Novelletto; Beniamino Trombetta; Fulvio Cruciani
Journal:  Hum Mol Genet       Date:  2021-11-16       Impact factor: 6.150

Review 2.  Probably Correct: Rescuing Repeats with Short and Long Reads.

Authors:  Monika Cechova
Journal:  Genes (Basel)       Date:  2020-12-31       Impact factor: 4.096

3.  Nine out of ten samples were mistakenly switched by The Orang-utan Genome Consortium.

Authors:  Graham L Banes; Emily D Fountain; Alyssa Karklus; Robert S Fulton; Lucinda Antonacci-Fulton; Joanne O Nelson
Journal:  Sci Data       Date:  2022-08-12       Impact factor: 8.501

4.  Amplified Fragments of an Autosome-Borne Gene Constitute a Significant Component of the W Sex Chromosome of Eremias velox (Reptilia, Lacertidae).

Authors:  Artem Lisachov; Daria Andreyushkova; Guzel Davletshina; Dmitry Prokopov; Svetlana Romanenko; Svetlana Galkina; Alsu Saifitdinova; Evgeniy Simonov; Pavel Borodin; Vladimir Trifonov
Journal:  Genes (Basel)       Date:  2021-05-20       Impact factor: 4.096

5.  Sequence Transpositions Restore Genes on the Highly Degenerated W Chromosomes of Songbirds.

Authors:  Luohao Xu; Martin Irestedt; Qi Zhou
Journal:  Genes (Basel)       Date:  2020-10-28       Impact factor: 4.141

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.