| Literature DB >> 26934921 |
Marta Tomaszkiewicz1, Samarth Rangavittal1, Monika Cechova1, Rebeca Campos Sanchez2, Howard W Fescemyer1, Robert Harris1, Danling Ye1, Patricia C M O'Brien3, Rayan Chikhi4, Oliver A Ryder5, Malcolm A Ferguson-Smith3, Paul Medvedev6, Kateryna D Makova1.
Abstract
The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species.Entities:
Mesh:
Year: 2016 PMID: 26934921 PMCID: PMC4817776 DOI: 10.1101/gr.199448.115
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Sequencing data summary
Figure 1.RecoverY—a novel algorithm for extracting Y Chromosome–specific reads from sequences of flow-sorted material. (A) The expected distribution of k-mer abundances. (B) The abundance of k-mers from paired-end flow-sorted gorilla Y sequencing data. The k-mers with an abundance greater than 100 are considered to be Y-specific or repetitive.
Figure 2.(A) The global workflow applied for the Y Chromosome assembly (see text for details). Four assemblies in the dotted frame are nested within each other. The best assembly is framed in red. (Orange) Illumina data; (blue) PacBio data. All assemblies were filtered against the reference female genome. The total (including Ns) and unambiguous (non-N, shown in parentheses) lengths are shown. N50 is the contig/scaffold length for which all contigs/scaffolds of that length or longer contain half of the assembly length. (B) Gene and (C) palindrome recovery. The heatmaps show how sequences homologous to 25 human genes, eight human palindromes, and 12 chimpanzee-specific palindromes were recovered in the assemblies (see Methods). Genes lost on the chimpanzee Y are marked with an asterisk.
Figure 3.A comparison of the gene content among the hominine Y Chromosomes. (A) X-degenerate genes. (B) Ampliconic genes.
A comparison of the hominine Y Chromosomes
Figure 4.Sizes of ampliconic gene families on the hominine Y Chromosome. The number of functional genes was evaluated for 14 gorilla males using ddPCR (blue), evaluated for two human males using ddPCR and retrieved from the reference human genome sequence (orange), and retrieved from the chimpanzee reference genome sequence (green). For families with intraspecific size variation (Supplemental Table S12), size averages (numbers above bars) and ranges (error bars) are shown.
Figure 5.The workflow for sequencing mammalian Y Chromosomes.