| Literature DB >> 22398555 |
Aylwyn Scally1, Julien Y Dutheil, LaDeana W Hillier, Gregory E Jordan, Ian Goodhead, Javier Herrero, Asger Hobolth, Tuuli Lappalainen, Thomas Mailund, Tomas Marques-Bonet, Shane McCarthy, Stephen H Montgomery, Petra C Schwalie, Y Amy Tang, Michelle C Ward, Yali Xue, Bryndis Yngvadottir, Can Alkan, Lars N Andersen, Qasim Ayub, Edward V Ball, Kathryn Beal, Brenda J Bradley, Yuan Chen, Chris M Clee, Stephen Fitzgerald, Tina A Graves, Yong Gu, Paul Heath, Andreas Heger, Emre Karakoc, Anja Kolb-Kokocinski, Gavin K Laird, Gerton Lunter, Stephen Meader, Matthew Mort, James C Mullikin, Kasper Munch, Timothy D O'Connor, Andrew D Phillips, Javier Prado-Martinez, Anthony S Rogers, Saba Sajjadian, Dominic Schmidt, Katy Shaw, Jared T Simpson, Peter D Stenson, Daniel J Turner, Linda Vigilant, Albert J Vilella, Weldon Whitener, Baoli Zhu, David N Cooper, Pieter de Jong, Emmanouil T Dermitzakis, Evan E Eichler, Paul Flicek, Nick Goldman, Nicholas I Mundy, Zemin Ning, Duncan T Odom, Chris P Ponting, Michael A Quail, Oliver A Ryder, Stephen M Searle, Wesley C Warren, Richard K Wilson, Mikkel H Schierup, Jane Rogers, Chris Tyler-Smith, Richard Durbin.
Abstract
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22398555 PMCID: PMC3303130 DOI: 10.1038/nature10842
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Speciation of the great apes
a, Phylogeny of the great ape family, showing the speciation of human (H), chimpanzee (C), gorilla (G) and orangutan (O). Horizontal lines indicate speciation times within the hominine subfamily and the sequence divergence time between human and orangutan. Interior grey lines illustrate an example of incomplete lineage sorting at a particular genetic locus – in this case (((C, G), H), O) rather than (((H, C), G), O). Below are mean nucleotide divergences between human and the other great apes from the EPO alignment. b, Great ape speciation and divergence times. Upper panel: solid lines show how times for the HC and HCG speciation events estimated by CoalHMM vary with average mutation rate; dashed lines show the corresponding average sequence divergence times, as well as the HO sequence divergence. Blue blocks represent hominid fossil species: each has a vertical extent spanning the range of dates estimated for it in the literature[13,50], and a horizontal position at the maximum mutation rate consistent both with its proposed phylogenetic position and the CoalHMM estimates (including some allowance for ancestral polymorphism in the case of Sivapithecus). The grey shaded region shows that an increase in mutation rate going back in time can accommodate present-day estimates, fossil hypotheses, and a mid-Miocene speciation for orangutan. Lower panel: estimates of the average mutation rate in present-day humans[10-12]; grey bars show 95% confidence intervals, with black lines at the means.
Assembly and annotation statistics
| Assembly | Annotation | ||
|---|---|---|---|
| Total length | 3,041,976,159 bp | Protein-coding genes | 20,962 |
| Contigs | 465,847 | Pseudogenes | 1,553 |
| Total contig length | 2,829,670,843 bp | RNA genes | 6,701 |
| Placed contig length | 2,712,844,129 bp | Gene exons | 237,216 |
| Unplaced contig length | 116,826,714 bp | Gene transcripts | 35,727 |
| Max contig length | 191,556 bp | lincRNA transcripts | 498 |
| Contig N50 | 11.8 kbp | ||
| Scaffolds | 22,164 | ||
| Max scaffold length | 10,247,101 bp | ||
| Scaffold N50 | 914 kbp |
Figure 2Genome-wide ILS and selection
a, Variation in incomplete lineage sorting. Each vertical blue line represents the fraction of ILS between human, chimpanzee and gorilla estimated in a 1 Mbp region. Dashed black lines show the average ILS across the autosomes and on X; the red line shows the expected ILS on X, given the autosomal average and assuming neutral evolution. b, Reduction in ILS around protein coding genes. The blue line shows the mean rate of ILS sites normalised by mutation rate as a function of distance upstream or downstream of the nearest gene (see Supplementary Information). The horizontal dashed line indicates the average value outside 300 kbp from the nearest gene; error bars are s.e.m.
Figure 3Differences in expression and regulation
a, Mean gene expression distance between human and chimpanzee as a function of the proportion of ILS sites per gene. Each point represents a sliding window of 900 genes (over genes ordered by ILS fraction); s.d. error limits are shown in grey. b, (top) Classification of CTCF sites in the gorilla (EB(JC)) and human (GM12878) LCLs on the basis of species-uniqueness; numbers of alignable CTCF binding sites are shown for each category; (bottom) sequence changes of CTCF motifs embedded in human-specific, shared and gorilla-specific CTCF binding sites located within shared CpG islands, species-specific CpG islands or outside CpG islands. Numbers of CTCF binding sites are shown for each CpG island category. Gorilla and human motif sequences are compared and represented as indels, disruptions (>4 bp gaps), and substitutions.
Figure 4Gorilla species distribution and divergence
a, Distribution of gorilla species in Africa. The western species (Gorilla gorilla) comprises two subspecies: western lowland gorillas (G. gorilla gorilla) and Cross River gorillas (G. gorilla diehli). Similarly, the eastern species (Gorilla beringei) is subclassified into eastern lowland gorillas (G. beringei graueri) and mountain gorillas (G. beringei beringei). (Based on data in IUCN 2010.) b, Western lowland gorilla Kamilah, source of the reference assembly (photo JR). c, Eastern lowland gorilla Mukisi (photo M. Seres). d, Isolation-migration model of the western and eastern species. NA, NW and NE are ancestral, western and eastern effective populations sizes; m is the migration rate. e, Likelihood surface for migration and split time parameters in the isolation-migration model.
Nucleotide polymorphism in western and eastern gorillas
| Species | heterozygous | homozygous | hom:het ratio | |
|---|---|---|---|---|
| Kamilah | western lowland | 0.189 | 0.0015 | - |
| EB(JC) | western lowland | 0.178 | 0.10 | 0.56 |
| Mukisi | eastern lowland | 0.076 | 0.19 | 2.5 |
Rates are based on variants detected by mapping sequence data to the gorilla reference and filtering sites by depth and mapping quality (Supplementary Information). The homozygosity rate for Kamilah is low (and is effectively an error rate) because her sequence was used for assembly. Reduced heterozygosity in Mukisi is not due to familial inbreeding, since there are no long homozygous stretches.