Literature DB >> 22398555

Insights into hominid evolution from the gorilla genome sequence.

Aylwyn Scally1, Julien Y Dutheil, LaDeana W Hillier, Gregory E Jordan, Ian Goodhead, Javier Herrero, Asger Hobolth, Tuuli Lappalainen, Thomas Mailund, Tomas Marques-Bonet, Shane McCarthy, Stephen H Montgomery, Petra C Schwalie, Y Amy Tang, Michelle C Ward, Yali Xue, Bryndis Yngvadottir, Can Alkan, Lars N Andersen, Qasim Ayub, Edward V Ball, Kathryn Beal, Brenda J Bradley, Yuan Chen, Chris M Clee, Stephen Fitzgerald, Tina A Graves, Yong Gu, Paul Heath, Andreas Heger, Emre Karakoc, Anja Kolb-Kokocinski, Gavin K Laird, Gerton Lunter, Stephen Meader, Matthew Mort, James C Mullikin, Kasper Munch, Timothy D O'Connor, Andrew D Phillips, Javier Prado-Martinez, Anthony S Rogers, Saba Sajjadian, Dominic Schmidt, Katy Shaw, Jared T Simpson, Peter D Stenson, Daniel J Turner, Linda Vigilant, Albert J Vilella, Weldon Whitener, Baoli Zhu, David N Cooper, Pieter de Jong, Emmanouil T Dermitzakis, Evan E Eichler, Paul Flicek, Nick Goldman, Nicholas I Mundy, Zemin Ning, Duncan T Odom, Chris P Ponting, Michael A Quail, Oliver A Ryder, Stephen M Searle, Wesley C Warren, Richard K Wilson, Mikkel H Schierup, Jane Rogers, Chris Tyler-Smith, Richard Durbin.   

Abstract

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22398555      PMCID: PMC3303130          DOI: 10.1038/nature10842

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


Humans share many elements of their anatomy and physiology with both gorillas and chimpanzees, and our similarity to these species was emphasised by Darwin and Huxley in the first evolutionary accounts of human origins[1]. Molecular studies confirmed that we are closer to the African apes than to orangutans, and on average closer to chimpanzees than gorillas[2] (Fig. 1a). Subsequent analyses have explored functional differences between the great apes and their relevance to human evolution, assisted recently by reference genome sequences for chimpanzee[3] and orangutan[4]. Here we provide a reference assembly and initial analysis of the gorilla genome sequence, establishing a foundation for the further study of great ape evolution and genetics.
Figure 1

Speciation of the great apes

a, Phylogeny of the great ape family, showing the speciation of human (H), chimpanzee (C), gorilla (G) and orangutan (O). Horizontal lines indicate speciation times within the hominine subfamily and the sequence divergence time between human and orangutan. Interior grey lines illustrate an example of incomplete lineage sorting at a particular genetic locus – in this case (((C, G), H), O) rather than (((H, C), G), O). Below are mean nucleotide divergences between human and the other great apes from the EPO alignment. b, Great ape speciation and divergence times. Upper panel: solid lines show how times for the HC and HCG speciation events estimated by CoalHMM vary with average mutation rate; dashed lines show the corresponding average sequence divergence times, as well as the HO sequence divergence. Blue blocks represent hominid fossil species: each has a vertical extent spanning the range of dates estimated for it in the literature[13,50], and a horizontal position at the maximum mutation rate consistent both with its proposed phylogenetic position and the CoalHMM estimates (including some allowance for ancestral polymorphism in the case of Sivapithecus). The grey shaded region shows that an increase in mutation rate going back in time can accommodate present-day estimates, fossil hypotheses, and a mid-Miocene speciation for orangutan. Lower panel: estimates of the average mutation rate in present-day humans[10-12]; grey bars show 95% confidence intervals, with black lines at the means.

Recent technological developments have dramatically reduced the costs of sequencing, but the assembly of a whole vertebrate genome remains a challenging computational problem. We generated a reference assembly from a single female western lowland gorilla (Gorilla gorilla gorilla) named Kamilah, using 5.4 Gbp of capillary sequence combined with 166.8 Gbp of Illumina read pairs (see Methods Summary). Genes, transcripts and predictions of gene orthologues and paralogues were annotated by Ensembl[5], and additional analysis found evidence for 498 functional long (> 200 bp) intergenic RNA transcripts. Table 1 summarizes the assembly and annotation properties. An assessment of assembly quality using finished fosmid sequences found that typical (N50) stretches of error-free sequence are 7.2 kbp in length, with errors tending to be clustered in repetitive regions. Outside RepeatMasked regions and away from contig ends, the total rate of single-base and indel errors is 0.13 per kbp. See Supplementary Information for further details.
Table 1

Assembly and annotation statistics

AssemblyAnnotation
Total length3,041,976,159 bpProtein-coding genes20,962
Contigs465,847Pseudogenes1,553
Total contig length2,829,670,843 bpRNA genes6,701
Placed contig length2,712,844,129 bpGene exons237,216
Unplaced contig length116,826,714 bpGene transcripts35,727
Max contig length191,556 bplincRNA transcripts498
Contig N5011.8 kbp
Scaffolds22,164
Max scaffold length10,247,101 bp
Scaffold N50914 kbp
We also collected less extensive sequence data for three other gorillas, to enable a comparison of species within the Gorilla genus. Gorillas survive today only within several isolated and endangered populations whose evolutionary relationships are uncertain. In addition to Kamilah, our analysis included two western lowland gorillas, Kwanza (male) and EB(JC) (female), and one eastern lowland, Mukisi (male).

Speciation of the great apes

We included the Kamilah assembly with human, chimpanzee, orangutan and macaque in a 5-way whole genome alignment using the Ensembl EPO pipeline[6] (Table ST3.2). Filtering out low-quality regions of the chimpanzee assembly and regions with many alignment gaps, we obtained 2.01 Gbp of 1:1:1:1 great ape orthologous alignment blocks, to which we then applied a coalescent inference model, CoalHMM, to estimate the timescales and population sizes involved in the speciation of the hominines (African great apes; see Table ST1.1 for terminology), with orangutan as an outgroup (Supplementary Information). Two issues need to be addressed in interpreting the results from CoalHMM (Table ST4.2). Firstly, the results themselves are obtained in units of sequence divergence rather than years, and so need to be scaled by an appropriate yearly mutation rate. Secondly, as with any model, CoalHMM makes several simplifying assumptions whose consequences we need to understand in the context of realistic demography. We discuss these issues in turn. Using a rate of 10−9 mutations per bp per year, derived from fossil calibration of the human-macaque sequence divergence and as used in previous calculations, CoalHMM’s results would correspond to speciation time estimates THC and THCG of 3.7 and 5.95 Mya respectively (Fig. 1b). These dates are consistent with other recent molecular estimates[7,8], but are at variance with certain aspects of the fossil record, including several fossils which have been proposed—though not universally accepted[13]—to be hominins, and therefore to postdate the human-chimpanzee split (Fig. 1b). Indeed the relationship between molecular and fossil evidence has remained difficult to resolve despite the accumulation of genetic data[9]. Direct estimates of the per-generation mutation rate in modern human populations, based on the incidence of disease-causing mutations[10] or sequencing of familial trios[11,12], indicate that a lower value of 0.5-0.6 × 10−9 bp−1y−1 is plausible (based on average hominine generation times of 20 to 25 y). This would give substantially older estimates of approximately 6 and 10 Mya for THC and THCG, potentially in better agreement with the fossil record. However this timetable for hominine speciation must also be reconciled with older events such as the speciation of orangutan, which is thought to have occurred no earlier than the Middle Miocene (12-16 Mya), as fossil apes prior to that differ substantially from what we might expect of an early great ape[14]. This is possible if we allow for mutation rates changing over time, with a mutation rate of around 1 × 10−9 bp−1y−1 in the common ancestor of great apes, decreasing to lower values in all extant species (Fig. 1b). Comparable changes in mutation rate have been observed previously in primate evolution on larger timescales, including an approximately 30% branch length decrease in humans compared to baboons since their common ancestor[15]. A decrease within the great apes is also a predicted consequence of the observed increase in body sizes over this time period and the association of small size with shorter generation times in other primates[16], and is consistent with deviations from a molecular clock seen in sequence divergences of the great apes and macaque (Table ST3.3). We discuss these and other constraints on estimates of great ape speciation times in the Supplementary Information. However we note that Sahelanthropus and Chororapithecus remain difficult to incorporate in this model, and can be accommodated as hominin and gorillin genera only if most of the decrease occurred early in great ape evolution. An alternative explanation for the apparent discrepancy in fossil and genetic dates (leaving aside the issue of whether fossil taxa have been correctly placed) is that ancestral demography may have affected the genetic inferences. Certainly CoalHMM’s model does not fit the data in all respects. Perhaps most importantly, it assumes that ancestral population sizes are constant in time and that no gene flow occurred between separated populations, approximations that may not hold in reality. Simulations (details in Supplementary Information) suggest that an ancestral population bottleneck would have had limited impact on the inference of THC, its influence being captured largely by changes in the model’s effective population size. Under conditions of genetic exchange between populations after the main separation of the chimpanzee and human lineages, the speciation time estimated by CoalHMM represents an average weighted by gene flow over the period of separation. This means in some cases it can be substantially older than the date of most recent exchange. However it would only be more recent than the speciation time inferred from fossils if there had been strong gene flow between populations after the development of derived fossil characteristics. To the extent that this is plausible, for example as part of a non-allopatric speciation process, it constitutes an alternative explanation for the dating discrepancy without requiring a change in mutation rate. In summary, although whole genome comparisons can be strongly conclusive about the ordering of speciation events, the inability to observe past mutation rates means that the timing of events from genetic data remains uncertain. In our view, possible variation in mutation rates allows hominid genomic data to be consistent with values of THC from 5.5 to 7 Mya and THCG from 8.5 to 12 Mya, with ancestral demographic structure potentially adding inherent ambiguity to both events. Better resolution may come from further integrated analysis of fossil and genetic evidence.

Incomplete lineage sorting and selection

The genealogy relating human (H), chimpanzee (C) and gorilla (G) varies between loci across the genome. CoalHMM explicitly models this and infers the genealogy at each position: either the standard ((H,C),G) relationship or the alternatives ((H,G),C) or ((C,G),H), which are the consequences of incomplete lineage sorting (ILS) in the ancestral HC population. We can use the pattern of ILS to explore evolutionary forces during the HCG speciation period. Across the genome we find 30% of bases exhibiting ILS, with no significant difference between the number sorting as ((H,G),C) and ((C,G),H). However, the fraction of ILS varies with respect to genomic position (Fig. 2a) by more than expected under a model of genome-wide neutral evolution (Fig. SF5.1). This variation reflects local differences in the ancestral effective population size Ne during the period between the gorilla and chimpanzee speciation events, most likely due to natural selection reducing Ne and making ILS less likely. Within coding exons mean ILS drops to 22%, and the suppression of ILS extends out to several hundred kbp from coding genes, evident even in raw site patterns before any model inference (Fig. 2b). An analysis of ILS sites in human segmental duplications suggests that assembly errors do not contribute significantly to this signal (Supplementary Information). We therefore attribute it to the effects of linkage around selected mutations, most likely in the form of background selection[17], observing that it is greater around genes with lower dN/dS ratios (Fig. SF8.4). Given that more than 90% of the genome lies within 300 kbp of a coding gene, and noting the similar phenomenon reported for recent human evolution[11], this supports the suggestion that selection has affected almost all of the genome throughout hominid evolution[18].
Figure 2

Genome-wide ILS and selection

a, Variation in incomplete lineage sorting. Each vertical blue line represents the fraction of ILS between human, chimpanzee and gorilla estimated in a 1 Mbp region. Dashed black lines show the average ILS across the autosomes and on X; the red line shows the expected ILS on X, given the autosomal average and assuming neutral evolution. b, Reduction in ILS around protein coding genes. The blue line shows the mean rate of ILS sites normalised by mutation rate as a function of distance upstream or downstream of the nearest gene (see Supplementary Information). The horizontal dashed line indicates the average value outside 300 kbp from the nearest gene; error bars are s.e.m.

In fitting the transitions between genealogies along the alignment, CoalHMM also estimates a regional recombination rate. This is primarily sensitive to ancestral crossover events prior to HC speciation, yet despite the expectation of rapid turnover in recombination hotspots[19], averaged over 1 Mbp windows there is a good correlation with estimates from present-day crossovers in humans (R = 0.49; p < 10−13; Fig. SF5.5), consistent with the conservation of recombination rates between humans and chimpanzees on the 1Mbp scale[19]. As expected, we see reduced ILS (Fig. 2a) and HC sequence divergence dHC (Fig. SF6.1) on the X chromosome, corresponding to a difference in Ne between X and the autosomes within the ancestral HC population. Several factors can contribute to this difference[20], notably the X chromosome’s haploidy in males, which reduces Ne on X by ¾, enhances purifying selection in males, and reduces the recombination rate, thereby increasing the effect of selection via linkage. However, sequence divergence is additionally affected by the mutation rate, which is higher in males than in females, further reducing the relative divergence observed on X[21]. Incorporating the ancestral Ne estimates from CoalHMM, we estimate a ratio of 0.87 ± 0.09 between average mutation rates on X and the autosomes on the HC lineage, corresponding to a male/female mutation rate bias α = 2.3 ± 0.4 (details in Supplementary Information). Previous estimates of α in hominids have ranged from 2 to 7 [22,23]. It is possible that some of the higher values, having been estimated from sequence divergence only and in smaller data sets, were inflated by underestimating the suppression of ancestral Ne on X, in particular due to purifying selection. Our calculation of α assumes that a single speciation time applies across the genome, attributing differences between the X chromosome and autosomes to the factors mentioned above. Patterson et al.[24] proposed an alternative model involving complex speciation, with more recent HC ancestry on X than elsewhere. Given potential confounding factors in demography, selection, mutation rate bias and admixture, our analyses do not discriminate between these models; however if the effective HC separation time on X is indeed reduced in this way it would imply a still lower value of α.

Functional sequence evolution

We looked for loss or gain of unique autosomal sequence within humans, chimpanzees and gorillas by comparing raw sequence data for each in the context of their reference assemblies (Supplementary Information). The total amount is small: 3-7 Mbp per species, distributed genome-wide in fragments no more than a few kbp in length (Table ST7.1). The vast majority (97%) of such material was also found either in orangutan or a more distant primate, indicating loss, and consistent with the expectation that gain is driven primarily by duplication (which our analysis excludes). Some fragments found only in one species overlap coding exons in annotated genes: 6 genes in human, 5 in chimpanzee and 9 in gorilla (Tables ST7.2,3,4), the majority being associated with olfactory receptor proteins or other rapidly-evolving functions such as male fertility and immune response. We did not assemble a gorilla Y chromosome, but by mapping ~6x reads from the male gorillas Kwanza and Mukisi to the human Y we identified several regions in which human single-copy material is missing in gorilla, comprising almost 10% of the accessible male-specific region. Across the Y chromosome there is considerable variation in the copy number of shared material, and the pattern of coverage is quite different from that of reads from a male bonobo mapped in the same way (Fig. SF7.1). Some missing or depleted material overlaps coding genes (Table ST7.5) including for example VCY, a gene expressed specifically in male germ cells which has two copies in human and chimpanzee but apparently only one in gorilla (Supplementary Information.) The resulting picture is consistent with rapid structural evolution of the Y chromosome in the great apes, as previously seen in the chimpanzee-human comparison[25].

Protein evolution

The EPO primate alignment was filtered to produce a high-quality genome-wide set of 11,538 alignments representing orthologous primate coding sequences, which were then scored with codon-based evolutionary models for likelihoods of acceleration or deceleration of the ratio dN/dS of nonsynonymous to synonymous mutation rates in the terminal lineages, ancestral branch, and entire hominine subfamily (Supplementary Information). We find that genes with accelerated rates of evolution across hominines are enriched for functions associated with sensory perception, particularly in relation to hearing and brain development (Table ST8.4G,H). For example, among the most strongly accelerated genes are OTOF (p = 0.0056), LOXHD1 (p < 0.01) and GPR98 (p = 0.0056) which are all associated with diseases causing human deafness (Table ST8.5). GPR98, which also shows significant evidence of positive selection under the branch-site test (p = 0.0081), is highly expressed in the developing central nervous system. The gene with the strongest evidence for acceleration along the branch leading to hominines is RNF213 (branch-site p < 2.9 × 10−9), a gene associated with Moyamoya disease in which blood flow to the brain is restricted due to arterial stenosis[26]. Given that oxygen and glucose consumption scales with total neuron number[27] RNF213 may have played a role in facilitating the evolution of larger brains. Together, these observations are consistent with a major role for adaptive modifications in brain development and sensory perception in hominine evolution. Turning to lineage-specific selection pressures, we find relatively similar numbers of accelerated genes in humans, chimpanzees and gorillas (663, 562 and 535 respectively at nominal p < 0.05, Table ST8.3A) and genome-wide dN/dS ratios (0.256, 0.249, and 0.239 in purifying sites, Table ST8.6) These numbers, which reflect variation in historical effective population sizes as well as environmental pressures, reveal a largely uniform landscape of recent hominine gene evolution - in accordance with previously-published analyses in human and chimpanzee[3,28] (Table ST8.7). Genes with accelerated rates of evolution along the gorilla lineage are most enriched for a number of developmental terms, including ear, hair follicle, gonad, and brain development, and sensory perception of sound. Among the most significantly accelerated genes in gorilla is EVPL (p < 2.2 × 10−5), which encodes a component of the cornified envelope of keratinocytes, and may be related to increased cornification of knuckle pads in gorilla[29]. Interestingly, gorilla and human both yielded brain-associated terms enriched for accelerated genes, but chimpanzee did not (Table ST8.4A-C). Genes expressed in the brain or involved in its development have not typically been associated with positive selection in primates, but our results show that multiple great ape lineages show elevated dN/dS in brain-related genes when evaluated against a primate background. We also identified cases of pairwise parallel evolution among hominines. Human and chimpanzee show the largest amount, with significantly more shared accelerations than expected by chance, while gorilla shares more parallel acceleration with human than with chimpanzee across a range of significance thresholds (Figure SF8.3). Genes involving hearing are enriched in parallel accelerations for all three pairs, but most strongly in gorilla-human (Table ST8.4D-F), calling into question a previous link made between accelerated evolution of auditory genes in humans and language evolution[28]. It is also interesting to note that ear morphology is one of the few external traits in which humans are more similar to gorillas than to chimpanzees[30]. Next we considered gene loss and gain. We found 84 cases of gene loss in gorilla due to the acquisition of a premature stop codon, requiring there to be no close paralogue (Table ST8.8); for example, TEX14, an intercellular bridge protein essential for spermatogenesis in mice. Genome-wide analysis of gene gain is confounded by the difficulty in assembly of closely related paralogues. We therefore resequenced, by finishing overlapping fosmids, three gene clusters known to be under rapid adaptive evolution in primates: the growth hormone cluster[31], the PRM clusters involved in sperm function and the APOBEC cluster implicated in molecular adaptation to viral defence. In the growth hormone cluster we observed four chorionic somatomammotropin (CSH) genes in gorilla compared to three in humans and chimpanzees, with a novel highly similar pair of CSH-like genes in gorilla that share a 3′ end similar to human growth hormone GH2, suggesting a complex evolutionary history as in other primates[31]. We saw sequence but not gene copy number changes in the PRM and APOBEC clusters (Supplementary Information). In several cases, a protein variant thought to cause inherited disease in humans[32] is the only version found in all three gorillas for which we have genome-wide sequence data (Table ST8.9). Striking examples are the dementia-associated variant Arg432Cys in the growth factor PGRN and the hypertrophic cardiomyopathy-associated variant Arg153His in the muscle Z disc protein TCAP, both of which were corroborated by additional capillary sequencing (Table ST8.10). Why variants that appear to cause disease in humans might be associated with a normal phenotype in gorillas is unknown; possible explanations are compensatory molecular changes elsewhere, or differing environmental conditions. Such variants have also been found in both the chimpanzee and macaque genomes[3,33].

Gene transcription and regulation

We carried out an analysis of hominine transcriptome variation using total RNA extracted and sequenced from lymphoblastoid cell lines (LCLs) of one gorilla, two chimpanzees and two bonobos (Supplementary Information), and published RNA sequence data for eight human individuals[34]. After quantifying reads mapping to exons and genes in each species, we calculated the degree of species-specific expression and splicing in 9,746 1:1:1 expressed orthologous genes. On average, human and chimpanzee expression were more similar to each other than either was to gorilla (Fig. SF10.2). However this effect is reduced in genes with a higher proportion of ILS sites, which tend to show greater expression distance between humans and chimpanzees (Fig. 3a). More generally, patterns seen in the relative expression distances between the three species showed a significant overlap with those derived from genomic lineage sorting (p = 0.026; Table ST10.4), demonstrating that ILS can be reflected in functional differences between primate species.
Figure 3

Differences in expression and regulation

a, Mean gene expression distance between human and chimpanzee as a function of the proportion of ILS sites per gene. Each point represents a sliding window of 900 genes (over genes ordered by ILS fraction); s.d. error limits are shown in grey. b, (top) Classification of CTCF sites in the gorilla (EB(JC)) and human (GM12878) LCLs on the basis of species-uniqueness; numbers of alignable CTCF binding sites are shown for each category; (bottom) sequence changes of CTCF motifs embedded in human-specific, shared and gorilla-specific CTCF binding sites located within shared CpG islands, species-specific CpG islands or outside CpG islands. Numbers of CTCF binding sites are shown for each CpG island category. Gorilla and human motif sequences are compared and represented as indels, disruptions (>4 bp gaps), and substitutions.

We also explore species specific variation in splicing[35], by calculating the variance in differential expression of orthologous exons within each gene. In total we found 7% of genes whose between-species variance is significant at the 1% level (based on the distribution of within-human variances, Fig. SF10.5). For example, Fig SF10.6 illustrates gorilla-specific splicing in the SQLE gene, involved in steroid metabolism. We further investigated great ape regulatory evolution by comparing the binding in human and gorilla of CTCF, a protein essential to vertebrate development involved in transcriptional regulation, chromatin loop formation, and protein scaffolding[36]. We performed ChIP-seq of CTCF in a gorilla LCL (from EB(JC)), and compared this with matched human experiments[37], using the EPO alignments to identify species-specific and shared binding regions (Fig. 3b and Supplementary Information). Consistent with previous results reporting strong CTCF binding conservation[38], and in contrast to the rapid turnover of some other transcription factor binding sites[39], we found that approximately 70% of gorilla CTCF binding regions are shared with human. This compares with around 80% pairwise overlaps between three human LCLs (Fig. SF11.1A). Binding regions that are shared among all three human individuals are three times more likely to be shared with gorilla than individual-specific regions (Fig. SF11.1B). The genomic changes leading to loss of CTCF binding differ between regions within CpG islands and those in the rest of the genome. Losses of CTCF binding outside CpG islands and within species-specific CpG regions co-occur with sequence changes in the binding motif, but for shared CpG islands most binding losses have no corresponding motif sequence change (Fig 3b). It is possible that DNA methylation differences are driving this effect, as CTCF binding can be abolished by methylation of specific target regions[36]. Alternatively, CTCF binding within CpG islands may also depend more on other regulators’ binding and less on the CTCF motif itself.

Genetic diversity within Gorilla

Recent studies of molecular and morphological diversity within the Gorilla genus have supported a classification into two species, eastern (G. beringei) and western (G. gorilla)[40], with both species further divided into subspecies (Fig. 4a). Although separated today by over 1000 km, it has been suggested that gene flow has occurred between the eastern and western species since divergence[41]. To investigate this, we collected reduced representation sequence data (Supplementary Information) for another female western lowland gorilla, EB(JC), and a male eastern lowland gorilla, Mukisi.
Figure 4

Gorilla species distribution and divergence

a, Distribution of gorilla species in Africa. The western species (Gorilla gorilla) comprises two subspecies: western lowland gorillas (G. gorilla gorilla) and Cross River gorillas (G. gorilla diehli). Similarly, the eastern species (Gorilla beringei) is subclassified into eastern lowland gorillas (G. beringei graueri) and mountain gorillas (G. beringei beringei). (Based on data in IUCN 2010.) b, Western lowland gorilla Kamilah, source of the reference assembly (photo JR). c, Eastern lowland gorilla Mukisi (photo M. Seres). d, Isolation-migration model of the western and eastern species. NA, NW and NE are ancestral, western and eastern effective populations sizes; m is the migration rate. e, Likelihood surface for migration and split time parameters in the isolation-migration model.

Table 2 summarizes the sequence diversity in these individuals and in Kamilah, based on alignment of sequence data to the gorilla assembly. The ratio of homozygous to heterozygous variant rates for EB(JC) (close to 0.5) is consistent with her coming from the same population as Kamilah (Supplementary Information), and her rate of heterozygosity matches Kamilah’s. Mukisi, on the other hand, has twice the rate of homozygous differences from the assembly, consistent with his coming from a separate population. Furthermore, heterozygosity in Mukisi is much lower, suggesting a reduced population size in the eastern species. This agrees with previous studies based on fewer loci[41], and also with estimates of present-day numbers in the wild, which indicate that whereas the western lowland subspecies may number up to 200,000 individuals, the eastern population as a whole is around ten times smaller[42,43]. Because it manifests in genetic diversity, this disparity must have existed for many millennia, and cannot have resulted solely from the current pressure of human activity in central Africa or recent outbreaks of the Ebola virus.
Table 2

Nucleotide polymorphism in western and eastern gorillas

Speciesheterozygoussite rate (%)homozygoussite rate (%)hom:het ratio
Kamilahwestern lowland0.1890.0015-
EB(JC)western lowland0.1780.100.56
Mukisieastern lowland0.0760.192.5

Rates are based on variants detected by mapping sequence data to the gorilla reference and filtering sites by depth and mapping quality (Supplementary Information). The homozygosity rate for Kamilah is low (and is effectively an error rate) because her sequence was used for assembly. Reduced heterozygosity in Mukisi is not due to familial inbreeding, since there are no long homozygous stretches.

Based on an alignment of the EB(JC) and Mukisi data to the human reference sequence and comparing high confidence genotype calls for the two individuals, we estimate a mean sequence divergence time between them of 1.75 Mya. However the pattern of shared heterozygosity is not consistent with a clean split between western and eastern gorillas (Supplementary Information). Under a model which allows symmetric genetic exchange between the populations after an initial split (Fig. 4d; Supplementary Information), the maximum likelihood species split time is ~0.5 Mya with moderate subsequent exchange of ~0.2 individuals per generation each way between breeding pools, totalling ~5,000 in each direction over 0.5 My (Fig. 4e). Different model assumptions and parameterisations would lead to different values. More extensive sampling and sequencing of both gorilla populations will afford better resolution of this issue. We also collected whole-genome sequence data from an additional male western lowland gorilla ‘Kwanza’ at 12x, and further whole genome sequence data for (eastern) Mukisi at 7x (Supplementary Information). Differences between the western gorillas and Mukisi represent a combination of inter-individual and inter-species variants. These include 1,615 non-synonymous SNPs in 1,326 genes, seven of which have more than four amino acid differences each (Table ST12.2), among which are two olfactory receptor genes and EMR3, implicated in immune and inflammatory responses[44]. Nineteen of the genes annotated in Kamilah carry an apparently homozygous premature stop codon in Mukisi. These include the gene encoding the seminal fluid protein SEMG2, implicated in sperm competition and known to be inactivated in some gorillas, where sperm competition is rare[45]. Both EMR3 and SEMG2 were corroborated by additional sequencing (Tables ST12.3, ST12.4). Finally, we investigated genomic duplication in gorilla using a whole genome shotgun sequence detection method applied to data from the western gorillas Kamilah and Kwanza (Supplementary Information). This revealed a level of private segmental duplication (0.9 Mbp and 1.5 Mbp in the two gorillas) well outside the range found in pairwise comparisons of humans (Fig. SF13.1), where a value of ~100 kbp is typical between any two individuals[46]. These results suggest greater copy number diversity in gorillas than in humans, consistent with previous observations in the great apes [47].

Conclusion

Since the Middle Miocene - an epoch of abundance and diversity for apes throughout Eurasia and Africa, the prevailing pattern of ape evolution has been one of fragmentation and extinction[48]. The present-day distribution of non-human great apes, existing only as endangered and subdivided populations in equatorial forest refugia[43], is a legacy of that process. Even humans, now spread around the world and occupying habitats previously inaccessible to any primate, bear the genetic legacy of past population crises. All other branches of the genus Homo have passed into extinction. It may be that in the condition of Gorilla, Pan and Pongo we see some echo of our own ancestors prior to the last 100,000 years, and perhaps a condition experienced many times over several million years of evolution. It is notable that species within at least three of these genera continued to exchange genetic material long after separation[4,49], a disposition that may have aided their survival in the face of diminishing numbers. As well as teaching us about human evolution, the study of the great apes connects us to a time when our existence was more tenuous, and in doing so, highlights the importance of protecting and conserving these remarkable species.

Methods summary

Assembly

We constructed a hybrid de novo assembly combining 5.4 Gbp of capillary read pairs with the contigs from an initial short read assembly of 166.8 Gbp of Illumina paired reads. Improvements in long-range structure were then guided by human homology, placing contigs into scaffolds wherever read pairs confirmed collinearity between gorilla and human. Base-pair contiguity was improved by local reassembly within each scaffold, merging or extending contigs using Illumina read pairs. Finally we used additional Kamilah BAC and fosmid end pair capillary sequences to provide longer range scaffolding. Base errors were corrected by mapping all Illumina reads back to the assembly and rectifying apparent homozygous variants, while recording the location of heterozygous sites. Further details and other methods are described in Supplementary Information.
  40 in total

1.  Molecular evolution of GH in primates: characterisation of the GH genes from slow loris and marmoset defines an episode of rapid evolutionary change.

Authors:  O C Wallis; Y P Zhang; M Wallis
Journal:  J Mol Endocrinol       Date:  2001-06       Impact factor: 5.098

2.  A fine-scale map of recombination rates and hotspots across the human genome.

Authors:  Simon Myers; Leonardo Bottolo; Colin Freeman; Gil McVean; Peter Donnelly
Journal:  Science       Date:  2005-10-14       Impact factor: 47.728

Review 3.  CTCF: master weaver of the genome.

Authors:  Jennifer E Phillips; Victor G Corces
Journal:  Cell       Date:  2009-06-26       Impact factor: 41.582

4.  Copy number variation analysis in the great apes reveals species-specific patterns of structural variation.

Authors:  Elodie Gazave; Fleur Darré; Carlos Morcillo-Suarez; Natalia Petit-Marty; Angel Carreño; Urko M Marigorta; Oliver A Ryder; Antoine Blancher; Mariano Rocchi; Elena Bosch; Carl Baker; Tomàs Marquès-Bonet; Evan E Eichler; Arcadi Navarro
Journal:  Genome Res       Date:  2011-08-08       Impact factor: 9.043

5.  Evolutionary and biomedical insights from the rhesus macaque genome.

Authors:  Richard A Gibbs; Jeffrey Rogers; Michael G Katze; Roger Bumgarner; George M Weinstock; Elaine R Mardis; Karin A Remington; Robert L Strausberg; J Craig Venter; Richard K Wilson; Mark A Batzer; Carlos D Bustamante; Evan E Eichler; Matthew W Hahn; Ross C Hardison; Kateryna D Makova; Webb Miller; Aleksandar Milosavljevic; Robert E Palermo; Adam Siepel; James M Sikela; Tony Attaway; Stephanie Bell; Kelly E Bernard; Christian J Buhay; Mimi N Chandrabose; Marvin Dao; Clay Davis; Kimberly D Delehaunty; Yan Ding; Huyen H Dinh; Shannon Dugan-Rocha; Lucinda A Fulton; Ramatu Ayiesha Gabisi; Toni T Garner; Jennifer Godfrey; Alicia C Hawes; Judith Hernandez; Sandra Hines; Michael Holder; Jennifer Hume; Shalini N Jhangiani; Vandita Joshi; Ziad Mohid Khan; Ewen F Kirkness; Andrew Cree; R Gerald Fowler; Sandra Lee; Lora R Lewis; Zhangwan Li; Yih-Shin Liu; Stephanie M Moore; Donna Muzny; Lynne V Nazareth; Dinh Ngoc Ngo; Geoffrey O Okwuonu; Grace Pai; David Parker; Heidie A Paul; Cynthia Pfannkoch; Craig S Pohl; Yu-Hui Rogers; San Juana Ruiz; Aniko Sabo; Jireh Santibanez; Brian W Schneider; Scott M Smith; Erica Sodergren; Amanda F Svatek; Teresa R Utterback; Selina Vattathil; Wesley Warren; Courtney Sherell White; Asif T Chinwalla; Yucheng Feng; Aaron L Halpern; Ladeana W Hillier; Xiaoqiu Huang; Pat Minx; Joanne O Nelson; Kymberlie H Pepin; Xiang Qin; Granger G Sutton; Eli Venter; Brian P Walenz; John W Wallis; Kim C Worley; Shiaw-Pyng Yang; Steven M Jones; Marco A Marra; Mariano Rocchi; Jacqueline E Schein; Robert Baertsch; Laura Clarke; Miklós Csürös; Jarret Glasscock; R Alan Harris; Paul Havlak; Andrew R Jackson; Huaiyang Jiang; Yue Liu; David N Messina; Yufeng Shen; Henry Xing-Zhi Song; Todd Wylie; Lan Zhang; Ewan Birney; Kyudong Han; Miriam K Konkel; Jungnam Lee; Arian F A Smit; Brygg Ullmer; Hui Wang; Jinchuan Xing; Richard Burhans; Ze Cheng; John E Karro; Jian Ma; Brian Raney; Xinwei She; Michael J Cox; Jeffery P Demuth; Laura J Dumas; Sang-Gook Han; Janet Hopkins; Anis Karimpour-Fard; Young H Kim; Jonathan R Pollack; Tomas Vinar; Charles Addo-Quaye; Jeremiah Degenhardt; Alexandra Denby; Melissa J Hubisz; Amit Indap; Carolin Kosiol; Bruce T Lahn; Heather A Lawson; Alison Marklein; Rasmus Nielsen; Eric J Vallender; Andrew G Clark; Betsy Ferguson; Ryan D Hernandez; Kashif Hirani; Hildegard Kehrer-Sawatzki; Jessica Kolb; Shobha Patil; Ling-Ling Pu; Yanru Ren; David Glenn Smith; David A Wheeler; Ian Schenck; Edward V Ball; Rui Chen; David N Cooper; Belinda Giardine; Fan Hsu; W James Kent; Arthur Lesk; David L Nelson; William E O'brien; Kay Prüfer; Peter D Stenson; James C Wallace; Hui Ke; Xiao-Ming Liu; Peng Wang; Andy Peng Xiang; Fan Yang; Galt P Barber; David Haussler; Donna Karolchik; Andy D Kern; Robert M Kuhn; Kayla E Smith; Ann S Zwieg
Journal:  Science       Date:  2007-04-13       Impact factor: 47.728

6.  Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors.

Authors:  Ralph Burgess; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2008-07-04       Impact factor: 16.240

7.  Heterogeneous genomic molecular clocks in primates.

Authors:  Seong-Ho Kim; Navin Elango; Charles Warden; Eric Vigoda; Soojin V Yi
Journal:  PLoS Genet       Date:  2006-08-11       Impact factor: 5.917

8.  Personalized copy number and segmental duplication maps using next-generation sequencing.

Authors:  Can Alkan; Jeffrey M Kidd; Tomas Marques-Bonet; Gozde Aksay; Francesca Antonacci; Fereydoun Hormozdiari; Jacob O Kitzman; Carl Baker; Maika Malig; Onur Mutlu; S Cenk Sahinalp; Richard A Gibbs; Evan E Eichler
Journal:  Nat Genet       Date:  2009-08-30       Impact factor: 38.330

9.  Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content.

Authors:  Jennifer F Hughes; Helen Skaletsky; Tatyana Pyntikova; Tina A Graves; Saskia K M van Daalen; Patrick J Minx; Robert S Fulton; Sean D McGrath; Devin P Locke; Cynthia Friedman; Barbara J Trask; Elaine R Mardis; Wesley C Warren; Sjoerd Repping; Steve Rozen; Richard K Wilson; David C Page
Journal:  Nature       Date:  2010-01-13       Impact factor: 49.962

10.  Ensembl 2009.

Authors:  T J P Hubbard; B L Aken; S Ayling; B Ballester; K Beal; E Bragin; S Brent; Y Chen; P Clapham; L Clarke; G Coates; S Fairley; S Fitzgerald; J Fernandez-Banet; L Gordon; S Graf; S Haider; M Hammond; R Holland; K Howe; A Jenkinson; N Johnson; A Kahari; D Keefe; S Keenan; R Kinsella; F Kokocinski; E Kulesha; D Lawson; I Longden; K Megy; P Meidl; B Overduin; A Parker; B Pritchard; D Rios; M Schuster; G Slater; D Smedley; W Spooner; G Spudich; S Trevanion; A Vilella; J Vogel; S White; S Wilder; A Zadissa; E Birney; F Cunningham; V Curwen; R Durbin; X M Fernandez-Suarez; J Herrero; A Kasprzyk; G Proctor; J Smith; S Searle; P Flicek
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

View more
  263 in total

1.  Brain organization of gorillas reflects species differences in ecology.

Authors:  Sarah K Barks; Michael E Calhoun; William D Hopkins; Michael R Cranfield; Antoine Mudakikwa; Tara S Stoinski; Francine G Patterson; Joseph M Erwin; Erin E Hecht; Patrick R Hof; Chet C Sherwood
Journal:  Am J Phys Anthropol       Date:  2014-10-31       Impact factor: 2.868

2.  Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history.

Authors:  Xuming Zhou; Boshi Wang; Qi Pan; Jinbo Zhang; Sudhir Kumar; Xiaoqing Sun; Zhijin Liu; Huijuan Pan; Yu Lin; Guangjian Liu; Wei Zhan; Mingzhou Li; Baoping Ren; Xingyong Ma; Hang Ruan; Chen Cheng; Dawei Wang; Fanglei Shi; Yuanyuan Hui; Yujing Tao; Chenglin Zhang; Pingfen Zhu; Zuofu Xiang; Wenkai Jiang; Jiang Chang; Hailong Wang; Zhisheng Cao; Zhi Jiang; Baoguo Li; Guang Yang; Christian Roos; Paul A Garber; Michael W Bruford; Ruiqiang Li; Ming Li
Journal:  Nat Genet       Date:  2014-11-02       Impact factor: 38.330

3.  Rapid changes in the gut microbiome during human evolution.

Authors:  Andrew H Moeller; Yingying Li; Eitel Mpoudi Ngole; Steve Ahuka-Mundeke; Elizabeth V Lonsdorf; Anne E Pusey; Martine Peeters; Beatrice H Hahn; Howard Ochman
Journal:  Proc Natl Acad Sci U S A       Date:  2014-11-03       Impact factor: 11.205

4.  Human-specific increase of dopaminergic innervation in a striatal region associated with speech and language: A comparative analysis of the primate basal ganglia.

Authors:  Mary Ann Raghanti; Melissa K Edler; Alexa R Stephenson; Lakaléa J Wilson; William D Hopkins; John J Ely; Joseph M Erwin; Bob Jacobs; Patrick R Hof; Chet C Sherwood
Journal:  J Comp Neurol       Date:  2015-12-29       Impact factor: 3.215

5.  Comparative genomic analysis of eutherian tumor necrosis factor ligand genes.

Authors:  Marko Premzl
Journal:  Immunogenetics       Date:  2015-12-09       Impact factor: 2.846

6.  Validating the use of a commercial enzyme immunoassay to measure oxytocin in unextracted urine and saliva of the western lowland gorilla (Gorilla gorilla gorilla).

Authors:  Austin Leeds; Patricia M Dennis; Kristen E Lukas; Tara S Stoinski; Mark A Willis; Mandi W Schook
Journal:  Primates       Date:  2018-07-20       Impact factor: 2.163

7.  Next-Generation Sequencing to Help Monitor Patients Infected with HIV: Ready for Clinical Use?

Authors:  Richard M Gibson; Christine L Schmotzer; Miguel E Quiñones-Mateu
Journal:  Curr Infect Dis Rep       Date:  2014-04       Impact factor: 3.725

8.  The ABO blood group is a trans-species polymorphism in primates.

Authors:  Laure Ségurel; Emma E Thompson; Timothée Flutre; Jessica Lovstad; Aarti Venkat; Susan W Margulis; Jill Moyse; Steve Ross; Kathryn Gamble; Guy Sella; Carole Ober; Molly Przeworski
Journal:  Proc Natl Acad Sci U S A       Date:  2012-10-22       Impact factor: 11.205

9.  Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth.

Authors:  Eleftheria Palkopoulou; Swapan Mallick; Pontus Skoglund; Jacob Enk; Nadin Rohland; Heng Li; Ayça Omrak; Sergey Vartanyan; Hendrik Poinar; Anders Götherström; David Reich; Love Dalén
Journal:  Curr Biol       Date:  2015-04-23       Impact factor: 10.834

10.  Thyroid autoantibodies are rare in nonhuman great apes and hypothyroidism cannot be attributed to thyroid autoimmunity.

Authors:  Holly Aliesky; Cynthia L Courtney; Basil Rapoport; Sandra M McLachlan
Journal:  Endocrinology       Date:  2013-10-03       Impact factor: 4.736

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.