| Literature DB >> 21926973 |
Ilan Gronau1, Melissa J Hubisz, Brad Gulko, Charles G Danko, Adam Siepel.
Abstract
Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108-157 thousand years ago, that Eurasians diverged from an ancestral African population 38-64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ∼9,000.Entities:
Mesh:
Year: 2011 PMID: 21926973 PMCID: PMC3245873 DOI: 10.1038/ng.937
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Individual Genomes Analyzed in this Paper
| Genome | Population | Technol. | Reads | Red. | Cov. | Depth | HQC | Ref. |
|---|---|---|---|---|---|---|---|---|
| Venter | European | Sanger | 800bp PE | 7.5 | 0.912 | 8.4 | 0.577 | |
| NA18507 | Yoruban | Illumina | 35bp PE | 40.6 | 0.900 | 41.1 | 0.672 | |
| YH | Han Chinese | Illumina | 35bp PE | 36 | 0.896 | 25.4 | 0.671 | |
| SJK | Korean | Illumina | 36, 75bp | 28.95 | 0.903 | 19.7 | 0.672 | |
| ABT | Bantu | SOLiD | 49bp | >30 | 0.874 | 21.4 | 0.641 | |
| KB1 | San | Illumina | 76bp | 23.1 | 0.901 | 23.6 | 0.621 |
Genome identifiers are surnames of sequenced individuals (Venter), identifiers for Coriell DNA samples (NA18507), or abbreviations introduced in published papers (YH, SJK, ABT, and KB1).
Sequencing technology: Sanger = Sanger (capillary) sequencing, Illumina = Illumina GenomeAnalyzer, SOLiD = SOLiD system by Applied Biosystems.
Average read length in bp, and whether or not paired-end (PE) reads were used.
Sequencing redundancy, or fold coverage, as reported in published paper.
Fraction of genome covered by uniquely aligned reads, according to the pipeline used here.
Actual depth: average number of uniquely aligned reads at positions having at least one uniquely aligned read. Excludes duplicate reads.
High quality coverage: fraction of genome covered by aligned reads that pass data quality filters.
KB1 was sequenced using both the 454 and Illumina methods, but this analysis used the more abundant Illumina data.
Fig 1Population phylogeny and genealogies
The population phylogeny assumed in this study, with one diploid genome per population (see Table 1) and a haploid chimpanzee outgroup. The Yoruban and Bantu individuals were included in the analysis as alternative African ingroups (denoted X), because their relationship to one another was uncertain (Supplementary Note). The free parameters in our model include the five population divergence times (τ) and ten effective population sizes (θ), all expressed in units of expected mutations per site. Various “migration bands” (gray arrow), allowing for gene flow between populations, were also considered, with the (constant) migration rates along these bands also treated as free parameters. The two parameters of primary interest were the San (τKHEXS) and African-Eurasian (τKHEX) divergence times. Absolute divergence times (in years) and effective population sizes (in numbers of individuals) were obtained by assuming a human-chimpanzee average genomic divergence time of 5.6–7.6 Mya, with a point estimate of 6.5 Mya.
Fig. 2Results of simulation study
Simulations assumed a population tree like the one shown in Fig. 1 and plausible divergence times, population sizes, and migration scenarios (Supplementary Note). (a) Accuracy of estimated African-Eurasian (τKHEX) and San (τKHEXS) divergence times without migration. Dotted lines indicate the values assumed for the simulations and each boxplot summarizes posterior mean estimates in six separate runs of G-PhoCS. Results are shown for correctly phased data (gold) and integration over unknown phasings (red). A random phasing procedure produced substantially poorer results (Supplementary Fig. 2). Most estimates fall within 10% of the true value, except for the smallest assumed divergence times, where weak information in the data leads to an upward bias. (b) Accuracy of the estimated San divergence time (τKHEXS) and the Yoruban/Bantu population size (θX) in simulations with four levels of constant-rate migration (denoted 0, 1, 2, and 3, in order of increasing strength) from population S to population X. Ratios of estimated to true values are shown when migration is not (blue) and is (red) allowed in the model. Each boxplot summarizes twelve runs. Notice that there is a pronounced bias when migration is present but is not modeled, but this bias is eliminated when migration is added to the model. Simulated and estimated migration rates (measured in expected number of migrants per generation) are shown at right. See Supplementary Figs. 2 & 3 for complete results.
Fig. 3Parameter estimates from real data
Estimates of (a) population divergence times, (b) migration rates, and (c) effective population sizes obtained for various scenarios. In (a) and (c), both mutation-scaled (left) and calibrated (right) y-axes are shown (with a calibration of Tdiv = 6.5 Mya). Results are shown for scenarios with either the Yoruban or Bantu ingroup X, and with or without a migration band between X and the San. Panel (b) shows estimated migration rates for fourteen different migration bands. Only the Yoruban-San (Y-S) and Bantu-San (B-S) migration scenarios are strongly supported. In all panels, each bar represents the mean estimate and 95% credible interval of a single representative run of the program. See Supplementary Tables 2 & 3 and Supplementary Fig. 4 for complete results.
Estimated Divergence Times, with Migration
| Divergence Event | Ingroup (X) | Raw Estimates | Calibrated Estimates | ||
|---|---|---|---|---|---|
| San ( | Yoruban | 0.91 (0.89–0.94) | 113 (110–116) | 131 (127–135) | 153 (149–157) |
| San ( | Bantu | 0.90 (0.88–0.93) | 111 (108–114) | 129 (126–133) | 151 (147–155) |
| AE ( | Yoruban | 0.33 (0.31–0.34) | 40 (38–42) | 47 (44–49) | 55 (51–57) |
| AE ( | Bantu | 0.37 (0.35–0.38) | 46 (43–47) | 53 (50–55) | 62 (59–64) |
Raw and calibrated estimates for the San (τKHEXS) and African-Eurasian (AE) (τKHEX) divergence times. Separate results are shown for the Yoruban and Bantu representatives of the African ingroup population X. In all cases, a migration band between the San and the African ingroup X was included in the model. Raw estimates (mean and 95% Bayesian credible intervals) are given in units of expected mutations per site × 10−4. Calibrated estimates are given in thousands of years (kya), for three different human-chimpanzee calibrations (Tdiv = {5.6, 6.5, 7.6} Mya).