| Literature DB >> 15898833 |
Jody Hey1.
Abstract
The founding of New World populations by Asian peoples is the focus of considerable archaeological and genetic research, and there persist important questions on when and how these events occurred. Genetic data offer great potential for the study of human population history, but there are significant challenges in discerning distinct demographic processes. A new method for the study of diverging populations was applied to questions on the founding and history of Amerind-speaking Native American populations. The model permits estimation of founding population sizes, changes in population size, time of population formation, and gene flow. Analyses of data from nine loci are consistent with the general portrait that has emerged from archaeological and other kinds of evidence. The estimated effective size of the founding population for the New World is fewer than 80 individuals, approximately 1% of the effective size of the estimated ancestral Asian population. By adding a splitting parameter to population divergence models it becomes possible to develop detailed portraits of human demographic history. Analyses of Asian and New World data support a model of a recent founding of the New World by a population of quite small effective size.Entities:
Mesh:
Year: 2005 PMID: 15898833 PMCID: PMC1131883 DOI: 10.1371/journal.pbio.0030193
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Figure 1Isolation with Migration Models
(A) The basic IM model. The demographic terms are effective population sizes (N1, N2, and NA), gene flow rates (m1 and m2), and population splitting time (t). Also shown are parameters scaled by the neutral mutation rate (u), as they are actually used in the model fitting. Terms for basic demographic parameters, including N, m, t, and u, are not italicized. Note that the migration parameters are identified by the source of migrants as time goes backward in the coalescent. In other words, the migration rate from population 1 to population 2 (i.e., m1) actually corresponds to the movement of genes from population 2 to population 1 as time moves forward.
(B) The IM model with changing population size. An additional parameter, s, is the fraction of NA that forms N1 (i.e., the fraction 1 − s gives rise to N2)
Parameter Summary and Description
Figure 2Approximate Geographic Locations, and Sample Sizes per location, for Each Locus Listed in Table 1
In some cases locations are based on actual geographic locations, in other cases the locations are the approximate center of the geographic region occupied by ethnic groups identified in the original references (Table 1).
Information on Loci Used in the Study
See Dataset S1 and Protocol S1 for more detail.
a The inheritance scalar was set to reflect the expected effective population size experienced by a locus relative to an autosome, assuming equal sex ratios and variance in reproductive success: autosomal loci, 1.0; X-linked loci, 0.75; maternally or paternally inherited loci, 0.25.
b The percentage of basepairs that differ between a human and a chimpanzee sequence.
c F ST is the proportion of variation that lies between samples pooled for Asia and the New World for each locus [70,71]. When divergence is low, calculation may yield a negative value (Undef).
d Data from [72]. The β-globin locus falls near a recombination hotspot [73]. Of the 3,011 bases of a large population genetic study of the β-globin region [72], the 5′ half shows ample evidence of historical recombination by the four-gamete criterion [38], whereas the 3′ half that was used for this study showed no evidence of historical recombination. Divergence from chimpanzees was measured over this region from the available chimpanzee sequence [74].
e Full-length mtDNA sequences were used [75,76]. Because of the need for an absence of homoplasy by the computer program fitting the model, control region sequences were removed and only transversion differences were used.
f Concatenated data from several noncoding regions of the nonrecombining portion of the Y chromosomes (NRY) [48]. Human-chimpanzee divergence for the NRY was estimated from 4,758 noncoding basepairs of the SMCY locus [77].
g Data from [78,79].
h Data from [80,81].
i Haplotypes were determined over multiple points across this locus [82]. A data summary was provided by Yvonne Thorstenson. The region used for this analysis included pieces scattered over 96 kilobasepairs that showed no evidence of recombination in Asian and New World samples. This locus was not included in the estimate of mutation rate per year because of length ambiguity of the sampled sequence and uncertainty over human-chimpanzee divergence.
j Data from [83].
Figure 3Marginal Posterior Probability Densities
Probability densities for each of the parameters described in Figure 1 are shown, as follows: (A) θ1; (B) θ2; (C) θA; (D) t (i.e., t/u); (E) t shown on a scale of years over the range corresponding to a maximum t value of 0.2; (F) s; (G) m 1; and (H) m 2. The analysis in which a high upper limit on the prior distribution for t was used is identified as “high t,” while those analyses with a smaller upper limit on the prior distribution of t are identified as “low t.” Each curve is based upon the results of multiple simulations over millions of Markov chain updates (see Materials and Methods), and is plotted over the specified prior range of that parameter.
Model Parameter Estimates
Parameter estimates are shown for three models described in the text. For those parameters in which the complete posterior distribution appeared to be estimated, the 90% highest posterior density interval was also determined and given as a range (in parentheses). This range is the shortest interval that contains 90% of the probability.
a The location of the highest value of t is at the right margin of the distribution. The location of the secondary peak is also given in parentheses.
NA, not applicable
Estimates of Demographic Quantities
The conversion of model parameters to demographic terms is described in “Analyses” in Materials and Methods.
a The estimated time associated with the highest value of t which is at the right margin of the distribution. The estimated time associated with the secondary peak is given in parentheses.
Contrasting Observed and Expected Levels of Variation
Shown, both within and between populations, are the values of the average number of differences between pairs of sequences.
Exp, expected; Obs, observed
Figure 4The Marginal Densities Obtained by Fitting the Model with Population Size Change to Simulated Data
The input parameters for the simulations were as follows: (A) θ1 = 10; (B) θ2 = 10; (C) θA = 10; (D) t =2.5, (E) s = 0.2, (F) m 1= 0.04; (G) m 2= 0.2 ; and t = 5 (t/2NA = 0.5). For each simulated dataset, coalescent simulations were done for each of 20 loci with identical mutation rates under an infinite sites mutation model, each with sample sizes of 10 for each of the two populations. Each simulated dataset was analyzed using wide uniform prior distributions for each parameter. Each analysis began with a burn-in period of 300,000 steps followed by a primary chain of 3 million to 10 million steps. The curves for parameters θ1 through m are shown in (A) through (G), respectively. For each figure, the true parameter value used in the simulations is shown as a black vertical bar, and the mean of the estimates for the 20 simulations (based on peak locations) is shown as a gray vertical bar.