Paul Verdu1, Noah A Rosenberg. 1. Department of Biology, Stanford University, Stanford, California 94305, USA. verdu@stanford.edu
Abstract
Admixed populations have been used for inferring migrations, detecting natural selection, and finding disease genes. These applications often use a simple statistical model of admixture rather than a modeling perspective that incorporates a more realistic history of the admixture process. Here, we develop a general model of admixture that mechanistically accounts for complex historical admixture processes. We consider two source populations contributing to the ancestry of a hybrid population, potentially with variable contributions across generations. For a random individual in the hybrid population at a given point in time, we study the fraction of genetic admixture originating from a specific one of the source populations by computing its moments as functions of time and of introgression parameters. We show that very different admixture processes can produce identical mean admixture proportions, but that such processes produce different values for the variance of the admixture proportion. When introgression parameters from each source population are constant over time, the long-term limit of the expectation of the admixture proportion depends only on the ratio of the introgression parameters. The variance of admixture decreases quickly over time after the source populations stop contributing to the hybrid population, but remains substantial when the contributions are ongoing. Our approach will facilitate the understanding of admixture mechanisms, illustrating how the moments of the distribution of admixture proportions can be informative about the historical admixture processes contributing to the genetic diversity of hybrid populations.
Admixed populations have been used for inferring migrations, detecting natural selection, and finding disease genes. These applications often use a simple statistical model of admixture rather than a modeling perspective that incorporates a more realistic history of the admixture process. Here, we develop a general model of admixture that mechanistically accounts for complex historical admixture processes. We consider two source populations contributing to the ancestry of a hybrid population, potentially with variable contributions across generations. For a random individual in the hybrid population at a given point in time, we study the fraction of genetic admixture originating from a specific one of the source populations by computing its moments as functions of time and of introgression parameters. We show that very different admixture processes can produce identical mean admixture proportions, but that such processes produce different values for the variance of the admixture proportion. When introgression parameters from each source population are constant over time, the long-term limit of the expectation of the admixture proportion depends only on the ratio of the introgression parameters. The variance of admixture decreases quickly over time after the source populations stop contributing to the hybrid population, but remains substantial when the contributions are ongoing. Our approach will facilitate the understanding of admixture mechanisms, illustrating how the moments of the distribution of admixture proportions can be informative about the historical admixture processes contributing to the genetic diversity of hybrid populations.
EXCHANGES of genes between two or more mutually isolated populations can result in new admixed or hybrid populations. For nearly 80 years, statistical models have been used to estimate the proportions of the genetic ancestry of an admixed population that are derived from the various parental source populations (Bernstein 1931; Roberts and Hiorns 1965; Long and Smouse 1983; Long 1991; Chakraborty ; Bertorelle and Excoffier 1998; Pritchard ; Chikhi ; Wang 2003; Tang ) and, more recently, to determine the probable ancestral origins of chromosomal segments within individual genomes (Ungerer ; Falush ; Hoggart ; Patterson ; Baird 2006; Tang ; Sankararaman ; Bercovici and Geiger 2009; Price ). Admixed human populations have been employed in assessing patterns of migration and genetic structure (Parra ; Seldin ; Wang ; Silva-Zolezzi ), detecting natural selection (Workman ; Cavalli-Sforza and Bodmer 1971; Chakraborty and Weiss 1988; Tang ; Oleksyk ; Lohmueller ), and identifying phenotypically important genes through admixture-mapping strategies (McKeigue 1998, 2005; Halder and Shriver 2003; Reich and Patterson 2005; Smith and O’Brien 2005; Buerkle and Lexer 2008; Seldin ).Many recent methods consider admixed populations as statistical combinations of the source populations, treating allele frequencies in a hybrid population as linear combinations of allele frequencies in the source groups. While this perspective is informative in diverse applications for describing the current structure of admixed populations, it does not mechanistically account for the inherent complexity of admixture processes. In the case of humans, throughout history, previously isolated populations have come into contact through colonization waves, forced displacements, and population migrations. Moreover, admixture processes have often been influenced by sociocultural rules on intermarriage in contexts of ethnic conflict or discrimination, slavery, and clan or caste systems. Such complex histories of social behaviors have produced a variety of patterns of genetic variation in different admixed groups (Parra , 2003; Bonilla ; Bedoya ; Chaix ; Wang ; Halder ; Tishkoff ; Verdu ; Bryc ).Mechanistic perspectives that seek to describe the history of admixture processes through time rather than estimating admixture proportions from the source populations in descriptive statistical models have been part of some recent studies of admixture (Briscoe ; Stephens ; Pfaff ), and they have been used to make theoretical predictions of admixture proportions as well as of Wright’s fixation index FST and statistics measuring linkage disequilibrium (Chakraborty and Weiss 1988; Long 1991; Guo ). Most of these approaches have relied on models with a relatively simple dynamic considering a single admixture event between populations, rather than on models that investigate a more complex history of admixture processes.Ewens and Spielman (1995) proposed a mechanistic admixture model that incorporated multiple admixture events involving multiple source populations. This model has been used primarily to evaluate the influence of population subdivision and admixture on the performance of the transmission-disequilibrium test (Ewens and Spielman 1995) and to examine linkage disequilibrium statistics (Guo ). However, complex mechanistic models have not been used to directly determine the influence of admixture histories on the admixture patterns of hybrid populations.In this article, expanding on the models of Ewens and Spielman (1995) and Guo , we develop a general mechanistic model of a historical admixture process. We first introduce the model, the most general form of which considers m source populations that contribute to the ancestry of a hybrid population. We treat the fraction of genetic admixture in the hybrid population originating from a specific source population as a random variable, whose distribution we study over time in the m = 2 case. We next examine the expectation, variance, and higher moments of the admixture fraction as functions of time and of the introgression parameters, and we consider in detail a special case in which admixture is constant across generations. Finally, we conclude with a discussion of the implications of the work for empirical studies of admixture.
The Model
We describe a version of our mechanistic admixture model in which the number of source populations is two. The generalization to m source populations is straightforward, and we provide it in supporting information, File S1, Figure S1, and Table S1.Define population H (“hybrid”) as a population consisting of immigrant individuals from two isolated source populations, S1 and S2, and hybrid individuals who have ancestors from both S1 and S2. The hybrid population can be viewed as having a separate location or status from S1 and S2, so that individuals within H can interbreed with each other and with new immigrants that come from the source populations.We let s1,, s2,, and h be the fractional contributions of populations S1, S2, and H to the hybrid population H at generation g + 1. That is, for a randomly chosen individual in H at generation g + 1, the probabilities that a randomly chosen parent of the individual derives from populations S1, S2, and H are s1,, s2,, and h, respectively. These probabilities can differ in different generations, but for all g ≥ 0, the parameters s1,, s2,, and h have values that are ≥0 and ≤1, such that s1, + s2, + h = 1. At generation 0, the hybrid population is not yet formed. Therefore, h0 = 0 and s1,0 + s2,0 = 1. Hence, considering the period through generation g, in addition to g itself, this model has 2g − 1 independent parameters: one introgression proportion in the first generation and two introgression proportions in each of the next g − 1 generations. A diagram of the model appears in Figure 1.
Figure 1
Diagram of a mechanistic model of admixture involving two isolated source populations.
Diagram of a mechanistic model of admixture involving two isolated source populations.
Admixture fractions for a random individual in the hybrid population
We focus on a key quantity in admixed populations, namely the fraction of admixture from one of the source populations for a random individual in H at a randomly chosen locus. This fraction represents the proportion of the genome of a randomly chosen individual in H that ultimately traces to a specific source.We indicate the possible sources for the (unordered) parents of an individual in H by S1S1, S1S2, S1H, S2H, HH, and S2S2. An individual in generation g ≥ 1 has one of several possible types of parents, each with some probability dependent on the parameters s1,−1, s2,−1, and h−1 (Table 1). If the parents have different ancestries, we do not distinguish the order of the two parents, so that, for example, “S1H” does not convey which specific parent is from population S1 and which is from H.
Table 1
Possible pairs of parents for a random individual in the hybrid population H at generation g, and their probabilities
Populations of origin of the parents of a random individual in population H at generation g ≥ 1
Probability
S1 and S1
s1,g−12
S1 and H (or H and S1)
2s1,g–1hg–1
S1 and S2 (or S2 and S1)
2s1,g–1s2,g–1
H and H
hg−12
S2 and H (or H and S2)
2s2,g–1hg–1
S2 and S2
s2,g−12
Note that at generation 0, h0 = 0 because the hybrid population is not yet formed.
Note that at generation 0, h0 = 0 because the hybrid population is not yet formed.Let Y be a random variable indicating the source populations of the parents of a random individual in H. Let H1, be the admixture fraction from source population S1 for a random individual in population H at a random locus at generation g. Because at generation 0, the hybrid population is not yet formed, h0 = 0, and H1,0 is not defined. Using Table 1, we can write a recursion relation to calculate H1, for all g ≥ 1. For the first generation (g = 1), we haveFor all subsequent generations (g ≥ 2), we haveHere, and are fractions of ancestry from source population S1 for the two parents of a hybrid individual at generation g with Y = HH. We use the superscripts (1) and (2) only to indicate that and are separate independent and identically distributed (IID) random variables, so that if an individual in population H at generation g has two parents from H, the admixture fraction is distributed as the mean of the admixture fractions for two IID random individuals from H in the previous generation.Equations 1 and 2 allow us to analyze the behavior of the admixture fraction from a source population for a random individual in the hybrid population, as a function of the time g and the parameters s1,, s2,, and h for i = 1, 2, … , g − 1. Under our model, the set of possible values of H1, is Using Equations 1 and 2, we can show that for a value q in the set Q, the probability P(H1, = q) that a random individual in the hybrid population at generation g has admixture fraction q can be computed using the following recursion relation (). For the first generation (g = 1), we have andFor all subsequent generations (g ≥ 2), for q in Q,where the function I is defined for all values of q in Q and equalsP(H1, = q) is zero when q is not in Q.We can use Equations 3–5 to examine the evolution of the distribution of H1, across generations. For five scenarios in which the admixture process is constant after the founding of population H (s1, = s1 and s2, = s2 for all g ≥ 1), Figure 2 plots the complete set of values of P(H1,) for the first six generations.
Figure 2
Probability distribution of the admixture fraction from source population S1 [P(H1,), Equations 3–5] for a random individual in the hybrid population at each of several points in time. Rows correspond to distributions of P(H1,) in five scenarios for each of a series of values of g. Columns correspond to five constant admixture processes (s1, = s1 and s2, = s2 for all g ≥ 1). (A) Population H is founded at generation 0 with equal proportions from source populations S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, both source populations do not contribute to H (s1 = s2 = 0). (B) Population H is founded with equal proportions from S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, the source populations contribute equally to H (s1 = s2 = 0.2). (C) Population H is founded with equal proportions from S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, the source populations contribute unequally to H, with S2 contributing more than S1 (s1 = 0.0001, s2 = 0.2). (D) Population H is founded at generation 0 with a greater contribution from S1 than S2 (s1,0 = 0.8, s2,0 = 0.2). Subsequently, the source populations contribute unequally to population H, with S2 contributing more than S1 (s1 = 0.0001, s2 = 0.2). (E) Population H is founded at generation 0 with a greater contribution from S1 than S2 (s1,0 = 0.8, s2,0 = 0.2). Subsequently, the source populations contribute unequally to population H, with S1 contributing more than S2 (s1 = 0.2, s2 = 0.0001).
Probability distribution of the admixture fraction from source population S1 [P(H1,), Equations 3–5] for a random individual in the hybrid population at each of several points in time. Rows correspond to distributions of P(H1,) in five scenarios for each of a series of values of g. Columns correspond to five constant admixture processes (s1, = s1 and s2, = s2 for all g ≥ 1). (A) Population H is founded at generation 0 with equal proportions from source populations S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, both source populations do not contribute to H (s1 = s2 = 0). (B) Population H is founded with equal proportions from S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, the source populations contribute equally to H (s1 = s2 = 0.2). (C) Population H is founded with equal proportions from S1 and S2 (s1,0 = s2,0 = 0.5). Subsequently, the source populations contribute unequally to H, with S2 contributing more than S1 (s1 = 0.0001, s2 = 0.2). (D) Population H is founded at generation 0 with a greater contribution from S1 than S2 (s1,0 = 0.8, s2,0 = 0.2). Subsequently, the source populations contribute unequally to population H, with S2 contributing more than S1 (s1 = 0.0001, s2 = 0.2). (E) Population H is founded at generation 0 with a greater contribution from S1 than S2 (s1,0 = 0.8, s2,0 = 0.2). Subsequently, the source populations contribute unequally to population H, with S1 contributing more than S2 (s1 = 0.2, s2 = 0.0001).In Figure 2A, we consider a scenario in which the hybrid population H is founded with equal contributions from source populations S1 and S2 (s1,0 = s2,0 = ), which do not subsequently contribute to H (s1, = s2, = 0 for all g ≥ 1). We can see that the probability P(H1,)for a random individual in H to exhibit a given fraction of admixture from S1 is distributed symmetrically around at each generation, with a single mode at H1, = for each of the first six generations. This pattern arises from the fact that after a symmetric founding event, in the absence of immigration, no new input enters the admixed population from either source, and the distribution remains symmetric.Figure 2B considers an admixture process with the same starting conditions as in the previous case (s1,0 = s2,0 = ), in which the subsequent contributions from the source populations S1 and S2 to the hybrid population H are symmetric and constant across generations as before, but nonzero (s1, = s2, ≠ 0 for all g ≥ 1). In this case, because at each generation, the two source populations make equal contributions, the distribution of H1, continues to be symmetric around . Instead of being unimodal as in the previous case, however, it is now multimodal. This multimodality arises from the fact that in a scenario with continuing gene flow from the sources, new modes arise as the new immigrants mate with individuals whose admixture fractions lie near preexisting modes.In Figure 2C, we consider an admixture process with a symmetric founding of population H as before (s1,0 = s2,0 = ), in which the subsequent contributions from the source populations S1 and S2 are nonzero and constant across generations, but with S2 contributing more than S1 at each generation (0 ≠ s1, s2, for all g ≥ 1). In this case, the distribution of H1, is no longer symmetric around . Instead, it is shifted toward smaller values after the founding of the hybrid population H. This pattern arises from the fact that in a scenario with continuing gene flow in which at each generation, many more individuals immigrate into H from S2 than from S1, matings between new immigrants and admixed individuals are more likely to occur with immigrants from S2 than with immigrants from S1. Thus, after the symmetric founding of population H, the probability of randomly drawing an individual in H with a high fraction of admixture from population S1 is lower than the probability of drawing an individual with a low fraction of admixture from S1.Figure 2D considers an admixture process in which population S1 contributes more than population S2 to the founding of population H (s1,0 > s2,0 ≠ 0), but with the same subsequent constant admixture process as in Figure 2C (0 ≠ s1, s2, for all g ≥ 1). In this case, the distribution of H1, is no longer symmetric around at generation 1, but is shifted toward higher values of the admixture fraction from source population S1. Nevertheless, as in Figure 2C, the distribution of H1, shifts toward zero in the subsequent generations. As in Figure 2C, in each generation, admixed individuals in population H are more likely to mate with new immigrants from S2 than with new immigrants from S1.Finally, in Figure 2E, we consider a process in which the source population S1 contributes more than population S2 to the hybrid population not only in the founding of population H (s1,0 > s2,0 ≠ 0) but also in each subsequent generation (0 ≠ s1, s2, for all g ≥ 1). In this case, the distribution of H1, is shifted toward high values of the admixture fraction from population S1. Unlike in Figure 2, C and D, in Figure 2E, an individual in population H is more likely to mate with a new immigrant from S1 than with a new immigrant from S2 at each generation following the founding of population H. Thus, unlike in Figure 2, C and D, generation after generation, the probability of randomly drawing an individual in population H with a high fraction of admixture from S1 is higher than that of drawing an individual with a low fraction of admixture from S1.This collection of scenarios illustrates three main points. First, if contributions to the admixed population occur only in the first generation, then the long-term level of admixture continues to reflect the initial conditions. Second, the same starting conditions can lead to quite different long-term patterns, depending on the subsequent contributions to the hybrid population. Third, with constant contributions at each generation, the starting conditions influence the speed with which the distribution of admixture tends toward its long-term distribution, but do not predict the qualitative form of this distribution.
Moments of the admixture fraction for a random individual in the hybrid population
Analysis of the moments of the distribution of admixture as a function of time g can provide a way of understanding features of the distribution and its determinants in the historical admixture process itself. We can utilize the recursion in Equations 1 and 2 to obtain recursions for the expectation, variance, and higher moments of H1, as functions of g and s1,, s2,, and h, for i = 1, 2, … , g − 1. We first obtain a recursion for the expectation E[H1,]. Next, we generalize the method used for finding the expectation, and we obtain a recursion for the kth moment, . Using the case of k = 2, we obtain a recursion for the variance V[H1,].
Expectation of H1,:
Using the law of total expectation, we can obtain an expression for the expectation E[H1,] as a function of conditional expectations for different possible pairs of parents Y for a random individual in population H at generation g:For the first generation, because parents cannot derive from population H, we haveUsing Equations 1 and 2,and for all subsequent generations (g ≥ 2),Recalling that for all g ≥ 0, s1, + s2, + h = 1, h0 = 0, and for all and are IID random variables, we can simplify the recursion expression. For g = 1,and for all subsequent generations (g ≥ 2), we haveThis result demonstrates that for a random individual in the hybrid population H, the expectation of the admixture fraction from population S1 in one generation is a linear function of the corresponding expectation in the previous generation.
Moments of H1,:
Using a similar computation to that employed in obtaining the recursion for the expected admixture, we can write recursions for higher moments of the admixture fraction . For the first generation (g = 1), we have for k ≥ 1,For all g ≥ 2, we havewhere and represent IID random variables for the fractions of ancestry from source population S1 for two hybrid individuals in generation g − 1.Using the law of total expectation, for k ≥ 1, we have for the first generation (g = 1)For g ≥ 2, we haveRecalling that for all g ≥ 0, s1, + s2, + h = 1, h0 = 0, and for all g ≥ 2, and are IID random variables, we can use the binomial theorem to obtain a simplified recursion for the moments of H1,. For the first generation, we haveFor g ≥ 2,Note that by simplifying Equation 16 with k = 1, we obtain for the first generationwhich matches Equation 10. Simplifying Equation 17 with k = 1 using the fact that s1, + s2, + h = 1 for all g ≥ 0, for all subsequent generations (g ≥ 2), we obtainwhich matches Equation 11.
Variance of H1,:
When k = 2, Equations 16 and 17 provide a recursion relation for the second moment of H1,. For the first generation, because s1,0 + s2,0 = 1, we haveFor subsequent generations (g ≥ 2), because s1, + s2, + h = 1 for all g ≥ 0, we obtainWith the relationship , and using Equations 10, 11, 20, and 21, we obtain a recursion for the variance of H1,. For the first generation (g = 1), we haveand for g ≥ 2,This recursion for the variance of the admixture fraction utilizes the variance in the previous generation, along with the expectation in the previous generation and its square.
Special Case: Constant Admixture after the Founding of the Hybrid Population
Using our recursions for the moments of the admixture fraction H1,, we can examine particular cases in which s1,, s2,, and h are specified. Here we consider a special case that reflects a constant process in which admixture occurs in the same way from one generation to the next after the founding of the hybrid population. In this section, we specify that for all g ≥ 1, all introgression parameters are constant in time after the founding of population H (s1, = s1, s2, = s2, and h = h for all g ≥ 1). We first consider a case in which no admixture from source populations S1 and S2 occurs after the founding of the hybrid population.
A single admixture event
Suppose that source populations S1 and S2 do not contribute to the hybrid population after its founding (s1 = s2 = 0, and h = 1). As before, because at generation 0 the hybrid population is not yet formed, we specify that h0 = 0 and s1,0 + s2,0 = 1, with s1,0 and s2,0 both taking values in (0, 1).Under this scenario, we can simplify Equations 10 and 11 for the expected admixture from population S1. Because s1 = s2 = 0 and h = 1, for all g ≥ 1, we haveWhen admixture occurs only in the initial generation, the expected admixture fraction for a random individual in the hybrid population at any generation depends only on the initial contribution from source population S1.Using Equations 22 and 23, V[H1,] follows the recursion relation of a geometric sequence with ratio 1/2 and initial value V[H1,1] = s1,0(1 − s1,0)/2. Therefore, for g ≥ 1,The variance decreases monotonically as a function of g and is smaller when the initial contribution s1,0 from source population S1 is farther away from .The scenario in Figure 2A, in which s1,0 = s2,0 = and s1, = s2, = 0 for all g ≥ 1, provides an example of the setting considered here. In Figure 2A, the distribution of the admixture fraction for a random individual in H becomes increasingly concentrated near as time progresses. As predicted by Equation 24, the mean admixture is constant over time with a value of . As predicted by Equation 25, the variance decreases over time; it eventually approaches zero, so that the admixture fraction for a random individual approaches the mean. This phenomenon can be attributed to the fact that except during the founding event, each mating in the population involves two individuals from the hybrid population itself; no new source of admixture draws the admixture fraction toward extreme values of 0 or 1. Thus, with admixture values equal to the mean of those of their parents, offspring individuals are likely to have intermediate admixture within the unit interval.It is noteworthy that if admixture occurs in a single event, then Equations 24 and 25 provide a basis for estimating the time of the event from the observed mean and variance of admixture. Given mean M and variance V (with V ≠ 0), Equations 24 and 25 yieldIt can be seen from Equation 26 that for a fixed mean, a smaller variance indicates a larger value of g and therefore a longer time since admixture, and for a fixed variance, a smaller value of M(1 − M) indicates a shorter time since admixture.
Nonzero combined contribution from the source populations at each generation
In this section, we consider values of s1 and s2 in [0, 1] and values of h in (0, 1). As before, because at generation 0 the hybrid population is not yet formed, h0 = 0 and s1,0 + s2,0 = 1. This set of assumptions corresponds to a process with a nonzero combined contribution of populations S1 and S2 to H in each generation (s1 + s2 ≠ 0 because h ≠ 1), although we do allow one or the other contribution to be zero (s1 = 0 and s2 ≠ 0 or s1 ≠ 0 and s2 = 0). The contribution of population H to itself in each generation is nonzero (h ≠ 0).Applying Equations 10 and 11, the recursion relation for E[H1,] can be simplified. For the first generation (g = 1), we haveFor all subsequent generations (g ≥ 2),This equation is a nonhomogeneous first-order recurrence of the formwith initial condition E[H1,1] = s1,0, where ψ(g) = s1 and λ = h. Because we consider an admixture process that is constant from one generation to the next and we assume h ≠ 0 and h ≠ 1, we can apply Theorem 3.1.2 of Cull et al. (2005) to Equation 29 to obtain the unique solution for E[H1,]:Figures 3 and 4 illustrate the expected admixture fraction as a function of g under constant admixture, as determined in Equation 30. In Figure 3, we can see that in three admixture scenarios with different parameter values for the founding of the hybrid population H, but with identical introgression parameters constant in the subsequent generations, the expected admixture fraction from the source population S1 approaches the same long-term limit. Moreover, in Figure 4, considering three scenarios with identical founding parameter values (s1,0 and s2,0), but different values for the introgression parameters s1 and s2 in the subsequent generations with identical ratios, s1/s2, the expected admixture fraction also approaches the same long-term limit.
Figure 3
Founding effect: expectation of the admixture fraction from source population S1 (E[H1,], Equation 30) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown: population H founded exclusively by source population S1, population H founded by both source populations S1 and S2 in equal proportions, and population H founded exclusively by source population S2. The subsequent admixture process is identical among three scenarios with different starting conditions and is constant over time after the founding of population H at the first generation: s1 = 0.04 and s2 = 0.08. For all three scenarios, using Equation 31, the long-term limit of the admixture proportion from population S1 is 1/3.
Figure 4
Ratio effect: expectation of the admixture fraction from source population S1 (E[H1,], Equation 30) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown with the same initial founding event for population H (s1,0 = 1, s2,0 = 0). After this founding event, the three scenarios have different proportions of descent from source populations S1 and S2 in H (constant at each generation), but with the same ratio s1/s2 = 2/3. For all three scenarios, using Equation 31, the long-term limit of the admixture proportion from population S1 is 2/5.
Founding effect: expectation of the admixture fraction from source population S1 (E[H1,], Equation 30) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown: population H founded exclusively by source population S1, population H founded by both source populations S1 and S2 in equal proportions, and population H founded exclusively by source population S2. The subsequent admixture process is identical among three scenarios with different starting conditions and is constant over time after the founding of population H at the first generation: s1 = 0.04 and s2 = 0.08. For all three scenarios, using Equation 31, the long-term limit of the admixture proportion from population S1 is 1/3.Ratio effect: expectation of the admixture fraction from source population S1 (E[H1,], Equation 30) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown with the same initial founding event for population H (s1,0 = 1, s2,0 = 0). After this founding event, the three scenarios have different proportions of descent from source populations S1 and S2 in H (constant at each generation), but with the same ratio s1/s2 = 2/3. For all three scenarios, using Equation 31, the long-term limit of the admixture proportion from population S1 is 2/5.Using Equation 30 and the relation s1 + s2 + h = 1 with h ∈ (0, 1), we can compute the long-term limit of E[H1,] as g → ∞:Equation 31 demonstrates that the starting conditions (s1,0 and s2,0) for the founding of the hybrid population H do not influence the long-term limiting expectation, as observed in Figure 3. The limiting expected admixture in Equation 31 can be rewritten as 1 − 1/(1 + s1/s2), showing that the limiting expectation is determined only by the ratio of the constant contributions from populations S1 and S2, as observed in Figure 4.Using Equation 31, we can plot the long-term limit of the expected admixture fraction from source population S1 as a function of the introgression parameters s1 and s2 (Figure 5). When the admixture process is constant over time, for a given value of s2, the long-term expectation of the admixture fraction from the source population S1 increases monotonically with s1. Because the long-term limit depends only on the ratio s1/s2, different introgression proportions as well as different founding scenarios for population H can lead, in the long-term, to the same expected admixture fractions in H.
Figure 5
Long-term limit of the expectation of the admixture fraction from source population S1 (lim→∞E[H1,], Equation 31) as a function of the introgression parameters s1 and s2 when the admixture process is constant over time.
Long-term limit of the expectation of the admixture fraction from source population S1 (lim→∞E[H1,], Equation 31) as a function of the introgression parameters s1 and s2 when the admixture process is constant over time.When the admixture process is constant across generations, we can employ the same methods used for obtaining the expectation of H1, to obtain a solution for . In this case, for the first generation (g = 1), Equation 20 givesFor g ≥ 2, Equation 21 givesAs was true in the case of Equation 28, this equation is a nonhomogeneous first-order recurrence with the formHere, the initial condition is , λ = h/2, and for all g ≥ 2,Using Equation 30, we can simplify Equation 35 for all g ≥ 2, to obtainBecause h ≠ 0 and h ≠ 1, Theorem 3.1.2 of Cull et al. (2005) applies in the same way as in the computation of E[H1,], producing a unique solution for :Decomposing the summation and summing separate geometric series, we obtainwhereWith the relationship , and using Equations 30 and 38, we obtain the variance of H1,:We can simplify Equation 43 to obtain expressions for V[H1,] without the summation . For all values of h in (0, 1) with h ≠ , by summing the geometric series from Equation 43,where A5 = 2hA4/(1 − 2h). For h = , Equation 43 givesFigures 6 and 7 illustrate the variance of the admixture fraction under the special case of constant admixture, computing Equation 43 for different sets of values of the introgression parameters. Figure 6 shows that in three scenarios with different founding parameter values (s1,0 and s2,0), because the admixture process is constant over time and identical among the scenarios, the variance of the admixture fraction from one of the source populations approaches the same long-term limit. In Figure 7, considering two admixture scenarios with identical founding events but opposite constant admixture processes, the variance of the admixture fraction also approaches the same limit.
Figure 6
Founding effect: variance of the admixture fraction from source population S1 (V[H1,], Equation 43) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown: population H founded exclusively by source population S1, population H founded by both source populations S1 and S2 in unequal proportions, with population S1 contributing more than population S2 (s1,0 = 0.8, s2,0 = 0.2), and population H founded exclusively by source population S2. The subsequent admixture process is identical among three scenarios with different starting conditions and is constant over time after the founding of population H at the first generation: s1 = 0.01 and s2 = 0.05. For all three scenarios, using Equation 47, the long-term limit of the variance of the admixture proportion from population S1 is 5/636 ≈ 0.008.
Figure 7
Opposite admixture processes: variance of the admixture fraction from source population S1 (V[H1,], Equation 43) for a random individual in the hybrid population H, when the admixture process is constant over time. Two scenarios are shown with the same initial founding event (s1,0 = 1, s2,0 = 0). After this founding event, the scenarios have opposite proportions of descent from source populations S1 and S2 in H, constant at each generation. For both scenarios, using Equation 47, the long-term limit of the admixture proportion from population S1 is 60/5523 ≈ 0.011.
Founding effect: variance of the admixture fraction from source population S1 (V[H1,], Equation 43) for a random individual in the hybrid population H, when the admixture process is constant over time. Three scenarios are shown: population H founded exclusively by source population S1, population H founded by both source populations S1 and S2 in unequal proportions, with population S1 contributing more than population S2 (s1,0 = 0.8, s2,0 = 0.2), and population H founded exclusively by source population S2. The subsequent admixture process is identical among three scenarios with different starting conditions and is constant over time after the founding of population H at the first generation: s1 = 0.01 and s2 = 0.05. For all three scenarios, using Equation 47, the long-term limit of the variance of the admixture proportion from population S1 is 5/636 ≈ 0.008.Opposite admixture processes: variance of the admixture fraction from source population S1 (V[H1,], Equation 43) for a random individual in the hybrid population H, when the admixture process is constant over time. Two scenarios are shown with the same initial founding event (s1,0 = 1, s2,0 = 0). After this founding event, the scenarios have opposite proportions of descent from source populations S1 and S2 in H, constant at each generation. For both scenarios, using Equation 47, the long-term limit of the admixture proportion from population S1 is 60/5523 ≈ 0.011.In Figures 6 and 7, we can see that for some sets of values of s1,0, s1, and s2, the variance of the admixture fraction from one of the source populations increases monotonically from the beginning of the admixture process until it reaches a maximal value and then decreases monotonically to its long-term limit. In these cases, at the beginning of the admixture process, the source populations introduce considerable variance to the distribution of the admixture fraction for a random individual in H. After a certain amount of time, the proportion of matings that involve members of the hybrid population H with similar admixture fractions increases, reducing the proportion of matings that generate offspring admixture fractions at opposite extremes. Additional matings then occur among individuals with similar admixture, ultimately decreasing the variance of the admixture fraction until V[H1,] approaches its long-term limit.Because h ≠ 0 and h ≠ 1, we can compute the long-term limit of V[H1,] as g→∞ using Equation 43. We obtain, for all values of h in (0, 1),The starting conditions do not influence the long-term limit, as observed in Figure 6.Recalling that s1 + s2 + h = 1, an alternative representation for Equation 46 isIt is possible to see from Equation 47 that if s1 + s2 is fixed, then the limiting variance is greater when both source populations contribute similarly to the hybrid population (s1 ≈ s2) than when one source population contributes more than the other (s1 ≫ s2 or s2 ≫ s1). Additionally, for a fixed ratio s1/s2, the variance is greater when the combined contribution from both source populations, s1 + s2, is greater. This result is sensible, as continuing contributions from the source populations generate individuals with admixture fractions at opposite extremes, thereby increasing the variance of admixture fractions.Using Equation 47, we can plot the long-term limit of the variance of admixture proportions as a function of s1 and s2 (Figure 8). Figure 8 illustrates that the long-term limit of V[H1,] is greater when s1 = s2 and s1 + s2 ≈ 1 (and thus h ≈ 0). This scenario corresponds to an admixture process in which the admixed individuals in population H contribute little to the next generation, and the population H is largely founded anew at each generation from the source populations S1 and S2, with identical proportions.
Figure 8
Long-term limit of the variance of the admixture fraction from source population S1 (lim→∞V[H1,], Equation 47) as a function of the introgression parameters s1 and s2 when the admixture process is constant over time.
Long-term limit of the variance of the admixture fraction from source population S1 (lim→∞V[H1,], Equation 47) as a function of the introgression parameters s1 and s2 when the admixture process is constant over time.When s1 + s2 → 0, with s1/s2 held constant, h → 1 and Equation 47 givesThis scenario corresponds to an admixture process in which populations S1 and S2 found the hybrid population H at the first generation and contribute little in subsequent generations. It tends toward the special case in which the source populations do not further contribute to the hybrid population after the founding event. The result in Equation 48 is consistent with the corresponding limit of Equation 25 for the case of no continuing admixture.Considering our results on the expectation and variance of the admixture fraction together, although different admixture proportions that are constant and nonzero across generations can lead in the long-term to the same expected fraction of admixture, such parameter values can produce different variances. The long-term limiting expectation and variance do not depend on the conditions of the founding event of the hybrid population H; they depend only on the subsequent constant admixture process.
Discussion
Our study provides a theoretical framework for analyzing complex admixture processes that involve dynamic contributions of mutually isolated source populations to the ancestry of a hybrid population. Using our mechanistic approach, we have analytically derived recursions for the expectation, variance, and higher moments of the admixture fractions in a hybrid population. In the special case of constant admixture, we have solved the recursions and analyzed the behavior of the expectation and variance.An important observable quantity that can be estimated in modern admixed populations and used for understanding historical aspects of the admixture process is the mean admixture fraction from a source population. For a hybrid population, this quantity provides a simple summary of its overall level of admixture. However, when a hybrid population is founded in a single admixture event, we have found that the mean admixture fraction is constant across generations and is therefore uninformative about the time since the founding of the hybrid population. When the source populations contribute in a constant manner to the hybrid population after the founding event, very different admixture processes can produce identical expected admixture fractions in the long-term.The behavior of the variance of the admixture fraction is more complex than that of the expectation. First, the variance is not constant in time, and therefore it does contain information about the time since the founding event. Second, the limiting variance can differ quite substantially for processes with the same limiting expectation, with the limit depending on the magnitude of the ongoing contributions from the source populations. Third, a low variance is characteristic of an admixture process that occurred as a single event, whereas higher variance occurs when admixture is ongoing. These results suggest that in addition to the mean admixture, other easily measured quantities such as the variance and higher moments of the admixture fraction are likely to be informative, together with the mean, in statistical procedures for estimating the parameters of the historical admixture model that gives rise to a hybrid population.Numerous statistical methods have been developed to estimate the admixture proportions from given source populations in hybrid populations using, for instance, maximum likelihood (Wang 2003; Tang ; Alexander ), least squares (Roberts and Hiorns 1965; Long and Smouse 1983), coalescence times (Bertorelle and Excoffier 1998), Bayesian approaches (Pritchard ; Corander ; Patterson ), and principal components analysis (Paschou ; McVean 2009; Bryc ). Although many of these methods do estimate a composite parameter representing the time since initial admixture, they generally do not use a full mechanistic approach and have largely not tried to reconstruct the history of the admixture process.Our model incorporates a general variation over time in the relative contributions of the source populations to the hybrid population. Owing to the potentially large number of parameters in a general case with arbitrary changes in admixture with time, it is unclear when the full history of admixture will be identifiable from genetic data. Indeed, as we have focused on the mean and variance of admixture in special cases of constant admixture processes, it is also uncertain how much information will be available for estimation from higher moments in a complex case with more parameters. However, our model is flexible enough to accommodate reductions in the number of parameters through assumptions of constant admixture over periods of many generations or over the entire history of the model. It is thus likely that identifiability can be achieved at least in some cases.The initial theoretical framework that we have developed can be expanded to account for additional aspects of the admixture process for hybrid populations. For instance, in File S1, we extend the approach to consider m potential source populations, deriving general expressions for the moments of the random fraction of admixture originating from any specific one of the m source populations. However, we have not modeled sex-specific contributions from the source populations or assortative mating between hybrid individuals on the basis of their admixture fractions (Risch ). Further, while we have considered the distribution of the admixture fraction across individuals in a hybrid population, we have studied admixture only pointwise in the genome, and we have not investigated variation in admixture across the genome of a random individual. The distribution of the length of chromosomal segments ultimately tracing to a particular source population, and other variables that could potentially be examined in a recombination-based model, could provide a useful additional set of quantities to consider beyond those available in our current formulation.Finally, we have not accounted for genetic drift in the founding populations over the course of the admixture process, a phenomenon that can confound the accurate estimation of admixture proportions (Long 1991). In the future, all of these factors can be incorporated by extending our initial mechanistic admixture model. The various extensions will make it possible to draw more information from genetic data to shed light on the complex mechanisms underlying observed genetic variation in hybrid individuals and populations.
Authors: Michael F Seldin; Chao Tian; Russell Shigeta; Hugo R Scherbarth; Gabriel Silva; John W Belmont; Rick Kittles; Susana Gamron; Alberto Allevi; Simon A Palatnik; Alejandro Alvarellos; Sergio Paira; Cesar Caprarulo; Carolina Guillerón; Luis J Catoggio; Cristina Prigione; Guillermo A Berbotto; Mercedes A García; Carlos E Perandones; Bernardo A Pons-Estel; Marta E Alarcon-Riquelme Journal: Am J Phys Anthropol Date: 2007-03 Impact factor: 2.868
Authors: Gabriel Bedoya; Patricia Montoya; Jenny García; Ivan Soto; Stephane Bourgeois; Luis Carvajal; Damian Labuda; Victor Alvarez; Jorge Ospina; Philip W Hedrick; Andrés Ruiz-Linares Journal: Proc Natl Acad Sci U S A Date: 2006-04-28 Impact factor: 11.205
Authors: Sijia Wang; Nicolas Ray; Winston Rojas; Maria V Parra; Gabriel Bedoya; Carla Gallo; Giovanni Poletti; Guido Mazzotti; Kim Hill; Ana M Hurtado; Beatriz Camrena; Humberto Nicolini; William Klitz; Ramiro Barrantes; Julio A Molina; Nelson B Freimer; Maria Cátira Bortolini; Francisco M Salzano; Maria L Petzl-Erler; Luiza T Tsuneto; José E Dipierri; Emma L Alfaro; Graciela Bailliet; Nestor O Bianchi; Elena Llop; Francisco Rothhammer; Laurent Excoffier; Andrés Ruiz-Linares Journal: PLoS Genet Date: 2008-03-21 Impact factor: 5.917
Authors: Bogdan Pasaniuc; Sriram Sankararaman; Dara G Torgerson; Christopher Gignoux; Noah Zaitlen; Celeste Eng; William Rodriguez-Cintron; Rocio Chapela; Jean G Ford; Pedro C Avila; Jose Rodriguez-Santana; Gary K Chen; Loic Le Marchand; Brian Henderson; David Reich; Christopher A Haiman; Esteban Gonzàlez Burchard; Eran Halperin Journal: Bioinformatics Date: 2013-04-09 Impact factor: 6.937