Literature DB >> 24939468

Investigating population history using temporal genetic differentiation.

Pontus Skoglund1, Per Sjödin2, Tobias Skoglund3, Martin Lascoux4, Mattias Jakobsson5.   

Abstract

The rapid advance of sequencing technology, coupled with improvements in molecular methods for obtaining genetic data from ancient sources, holds the promise of producing a wealth of genomic data from time-separated individuals. However, the population-genetic properties of time-structured samples have not been extensively explored. Here, we consider the implications of temporal sampling for analyses of genetic differentiation and use a temporal coalescent framework to show that complex historical events such as size reductions, population replacements, and transient genetic barriers between populations leave a footprint of genetic differentiation that can be traced through history using temporal samples. Our results emphasize explicit consideration of the temporal structure when making inferences and indicate that genomic data from ancient individuals will greatly increase our ability to reconstruct population history.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  FST; ancient DNA; genetic differentiation; population structure; time-serial sampling

Mesh:

Year:  2014        PMID: 24939468      PMCID: PMC4137715          DOI: 10.1093/molbev/msu192

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Recent advances in molecular genetics have opened up the possibility of using temporal genetic samples to answer biological questions, including studies focusing on viruses (Rodrigo and Felsenstein 1999) and studies of animal and human remains (Shapiro and Hofreiter 2014). DNA extraction from fossils or ancient material was pioneered some 3 decades ago (Higuchi et al. 1984; Pääbo 1985), but the field of ancient DNA has been plagued by problems such as contamination from modern-day DNA, postmortem DNA damage, and low levels of endogenous DNA. However, many problems have been resolved in the last few years. For example, the high frequency of postmortem damage in ancient DNA sequences (Briggs et al. 2007) can be difficult to distinguish from biological polymorphisms, but experimental solutions have been developed, such as using enzymes to repair damaged nucleotides (Briggs et al. 2010). Likewise, problems arising from contamination from present-day individuals can be circumvented using these same postmortem damage patterns (Krause et al. 2010; Meyer et al. 2014; Skoglund et al. 2014), coupled with an assessment of whether the DNA originates from a single individual (Green et al. 2010; Krause et al. 2010). These advances have resulted in a remarkable development, exemplified by the explosion in genomic studies of ancient hominid remains such as the sequencing of the Neandertal genome (Green et al. 2010; Prufer et al. 2014), the Denisova genome (Reich et al. 2010; Meyer et al. 2012), and genomic investigations of several prehistoric humans (Rasmussen et al. 2010; Keller et al. 2012; Sánchez-Quinto et al. 2012; Skoglund et al. 2012; Raghavan et al. 2014). There are even isolated examples of DNA preservation in fossils that are hundreds of thousands years old (Orlando et al. 2013; Meyer et al. 2014). The new sequencing technologies have been instrumental for this development simply because they work with massive amounts of short-fragmented DNA, which is the state in which we find postmortem DNA. Theoretical aspects of temporal genetic differentiation have not been extensively investigated even though many of the classical population-genetic parameters, such as Wright’s F-statistics (Wright 1949), stem from temporal models. For example, temporal differences between ancient samples, as well as between ancient samples and modern-day ones, complicate interpretations of population-genetic structure. Even in the absence of population structure, genetic drift is expected to produce genetic differences between genetic data from different points in time (Krimbas and Tsakas 1971; Waples 1989; Nordborg 1998; Anderson et al. 2000; Wang 2001; Berthier et al. 2002; Beaumont 2003; Depaulis et al. 2009; Nyström et al. 2012), which in practice makes separating historical scenarios of replacement and genetic drift difficult (Nordborg 1998; Serre et al. 2004; Haak et al. 2005; Castroviejo-Fisher et al. 2011; Sjödin et al. 2014). However, it may be desirable to use the temporal structure within a sample to make inferences, because time-structured data offer a new dimension of information for learning about the demographic history. That important information can be extracted from temporal samples is illustrated by the long tradition of using variance in allele frequencies between multi-individual samples from discrete time points to infer effective population size (Krimbas and Tsakas 1971; Waples 1989; Anderson et al. 2000; Wang 2001; Berthier et al. 2002; Beaumont 2003) and methods for using single-locus nonrecombining markers, such as mitochondrial DNA, to infer population size changes (Drummond et al. 2005; Ramakrishnan et al. 2005; Chan et al. 2006; Drummond and Rambaut 2007; Ramakrishnan and Hadly 2009; Navascues et al. 2010; Ho and Shapiro 2011). Furthermore, the coalescent model (Kingman 1982) is readily adapted to accommodate time-serial samples (Rodrigo and Felsenstein 1999) and several simulation tools that handle temporal samples have been developed (Anderson et al. 2005; Jakobsson 2009; Excoffier and Foll 2011). However, the use of genomic data from temporal samples for inferring more complex population histories remains largely unexplored. As the quality and quantity of ancient genomic data is increasing, we need a better understanding of how temporal structure affects genetic differentiation and diversity. In this article, we first illustrate how temporal structure relates to classical models of population structure by calculating Wright’s fixation index, FST, in simple demographic models, which provides an intuitive understanding of the problem at hand. Second, we demonstrate that genetic data from temporal samples can greatly aid inferences of population history by highlighting several instances where wide temporal sampling can provide insights that would be hard to obtain otherwise.

Fundamental Properties of Temporal Genetic Structure

Genetic drift results in differentiation between structured populations (Wright 1940, 1951). In a coalescent framework (Kingman 1982; Hudson 1990; Slatkin 1991), genetic differentiation between populations can be viewed as the effect of a shorter expected time of coalescence for lineages from the same population E[TW] compared with the expected time of coalescence for lineages from different populations E[TB]. A fundamental metric of genetic differentiation in structured populations is Wright’s fixation index FST which, in coalescent terms, corresponds to 1−[E[TW]/((E[TW] + E[TB])/2)], where E[TW] and E[TB] are averaged across populations and comparisons (Slatkin 1991). Taking mutations into account, this can be expressed in terms of probabilities of identity by descent (IBD) such as FST = (fw − fb)/(1 − fb). Here, fb is the probability of IBD for lineages picked from different populations and fw is the probability of IBD for lineages picked from the same population (averaged over the different populations). For instance, if f1 and f2 are the probabilities of IBD in two different groups 1 and 2 In this article, we consider FST for models where samples are drawn from two time points and compare this situation to a model where the two samples are drawn from different populations that diverged at some point in the past (fig. 1).
F

Additivity of genetic drift can result in equivalent genetic differentiation (FST) under temporal structure and population divergence. (A and B) Thirty individuals are sampled from two populations (15 individuals from each population) that diverged at a given time in the past. In (A), both samples are taken at the present, and in (B), one of the samples is taken at 0.5 × 2Ne generations before present and the other sample is taken at present. (C) Thirty individuals are sampled from two discrete time points (15 individuals from each time point) in the history of a continuous population. In all three scenarios, the total time that passes between each sample is T = 2Ne generations. The 15 individuals in each sample are illustrated as a series of red circles or a series of blue circles.

Additivity of genetic drift can result in equivalent genetic differentiation (FST) under temporal structure and population divergence. (A and B) Thirty individuals are sampled from two populations (15 individuals from each population) that diverged at a given time in the past. In (A), both samples are taken at the present, and in (B), one of the samples is taken at 0.5 × 2Ne generations before present and the other sample is taken at present. (C) Thirty individuals are sampled from two discrete time points (15 individuals from each time point) in the history of a continuous population. In all three scenarios, the total time that passes between each sample is T = 2Ne generations. The 15 individuals in each sample are illustrated as a series of red circles or a series of blue circles. If the population size N is constant, the probability of IBD in both the temporal model and the divergence model for lineages picked from the same population is where µ is the mutation rate per site per generation and θ = 4Nµ. This is simply the probability that two lineages coalesce before a mutation occurs (2µ is the mutation rate in the two lineages [ignoring µ2 terms] and (2N)−1 is the coalescence rate). As for the probability of IBD between populations, in the divergence model (fig. 1A) it is where T1 = t1/2N and T2 = t2/2N and t1 and t2 are the times (in generations) to the split of the two populations. This expression is derived from considering that neither lineage can have a mutation before they reach the ancestral population, and once in the ancestral population, they must coalesce before a mutation occurs (as above). Applying the same argument for the temporal model (fig. 1C), two lineages sampled t generations apart will be IBD if there is no mutation in the younger lineage during t generations and, once in the ancestral population, the two lineages coalesce before any of them mutate. Hence, where T = t/2N. If T = T1 + T2 then FST in the temporal and divergence model is the same and Note that this extends naturally also for models with both divergence and temporal samples, such as the model in figure 1B. However, this simple relationship between temporal structure and divergence models only holds when the population size is constant. When the population size is not constant, FST in the temporal and divergence models is expected to be equal only under very specific conditions (see supplementary fig. S1, Supplementary Material online).

Nei’s Estimator of Divergence Time between Populations

Based on a result from Nei (1973), it is commonly stated that expected FST in a divergence model with constant size equals 1−e−/2 (letting T denote the total time in coalescent units that separates two populations as above). This result was derived under a very specific model assumption—namely that all polymorphisms were present in the ancestral population. Furthermore, this only applies when sampling times are equal, because for a temporal model where polymorphisms were present in the ancestral population, we find instead that FST = (1−e−)/2 (supplementary material, Supplementary Material online). Curiously, simulations highlight the generality with which FST responds to genetic drift under constant-size scenarios, because Nei’s case with ascertainment of polymorphic loci in the ancestral population of the divergence model (FST = 1−e−/2) corresponds to the temporal case if ascertainment of polymorphisms is performed at the midpoint between the two temporal samples (fig. 2). Likewise, the expectation when polymorphisms are ascertained in the ancestral population of the temporal model (FST = (1−e−)/2) corresponds to ascertaining in one of the two populations of the divergence model (fig. 2).
F

Dependence of FST in temporal and divergence models conditioning on the allele being polymorphic. (A) FST as a function of T, the total time that separates two populations (two times the population divergence time) or the time that separates two samples in model of samples taken at two different time points. The gray line shows the function FST = 1−e−/2 (Nei 1973). (B) The models used for simulating population-genetic data and computing FST. The split model illustrates a population split T/2 time units in the past and the temporal model that illustrates a single population (of constant size) from which samples have been taken at two time points. Arrows point to where, in time, sites have been ascertained for variation (see main text for a full description of the procedure).

Dependence of FST in temporal and divergence models conditioning on the allele being polymorphic. (A) FST as a function of T, the total time that separates two populations (two times the population divergence time) or the time that separates two samples in model of samples taken at two different time points. The gray line shows the function FST = 1−e−/2 (Nei 1973). (B) The models used for simulating population-genetic data and computing FST. The split model illustrates a population split T/2 time units in the past and the temporal model that illustrates a single population (of constant size) from which samples have been taken at two time points. Arrows point to where, in time, sites have been ascertained for variation (see main text for a full description of the procedure).

FST and the Combined Effect of Migration and Temporal Structure

We study the effect of migration by considering a simple island/stepping-stone model with two populations/demes of equal size N and a symmetric migration rate, m, between them and with the two populations being sampled t generations apart. In this case, FST can be shown to be where θ = 4Nµ, M = 2Nm, and T = t/2N (see supplementary material, Supplementary Material online). As M increases, this expression converges to the formula for FST in a pure temporal model with a population of constant but twice as large effective population size (so that the scaled mutation rate is larger by a factor 2, whereas the scaled time is half as large, compare to eq. 5 above). Intuitively, increasing the migration rate lowers FST, whereas an increase in time between the sampled time points increases FST (fig. 3). Importantly, for a fixed value of FST (and θ), there is no definite solution in terms of M because this will depend on T, so that FST is not a direct measure of migration rate under this model unless T is known (which requires that N and t are known). This is similar to the difficulty associated with differentiating between population split time and migration in spatial divergence models (Nielsen and Wakeley 2001).
F

Dependence of FST in a simple temporal island model. The model consists of two equally sized populations with symmetric migration rate. The X axis shows the separation in (scaled) time between the samples. θ (the scaled mutation rate) is set to 0.1. The continuous lines show how FST for models with different (scaled) migration rates depends on time separation. The dotted lines show FST if the samples are not separated in time or, equivalently, if the effective population size is infinitely large. The red line is the limit when migration is infinitely fast. In this case, the model is identical to a purely temporal model with a doubled (and constant) population size (the larger population size implies that θ is twice as large). The dark blue line is the limit when there is no migration while the light blue line at log(M)=0 is included for reference.

Dependence of FST in a simple temporal island model. The model consists of two equally sized populations with symmetric migration rate. The X axis shows the separation in (scaled) time between the samples. θ (the scaled mutation rate) is set to 0.1. The continuous lines show how FST for models with different (scaled) migration rates depends on time separation. The dotted lines show FST if the samples are not separated in time or, equivalently, if the effective population size is infinitely large. The red line is the limit when migration is infinitely fast. In this case, the model is identical to a purely temporal model with a doubled (and constant) population size (the larger population size implies that θ is twice as large). The dark blue line is the limit when there is no migration while the light blue line at log(M)=0 is included for reference.

Results

The simple theoretical models considered above indicate that both temporal structure and spatial structure affect FST in a rather similar manner but that their effects are sufficiently different to prompt caution in interpretations of FST, in particular for cases where temporal samples are involved. To investigate more complex scenarios of continuous sampling over time, we now turn to a simulation-based approach.

Stepping-Stone Migration Model with Temporal Samples

In contrast to isolation models, stepping-stone migration models—where populations (demes) are connected in one- or two-dimensional landscapes—typically result in continuous differentiation between individuals rather than discrete genetic clusters of individuals (Novembre and Stephens 2008; Engelhardt and Stephens 2010). Given that temporal genetic structure also affects expected coalescence times between lineages, it can be expected that temporal differentiation would display similar behavior. To demonstrate this phenomenon, we designed a temporal simulation algorithm (see Materials and Methods) based on Hudson’s “ms” coalescence simulation software (Hudson 2002) and simulated a model with 100 demes in a two-dimensional habitat (10 × 10 lattice) with stepping-stone migration. We used 4Nem = 2, where m is the fraction of each subpopulation made up of new migrants each generation (note that scaling in ms is slightly different to the theory above) and sampled one haploid individual from each deme at ten time points separated by t = 4Ne generations, creating a three-dimensional model comprising the two spatial dimensions and the temporal dimension (fig. 4A). Because of the increased complexity of the data, pairwise comparisons such as FST are poorly suited to analyze the results. Instead, we used principal component analysis (PCA) to summarize and visualize the resulting population-genetic data (see Materials and Methods). PCA and FST have strong conceptual connections, with principal components (PCs) being closely related to the average coalescent times between pairs of haploid genomes (McVean 2009). We find that the first three PCs mirror the three dimensions of the model (three-dimensional Procrustes correlation: 0.984, P < 10−5) (fig. 4B). Specifically, PC1 and PC2 represented isolation-by-distance in the two-dimensional habitat, whereas PC3 represented temporal differentiation (supplementary fig. S2, Supplementary Material online), but this order of PCs will depend on the relative magnitudes of the scaled migration rate and genetic drift between time points (McVean 2009).
F

Genetic differentiation mirrors the sampling scheme in a model with both temporal and spatial structure. (A) Two-dimensional stepping-stone migration model from which ten haploid individuals were sampled at different historical time points. (B) PCA of data generated under the model. Large symbols correspond to high PC1 and PC3 values but low PC2 values.

Genetic differentiation mirrors the sampling scheme in a model with both temporal and spatial structure. (A) Two-dimensional stepping-stone migration model from which ten haploid individuals were sampled at different historical time points. (B) PCA of data generated under the model. Large symbols correspond to high PC1 and PC3 values but low PC2 values.

Temporal Genetic Differentiation Can Be Informative about Complex Population Histories

As illustrated in figure 1C, genetic differentiation can also occur in the absence of any spatial structure, that is, in samples taken at different time points from a single continuous (unstructured) population. To investigate temporal differentiation more closely, we simulated a single continuous population with an effective population size of 5,000 diploid individuals and a generation time of 25 years, sampling 20 diploid individuals from the present, and an additional 20 diploid individuals evenly distributed over the period 500–10,000 years ago with a 500-year interval between each sampled individual (fig. 5A). In a PCA, we see that PC1 captures the temporal genetic differentiation, separating the samples from the most recent to the most ancient as a monotonic (but not linear) cline, where individuals close in time are also more genetically similar (fig. 5D and supplementary fig. S3, Supplementary Material online). To investigate the effect of population size fluctuations (i.e., fluctuations in the magnitude of genetic drift), we reduced the population to a tenth of its original size between 5,000 and 5,500 years before present. Under this sampling scheme, the bottleneck is easily detected as a discontinuation in the monotonic cline (fig. 5B and E and supplementary fig. S3, Supplementary Material online).
F

Temporal sampling distinguishes genetic drift from population structure. (A) Constant population size model. (B) Bottleneck model. (C) Replacement model. (D) PC1 stratified by sample time under the constant population size model. (E) PC1 stratified by sample time under the bottleneck model. (F) PC1 stratified by sample time under the replacement model. Each colored circle corresponds to a single-sampled individual except for the large circles at time zero which corresponds to 20 sampled individuals in A, B, and C (in D, E, and F, the 20 individuals sampled at time zero end up on top of each other). FST between samples from before and after the bottleneck/replacement events at 5,500 years ago fails to distinguish between the models (FST = 0.0154 ± 0.0003 and 0.0153 ± 0.0003, respectively, see fig. 6).

Temporal sampling distinguishes genetic drift from population structure. (A) Constant population size model. (B) Bottleneck model. (C) Replacement model. (D) PC1 stratified by sample time under the constant population size model. (E) PC1 stratified by sample time under the bottleneck model. (F) PC1 stratified by sample time under the replacement model. Each colored circle corresponds to a single-sampled individual except for the large circles at time zero which corresponds to 20 sampled individuals in A, B, and C (in D, E, and F, the 20 individuals sampled at time zero end up on top of each other). FST between samples from before and after the bottleneck/replacement events at 5,500 years ago fails to distinguish between the models (FST = 0.0154 ± 0.0003 and 0.0153 ± 0.0003, respectively, see fig. 6).
F

Genetic differentiation between temporal sample groups. (A) FST computed on aggregated sample groups is unable to differentiate the bottleneck and replacement models. “Moderns”: 20 samples from time 0. “Young”: 10 samples from time 0–5,000 years ago. “Old”: 10 samples from 5,500 to 10,000 years ago. (B) FST between individuals adjacent in time is able to detect a sudden increase in FST between the pair of individuals that flank the demographic event (both bottleneck and replacement), but we are unable to separate the replacement and bottleneck scenarios. Standard errors are not shown but ranged between 0.002 and 0.003. (C) FST between 20 modern individuals and each ancient individual. Standard errors are not shown but ranged between 0.0010 and 0.0014.

We also simulated population-genetic data under a divergence model of two populations that diverged 10,000 years ago (fig. 5C). Ten ancient individuals were sampled at different time points between 10,000 and 5,500 years from one population, and 30 individuals were sampled from the other population, between 5,000 years ago and the present (ten ancient individuals spread out in time and 20 present-day individuals). This simulation could correspond to a scenario where the older population was replaced with new colonizers from another population. In the simulated data, individuals sampled before the replacement event show a trajectory along PC1 through time that is angled away from the individuals in the population that replaced the previous population (fig. 5F and supplementary fig. S3, Supplementary Material online). In contrast, in the bottleneck scenario, the sampled individuals from before the bottleneck event show a trajectory along PC1 as a function of time that is angled toward the individuals in the descendant population (fig. 5E and supplementary fig. S3, Supplementary Material online). However, FST between the ancient individuals from before and after the event was indistinguishable under the bottleneck model and the replacement model (0.0154 ± 0.0003 and 0.0153 ± 0.0003, respectively; fig. 6). To complement the PCA approach, we reconstructed maximum-likelihood trees (supplementary fig. S4, Supplementary Material online) using the covariance in allele frequencies between individuals (Pickrell and Pritchard 2012) and other pairwise FST comparisons (fig. 6). This analysis gave different results depending on the way the samples were obtained. The two scenarios were (again) indistinguishable if the samples were grouped into three separate temporal samples. In contrast, if the full temporal structure was accounted for so that each sample was treated independently, the maximum-likelihood trees revealed a difference between the bottleneck model and the replacement model. These observations illustrates that many inference tools can lead to incorrect conclusions for temporally sampled data, and they emphasize the importance of considering detailed temporal sampling structure for distinguishing between bottleneck and replacement models. It also illustrates that the considerable power to distinguish different models that we report is not directly linked to the use of PCA methods but is mainly due to the temporal sampling schemes. Genetic differentiation between temporal sample groups. (A) FST computed on aggregated sample groups is unable to differentiate the bottleneck and replacement models. “Moderns”: 20 samples from time 0. “Young”: 10 samples from time 0–5,000 years ago. “Old”: 10 samples from 5,500 to 10,000 years ago. (B) FST between individuals adjacent in time is able to detect a sudden increase in FST between the pair of individuals that flank the demographic event (both bottleneck and replacement), but we are unable to separate the replacement and bottleneck scenarios. Standard errors are not shown but ranged between 0.002 and 0.003. (C) FST between 20 modern individuals and each ancient individual. Standard errors are not shown but ranged between 0.0010 and 0.0014.

Transient Genetic Barriers

To study more complex population models, we simulated a population split model which involved two populations (A and B) that diverged 8,000 years ago. We kept the same simulation parameters and temporal sampling scheme as above but assigned the 4 ancient individuals from 3,500, 4,500, 5,500, and 6,500 years ago to population B and the remaining 16 ancient individuals to population A (fig. 7A). Strikingly, the population split event is readily identifiable when PC1 is stratified by sampling time (fig. 7C and supplementary fig. S3, Supplementary Material online). In a further modification of the model, we simulated secondary admixture between the two populations 3,000 years before present and where 75% of the genetic material of the recent population was contributed by population A and 25% was contributed by population B (fig. 7B). A plot of PC1 versus sampling time shows the two series of individuals represented by samples from the two populations becoming more similar as time approaches the time of admixture (fig. 7D and supplementary fig. S3, Supplementary Material online), suggesting that transient genetic barriers can be investigated using continuous temporal genetic data.
F

Temporal sampling can be used to detect transient genetic barriers. (A) Split model. (B) Split-admixture model. (C) PC1 stratified by time for data simulated under the split model. (D) PC1 stratified by time for data simulated under the split-admixture model. Each colored circle corresponds to a single sampled individual except for the large circles at time zero, which corresponds to 20 sampled individuals in A and B (in C and D, the 20 individuals sampled at time zero overlap). The 4 ancient individuals from 3,500, 4,500, 5,500, and 6,500 years ago (marked circles) were sampled from population B (bottom population in the model illustrations) and the remaining ancient 16 individuals were sampled from population A (top population in the model illustrations).

Temporal sampling can be used to detect transient genetic barriers. (A) Split model. (B) Split-admixture model. (C) PC1 stratified by time for data simulated under the split model. (D) PC1 stratified by time for data simulated under the split-admixture model. Each colored circle corresponds to a single sampled individual except for the large circles at time zero, which corresponds to 20 sampled individuals in A and B (in C and D, the 20 individuals sampled at time zero overlap). The 4 ancient individuals from 3,500, 4,500, 5,500, and 6,500 years ago (marked circles) were sampled from population B (bottom population in the model illustrations) and the remaining ancient 16 individuals were sampled from population A (top population in the model illustrations).

Approximate Bayesian Computation Using Temporal Genetic Differentiation

The observation that some statistics of temporal genetic differentiation can recapitulate population history suggests that those statistics can be used to infer population history in more formal settings. We used approximate Bayesian computation (Tavare et al. 1997; Pritchard et al. 1999; Beaumont et al. 2002) to exemplify that temporal genetic data can be used to infer parameters of a demographic model based solely on PC1 loadings of sampled individuals as summary statistics. We applied this approach to a data set consisting of 44 Siberian Woolly Mammoth samples spanning 50,000 years and genotyped at four microsatellites (Nyström et al. 2012). The original study used conventional summary statistics and aggregated temporal groups to show that a population size reduction during the Holocene transition could explain the fact that two temporal groups were genetically differentiated. Here, we expand the inference to a three-parameter model (fig. 8): The time of change, the effective population size before the change, and the effective population size after the change, allowing for either a reduction or expansion in population size (see Materials and Methods). The estimated posterior distribution indicates a size reduction at around 11,200 years ago with the effective population size in the more recent time period being approximately ten times smaller than before the change (table 1 and fig. 8). The inferred timing of this population size reduction coincides with the isolation of Wrangel Island from the Siberian mainland (Vartanyan et al. 1993) and thus corroborates the hypothesis that this restriction of the habitat triggered a founder event in the resident mammoth population (Nyström et al. 2010, 2012).
F

Approximate Bayesian Computation of Woolly Mammoth demographic history using PC1 loadings as summary statistics. (A) Plot of PC1 versus the age of the mammoth individuals. (B) Illustration of the three-parameter model of instantaneous size change. (C) Estimated posterior distribution for the time of size change. (D) Estimated posterior distributions of effective population size before and after the size change. Prior distributions in (C) and (D) are shown by the gray lines.

Table 1.

ABC Inference of Northeast Siberian Woolly Mammoth Demographic History Using PC Loadings as Summary Statistic.

ParameterPrior (Uniform)Posterior ModePosterior 95% CI
Ne before change200–50,00023,50016,900–29,400
Ne after change200–50,0001,8001,000–8,300
Time of change (years ago)3,000–40,00011,2005,100–23,100

Note.—ABC, approximate Bayesian computation. Ne is the effective population size.

Approximate Bayesian Computation of Woolly Mammoth demographic history using PC1 loadings as summary statistics. (A) Plot of PC1 versus the age of the mammoth individuals. (B) Illustration of the three-parameter model of instantaneous size change. (C) Estimated posterior distribution for the time of size change. (D) Estimated posterior distributions of effective population size before and after the size change. Prior distributions in (C) and (D) are shown by the gray lines. ABC Inference of Northeast Siberian Woolly Mammoth Demographic History Using PC Loadings as Summary Statistic. Note.—ABC, approximate Bayesian computation. Ne is the effective population size.

Pitfalls When Comparing Ancient Genomes to Modern Populations

A common situation is that a single ancient genome is available from a certain time point, and the goal is to investigate the historical relationship between the ancient individual and present-day populations. To investigate the differentiation between a single ancient genome and more recent populations, we simulated ten individuals from each of two populations (A and B) which diverged 20,000 years ago (Ne = 10,000) and a single 18,000-year-old individual from the lineage leading to population B (fig. 9). Using PCA, we found that PC1 captures the spatial differentiation between populations A and B, whereas PC2 captures the temporal differentiation between the ancient sample and the modern sample (fig. 9C). The ancient sample appears closer to population B, recapitulating the population history. However, when we modified the model to include a 10-fold population size reduction in population B after the time of sampling of the ancient genome (15,000 years ago; fig. 9B), the ancient sample instead clustered closer to population A (fig. 9D), despite the fact that the ancient individual was sampled from the population that is ancestral to the extant sample from population B. This pattern is due to the fact that less time (on the coalescent scale) has passed between the ancient sample and the extant sample from population A, and the genetic differentiation between the ancient individual and the extant sample from population A (FST = 0.030 ± 0.001) was also smaller than the genetic differentiation between the ancient sample and the extant sample from population B (FST = 0.055 ± 0.001). Thus, if the demographic history was unknown, one could possibly mistakenly conclude that the ancient sample shares a more recent genetic history with population A, solely due to the different magnitudes of genetic drift. Indeed, the parameter of historical interest is often the degree of shared history, that is, the amount of shared genetic drift and not the relative degrees of differentiation. Accordingly, we were able to identify the correct topology (fig. 9E and F) using concordance tests (Schlebusch et al. 2012; Skoglund et al. 2012) and D statistics (Reich et al. 2009; Durand et al. 2011; Patterson et al. 2012) that are less sensitive to lineage-specific genetic drift.
F

Comparing a single ancient genome to modern populations. (A) Population divergence model with constant effective population size. (B) Population divergence with a 10-fold population size reduction postdating the ancient individual. (C) PCA of 100,000 SNPs simulated under the model in (A). (D) PCA of 100,000 SNPs simulated under the model in (B). (E) Population topology inferred using C tests and D tests based on 100,000 independent SNPs simulated under the model in (A), and (F) population topology inferred using C tests and D tests based on 100,000 independent SNPs simulated under the model in (B). Values for the C-statistic are only positive for the correct topology, and absolute values of the D-statistic are lowest for the correct topology. The tree topologies displayed in (E) and (F) represent the three possible topologies tested and the larger trees represent the true topology (and also the one supported by the statistics). The gray circles in (E) and (F) represent an outgroup individual constructed from the ancestral alleles of each simulated locus. For details on these tests, see Materials and Methods.

Comparing a single ancient genome to modern populations. (A) Population divergence model with constant effective population size. (B) Population divergence with a 10-fold population size reduction postdating the ancient individual. (C) PCA of 100,000 SNPs simulated under the model in (A). (D) PCA of 100,000 SNPs simulated under the model in (B). (E) Population topology inferred using C tests and D tests based on 100,000 independent SNPs simulated under the model in (A), and (F) population topology inferred using C tests and D tests based on 100,000 independent SNPs simulated under the model in (B). Values for the C-statistic are only positive for the correct topology, and absolute values of the D-statistic are lowest for the correct topology. The tree topologies displayed in (E) and (F) represent the three possible topologies tested and the larger trees represent the true topology (and also the one supported by the statistics). The gray circles in (E) and (F) represent an outgroup individual constructed from the ancestral alleles of each simulated locus. For details on these tests, see Materials and Methods.

Discussion

The main insight that arises from our analyses is that wide temporal sampling provides information that can be hard to attain using modern-day data alone or more clustered temporal groups. The importance of wide temporal sampling could also explain previous results suggesting that not much statistical power is gained solely by adding one or a few temporal sample groups (Mourier et al. 2012). Spatial sampling structure can also have a substantial impact on inferences of population history using modern-day data (Serre and Pääbo 2004; Rosenberg et al. 2005; Chikhi et al. 2010; DeGiorgio and Rosenberg 2013) in which case differentiating between the relative contributions of migration and genetic drift because population divergence is a serious challenge (Nielsen and Wakeley 2001). In contrast to the many similarities between spatial and temporal structure that we have highlighted, the possibility of migration in the different dimensions represents a fundamental difference, because migration of lineages is not possible in the temporal dimension (except in the case of overlapping generations or seed bank models, see Kaj et al. [2001]), resulting in a more constrained set of models that may be consistent with a particular pattern of genetic variation. One of the enduring challenges in population-genetic analysis of ancient DNA is whether some observed level of genetic differentiation between temporal sample groups is the result of genetic drift (possibly enhanced by a bottleneck) or the result of a replacement of the older population with new colonizers from another population (Nordborg 1998). We show that this question can be addressed by considering the trajectory of genetic relatedness within a temporal sample that spans the time of the putative event. Additional hypotheses about population history that are difficult to address with genetic data from one or a few time points but that can be addressed with wide temporal samples include the timing of bottlenecks and transient genetic barriers. Conventional inference of the timing of population size reductions usually requires assumptions about mutation rate and/or recombination rate (Ramakrishnan et al. 2005; Voight et al. 2005; Li and Durbin 2011; Mourier et al. 2012; Sheehan et al. 2013). As illustrated, for example, in figure 5E, the use of continuously distributed temporal data allows accurate identification of the time of population size reduction that is robust to assumptions about mutation and recombination rates. For these reasons, ancient genomic data promise to advance our understanding of the recent evolutionary history of many species.

Materials and Methods

To investigate temporal structure under an infinitely many-sites mutation model and population structure (see also Excoffier and Foll 2011; Skoglund et al. 2011), we developed a temporal coalescent simulation algorithm based on Hudson’s (2002) ms. The idea here is to use the versatility of ms to simulate a genealogy but use in-house custom code for the mutation process to accommodate different branch lengths due to temporal structure. The algorithm proceeded as follows: For a sample of size L ancient diploid individuals, we instruct the program ms to create 2L isolated subpopulations and sample a single lineage from each. At the desired time th of each historical sample, each of the 2L subpopulation is joined (command “-ej”) with the appropriate population to which they belong. From the gene tree output of ms (command “-T”), we subtract th from the external branch of each ancient sample and add a single mutation on the resulting genealogy with probability equal to branch length (Hudson 1990) using custom code. For example, if there is one individual to be sampled at time 0.3 and five additional individuals at time 0.4, two lineages are joined to the population at time 0.3 and the remaining ten lineages join the population at time 0.4. To increase precision, we modified the source code of ms to produce 12 decimal digits for each branch in the gene tree output. The custom code is available upon request. We validated the algorithm by comparison with COMPASS (Jakobsson 2009), which allows temporal samples but not from multiple populations. Under the model in figure 1A, we obtained identical estimates of FST = 0.337 ± 0.001 for both algorithms, as well as highly similar site frequency spectra (supplementary fig. S5, Supplementary Material online). For all simulations above, except the two-dimensional spatial lattice and the Woolly Mammoth analysis (see below), we simulated 2 × 100,000 independent (unlinked) SNPs for each individual and combined pairs of lineages to create a diploid genotype for each individual. When time is given in years, we assumed a 25-year generation time, except in the case of the Woolly Mammoth, where we assumed 15 years as in Nyström et al. (2012). FST was estimated using equation (5.3) in Weir (1996) with standard errors estimated using a block jackknife, dropping blocks of 1,000 loci in turn. PCA was performed using the prcomp function in R 2.11.1 (R Development Core Team 2010). Except in the case of microsatellites and the three-dimensional stepping-stone model with temporal samples, we used the normalization suggested by Patterson et al. (2006). For the 44 mammoth individuals in Nyström et al. (2012) that had no missing data for the four microsatellites, we considered each unique microsatellite allele to be a separate marker, which were given a count of 0, 1, or 2 copies in each individual. Maximum-likelihood trees were inferred using TreeMix version 1.11 (Pickrell and Pritchard 2012) assuming no migration and using a block size of 1,000 SNPs for estimating standard errors. To confirm the relationship between temporal structure and divergence models, we estimated FST between two samples of 15 diploid individuals each for three simulated demographic models with a constant effective population size (Ne) (fig. 1). In the first model (A), both samples were from the same time point but from two populations that had diverged T = 0.5 time units into the past (fig. 1A). The second model (B) assumed that one sample was T = 0.5 time units older than the other and that the two samples were from different populations that diverged T = 0.75 time units into the past (fig. 1B). The third model assumed a single continuous population but with one sample T = 1.0 coalescent time units (2Ne generations) older than the other (fig. 1C). Most importantly, in all three models, the total coalescent time that passes as one follows the history from one sample to the other is T = 1.0. In all three models, we also estimate FST to approximately 0.33 (0.337 ± 0.001 [±1 standard error], 0.334 ± 0.001, and 0.335 ± 0.001, respectively). To simulate microsatellite data, we implemented a stepwise mutation model with μ = 10−3 for COMPASS (Jakobsson 2009) as in Nyström et al. (2012), where each mutation event either (with equal probability) adds or subtracts one unit from an arbitrarily chosen starting length (100). After this simulation, we considered each simulated (unique) microsatellite allele as its own marker, which was counted as above, and used that information as input for the PCA. We used the PC1 loading of each individual as summary statistic (in total a vector of 44 summary statistics). We simulated 100,000 replicates from which 0.2% of the replicates with the smallest Euclidian distance to the empirical PC1 loadings were used to obtain posterior distributions using local linear regression (Beaumont et al. 2002) after log transformation as implemented in the abc R package (Csillery et al. 2012). To investigate the population topology inferred from single individuals, we applied tests that utilize sharing of derived alleles. D-statistics were computed using a strategy of sampling a single haploid gene copy from each population (Reich et al. 2009; Durand et al. 2011; Patterson et al. 2012). We tested all three possible topologies that could be constructed using four taxa: Population A, population B, the ancient individual, and an outgroup individual (gray symbol in fig. 9E and F) that was taken to carry the ancestral allele (which is given in the ms simulations). Specifically, the topologies tested were (Outgroup, (Ancient, (population A, population B))); (Outgroup, (population A, (Ancient, population B))); and (Outgroup, (population B, (Ancient, population A))). For a proposed topology of the form (Outgroup, (J, (Y, Z))), we denote the count of all observations of a shared derived allele (“B”) for J and Y that is absent from Outgroup and Z by “ABBA” (here “B” symbolizes the derived state and “A” the ancestral state), and the count of all observations of a shared derived allele for J and Z that is absent from Outgroup and Y by “BABA.” The D-statistic is given by and a deviation from zero suggest a violation of the proposed topology. We computed concordance statistics (Schlebusch et al. 2012; Skoglund et al. 2012) using the same data and testing the same topologies, but these tests also use the configuration where Z and Y share a derived allele that is absent from Outgroup and J, which we denote “AABB.” The concordance statistic is given by and positive values of C indicate concordance with the proposed topology.

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
  67 in total

1.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites.

Authors:  J K Pritchard; M T Seielstad; A Perez-Lezaun; M W Feldman
Journal:  Mol Biol Evol       Date:  1999-12       Impact factor: 16.240

2.  Statistical guidelines for detecting past population shifts using ancient DNA.

Authors:  Tobias Mourier; Simon Y W Ho; M Thomas P Gilbert; Eske Willerslev; Ludovic Orlando
Journal:  Mol Biol Evol       Date:  2012-03-16       Impact factor: 16.240

3.  Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

Authors:  Ludovic Orlando; Aurélien Ginolhac; Guojie Zhang; Duane Froese; Anders Albrechtsen; Mathias Stiller; Mikkel Schubert; Enrico Cappellini; Bent Petersen; Ida Moltke; Philip L F Johnson; Matteo Fumagalli; Julia T Vilstrup; Maanasa Raghavan; Thorfinn Korneliussen; Anna-Sapfo Malaspinas; Josef Vogt; Damian Szklarczyk; Christian D Kelstrup; Jakob Vinther; Andrei Dolocan; Jesper Stenderup; Amhed M V Velazquez; James Cahill; Morten Rasmussen; Xiaoli Wang; Jiumeng Min; Grant D Zazula; Andaine Seguin-Orlando; Cecilie Mortensen; Kim Magnussen; John F Thompson; Jacobo Weinstock; Kristian Gregersen; Knut H Røed; Véra Eisenmann; Carl J Rubin; Donald C Miller; Douglas F Antczak; Mads F Bertelsen; Søren Brunak; Khaled A S Al-Rasheid; Oliver Ryder; Leif Andersson; John Mundy; Anders Krogh; M Thomas P Gilbert; Kurt Kjær; Thomas Sicheritz-Ponten; Lars Juhl Jensen; Jesper V Olsen; Michael Hofreiter; Rasmus Nielsen; Beth Shapiro; Jun Wang; Eske Willerslev
Journal:  Nature       Date:  2013-06-26       Impact factor: 49.962

4.  On the probability of Neanderthal ancestry.

Authors:  M Nordborg
Journal:  Am J Hum Genet       Date:  1998-10       Impact factor: 11.025

5.  Microsatellite genotyping reveals end-Pleistocene decline in mammoth autosomal genetic variation.

Authors:  Veronica Nyström; Joanne Humphrey; Pontus Skoglund; Niall J McKeown; Sergey Vartanyan; Paul W Shaw; Kerstin Lidén; Mattias Jakobsson; Ian Barnes; Anders Angerbjörn; Adrian Lister; Love Dalén
Journal:  Mol Ecol       Date:  2012-03-23       Impact factor: 6.185

6.  Temporal genetic change in the last remaining population of woolly mammoth.

Authors:  Veronica Nyström; Love Dalén; Sergey Vartanyan; Kerstin Lidén; Nils Ryman; Anders Angerbjörn
Journal:  Proc Biol Sci       Date:  2010-03-31       Impact factor: 5.349

7.  Molecular cloning of Ancient Egyptian mummy DNA.

Authors:  S Pääbo
Journal:  Nature       Date:  1985 Apr 18-24       Impact factor: 49.962

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

9.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

10.  Using classical population genetics tools with heterochroneous data: time matters!

Authors:  Frantz Depaulis; Ludovic Orlando; Catherine Hänni
Journal:  PLoS One       Date:  2009-05-14       Impact factor: 3.240

View more
  16 in total

Review 1.  Reconstructing ancient genomes and epigenomes.

Authors:  Ludovic Orlando; M Thomas P Gilbert; Eske Willerslev
Journal:  Nat Rev Genet       Date:  2015-06-09       Impact factor: 53.242

2.  Isolation-by-distance-and-time in a stepping-stone model.

Authors:  Nicolas Duforet-Frebourg; Montgomery Slatkin
Journal:  Theor Popul Biol       Date:  2015-11-21       Impact factor: 1.570

Review 3.  Statistical methods for analyzing ancient DNA from hominins.

Authors:  Montgomery Slatkin
Journal:  Curr Opin Genet Dev       Date:  2016-09-05       Impact factor: 5.578

4.  Reconstructing Prehistoric African Population Structure.

Authors:  Pontus Skoglund; Jessica C Thompson; Mary E Prendergast; Alissa Mittnik; Kendra Sirak; Mateja Hajdinjak; Tasneem Salie; Nadin Rohland; Swapan Mallick; Alexander Peltzer; Anja Heinze; Iñigo Olalde; Matthew Ferry; Eadaoin Harney; Megan Michel; Kristin Stewardson; Jessica I Cerezo-Román; Chrissy Chiumia; Alison Crowther; Elizabeth Gomani-Chindebvu; Agness O Gidna; Katherine M Grillo; I Taneli Helenius; Garrett Hellenthal; Richard Helm; Mark Horton; Saioa López; Audax Z P Mabulla; John Parkington; Ceri Shipton; Mark G Thomas; Ruth Tibesasa; Menno Welling; Vanessa M Hayes; Douglas J Kennett; Raj Ramesar; Matthias Meyer; Svante Pääbo; Nick Patterson; Alan G Morris; Nicole Boivin; Ron Pinhasi; Johannes Krause; David Reich
Journal:  Cell       Date:  2017-09-21       Impact factor: 41.582

Review 5.  Sometimes hidden but always there: the assumptions underlying genetic inference of demographic histories.

Authors:  Liisa Loog
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2020-11-30       Impact factor: 6.237

6.  Population structure among octocoral adults and recruits identifies scale dependent patterns of population isolation in The Bahamas.

Authors:  Howard R Lasker; Isabel Porto-Hannes
Journal:  PeerJ       Date:  2015-06-30       Impact factor: 2.984

7.  Inferring genetic connectivity in real populations, exemplified by coastal and oceanic Atlantic cod.

Authors:  Ingrid Spies; Lorenz Hauser; Per Erik Jorde; Halvor Knutsen; André E Punt; Lauren A Rogers; Nils Chr Stenseth
Journal:  Proc Natl Acad Sci U S A       Date:  2018-04-19       Impact factor: 11.205

8.  Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

Authors:  Aaron B A Shafer; Joseph M Northrup; Martin Wikelski; George Wittemyer; Jochen B W Wolf
Journal:  PLoS Biol       Date:  2016-01-08       Impact factor: 8.029

9.  A Spatial Framework for Understanding Population Structure and Admixture.

Authors:  Gideon S Bradburd; Peter L Ralph; Graham M Coop
Journal:  PLoS Genet       Date:  2016-01-15       Impact factor: 5.917

Review 10.  Tackling Drug Resistant Infection Outbreaks of Global Pandemic Escherichia coli ST131 Using Evolutionary and Epidemiological Genomics.

Authors:  Tim Downing
Journal:  Microorganisms       Date:  2015-05-20
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.