Literature DB >> 35430881

Human genetic admixture through the lens of population genomics.

Shyamalika Gopalan¹, Samuel Pattillo Smith^2,3, Katharine Korunes¹, Iman Hamid¹, Sohini Ramachandran^2,3,4, Amy Goldberg¹.

Abstract

Over the past 50 years, geneticists have made great strides in understanding how our species' evolutionary history gave rise to current patterns of human genetic diversity classically summarized by Lewontin in his 1972 paper, 'The Apportionment of Human Diversity'. One evolutionary process that requires special attention in both population genetics and statistical genetics is admixture: gene flow between two or more previously separated source populations to form a new admixed population. The admixture process introduces ancestry-based structure into patterns of genetic variation within and between populations, which in turn influences the inference of demographic histories, identification of genetic targets of selection and prediction of complex traits. In this review, we outline some challenges for admixture population genetics, including limitations of applying methods designed for populations without recent admixture to the study of admixed populations. We highlight recent studies and methodological advances that aim to overcome such challenges, leveraging genomic signatures of admixture that occurred in the past tens of generations to gain insights into human history, natural selection and complex trait architecture. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.

Entities: Chemical

Keywords: admixture; genetic diversity; population genetics

Mesh：

Year: 2022 PMID： 35430881 PMCID： PMC9014191 DOI： 10.1098/rstb.2020.0410

Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN： 0962-8436 Impact factor: 6.671

Introduction

In his foundational 1972 study ‘The Apportionment of Human Diversity’, Richard Lewontin demonstrated that the majority of human genetic diversity at a single locus is contained within, rather than between, populations using polymorphism data from a global sample [1]. The field continues to strive to understand the evolutionary processes that shape this important empirical observation. Notably, genomic data have revealed the extent to which one such process—genetic admixture—has been ubiquitous throughout human history and can shape the distribution of human genetic diversity in ways different from those predicted by classic population genetic models [2-5]. Here we focus on admixture as a population-level process, whereby gene flow occurs between previously diverged source populations, producing new populations with ancestry from multiple source populations. We discuss how recent research on genetic admixture has extended our understanding of the distribution of human genetic variation. Beyond the allele-frequency-based summaries of variation studied by Lewontin [1], variation in admixed populations can be summarized based on ancestry from source populations. These ancestry patterns may vary between admixed populations formed by the same source populations, between individuals within an admixed population, and across loci within an admixed individual (figure 1). Population geneticists have long recognized that studying admixed human groups provides opportunities to learn about evolutionary forces [2,3]. Despite this early interest, inclusion of admixed populations in genetic studies is variable by research goal. Whereas the demographic and selective histories of admixed populations are well-studied, phenotypic and medical studies of admixed populations have lagged behind relative to studies of single-ancestry populations. For example, admixed populations are underrepresented in biobank datasets [7,8]. The lack of medical genomic samples and the frequent need for admixture-specific methods lead to admixed populations often being excluded from these studies [7-10]. Additionally, in practice, defining admixture in humans is highly context dependent, affected by social structures that influence population or self identification, as well as methodological limits on detecting admixture from genomic data (box 1).

Figure 1

Ancestry in admixed populations varies at multiple genetic scales, with variance among individuals and within individual genomes. We show examples of global and local ancestry inferred from phased 1000 Genomes Project data for populations of the Americas and Caribbean. Global ancestry was estimated using unsupervised ADMIXTURE analysis, including additional populations of European (Iberians (IBS) and Tuscans in Italy (TSI)) and West African (Esan (ESN), Mandinka (GWD), Mende (MSL) and Yoruba (YRI)) ancestry for reference. We show (a) population-level and (b) individual-level estimates of global ancestry across Mexican ancestry (MXL), Peruvian (PEL), Colombian (CLM), Puerto Rican (PUR), African ancestry (ASW) and Barbadian (ACB) populations; barplots illustrating these estimates for K = 3 were made using pong [6]. (c) Local ancestry as inferred by RFMix [7] for two example individuals (HG01149 and NA19776) who have similar global ancestry proportions, and belong to the CLM and MXL populations, respectively. For these analyses, we retained only SNPs marked ‘PASS’ and removed all individuals who were noted to have an up to third degree relative in the 1000 Genomes Project phase 3 pedigree file, leaving 998 individuals for analysis. We then filtered SNPs for missingness (greater than 5%) and low minor allele frequency (less than 1%) across all populations, and Hardy–Weinberg disequilibrium (p-value < 0.000001) within populations. For our ADMIXTURE analyses, we also removed SNPs in linkage disequilibrium (using the PLINK command – indep-pairwise 50 10 0.1), which left 698 408 SNPs for analysis. We ran the ADMIXTURE algorithm for K = 3 unsupervised using the default settings and a random seed. Pong identified a single mode across 30 replicates. To estimate local ancestry, we used the missingness, minor allele frequency and Hardy–Weinberg filtered phased genotype dataset. We designated individuals with high levels (over 99%) of global West African (AFR), Amerindigenous (AMR) and European (EUR) ancestry, as determined by our ADMIXTURE analysis, as reference groups for those respective ancestries. We ran RFMix v. 2.03 for the target Colombian and Mexican ancestry individuals using the HapMap GRCh37 genetic map lifted over to GRCh38, a maximum of two expectation-maximization iterations, and otherwise default parameters. (Online version in colour.) Discussions of genetic admixture are, implicitly or explicitly, predicated on the idea that meaningful genetic differences exist between discretized human groups. In practice, the term ‘admixed’ can vary to encompass a range of spatial and temporal processes of gene flow between previously isolated groups. In this review, as noted in §1, we have focused on recent admixture occurring within the past tens of generations. However, extensive gene flow between groups is a hallmark of recent human evolution; when examined in enough detail, nearly all populations can be described as descended from a combination of multiple ancestries. Similar to other discussions on delineating population definitions and boundaries, because there are no strict criteria that determine which populations should be considered admixed from a genetic perspective, classification of a population as admixed is often dependent on the context of the line of inquiry [11-14]. Additionally, while the effects of genetic admixture can be observed in individual genomes, it is conceptualized as a demographic process that acts on populations. For example, for a recent two-way admixture pulse between populations A and B, high variance in these ancestry components across individuals is expected; under neutrality, there will be individuals in the admixed population that derive 0%, and those that derive 100%, of their ancestry from population A (figure 1) [15,16]. As demonstrated in the bottom panel of figure 1, any individual locus in a genome from an admixed population can only contribute partially to inference of the demographic history of the admixed group due to the substantial variance in how ancestral diversity is distributed across genomes from an admixed population. Methodological issues and patterns of genetic diversity are not the only factors shaping our understanding of genetic admixture; a long record of societies’ and scientists’ use of largely superficial characteristics to classify human groups also plays a role. Geneticists and anthropologists have long wrestled with the various field-specific and lay definitions of ‘population’, ‘ancestry’, ‘ethnicity’ and ‘race’, which interact and intersect with each other in complicated ways [11-14,17]. In discussions of human genetic admixture, it becomes especially important to emphasize that these categories do not map onto each other one-to-one, and that race and ethnicity, in particular, describe classifications based on social phenomena. The variation in ancestry within individuals in admixed populations, shown in figure 1, illustrates this and can be an effective tool in illustrating the difference between genetic ancestry, phenotypes and self-identified race and/or ethnicity. Many population genetics methods and analytical results are based on assumptions about populations that do not hold under recent admixture. Under a model of isolation, metrics of genomic diversity often have well-defined theoretical expectations with respect to fundamental parameters of the population's evolutionary history. However, many of these relationships are unclear, with admixture introducing blocks of linked ancestral haplotypes each with potentially different patterns of variation based on the history of their source populations. That is, admixture changes both linkage structure and allele frequency distributions, which is often not accounted for in traditional inference methods developed without consideration of admixture. Studying the ancestry patterns of present-day admixed groups has revealed information about the demographic histories of their source populations, including those that are uncommon in unadmixed form today [18,19]. For example, high-resolution genetic maps have been constructed based on the frequency of estimated local ancestry switchpoints (i.e. where local ancestry changes from one source to another along a single chromosome), which contains information about recombination rates along the genome [20,21]. Admixed genomes have also enabled the discovery of variant–trait associations and improvements in genetic risk prediction models beyond the associations identified and predictions that have been made using the ancestral populations [22-26]. Recent methodological improvements have increased the efficiency and performance of local ancestry calling (i.e. the assignment of genomic segments to their population of origin; some early scalable algorithmic implementations are given in [7,27-29]; figure 1c). These advances have enabled the use of local ancestry patterns in admixed populations to infer demographic history, adaptation and the genetic bases of complex traits. Here, we consider three inferential problems based on studying patterns of genetic variation produced by admixture: inference of population history, identifying adaptive mutations and complex trait associations and prediction using admixed genomes. We summarize recent progress in the field, highlight as yet unresolved issues, and outline potential avenues of future research on the genetics of admixed populations. We focus on recent admixture between modern human populations, roughly corresponding to admixed populations founded within the last tens of generations; Witt et al. in this issue consider ancient admixture events with archaic humans and their consequences for human genetic variation [31].

Estimating genetic diversity and ancestry in admixed populations

Well before polymorphism data could be generated at a genome-wide scale, several methods of measuring genetic diversity had already been proposed, including heterozygosity and nucleotide diversity [32,33]. By connecting these to theoretical population genetics models, summaries of genetic variation can provide insight into the evolutionary forces acting on populations. However, inferring population history from genetic data is highly dependent on how groups are defined, a choice made by the researchers (box 1). Recent admixture complicates the quantification and analysis of genetic diversity, and can, therefore, affect traditional summaries of diversity in unexpected ways. In his 1972 paper, Lewontin discusses his choices of a genetic diversity measure at some length, ultimately settling on one that is analogous to heterozygosity [1]. Relevant to genetic admixture, Lewontin specifically notes that: ‘a collection of individuals made by pooling two populations ought always to be more diverse than the average of their separate diversities, unless the two populations are identical in composition' (p. 338). In this statement, Lewontin describes expectations of diversity in a set of pooled haplotypes originating from individuals of distinct ancestries, as might result from sampling schemes that combine populations in genetic analysis. This quote also gives insight into how admixture may impact the measures of genetic variation that Lewontin considers. These ideas were revisited in a recent study that explores patterns of heterozygosity in admixed populations [34]. The authors theoretically demonstrate that the heterozygosity of an admixed population is predicted by the heterozygosities of its source populations, the FST between them and the admixture contributions [34]. FST, which has taken the place of entropy partitioning statistics that Lewontin [1] used, can also be informative about the parameters of the admixture process, as Boca and Rosenberg demonstrated [35]. These studies illustrate how traditional measures of genetic diversity can be repurposed to improve our understanding of the admixture process. Beyond within- and between-population estimates of genetic diversity and ancestry, admixed populations introduce another class of summaries of genetic variation: tracts of the genome within individuals that originate from each ancestry source [15,17,36]. In figure 1, we illustrate three hierarchical categories of genetic ancestry variation in admixed populations from the 1000 Genomes Project [37] from the Americas, who have African, European and Amerindigenous ancestry. First, given similar continental source ancestries, admixed populations can vary in their average proportions from each source (figure 1a). Second, individuals within an admixed population may vary in their genome-wide, or ‘global’, ancestry proportions (figure 1b). Third, individuals with similar source ancestry contributions and admixture histories may vary by ‘local’ ancestry across genomic loci (figure 1c). At each level, these patterns of diversity contain information about admixture and post-admixture processes. In practice, genetic ancestry of individuals from admixed populations is not fully known and is inferred, often using reference panels that are collated to represent the source populations [4,27-30,38]. In the following sections, we discuss aspects of human evolution that are commonly inferred from patterns of genetic variation in admixed populations, particularly genetic ancestry. The performance of these methods is predicated on accurate estimates of global and local ancestry. The quality of ancestry estimates depends on a variety of sampling and evolutionary scenarios [39]. A recent study of the admixed Ashkenazi Jewish population noted that the lack of differentiation between European and Middle Eastern haplotypes made accurate local ancestry inference challenging, reducing their power to infer the parameters of the admixture process [40] and demonstrating the complexity in defining admixed populations, as these populations are often not considered admixed. The authors suggest that these issues might be mitigated by incorporating uncertainty in local ancestry estimates into complex demographic scenarios. Lawson et al. [39] demonstrate multiple avenues for potential over- or misinterpretation of global ancestry estimates from a commonly used suite of model-based methods based on the Pritchard–Stephens–Donnelly model of mixed membership across latent clusters. For example, they found that multiple qualitatively different evolutionary scenarios produced similar global ancestry estimates in the admixed population, and uneven sample sizes between populations may influence ancestry estimates. Notably, many methods, especially for local ancestry, rely on the use of reference panels of modern populations as proxies for the source populations, which may not fully represent the populations that existed at the time of admixture, and have uneven global representation.

Inferring population history

The admixture history of a population, such as the timing and source contribution levels, leaves predictable patterns of genetic variation within and between individuals from the admixed population [15,16,37,41-43]. Empirical genetic analyses can, therefore, be used to infer the histories that produced observed genetic variation. Under a simple admixture scenario, the allele frequency of a locus in the admixed population is expected to be the average of the allele frequencies in source populations weighted by their contribution levels [44-46]. That is, the admixture contribution levels from the sources can be estimated from the allele frequencies of the admixed and source populations. Estimation of ancestry proportions under this model of admixture often relies on identifying a subset of loci with particularly large allele frequency differences between the source populations, known as Ancestry Informative Markers (AIMs) [47]. With further developments in genome sequencing increasing the density of loci across genomes, recent methods often incorporate linkage information or model small allele frequency changes over many loci, producing estimates of global ancestry proportions, as well as local ancestry along an admixed individual's genome [4,27-30,38]. Mechanistic models of admixture complement empirical studies to improve our intuition of admixture dynamics and interpretation of empirical results [15,42,48-51]. Related model-based inference frameworks have been developed to estimate parameters of population history. Patterns of global and local ancestry within and between individuals are informative about admixture histories. For example, over time, recombination tends to break up local ancestry tracts; therefore, longer tracts generally indicate more recent contributions from source populations to an admixed population and may be used to infer the timing of admixture [36,42,52-58]. Similarly, as random mating leads to the averaging of ancestry proportions across individuals as they produce the next generation, the variance in global ancestry within the admixed population decreases over time as well [15]. Summaries of variation that are not explicitly based on local or global ancestry, such as linkage disequilibrium, can also be informative of the timing of admixture as populations with differentiated allele frequencies mix. With two-way admixture, high-frequency variants from each source will be strongly correlated with each other in the first-generation admixed population, regardless of their respective locations in the genome and degree of physical linkage. Over time, recombination will erode these correlations to generate a pattern of non-random association of pairs of loci that decay over genomic distance. Several methods leverage these characteristic decay curves to estimate the age of a pulse of admixture [4,59,60], and extensions of these methods infer admixture parameters under models that include continuous gene flow, multiple waves or assortative mating [61-63]. Similarly, sociocultural practices that govern mate choice or sex-specific contributions from the source populations will leave signatures in patterns of genetic ancestry. Individual behaviours such as mating preferences or long-range migration can exhibit ancestry biases in which the ancestry patterns in the subset of the population that migrates are not representative of the whole admixed population, potentially driven by correlations between ancestry and visible traits like skin pigmentation or socioeconomic differences [64-69]. Simple models of admixture often assume that individuals mate randomly; however, admixed human populations show evidence of positive assortative mating, with mating pairs often correlated in global ancestry proportion [67,70-72]. Recent methods have sought to test for ancestry-based assortative mating by developing frameworks to infer parental ancestries from phased haplotypes within a single individual [73-75]. When not accounted for, nonrandom mating patterns can bias inference of admixture parameters [62,76]. Additionally, based on the sex-specific inheritance of the X chromosome (where females inherit two copies, one from each parent, while males inherit one X chromosome maternally and their Y chromosome paternally), comparisons of X-chromosomal and autosomal ancestry proportions have been used to infer sex-biased admixture in ancient and modern human populations [49,50,77-80]. These differences in female and male contribution levels from the sources may be indicative of complex social interactions that govern mating behaviors between the admixing human populations, such as dominance structures associated with colonization. Differences in ancestry proportion across the geographical span of a population or populations with shared ancestry components have been used to infer ancestry-biased migration patterns, which may be driven by social cues. For example, ancestry-biased migration, often combined with other mating dynamics, has been proposed as a process shaping regional variation in African ancestry proportions across the USA [65,81,82]. Similarly, temporal changes in ancestry proportion within a population may be caused by time-varying social dynamics. Spear et al. [69] found a significant increase in Amerindigenous ancestry in Mexican American populations over time, potentially owing to differences in ancestry in the migrating population over time and fecundity correlated with ancestry. Sufficiently accounting for these spatial and temporal dynamics of the admixture process presents an exciting challenge. One solution to address admixture processes that vary over space or time involves simulation-based demographic inference frameworks, such as approximate Bayesian computation and machine learning-based approaches. For example, MetHis is an approximate Bayesian computation-based approach for inference under complex two-way admixture models [48,83]. An advantage of simulation-based demographic inference methods over models that use a likelihood is that they can handle arbitrarily complicated admixture scenarios, accommodate any calculable feature of genomic data (such as tracts that are identical by descent (IBD) and runs of homozygosity (ROH)), and even conduct summary-statistic-free inference [84]. Continued work to extend these methods will enable disentangling the myriad of historical, evolutionary and socio-cultural factors contributing to human admixture processes. Studying the genomes of admixed populations can also provide insight into the genetic origins and demographic histories of their founding populations, particularly for source ancestries that are no longer commonly represented by an extant single-ancestry population [18,19]. An increasingly popular approach is to first estimate local ancestry, then separately apply classic single-population methods on the subsets of the genome that are inferred to be from each source. This is exemplified by the ancestry-specific PCA (ASPCA) method, which performs PCA separately for each contributing source ancestry, as identified by local ancestry inference methods. This approach has revealed previously unappreciated variation in the European and Amerindigenous ancestry sources of admixed Latinos across Mexico [85], the Caribbean [86] and South America [87]. Local ancestry inference can also be used to unravel source-specific historical population size dynamics. The process of admixture often involves bottlenecks at the time of founding, the timing and strength of which Browning et al. [88] demonstrated can be inferred using ancestry-specific IBD. This approach combines estimates of local ancestry and IBD for admixed groups to estimate the past effective population sizes of each of the source ancestries. They found ancestry-specific population size changes, including variable bottleneck severity . Moving forward, combining ancestry-based inference with patterns of homozygosity and IBD may help elucidate these complex and dynamic population histories. For example, homozygosity and IBD are shaped by the relationships between mating pairs, which are in turn influenced by sociocultural processes [65,67,86,89,90]. However, we lack theoretical expectations for the distributions of ROH and IBD segments after admixture, which may break up local patterns of homozygosity while also involving major changes in genome-wide variation due to the mixing of previously isolated populations. Recent empirical explorations suggest that, in particular, ROH in admixed populations reflect both contributions from source populations and post-admixture population dynamics.

Detecting selection

Adaptation to biotic and abiotic environments leaves signatures in patterns of human genetic variation that can be used to identify adaptive loci and infer their selection history [91-94]. However, admixture can confound this inferential process and obscure the detection of genomic targets of selection by producing genetic signatures that are classically interpreted as signatures of selection [95-98]. Additionally, the long-range geographical movement of people associated with recent admixture may introduce novel selective pressures. Under certain scenarios, selection may indeed be easier to detect in admixed populations than in single-ancestry populations with the additional information provided by ancestry patterns [99-102]. That is, inferring selection from admixed genomes poses unique challenges, but also opportunities for new insights into human adaptation. As described previously, admixed populations are often considered as a linear combination of their sources such that the expected allele frequency of a locus is an average of the allele frequencies in each source population at that locus weighted by their proportional contribution to the admixed population. Loci that dramatically differ from this expectation are candidates for loci under selection (reviewed in Adams & Ward [45], and Chakraborty [3]). Outlier methods have been used to detect selection with a variety of summary statistics in single-ancestry populations, including early work by Lewontin and Krakauer [103], and more recently, IBD or ROH. However, non-equilibrium demographic processes such as bottlenecks and gene flow can change the distribution of these statistics across the genome, leading to false positives or complicating interpretation of these outlier methods [104-107]. When using methods not specifically developed for admixed populations, admixture can lead to both increased false-positive rates and decreased power to detect both pre-admixture selection (i.e. selection that happened in the source populations) and post-admixture selection [96]. Recent methods often leverage ancestry information to detect post-admixture adaptation, independently based on ancestry distributions, or in combination with other classic summary statistics [99-102,108-110]. When selective pressures are shared between admixed populations and one of their sources, admixture-mediated adaptation may occur through contributions of an adaptive allele from that source population. This may be a particularly rapid mode of adaptation because the allele is often introduced into the admixed population at intermediate to high frequency (proportional to the admixture contribution from that source), decreasing stochastic loss. If the adaptive allele is common in one source population but rare in the other(s), then as that allele rises in frequency in the admixed population, so will the corresponding local ancestry at that locus. This observation has led to a common method to detect post-admixture selection: scanning for outliers in local ancestry compared to genome-wide ancestry. Empirical studies have identified numerous candidate regions under selection post-admixture using ancestry outlier methods [108,110-112]; however, this approach has several limitations. The distribution of local ancestry within a population is influenced by a complex interplay of selective and demographic histories, and current theoretical understanding is limited, making the choice of cutoff for identifying outliers somewhat arbitrary [113]. More fundamentally, an ancestry-outlier approach is only suitable in situations where the allele frequencies in the source populations differ substantially, which couples allele frequency changes with a single source's ancestry. In figure 2 we demonstrate this coupling by simulating admixture with equal contributions from two sources, followed by 12 generations of strong selection (s = 0.05) at the adaptive locus; the proportion of simulations in which the adaptive locus is an outlier increases with increasing FST between the sources. Additionally, the power of outlier approaches to localize adaptive loci depends on the length of the admixture tract containing the locus, and therefore the selection history. Finally, while useful for identifying adaptive loci, these methods must be combined with other information or simulations to infer parameters of the population's history such as the strength or timing of selection.

Figure 2

Ancestry outlier tests for post-admixture selection are underpowered when source differentiation is low. We examine how FST between two source populations at a selected locus affects the power of a local ancestry outlier approach to detect selection. Whole-genome simulations were conducted in SLiM [114]. We simulated 50 sets of 10 000 individuals under a two-way admixture model with equal contributions from the sources, with Population A contributing an allele that is under strong selection (s = 0.05) in the admixed population for 12 generations. For increasing values of FST along the x axis, we plot (a) the proportion of simulations in which the selected locus would be classified as an ‘outlier’ in local ancestry frequency from Population A for multiple genome-wide thresholds, and (b) the rank of the selected locus among all loci genome-wide for ancestry from Population A. Even with relatively strong selection and complete differentiation between source ancestries (i.e. FST = 1) at the selected locus, it frequently failed to appear as a Population A ancestry outlier, potentially because selection had not had long enough to act, resulting in other loci having higher local ancestry frequencies in the population by chance. Similarly, the rank (with all loci ordered by frequency of local ancestry from Population A) of the selected locus increases with increasing differentiation between source populations at the locus. We simulated 6 diploid individuals per source population, and use the (potentially multiple) allele frequency combinations that produce the five values of FST plotted, specifying that Population A's frequency was equal to or higher than Population B's. From these, we randomly chose a starting allele frequency combination for the source populations for each of the 50 simulations. (Online version in colour.) Ongoing work extends initial implementations of ancestry-outlier approaches to study post-admixture selection, and often uses simulations to improve interpretation and test power [99-102,115,116]. These methods have recovered classic examples of selected loci from the genomes of admixed populations and inferred the timing, strength and repeatability of selection under different scenarios. For example, our work in Hamid et al. [100] found signatures of adaptation to malaria via the DARC gene in the admixed population of Cabo Verde based on long, high frequency African ancestry tracts. Hamid et al. [100] further used simulation-based inference to infer the strength of selection. This study's findings reinforced others that have identified post-admixture selection pressure to retain African ancestry at DARC, a known malaria susceptibility locus, in multiple admixed populations on multiple timescales [99,112,117-119]. It also provides an example of combining ancestry-specific summary statistics with simulations to both localize selection and infer parameters of the selection history. While these recent studies using empirically driven summary statistics have proven informative in certain scenarios, more work is needed to develop expectations of the distributions of ancestry under models of selection with admixture. Indeed, recent work has suggested perhaps unexpected relationships between ancestry tract lengths, allele frequencies and selection history, emphasizing the need for additional theory [120].

Understanding complex trait architecture and predicting genetic traits

For decades, human genetics research has aspired to make personalized medical therapies a reality by improving the prediction of traits from genetic data; while progress has been made on the genetic prediction of traits in recent years, its potential for making personalized medicine a reality may only be currently applicable to individuals of European ancestry [121]. Genome-wide association studies (GWAS) have been the standard framework for studying the genetic basis of complex traits for over 15 years, in which variants across the genome are tested individually for statistical association with a trait of interest. GWAS studies have also formed the statistical foundation for polygenic scores (PGS), in which complex quantitative traits (e.g. height or cholesterol level) are predicted under Fisher's infinitesimal model using the sum of an individual's observed genotypes weighted by GWAS-inferred effect sizes. Admixture complicates the identification of genetic underpinnings of complex traits. For example, GWAS generally assume that there are no systematic differences in the genetic variation of the study cohort except in those variants that underlie the trait of interest. Yet patterns of ancestry vary widely across individuals within an admixed population, both at the genome-wide level and within regions of the genome, as shown in figure 1. Local ancestry block structures induced by admixture processes cannot be controlled for using genome-wide ancestry (e.g. principal components) as covariates, as is standard practice in GWAS, and as a result GWAS of admixed populations may have inflated error rates [10,22,122,123]. That is, admixture introduces complex population structure and linkage blocks that, if unaccounted for, can identify false-positive variant–trait associations. Recent research has shown that variant-level effect sizes on a given trait estimated from GWAS tend to be ancestry- or even study-specific [124-127]. This severely limits the ability to use effect sizes estimated in a sample from one ancestry to predict trait levels in a sample from a different ancestry, which generally results in poor trait prediction accuracy for individuals who were not part of the discovery GWAS, even if from the same ancestry [69,124,126,128]. Increasingly, research suggests that by excluding individuals from admixed populations (as well as from non-admixed minority populations), geneticists are discarding a rich source of genomic information [26]. PGS accuracy could be improved with more comprehensive sequencing of cohorts of non-European ancestry [7,125,129,130], but must be coupled with new methods tailored to admixed populations and the patterns of linkage disequilibrium patterns and allele frequency variation that arise from their population histories (see also Fish et al. [131]). Furthermore, source ancestry contributions to admixed populations and their dynamics within admixed populations can change over time, leading to temporal variation in effect size estimates [69]. All of these factors can contribute to a loss of predictive power in individuals of admixed ancestry, even when accounting for local ancestry and using high-quality effect size estimates for all source ancestries [132,133]. In an effort to address these challenges for predicting traits in admixed populations, new frameworks are being developed to improve the performance of PGS in individuals from admixed populations, such as including local ancestry-based principal components to correct for heterogeneous patterns of population structure along the genome or subdividing the cohort by genome-wide ancestry and taking a meta-analysis approach [121].

Conclusion

Though it was not the focus of his paper, Lewontin [1] acknowledged the role of admixture in shaping distributions of genetic variation and included admixed populations in his analyses (see also box 1). In the intervening fifty years, population genetics research has continued to shed light on the importance of admixture processes for genetic variation and complex trait architectures. In certain scenarios, studying admixed populations may provide insight into general human evolutionary processes (for example, recombination as in [20,21]) and history beyond admixture itself because of the added information from ancestry-based statistics. Multiple future directions in research on admixture will extend our understanding of human evolution and the distribution of human genetic variation. First, there is a need for more theory regarding how natural selection interacts with admixed population histories (but see [120]). Figure 2, as well as multiple recent studies [101,102], show that common summary statistics to detect selection in admixed populations have variable power and often unclear interpretations. Moving beyond simple implementations of ancestry-outlier approaches, which provide a list of candidate loci, may also be useful for developing methods to infer the selective history of adaptive loci. Second, the study of admixed populations is often based on contrasting genetic variation in admixed populations against that of reference populations for source ancestries, even if accurate references are not available. Reference-free methods have proven useful for estimating global ancestry, for example in unsupervised implementations of ADMIXTURE and STRUCTURE, yet remain rare for local ancestry assignment (but see [134]). Finally, methods have thus far focused on positive selection primarily at single loci, and more work is needed to study other directions or genetic architectures under selection, such as background and polygenic selection. An important step for interpreting signals of adaptation is understanding the genetic basis of traits. Towards this goal, multiple recent studies have focused on methods for predicting quantitative traits in admixed populations [69,125,128,129,135], and offer new insight into how admixture linkage disequilibrium specifically confounds the identification of shared genetic associations. Prioritization of sampling from admixed populations for association studies would increase power to accurately estimate effect sizes for these groups rather than relying on GWAS results from proxies for their ancestral sources [121,122,129].

123 in total

1. On the number of segregating sites in genetical models without recombination.

Authors: G A Watterson
Journal: Theor Popul Biol Date: 1975-04 Impact factor: 1.570

2. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference.

Authors: Brian K Maples; Simon Gravel; Eimear E Kenny; Carlos D Bustamante
Journal: Am J Hum Genet Date: 2013-08-01 Impact factor: 11.025

3. Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions.

Authors: Y Zhou; K Yuan; Y Yu; X Ni; P Xie; E P Xing; S Xu
Journal: Heredity (Edinb) Date: 2017-02-15 Impact factor: 3.821

4. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America.

Authors: Etienne Patin; Marie Lopez; Rebecca Grollemund; Paul Verdu; Christine Harmant; Hélène Quach; Guillaume Laval; George H Perry; Luis B Barreiro; Alain Froment; Evelyne Heyer; Achille Massougbodji; Cesar Fortes-Lima; Florence Migot-Nabias; Gil Bellis; Jean-Michel Dugoujon; Joana B Pereira; Verónica Fernandes; Luisa Pereira; Lolke Van der Veen; Patrick Mouguiama-Daouda; Carlos D Bustamante; Jean-Marie Hombert; Lluís Quintana-Murci
Journal: Science Date: 2017-05-05 Impact factor: 47.728

5. Mathematical model for studying genetic variation in terms of restriction endonucleases.

Authors: M Nei; W H Li
Journal: Proc Natl Acad Sci U S A Date: 1979-10 Impact factor: 11.205

6. The transmission/disequilibrium test: history, subdivision, and admixture.

Authors: W J Ewens; R S Spielman
Journal: Am J Hum Genet Date: 1995-08 Impact factor: 11.025

7. Inference on admixture fractions in a mechanistic model of recurrent admixture.

Authors: Erkan Ozge Buzbas; Paul Verdu
Journal: Theor Popul Biol Date: 2018-03-28 Impact factor: 1.570

8. Strong selection during the last millennium for African ancestry in the admixed population of Madagascar.

Authors: Denis Pierron; Margit Heiske; Harilanto Razafindrazaka; Veronica Pereda-Loth; Jazmin Sanchez; Omar Alva; Amal Arachiche; Anne Boland; Robert Olaso; Jean-Francois Deleuze; Francois-Xavier Ricaut; Jean-Aimé Rakotoarisoa; Chantal Radimilahy; Mark Stoneking; Thierry Letellier
Journal: Nat Commun Date: 2018-03-02 Impact factor: 14.919

9. Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference.

Authors: Xumin Ni; Xiong Yang; Wei Guo; Kai Yuan; Ying Zhou; Zhiming Ma; Shuhua Xu
Journal: Sci Rep Date: 2016-01-28 Impact factor: 4.379