| Literature DB >> 35430881 |
Shyamalika Gopalan1, Samuel Pattillo Smith2,3, Katharine Korunes1, Iman Hamid1, Sohini Ramachandran2,3,4, Amy Goldberg1.
Abstract
Over the past 50 years, geneticists have made great strides in understanding how our species' evolutionary history gave rise to current patterns of human genetic diversity classically summarized by Lewontin in his 1972 paper, 'The Apportionment of Human Diversity'. One evolutionary process that requires special attention in both population genetics and statistical genetics is admixture: gene flow between two or more previously separated source populations to form a new admixed population. The admixture process introduces ancestry-based structure into patterns of genetic variation within and between populations, which in turn influences the inference of demographic histories, identification of genetic targets of selection and prediction of complex traits. In this review, we outline some challenges for admixture population genetics, including limitations of applying methods designed for populations without recent admixture to the study of admixed populations. We highlight recent studies and methodological advances that aim to overcome such challenges, leveraging genomic signatures of admixture that occurred in the past tens of generations to gain insights into human history, natural selection and complex trait architecture. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.Entities:
Keywords: admixture; genetic diversity; population genetics
Mesh:
Year: 2022 PMID: 35430881 PMCID: PMC9014191 DOI: 10.1098/rstb.2020.0410
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.671
Figure 1Ancestry in admixed populations varies at multiple genetic scales, with variance among individuals and within individual genomes. We show examples of global and local ancestry inferred from phased 1000 Genomes Project data for populations of the Americas and Caribbean. Global ancestry was estimated using unsupervised ADMIXTURE analysis, including additional populations of European (Iberians (IBS) and Tuscans in Italy (TSI)) and West African (Esan (ESN), Mandinka (GWD), Mende (MSL) and Yoruba (YRI)) ancestry for reference. We show (a) population-level and (b) individual-level estimates of global ancestry across Mexican ancestry (MXL), Peruvian (PEL), Colombian (CLM), Puerto Rican (PUR), African ancestry (ASW) and Barbadian (ACB) populations; barplots illustrating these estimates for K = 3 were made using pong [6]. (c) Local ancestry as inferred by RFMix [7] for two example individuals (HG01149 and NA19776) who have similar global ancestry proportions, and belong to the CLM and MXL populations, respectively. For these analyses, we retained only SNPs marked ‘PASS’ and removed all individuals who were noted to have an up to third degree relative in the 1000 Genomes Project phase 3 pedigree file, leaving 998 individuals for analysis. We then filtered SNPs for missingness (greater than 5%) and low minor allele frequency (less than 1%) across all populations, and Hardy–Weinberg disequilibrium (p-value < 0.000001) within populations. For our ADMIXTURE analyses, we also removed SNPs in linkage disequilibrium (using the PLINK command – indep-pairwise 50 10 0.1), which left 698 408 SNPs for analysis. We ran the ADMIXTURE algorithm for K = 3 unsupervised using the default settings and a random seed. Pong identified a single mode across 30 replicates. To estimate local ancestry, we used the missingness, minor allele frequency and Hardy–Weinberg filtered phased genotype dataset. We designated individuals with high levels (over 99%) of global West African (AFR), Amerindigenous (AMR) and European (EUR) ancestry, as determined by our ADMIXTURE analysis, as reference groups for those respective ancestries. We ran RFMix v. 2.03 for the target Colombian and Mexican ancestry individuals using the HapMap GRCh37 genetic map lifted over to GRCh38, a maximum of two expectation-maximization iterations, and otherwise default parameters. (Online version in colour.)
Figure 2Ancestry outlier tests for post-admixture selection are underpowered when source differentiation is low. We examine how FST between two source populations at a selected locus affects the power of a local ancestry outlier approach to detect selection. Whole-genome simulations were conducted in SLiM [114]. We simulated 50 sets of 10 000 individuals under a two-way admixture model with equal contributions from the sources, with Population A contributing an allele that is under strong selection (s = 0.05) in the admixed population for 12 generations. For increasing values of FST along the x axis, we plot (a) the proportion of simulations in which the selected locus would be classified as an ‘outlier’ in local ancestry frequency from Population A for multiple genome-wide thresholds, and (b) the rank of the selected locus among all loci genome-wide for ancestry from Population A. Even with relatively strong selection and complete differentiation between source ancestries (i.e. FST = 1) at the selected locus, it frequently failed to appear as a Population A ancestry outlier, potentially because selection had not had long enough to act, resulting in other loci having higher local ancestry frequencies in the population by chance. Similarly, the rank (with all loci ordered by frequency of local ancestry from Population A) of the selected locus increases with increasing differentiation between source populations at the locus. We simulated 6 diploid individuals per source population, and use the (potentially multiple) allele frequency combinations that produce the five values of FST plotted, specifying that Population A's frequency was equal to or higher than Population B's. From these, we randomly chose a starting allele frequency combination for the source populations for each of the 50 simulations. (Online version in colour.)