| Literature DB >> 30952160 |
Burak Yelmen1,2, Mayukh Mondal1, Davide Marnetto1, Ajai K Pathak1,2, Francesco Montinaro1,3, Irene Gallego Romero4, Toomas Kivisild1,5, Mait Metspalu1, Luca Pagani1,6.
Abstract
Genetic variation in contemporary South Asian populations follows a northwest to southeast decreasing cline of shared West Eurasian ancestry. A growing body of ancient DNA evidence is being used to build increasingly more realistic models of demographic changes in the last few thousand years. Through high-quality modern genomes, these models can be tested for gene and genome level deviations. Using local ancestry deconvolution and masking, we reconstructed population-specific surrogates of the two main ancestral components for more than 500 samples from 25 South Asian populations and showed our approach to be robust via coalescent simulations. Our f3 and f4 statistics-based estimates reveal that the reconstructed haplotypes are good proxies for the source populations that admixed in the area and point to complex interpopulation relationships within the West Eurasian component, compatible with multiple waves of arrival, as opposed to a simpler one wave scenario. Our approach also provides reliable local haplotypes for future downstream analyses. As one such example, the local ancestry deconvolution in South Asians reveals opposite selective pressures on two pigmentation genes (SLC45A2 and SLC24A5) that are common or fixed in West Eurasians, suggesting post-admixture purifying and positive selection signals, respectively.Entities:
Keywords: South Asia; ancestry deconvolution; post-admixture selection; skin color
Mesh:
Year: 2019 PMID: 30952160 PMCID: PMC6657728 DOI: 10.1093/molbev/msz037
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.PCA of ARBs (sources: French, Paniya-target: all South Asian populations) and modern populations. In order not to bias the PCA for an excess of South Asian samples, a single haploid ARB_N and ARB_S (colored) for each group was selected based on their proximity to medians of PC1 and PC2 values of each group obtained from initial PCA (supplementary fig. 19, Supplementary Material online, see Materials and Methods). Insets show a zoom on ARB_N and ARB_S samples, respectively.
. 2.qpGraph model of N and S ancestries. The qpGraph reported here places the obtained N and S deconvoluted ancestries (represented here by Gujaratis_N and Irula_S, respectively) within the broader scenario of within Eurasia splits and subsequent admixtures. Alternative trees involving a different Onge/Han/S split order, or a reverse Steppe_EMBA – S admixture direction yielded f4 outliers and an overall poorer fit. Final score: 3,061.001, degrees of freedom: 2, no f2 outliers, no f4 outliers, worst f-stat: 1.747.
. 3.Derived frequency differences between source and deconvoluted populations. In the scatter plot, we report the difference in derived frequency between GIH_N and GIH_S against the same difference between the source populations TSI and ITU: all SNPs from 1000 Genomes sequences are included in the cloud. Phenotype-informative SNPs reported in supplementary table 3, Supplementary Material online, are provided as dots. The vertical and horizontal gray lines delimit the top and bottom 2.5% most extreme values in each axis independently. The oblique red and gray lines represent, respectively, the best fit linear regression and the diagonal.
Top Five Hits for N- and S-Related Selected Regions.
| Component | Position (Chr:Start–End) | Number of Populations with Significant Value | Genes (±50-kb Region) | Number of Ten SNPS Regions |
|---|---|---|---|---|
|
| 3:9,363,925–9,595,374 | 22 (percentile = 99.9949) |
| 2 |
| 6:84,399,772–85,572,756 | 21 (percentile = 99.9814) |
| 2 | |
| 6:30,079,993–30,257,693 | 21 (percentile = 99.9814) |
| 1 | |
| 14:97,636,701–97,715,909 | 19 (percentile = 99.9383) |
| 1 | |
| 19:23,930,879–24,368,053 | 18 (percentile = 99.9195) |
| 1 | |
|
| 5:33,944,217–34,032,014 | −21 (percentile = 0.0057) |
| 2 |
| 20:652,097–694,894 | −16 (percentile = 0.038) |
| 1 | |
| 8:116,208,407–116,308,464 | −16 (percentile = 0.038) |
| 1 | |
| 9:12,276,668–12,460,256 | −15 (percentile = 0.0757) |
| 2 | |
| 8:54,578,044–55,071,319 | −14 (percentile = 0.1268) |
| 1 |
Note.—Only populations with admixture imbalance score beyond 2.5 percentile were counted. Positive values represent N excess and negative values represent S excess.
. 4.Admixture imbalance (AHFD) percentile values of ten SNP windows including the rs1426654 (SLC24A5, squares) and rs16891982 (SLC45A2, dots) markers. The window including the SLC45A2 marker is in the bottom (green) percentile of most South Asian populations, hence showing a significant excess of the S haplotypes compared with the genome-wide average per population. The window including the SLC24A5 marker shows instead a more balanced pattern, showing moderate to high scores (orange to purple, hence toward an excess of N haplotypes) in West and North South Asia and orange to green (hence toward a prevalence of S haplotypes) in East and South Asia.