| Literature DB >> 33638983 |
Burak Yelmen1,2, Davide Marnetto1, Ludovica Molinaro1,2, Rodrigo Flores1, Mayukh Mondal1, Luca Pagani1,3.
Abstract
Detecting natural selection signals in admixed populations can be problematic since the source of the signal typically dates back prior to the admixture event. On one hand, it is now possible to study various source populations before a particular admixture thanks to the developments in ancient DNA (aDNA) in the last decade. However, aDNA availability is limited to certain geographical regions and the sample sizes and quality of the data might not be sufficient for selection analysis in many cases. In this study, we explore possible ways to improve detection of pre-admixture signals in admixed populations using a local ancestry inference approach. We used masked haplotypes for population branch statistic (PBS) and full haplotypes constructed following our approach from Yelmen et al. (2019) for cross-population extended haplotype homozygosity (XP-EHH), utilizing forward simulations to test the power of our analysis. The PBS results on simulated data showed that using masked haplotypes obtained from ancestry deconvolution instead of the admixed population might improve detection quality. On the other hand, XP-EHH results using the admixed population were better compared with the local ancestry method. We additionally report correlation for XP-EHH scores between source and admixed populations, suggesting that haplotype-based approaches must be used cautiously for recently admixed populations. Additionally, we performed PBS on real South Asian populations masked with local ancestry deconvolution and report here the first possible selection signals on the autochthonous South Asian component of contemporary South Asian populations.Entities:
Keywords: PBS; South Asia; XP-EHH; admixed populations; local ancestry inference; natural selection
Mesh:
Year: 2021 PMID: 33638983 PMCID: PMC8046333 DOI: 10.1093/gbe/evab039
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.SNP by SNP PBS comparison for SNPs with PBS values above 99% threshold (SNPs selected based on Han and French source population scores) using Spearman’s correlation. (a) Han vs naive (correlation coefficient: 0.409, 95% confidence interval: 0.377–0.441, p-value <2.2e−16, n = 2,529), Han vs MASK_S_ELAI (correlation coefficient: 0.510, 95 percent confidence interval: 0.479–0.542, p-value <2.2e−16, n = 2,529), Han versus MASK_S_PCAdmix (correlation coefficient: 0.210, 95% confidence interval: 0.169–0.252, p-value <2.2e−16, n = 2,529). (b) French vs naive (correlation coefficient: 0.400, 95% confidence interval: 0.367–0.436, p-value <2.2e−16, n = 2,530), French versus MASK_N_ELAI (correlation coefficient: 0.452, 95% confidence interval: 0.422–0.483, p-value <2.2e−16, n = 2,530), French versus MASK_N_PCAdmix (correlation coefficient: 0.394, 95% confidence interval: 0.354–0.432, p-value <2.2e−16, n = 2,530).
Comparison of Possible Selection Signals Based on Mean PBS Scores (Above 99.5% Noted as Positive) for 50 kb Windows with TP, FP, FN, TPR (Measuring the Fraction of Correctly identified positives), and FDR, Measuring Expected Fraction of FPs) Indicators for Each Tested Population (Admixed Naive along with PCAdmix and ELAI-Masked Populations, see Materials & Methods for details) Compared with True Source Population
| TP | FP | FN | TPR | FDR | |
|---|---|---|---|---|---|
| Naive | 25 | 39 | 29 | 0.46 | 0.61 |
| MASK_S_ELAI | 27 | 23 | 27 | 0.50 | 0.46 |
| MASK_S_PCAdmix | 25 | 18 | 29 | 0.46 | 0.42 |
| Naive | 50 | 33 | 60 | 0.45 | 0.40 |
| MASK_N_ELAI | 41 | 21 | 69 | 0.37 | 0.34 |
| MASK_N_PCAdmix | 68 | 38 | 42 | 0.62 | 0.36 |
Fig. 2.PBS on masked South Asian genomes (MASK_S) with the dashed line marking 99.9% threshold. Genes within 50-kb range of the highest peaks are annotated.