| Literature DB >> 32264823 |
Caitlin Uren1, Eileen G Hoal2, Marlo Möller2.
Abstract
BACKGROUND: Global and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms. The development and resulting accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios.Entities:
Keywords: ADMIXTURE; Local ancestry inference; Population genetics; RFMix; South Africa
Mesh:
Year: 2020 PMID: 32264823 PMCID: PMC7140372 DOI: 10.1186/s12863-020-00845-3
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Average admixture proportions
| Previously Reported (Uren et al. 2016) (%) | Simulation (%) | ADMIXTURE (unsupervised) (%) | ADMIXTURE (supervised) (%) | RFMix (%) | |
|---|---|---|---|---|---|
| Bantu-speaking African | 32 | 26 (95% CI: 25–28) | 33 (95% CI: 32–35) | 31 (95% CI: 30–33) | 27 (95% CI: 26–30) |
| KhoeSan | 30 | 33 (95% CI: 31–36) | 25 (95% CI: 23–27) | 34 (95% CI: 31–37) | 33 (95% CI: 30–36) |
| European | 19 | 23 (95% CI: 21–25) | 26 (95% CI: 24–29) | 21 (95% CI: 19–24) | 22 (95% CI: 20–24) |
| East Asian | 7 | 6 (95% CI: 5–9) | 7 (95% CI: 5–9) | 6 (95% CI: 5–8) | 6 (95% CI: 5–9) |
| South East Asian | 12 | 12 (95% CI: 10–15) | 9 (95% CI: 8–12) | 8 (95% CI: 7–11) | 12 (95% CI: 10–14) |
Fig. 1Comparison between observed global ancestry proportions and “true” proportions showing RFMix performs more accurately than ADMIXTURE in ancestry determination. Admixture proportions calculated by ADMIXTURE are in red (Unsupervised) and black (Supervised), and RFMix in blue. Root Mean Square Errors for every comparison are shown
Fig. 2Boxplot showing the robustness of RFMix when using inaccurate time since admixture estimates. Time since admixture of 10 (red), 15 (green) and 20 (blue) generations are shown. The median (bold horizontal line) and the upper and lower quartiles are shown. Data outside this range are plotted as outliers. The differences in accuracies across generations for each ancestry were assessed using a Wilcoxon non-parametric test. All statistically significant p values (< 0.01) are shown
Fig. 3Boxplot showing the accuracy with which RFMix assigns an ancestral origin to a genetic region, stratified by reference population. The median (bold horizontal line) and the upper and lower quartiles are shown. Data outside this range are plotted as outliers. The differences in accuracies across ancestries were assessed using a Wilcoxon non-parametric test. All statistically significant p values (< 0.01) are shown
Population characteristics of the final merged dataset
| Population | Number of individuals included |
|---|---|
| KhoeSan (Nama and ≠Khomani San) | 284 |
| European (British) | 79 |
| African (Yoruba and Luhya) | 35 |
| East Asian (Han) | 50 |
| South East Asian (Gujarati) | 103 |
Fig. 4Computational workflow. The full dataset (n = 499) was divided into a dataset used for the simulation (n = 55) and a dataset used for GAI and LAI (n = 444). Once the simulated SAC population was generated (including global and local ancestry estimations), these true values were compared to values emanating from ADMIXTURE and RFMix. For details, please see the methods section