| Literature DB >> 24066090 |
Emile R Chimusa1, Michelle Daya, Marlo Möller, Raj Ramesar, Brenna M Henn, Paul D van Helden, Nicola J Mulder, Eileen G Hoal.
Abstract
Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes. We applied this approach to a complex, uniquely admixed South African population. Using genome-wide SNP data from over 764 individuals, we accurately estimate the genetic contributions from the best ancestral populations: isiXhosa [Formula: see text], ‡Khomani SAN [Formula: see text], European [Formula: see text], Indian [Formula: see text], and Chinese [Formula: see text]. We also demonstrate that the ancestral allele frequency differences correlate with increased linkage disequilibrium in the South African population, which originates from admixture events rather than population bottlenecks. NOMENCLATURE: The collective term for people of mixed ancestry in southern Africa is "Coloured," and this is officially recognized in South Africa as a census term, and for self-classification. Whilst we acknowledge that some cultures may use this term in a derogatory manner, these connotations are not present in South Africa, and are certainly not intended here.Entities:
Mesh:
Year: 2013 PMID: 24066090 PMCID: PMC3774743 DOI: 10.1371/journal.pone.0073971
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Proxy Ancestry Score: results from simulation Data.
| Populations | PScore | Standard Error | Z |
| African non-click Speaking Group | |||
| isiXhosa | −0.124 | 1.138 | 219.793 |
| Bantu South Africa | −0.015 | 0.001 | 28.648 |
| Yoruba | −0.010 | 0.001 | 27.101 |
| Kongo | −0.008 | 0.001 | 40.658 |
| Herero | −0.008 | 0.001 | 28.306 |
| South Asia Group | |||
| Gujarati | 0.015 | 0.007 | 223.504 |
| Pathan | −0.007 | 0.001 | 26.427 |
| Druze | −0.008 | 0.001 | 22.115 |
| East Asia Group | |||
| CHD | −0.001 | 0.003 | 118.144 |
| Dai | −0.008 | 0.001 | 30.695 |
| Daur | −0.007 | 0.001 | 42.628 |
| Japanese | −0.008 | 0.001 | 26.487 |
| European Group | |||
| CEU | 0.019 | 0.009 | 274.700 |
| Russian | −0.008 | 0.001 | 33.347 |
| Italian | −0.008 | 0.001 | 30.793 |
| French | −0.008 | 0.001 | 30.716 |
| African click-speaking Group | |||
| ‡Khomani | 0.010 | 0.007 | 174.846 |
| Ju|’hoan | −0.007 | 0.001 | 35.968 |
| Bushmen | −0.007 | 0.001 | 34.664 |
| SAN | −0.008 | 0.001 | 25.196 |
Proxy-ancestry score for 5 distinct pools, including African (isiXhosa, Bantu South Africa, Yoruba, Kongo, Herero), South Asia (Gujarati, Pathan, Druze), East Asia (CHD, Dai, Daur, Japanese), European (CEU, Russian, Italian, French) and click-speaker groups (‡Khomani, Ju|’hoan, Bushmen, SAN) using the simulated data. The results indicate that the highest scores in each pool are from CEU, ‡Khomani, isiXhosa, Chinese (CHD) and Gujarati.
Top 16 linear combinations that minimize the objective function between simulated data and a combination of 5 reference populations.
| Population Linear Combination | F | Standard error | 95% |
| (isiXhosa, Gujarati, CHD, CEU, ‡Khomani) | −0.00075 | 0.0005599 | (−0.001, 0.0005) |
| (isiXhosa, GIH, CHD, CEU, SAN) | −0.00058 | 0.0005599 | (−0.001, 0.0005) |
| (isiXhosa, GIH, CHD, Italian, SAN) | −0.00057 | 0.0005599 | (−0.001, 0.0005) |
| (isiXhosa, GIH, CHD, Italian, ‡Khomani) | −0.00054 | 0.0005599 | (−0.001, 0.005) |
| (isiXhosa, GIH, Japanese, Italian, SAN) | −0.00053 | 0.0005586 | (−0.001, 0.0005) |
| (isiXhosa, GIH, Japanese, Italian, ‡Khomani) | −0.00054 | 0.0005586 | (−0.001, 0.0005) |
| (isiXhosa, GIH, Japanese, CEU, SAN) | −0.00051 | 0.0005585 | (−0.001, 0.0005) |
| (isiXhosa, GIH, Japanese, CEU, ‡Khomani) | −0.00054 | 0.0005586 | (−0.001, 0.0005) |
| (Yoruba, GIH, CHD, Italian, SAN) | −0.000371 | 0.0001110 | (−0.0005, −0.0001) |
| (Yoruba, GIH, CHD, Italian, ‡Khomani) | −0.000361 | 0.0001110 | (−0.0005, −0.0001) |
| (Yoruba, GIH, CHD, CEU, SAN) | −0.000371 | 0.0001110 | (−0.0005, −0.0001) |
| (Yoruba, GIH, CHD, CEU, ‡Khomani) | −0.000372 | 0.0001110 | (−0.0005, −0.0001) |
| (Yoruba, GIH, Japanese, Italian, SAN) | −0.000362 | 0.0001085 | (−0.0005, −0.0001) |
| (Yoruba, GIH, Japanese, Italian, ‡Khomani) | −0.000365 | 0.0001085 | (−0.0006, −0.0001) |
| (Yoruba, GIH, Japanese, CEU, SAN) | −0.000362 | 0.0001085 | (−0.0005, −0.0001) |
| (Yoruba, GIH, Japanese, CEU, ‡Khomani) | −0.000362 | 0.0001085 | (−0.0005, −0.0001) |
The top linear combination is CEU, ‡Khomani, isiXhosa, Chinese (CHD) and Gujarati, consistent with Table 1 and with our simulation scheme.
Statistic: the signal of admixture in the simulation data.
| Pop 1 | Pop 2 | Target | f3 | Standard Error | Z |
| CEU | SAN | Simulated data | −0.00827 | 0.00149 | −5.57 |
| CEU | CHD | Simulated data | 0.01321 | 0.00085 | 15.58 |
| CEU | Gujarati | Simulated data | 0.02476 | 0.00079 | 31.33 |
| CEU | Herero | Simulated data | −0.00586 | 0.00140 | −4.18 |
| CEU | isiXhosa | Simulated data | −0.01748 | 0.00049 | −36.0 |
| CEU | ‡Khomani | Simulated data | −0.0163 | 0.00051 | −32.13 |
| CEU | Pathan | Simulated data | −0.00602 | 0.00156 | −3.86 |
| CEU | Russian | Simulated data | −0.00451 | 0.00137 | −3.29 |
| CHD | SAN | Simulated data | −0.00289 | 0.00208 | −1.39 |
| CHD | Gujarati | Simulated data | 0.02148 | 0.000794 | 27.134 |
| CHD | isiXhosa | Simulated data | −0.01389 | 0.00057 | −24.19 |
| CHD | Italian | Simulated data | −0.00178 | 0.00166 | −1.07 |
| CHD | Japanese | Simulated data | −0.00352 | 0.00157 | −2.24 |
| CHD | ‡Khomani | Simulated data | −0.01133 | 0.00058 | −19.53 |
| CHD | Pathan | Simulated data | −0.00308 | 0.00163 | −1.89 |
| CHD | Russian | Simulated data | −0.00111 | 0.00167 | −0.7 |
| Gujarati | isiXhosa | Simulated data | −0.01537 | 0.00049 | −31.34 |
| Gujarati | ‡Khomani | Simulated data | −0.01452 | 0.00051 | −28.27 |
| ‡Khomani | Druze | Simulated data | −0.00139 | 0.00106 | −1.321 |
| ‡Khomani | French | Simulated data | −0.00151 | 0.00098 | −1.54 |
| ‡Khomani | Herero | Simulated data | −0.00084 | 0.00105 | −0.80 |
| ‡Khomani | isiXhosa | Simulated data | 0.00247 | 0.00036 | 6.79 |
| ‡Khomani | Italian | Simulated data | −0.00128 | 0.00103 | −1.24 |
| ‡Khomani | Japanese | Simulated data | −0.00042 | 0.00104 | −0.40 |
| ‡Khomani | Kongo | Simulated data | −0.00076 | 0.00096 | −0.79 |
| ‡Khomani | Pathan | Simulated data | −0.00023 | 0.00107 | −0.22 |
| ‡Khomani | Russian | Simulated data | −0.0011 | 0.00097 | −1.1 |
Statistic: the signal of admixture in the simulation data (simulation obtained from 5-way admixture of ‡Khomani, isiXhosa, Chinese (CHD) Gujarati Indian and CEU) using pair-wise ancestral populations. The statistic fails to provide clear evidence/non-evidence of population admixture based on simulated data of 5-way admixed population.
Figure 1Comparison of true individual admixture proportions versus those estimated using appropriate and inappropriate proxy ancestry.
(A) Plot of the estimated individual’s ancestry from best proxy ancestry (IsiXhosa:blue) and the true individual’s ancestry from the 750 admixed individuals (Black) obtained from the simulation. Plot of inappropriate proxy ancestry (Yoruba:red) estimated individual’s ancestry and the true individual’s ancestry from the 750 admixed individuals (Black) obtained from the simulation (see Materials and Methods). (B) Plot of the true ancestry versus the estimated individual’s ancestry from best proxy ancestry (IsiXhosa) and the estimated individuals ancestry from inappropriate proxy ancestry (Yoruba).
Figure 2Figure 2. Genotype call rate when imputing missing genotypes for the simulated data.
2044 SNPs were imputed for the simulated data using 4 sets of reference populations. Panels included Black: (CEU, CHD, GIH, isiXhosa, ‡Khomani), Green: (Initial samples from CEU, CHD,GIH, isiXhosa, ‡Khomani), Blue: All populations used to evaluate PROXYANC (see Materials and Methods) and Red: (Russia, Japanese, Palestine, Yoruba and Ju|’hoan). Using a panel of best proxy ancestral populations of multi-way admixed population can produce as similar accurate results of the imputation of missing genotype as using all available reference populations, and highlights the benefit of using correct proxy ancestral populations through the imputation of missing genotype in multi-way admixed populations which may reduce the computational cost of the imputation engine to choose the best haplotype among several available reference populations.
Proxy Ancestry Score: results from the South African Coloured.
| Populations | PScore | Standard Error | Z |
| South Asia Group | |||
| Kalash | −0.003 | 0.001 | 1483.76 |
| Gujarati | 0.003 | 0.001 | 2224.43 |
| Pathan | −0.002 | 0.001 | 1511.30 |
| African Non-Click Speaking Group | |||
| Fulani | 0.001 | 0.002 | 1822.48 |
| Bantu South Africa | 0.001 | 0.001 | 1822.48 |
| Yoruba | 0.004 | 0.001 | 2282.03 |
| Tswana | 0.003 | 0.001 | 2237.05 |
| isiXhosa | 0.003 | 0.001 | 2350.63 |
| Bamoun | −0.002 | 0.001 | 1769.27 |
| Brong | 0.001 | 0.001 | 2013.24 |
| Herero | 0.002 | 0.001 | 2180.48 |
| African Click-speak Group | |||
| SAN | 0.002 | 0.001 | 2150.70 |
| Hadza | −0.003 | 0.001 | 1783.85 |
| Sandawe | 0.001 | 0.001 | 2064.319 |
| Bushmen | −0.003 | 0.001 | 1784.10 |
| Ju|’hoan | 0.003 | 0.002 | 2206.76 |
| ‡Khomani | 0.007 | 0.001 | 2612.07 |
| East Asia Group | |||
| She | −0.007 | 0.001 | 1181.64 |
| Dai | −0.003 | 0.001 | 1579.25 |
| Daur | −0.004 | 0.001 | 1329.53 |
| CHB | −0.003 | 0.001 | 1523.72 |
| CHD | −0.003 | 0.001 | 1544.38 |
| Japanese | −0.003 | 0.001 | 1443.25 |
| European Group | |||
| Sardinia | −0.003 | 0.001 | 1463.5 |
| Belgarmo | −0.001 | 0.001 | 1668.56 |
| CEU | 0.000 | 0.001 | 1891.314 |
| Russian | −0.002 | 0.001 | 1535.53 |
| French | −0.001 | 0.001 | 1723.62 |
Proxy-ancestry score for 5 distinct pools, including African non-click speaking group, East Asian, European, click-speaking group and South Asian populations using the SAC data. The result shows that the highest scores are from CEU, ‡Khomani, isiXhosa, Chinese and Gujarati in the relevant pool.
as an Objective Function: Results from South African Coloured Data.
| Pop Linear Combination | F | Standard error | 95% |
| (Gujarati, Sotho, ‡Khomani, CHB, CEU) | 0.0042 | 0.0010 | (−0.006, −0.0025) |
| (Gujarati, Sotho, ‡Khomani, CHB, Russian) | −0.0042 | 0.00102 | (−0.006, −0.0023) |
| (Gujarati, Sotho, ‡Khomani, CHD, CEU) | −0.0042 | 0.00101 | (−0.006, −0.0023) |
| (Gujarati, Sotho, ‡Khomani, CHD, Russian) | −0.0042 | 0.00101 | (−0.006, −0.0023) |
| (Gujarati, isiXhosa, ‡Khomani, CHB, CEU) | −0.00374 | 0.00060 | (−0.005, −0.003) |
| (Gujarati, isiXhosa, ‡Khomani, CHB, Russian) | −0.00374 | 0.00060 | (−0.005, −0.003) |
| (Gujarati, isiXhosa, ‡Khomani, CHD, CEU)* | −0.00374 | 0.00060 | (−0.005, −0.003) |
| (Gujarati, isiXhosa, ‡Khomani, CHD, Russian) | −0.00374 | 0.00060 | (−0.005,−0.003) |
| (Gujarati, Brong, ‡Khomani, CHB, CEU) | −0.02483 | 0.00605 | (−0.037, −0.013) |
| (Gujarati, Brong, ‡Khomani, CHB, Russian) | −0.02483 | 0.00605 | (−0.037, −0.013) |
| (Gujarati, Brong, ‡Khomani, CHD, CEU) | −0.02483 | 0.00605 | (−0.037, −0.013) |
| (Gujarati, Brong, ‡Khomani, CHD, Russian) | −0.02483 | 0.00605 | (−0.037, −0.013) |
Top 12 linear combinations that minimize the objective function between SAC data and a combination of 5 pools of reference populations. The top linear combination is CEU, ‡Khomani, isiXhosa, Chinese (CHD) and Gujarati, consistent with Table 4.
Summary mean and standard error on proportion of ancestral populations contributing to the genetic make-up of the South African Coloureds.
| This Study | ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
| – |
|
|
|
This table displays the mean and the standard errors of ancestral proportions with the best proxy ancestors obtained from PROXYANC, with the reference populations panel used in De Wit et al. [19] and the SAC’s ancestral proportions reported in Patterson et al. [20].
Figure 3Individual ancestry proportion and PCA based on 47863 autosomal SNPs in the SAC data.
(A) Population clustering analysis of the SAC using both the current selected best proxy ancestors as reference panel (First top figure) and reference panel used in De Wit et al.(B) Plot showing individual’s ancestry difference between panel of selected best proxy ancestral population of the SAC and the panel of reference population used in De Wit et al. [19]. This plot indicates a large difference of African ancestry of the SAC between the two analyses, suggesting the choice of African Ancestry of the SAC is critical and sensitive due to the diversity and closely relatedness of most African populations. (C) PCA on autosomal SNPS. Both first and second principal components show great genetic differentiation between the five proxy ancestral groups, where the SAC is in the convex hull of them.
Correlation between maximum expected admixture LD and the observed LD in the SAC.
| Pair-wise populations | P-value |
|
| (CHD, Gujarati) |
|
|
| (isiXhosa, Gujarati) |
|
|
| (CEU, CHD) | 0.92 |
|
| (CHD, ‡Khomani) |
|
|
| (‡Khomani, isiXhosa) |
|
|
| (‡Khomani, Gujarati) |
|
|
| (CEU, Gujarati) |
|
|
| (CEU, |
|
|
| (CHD, isiXhosa) |
|
|
| (CEU, isiXhosa) |
|
|
P-value obtained from the correlation between expected admixture LD from each pair of proxy ancestral group with respect to the observed LD in the SAC.