| Literature DB >> 24376522 |
Michelle Daya1, Lize van der Merwe2, Ushma Galal3, Marlo Möller1, Muneeb Salie1, Emile R Chimusa4, Joshua M Galanter5, Paul D van Helden1, Brenna M Henn6, Chris R Gignoux5, Eileen Hoal1.
Abstract
Admixture is a well known confounder in genetic association studies. If genome-wide data is not available, as would be the case for candidate gene studies, ancestry informative markers (AIMs) are required in order to adjust for admixture. The predominant population group in the Western Cape, South Africa, is the admixed group known as the South African Coloured (SAC). A small set of AIMs that is optimized to distinguish between the five source populations of this population (African San, African non-San, European, South Asian, and East Asian) will enable researchers to cost-effectively reduce false-positive findings resulting from ignoring admixture in genetic association studies of the population. Using genome-wide data to find SNPs with large allele frequency differences between the source populations of the SAC, as quantified by Rosenberg et. al's In-statistic, we developed a panel of AIMs by experimenting with various selection strategies. Subsets of different sizes were evaluated by measuring the correlation between ancestry proportions estimated by each AIM subset with ancestry proportions estimated using genome-wide data. We show that a panel of 96 AIMs can be used to assess ancestry proportions and to adjust for the confounding effect of the complex five-way admixture that occurred in the South African Coloured population.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24376522 PMCID: PMC3869660 DOI: 10.1371/journal.pone.0082224
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Source population data.
| Source population | Group | Description | Source | Platform | Size |
| African San (san) | kho | ‡ Khomani San from Northern Cape, South Africa | Henn 2011 | Illumina 550K | 14 |
| bus | Juu San from South Namibia | Henn 2011 | Illumina 650K & 1M | 9 | |
| khs | Ju|'hoansi San from North Namibia | Private | Affymetrix 6.0 | 22 | |
| African non-San (afr) | brong | Ghana | Henn 2011 | Affymetrix 500K | 8 |
| kongo | Atlantic coast of Congo | Henn 2011 | Affymetrix 500K | 9 | |
| igbo | Southeastern Nigeria | Henn 2011 | Affymetrix 500K | 15 | |
| fang | Equatorial Guinea | Henn 2011 | Affymetrix 500K | 15 | |
| bulala | Central Chad | Henn 2011 | Affymetrix 500K | 15 | |
| mada | West Cameroon | Henn 2011 | Affymetrix 500K | 12 | |
| hausa | West Nigeria | Henn 2011 | Affymetrix 500K | 12 | |
| bamoun | West Cameroon | Henn 2011 | Affymetrix 500K | 18 | |
| European (eur) | CEU | Utah residents with Northern and Western European ancestry, USA | HapMap3 | Release 3 | 111 |
| TSI | Italians from Italy | HapMap3 | Release 3 | 102 | |
| South Asian (sas) | GIH | Gujarati Indians from Houston, Texas, USA | HapMap3 | Release 3 | 97 |
| East Asian (eas) | CHD | Chinese Metropolitan Denver, Colorado, USA | HapMap3 | Release 3 | 106 |
| JPT | Japanese from Tokyo, Japan | HapMap3 | Release 3 | 113 |
Data sets used to represent the five source populations of the South African Coloured population. The sample size reflects the group size after relative pairs have been removed. Henn et al. [52] merged the Juu San data from the Human Genome Diversity Project (HGDP) and Schuster et al. [53] and the African non-San data from Bryc et al [54].
Figure 1Admixture proportion correlation versus number of AIMs in set.
Correlation between admixture proportions estimated using AIMs and proportions estimated using genome-wide data, using AIM sets of increasing size (increments of 25) for the Cape Town study group (n = 733). A proportion of the SNPs in each set of AIMs were selected using the multiple -statistic, indicated in each panel as a percentage, while the remaining SNPs were selected using the pairwise -statistic, as described in the Methods section.
Correlation and RSME of 96 and 120 AIMs.
| 96 panel | 120 panel | |||
| Ancestry | Correlation | RSME | Correlation | RSME |
| African San | 0.7565 | 0.0684 | 0.7905 | 0.0621 |
| African non-San | 0.7930 | 0.0774 | 0.8160 | 0.0719 |
| European | 0.8019 | 0.0554 | 0.8150 | 0.0535 |
| South Asian | 0.4808 | 0.0658 | 0.5283 | 0.0625 |
| East Asian | 0.5665 | 0.0560 | 0.5822 | 0.0522 |
Correlation and RSME between ancestry proportions estimated using the 96 and 120 AIM panels respectively and proportions estimated using genome-wide data, for the Cape Town study group (n = 733).
Figure 2Barplots of ancestry proportions estimated using genome-wide data and using AIMs.
In the first panel ancestry proportions were estimated using genome-wide data. The admixed study group (sac) is ordered by proportions of African San, African non-San, European, South Asian and East Asian ancestry. In the second panel ancestry proportions were estimated using 96 AIMs. Individuals appear in the same order as in the first panel.
Figure 3Boxplots of ancestry proportions of the Cape Town study group.
Boxplots of ancestry proportions estimated using genome-wide data and proportions estimated using the panel of 96 AIMs are shown in this figure per source population, for the Cape Town study group (n = 733).
Correlation for different admixed study groups.
| Study group | Number AIMs | African San | African non-San | European | South Asian | East Asian |
| Colesberg (n = 20) | 84 | 0.7661 | 0.8437 | 0.8996 | 0.4675 | 0.4731 |
| Karretjie (n = 20) | 84 | 0.8436 | 0.7007 | 0.7724 | 0.5590 | 0.1815 |
| Wellington (n = 20) | 84 | 0.7252 | 0.7102 | 0.8008 | 0.6783 | 0.3311 |
| Upington (n = 21) | 76 | 0.8747 | 0.6304 | 0.8739 | 0.3777 | 0.3426 |
Correlation between ancestry proportions estimated using AIMs and proportions estimated using genome-wide data, for small admixed study groups from different geographic locations. The number of AIMs reflects the number of markers in the 120 panel that were found in the genome-wide data sets of the study groups.
Ancestry proportion distribution.
| Study group | Data set | African San | African non-San | European | South Asian | East Asian |
| Cape Town | Chip | 0.31 (0.23–0.39) | 0.26 (0.18–0.37) | 0.18 (0.10–0.26) | 0.12 (0.08–0.17) | 0.07 (0.04–0.10) |
| (n = 733) | 96 AIMs | 0.31 (0.21–0.40) | 0.26 (0.16–0.40) | 0.17 (0.09–0.27) | 0.10 (0.02–0.18) | 0.09 (0.03–0.16) |
| 120 AIMs | 0.31 (0.22–0.40) | 0.27 (0.16–0.39) | 0.17 (0.09–0.27) | 0.11 (0.03–0.19) | 0.08 (0.03–0.15) | |
| Colesberg | Chip | 0.33 (0.25–0.40) | 0.29 (0.21–0.40) | 0.18 (0.10–0.29) | 0.05 (0.03–0.09) | 0.05 (0.02–0.07) |
| (n = 20) | 84 AIMs | 0.31 (0.24–0.35) | 0.27 (0.18–0.46) | 0.17 (0.03–0.29) | 0.07 (0.03–0.19) | 0.01 (0.00–0.05) |
| Karretjie | Chip | 0.69 (0.57–0.77) | 0.20 (0.15–0.23) | 0.08 (0.04–0.12) | 0.03 (0.01–0.04) | 0.02 (0.01–0.04) |
| (n = 20) | 84 AIMs | 0.66 (0.59–0.74) | 0.17 (0.08–0.27) | 0.04 (0.01–0.16) | 0.03 (0.00–0.06) | 0.00 (0.00–0.02) |
| Wellington | Chip | 0.13 (0.12–0.15) | 0.21 (0.19–0.23) | 0.29 (0.24–0.31) | 0.17 (0.12–0.23) | 0.17 (0.15–0.18) |
| (n = 20) | 84 AIMs | 0.14 (0.04–0.25) | 0.22 (0.14–0.33) | 0.28 (0.19–0.37) | 0.10 (0.03–0.16) | 0.19 (0.11–0.26) |
| Upington | Chip | 0.61 (0.47–0.72) | 0.11 (0.08–0.17) | 0.13 (0.10–0.23) | 0.04 (0.01–0.09) | 0.02 (0.01–0.06) |
| (n = 21) | 76 AIMs | 0.62 (0.43–0.67) | 0.08 (0.02–0.17) | 0.18 (0.07–0.26) | 0.02 (0.00–0.07) | 0.00 (0.00–0.07) |
Median and IQR of the ancestry proportions estimated using genome-wide data and AIMs, per admixed study group.