| Literature DB >> 19630973 |
Rami Nassir1, Roman Kosoy, Chao Tian, Phoebe A White, Lesley M Butler, Gabriel Silva, Rick Kittles, Marta E Alarcon-Riquelme, Peter K Gregersen, John W Belmont, Francisco M De La Vega, Michael F Seldin.
Abstract
BACKGROUND: Case-control genetic studies of complex human diseases can be confounded by population stratification. This issue can be addressed using panels of ancestry informative markers (AIMs) that can provide substantial population substructure information. Previously, we described a panel of 128 SNP AIMs that were designed as a tool for ascertaining the origins of subjects from Europe, Sub-Saharan Africa, Americas, and East Asia.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19630973 PMCID: PMC2728728 DOI: 10.1186/1471-2156-10-39
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Paired Fst values using 93 AIMs and random sets of 3500 SNPsa
| 0.040 | 0.012 | 0.260 | 0.311 | 0.310 | 0.246 | 0.223 | 0.191 | 0.461 | 0.470 | 0.176 | 0.142 | 0.221 | 0.074 | 0.147 | 0.186 | ||
| 0.029 | 0.040 | 0.198 | 0.249 | 0.247 | 0.187 | 0.268 | 0.224 | 0.503 | 0.506 | 0.118 | 0.083 | 0.153 | 0.041 | 0.152 | 0.162 | ||
| 0.014 | 0.043 | 0.234 | 0.285 | 0.289 | 0.219 | 0.248 | 0.217 | 0.440 | 0.452 | 0.149 | 0.110 | 0.199 | 0.062 | 0.124 | 0.153 | ||
| 0.108 | 0.087 | 0.106 | 0.014 | 0.017 | 0.014 | 0.459 | 0.450 | 0.487 | 0.504 | 0.027 | 0.061 | 0.035 | 0.091 | 0.250 | 0.234 | ||
| 0.112 | 0.089 | 0.111 | 0.011 | 0.007 | 0.033 | 0.492 | 0.493 | 0.520 | 0.539 | 0.059 | 0.096 | 0.069 | 0.129 | 0.308 | 0.294 | ||
| 0.109 | 0.087 | 0.108 | 0.012 | 0.002 | 0.030 | 0.492 | 0.492 | 0.527 | 0.467 | 0.069 | 0.102 | 0.066 | 0.129 | 0.317 | 0.302 | ||
| 0.111 | 0.091 | 0.108 | 0.010 | 0.019 | 0.020 | 0.459 | 0.443 | 0.467 | 0.484 | 0.022 | 0.051 | 0.051 | 0.084 | 0.240 | 0.236 | ||
| 0.109 | 0.104 | 0.119 | 0.133 | 0.131 | 0.128 | 0.140 | 0.030 | 0.598 | 0.602 | 0.398 | 0.380 | 0.432 | 0.308 | 0.408 | 0.415 | ||
| 0.125 | 0.120 | 0.137 | 0.146 | 0.143 | 0.141 | 0.152 | 0.035 | 0.672 | 0.661 | 0.373 | 0.352 | 0.420 | 0.268 | 0.407 | 0.415 | ||
| 0.217 | 0.214 | 0.221 | 0.173 | 0.184 | 0.182 | 0.160 | 0.260 | 0.276 | 0.031 | 0.471 | 0.503 | 0.511 | 0.496 | 0.481 | 0.497 | ||
| 0.191 | 0.186 | 0.192 | 0.146 | 0.159 | 0.156 | 0.133 | 0.232 | 0.247 | 0.048 | 0.483 | 0.497 | 0.522 | 0.504 | 0.477 | 0.495 | ||
| 0.093 | 0.074 | 0.091 | 0.018 | 0.021 | 0.021 | 0.016 | 0.125 | 0.136 | 0.168 | 0.138 | 0.013 | 0.014 | 0.040 | 0.176 | 0.169 | ||
| 0.082 | 0.065 | 0.083 | 0.027 | 0.030 | 0.029 | 0.027 | 0.122 | 0.136 | 0.183 | 0.152 | 0.008 | 0.041 | 0.017 | 0.170 | 0.172 | ||
| 0.116 | 0.100 | 0.116 | 0.047 | 0.046 | 0.044 | 0.049 | 0.145 | 0.159 | 0.201 | 0.172 | 0.035 | 0.040 | 0.065 | 0.233 | 0.226 | ||
| 0.035 | 0.028 | 0.042 | 0.035 | 0.035 | 0.034 | 0.036 | 0.095 | 0.109 | 0.182 | 0.152 | 0.021 | 0.017 | 0.049 | 0.141 | 0.158 | ||
| 0.139 | 0.149 | 0.140 | 0.153 | 0.157 | 0.155 | 0.156 | 0.208 | 0.232 | 0.259 | 0.232 | 0.146 | 0.146 | 0.169 | 0.130 | 0.088 | ||
| 0.170 | 0.177 | 0.175 | 0.169 | 0.173 | 0.170 | 0.172 | 0.228 | 0.258 | 0.271 | 0.245 | 0.164 | 0.168 | 0.186 | 0.155 | 0.105 |
a. The Paired Fst value were determined using the Weir and Cockerham algorithm [9]. Above the diagonal of identity the Fst values were determined using the 93 SNP AIMs. Below the diagonal is the mean determined from three nonoverlapping sets of 3500 SNPs.
b. Population group abreviations included Chinese from Beijing (CHB) from HapMap data, Yakut (YAK), Filipino (FIL), Ashkenazi American (AJA), Swedish (SWED), Maya (MAYA), Palestinian (PAL), Columbian (COL), Mbuti Pygmy (PYG), YRI (Yorubon, HapMap data), Balochi (BAL), Burusho (BUR), Kalash (KAL), Uygur (UYG), Melanesian (MEL) and Papuan (PAP).
Figure 1Probability estimations for the number of cluster groups (K) using STRUCTURE. The ordinate show the Ln probability corresponding to the number of cluster (K). STRUCTURE analyses were performed using the F model (admixture) as described in Methods using the 93 SNP AIM set. The Ln probability closest to zero corresponds to the most likely number of clusters or population groups that explain the population structure.
Figure 2Analysis of population genetic structure using 93 SNP AIMs. Each horizontal line represents an individual subject. Each self identified population group is shown along the ordinate. Analyses were performed using STRUCTURE without any prior population assignment (see Methods). The number of cluster groups is shown for each panel. The color code corresponds to individual cluster groups that were named according to the continental group with the largest membership in that group.
Figure 3Correlations between population structure results. Results of STRUCTURE analyses using the 93 SNP AIM Set and 3500 random SNPs are shown. The fraction of membership for each individual analyzed is indicated for each of six clusters (panels A-F) for the 93 SNP AIM set (ordinate) and a 3500 SNP set (abscissa). The population clusters named according to the continental group with the largest membership in that group. The subjects included each of the HGDP, HapMap and the Maya (Kachiquel) individuals (see methods). The results show a single 3500 random SNP set is shown. However each of three independent 3500 random SNP sets show very similar results. For panels D and F, the South Asian ethnic groups (open circles) and Adygei ethnic group (grey circles) are shown with different symbols to highlight the differences between the 93 SNP AIM set and 3500 random SNPs.
Ascertainment of Continental Ancestry Using 93 SNP AIM Panel
| Ashkenazi (40) | 97.5% | 0.0% | 0.0% | 0.0% | 0.0% | 95.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Palestinian (26) | 73.1% | 0.0% | 0.0% | 0.0% | 0.0% | 80.8% | 0.0% | 0.0% | 0.0% | 0.0% |
| Bedouin (47) | 78.7% | 0.0% | 0.0% | 0.0% | 0.0% | 80.9% | 0.0% | 0.0% | 0.0% | 0.0% |
| Russian (13) | 92.3% | 0.0% | 0.0% | 0.0% | 0.0% | 84.6% | 0.0% | 0.0% | 0.0% | 0.0% |
| CEU (48) | 93.8% | 0.0% | 0.0% | 0.0% | 0.0% | 93.8% | 0.0% | 0.0% | 0.0% | 0.0% |
| EURA (399) | 94.2% | 0.0% | 0.0% | 0.0% | 0.0% | 88.2% | 0.0% | 0.0% | 0.0% | 0.0% |
| OEUR (70)c | 90.0% | 0.0% | 0.0% | 0.0% | 0.0% | 78.6% | 0.0% | 0.0% | 0.0% | 0.0% |
| CHB (43) | 0.0% | 97.7% | 0.0% | 0.0% | 0.0% | 0.0% | 88.4% | 0.0% | 0.0% | 0.0% |
| JPT (43) | 0.0% | 93.0% | 0.0% | 0.0% | 0.0% | 0.0% | 88.4% | 0.0% | 0.0% | 0.0% |
| Chinese American (44) | 0.0% | 84.1% | 0.0% | 0.0% | 0.0% | 0.0% | 84.1% | 0.0% | 0.0% | 0.0% |
| Yakut (15) | 0.0% | 20.0% | 0.0% | 0.0% | 0.0% | 0.0% | 66.7% | 0.0% | 0.0% | 0.0% |
| Mongolian (9) | 0.0% | 88.9% | 0.0% | 0.0% | 0.0% | 0.0% | 77.8% | 0.0% | 0.0% | 0.0% |
| Filipino American (42) | 0.0% | 73.8% | 0.0% | 0.0% | 0.0% | 0.0% | 88.1% | 0.0% | 0.0% | 0.0% |
| OEAS (54) | 0.0% | 64.8% | 0.0% | 0.0% | 0.0% | 0.0% | 85.2% | 0.0% | 0.0% | 0.0% |
| YRI (56) | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 87.5% | 0.0% | 0.0% |
| Nilo-Saharan (23) | 0.0% | 0.0% | 95.7% | 0.0% | 0.0% | 0.0% | 0.0% | 91.3% | 0.0% | 0.0% |
| Mbuti (15) | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 80.0% | 0.0% | 0.0% |
| Biaki (32) | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% |
| Mandeka (24) | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% | 0.0% | 0.0% | 100.0% | 0.0% | 0.0% |
| OSSAFR (71) | 0.0% | 0.0% | 98.6% | 0.0% | 0.0% | 0.0% | 0.0% | 94.4% | 0.0% | 0.0% |
| Indian (64) | 1.6% | 0.0% | 0.0% | 0.0% | 0.0% | 9.4% | 0.0% | 0.0% | 0.0% | 0.0% |
| Burusho (7) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 14.3% | 0.0% | 0.0% | 0.0% | 0.0% |
| Balochi (15) | 13.3% | 0.0% | 0.0% | 0.0% | 0.0% | 20.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Kalash (18) | 27.8% | 0.0% | 0.0% | 0.0% | 0.0% | 50.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Maya-Kachiquel (50)c | 0.0% | 0.0% | 0.0% | 0.0% | 98.0% | 0.0% | 0.0% | 0.0% | 0.0% | 98.0% |
| Maya HGDP (13) | 0.0% | 0.0% | 0.0% | 0.0% | 46.2% | 0.0% | 0.0% | 0.0% | 0.0% | 46.2% |
| Quechuan (26) | 0.0% | 0.0% | 0.0% | 0.0% | 76.9% | 0.0% | 0.0% | 0.0% | 0.0% | 88.5% |
| OAMI (47) | 0.0% | 0.0% | 0.0% | 0.0% | 91.5% | 0.0% | 0.0% | 0.0% | 0.0% | 66.0% |
| Mozabite (30) | 3.3% | 0.0% | 0.0% | 0.0% | 0.0% | 3.3% | 0.0% | 0.0% | 0.0% | 0.0% |
| Mexican (66) | 1.5% | 0.0% | 0.0% | 0.0% | 4.5% | 1.5% | 0.0% | 0.0% | 0.0% | 13.6% |
| African American (100) | 1.0% | 0.0% | 23.0% | 0.0% | 0.0% | 1.0% | 0.0% | 11.0% | 0.0% | 0.0% |
| Uygur (10) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Puerto Rican (28) | 3.6% | 0.0% | 0.0% | 0.0% | 0.0% | 7.1% | 0.0% | 0.0% | 0.0% | 0.0% |
a. For STRUCTURE, the criterion was >0.85 membership in a particular cluster using K = 6. For PCA the criterion was mean +/- 2 SD for each of the first four PCs where the mean was determined based on the self-identified ethnic group.
b. The continental group is shown in bold and selected individual ethnic groups presented below each continental heading. For presentation purposes many of the individual ethnic groups were placed together: other European (OEUR), other East Asian (OEAS), other sub-Saharan African (OSSAFR) and other Amerindian (OAMI).
c. The Maya-Kachiquel were Maya from the Kachiquel language group as previously described[7] and is from a collection distinct from the HGDP Maya group.
Figure 4Eigenvalue distribution for principal components. The eigenvalues for each PC are shown. Comparing the eigenvalue of each PC shows the relative amount of variation that is explained by the different PCs. The plateau in eigenvalues generally corresponds to variation that can not be attributable to discernable groupings of subjects.
Figure 5Principal component analysis of diverse population groups. The analysis used the same data set indicated in Figure 2. The population groups are shown by the color coded symbols. The sub-Saharan African groups are designated S-S African in this figure. A, shows the PC1 and PC2 results from the different ethnic groups excluding the admixed populations. C, shows the PC1 and PC2 results for the African American and Mexican American subjects that were run together with the individual subjects shown in A. B, and D show the results for the same subject groups for PC3 and PC4.
Figure 6Principal component analysis of sub-Saharan African, European and East Asian populations. The analysis was performed using only the individual continental groups. The populations included each of the sub-Saharan African (A), European (B), and East Asian (C) ethnic groups. The color code highlights specific ethnic groups.