| Literature DB >> 16004724 |
Mark D Shriver1, Rui Mei, Esteban J Parra, Vibhor Sonpar, Indrani Halder, Sarah A Tishkoff, Theodore G Schurr, Sergev I Zhadanov, Ludmila P Osipova, Tom D Brutsaert, Jonathan Friedlaender, Lynn B Jorde, W Scott Watkins, Michael J Bamshad, Gerardo Gutierrez, Halina Loi, Hajime Matsuzaki, Rick A Kittles, George Argyropoulos, Jose R Fernandez, Joshua M Akey, Keith W Jones.
Abstract
Understanding the distribution of human genetic variation is an important foundation for research into the genetics of common diseases. Some of the alleles that modify common disease risk are themselves likely to be common and, thus, amenable to identification using gene-association methods. A problem with this approach is that the large sample sizes required for sufficient statistical power to detect alleles with moderate effect make gene-association studies susceptible to false-positive findings as the result of population stratification. Such type I errors can be eliminated by using either family-based association tests or methods that sufficiently adjust for population stratification. These methods require the availability of genetic markers that can detect and, thus, control for sources of genetic stratification among populations. In an effort to investigate population stratification and identify appropriate marker panels, we have analysed 11,555 single nucleotide polymorphisms in 203 individuals from 12 diverse human populations. Individuals in each population cluster to the exclusion of individuals from other populations using two clustering methods. Higher-order branching and clustering of the populations are consistent with the geographic origins of populations and with previously published genetic analyses. These data provide a valuable resource for the definition of marker panels to detect and control for population stratification in population-based gene identification studies. Using three US resident populations (European-American, African-American and Puerto Rican), we demonstrate how such studies can proceed, quantifying proportional ancestry levels and detecting significant admixture structure in each of these populations.Entities:
Mesh:
Year: 2005 PMID: 16004724 PMCID: PMC3525270 DOI: 10.1186/1479-7364-2-2-81
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Populations and summary statistics for autosomal single nucleotide polymorphism (SNP) loci
| Population | Location | Sample size | Heterozygositya | Monomorphic SNP loci | % HWE deviationsb |
|---|---|---|---|---|---|
| Mbuti | Ituri forest | 20 | 0.280 | 0.126 | 0.050 |
| Mende | Sierra Leone | 22 | 0.323 | 0.074 | 0.058 |
| Burunge | Tanzania | 20 | 0.341 | 0.049 | 0.062 |
| Spanish | Valencia | 20 | 0.346 | 0.057 | 0.063 |
| Indian | India | 22 | 0.356 | 0.042 | 0.050 |
| Upper caste | India | 11 | 0.357 | 0.070 | 0.047 |
| Lower caste | India | 11 | 0.352 | 0.077 | 0.040 |
| Nasioi | Melanesia | 19 | 0.280 | 0.181 | 0.031 |
| Altaian | Siberia | 20 | 0.350 | 0.048 | 0.046 |
| East Asian | USA | 20 | 0.327 | 0.096 | 0.045 |
| Chinese | USA | 10 | 0.327 | 0.135 | 0.023 |
| Japanese | USA | 10 | 0.324 | 0.146 | 0.022 |
| Nahua | Mexico | 20 | 0.295 | 0.156 | 0.069 |
| Quechua | Peru | 20 | 0.297 | 0.127 | 0.062 |
| Total sample | 203 | 0.377 | 0.000 | 0.536 |
a Average unbiased heterozygosity.
b Proportion of deviations from Hardy-Weinberg equilibrium (HWE) using α = 0.05 with standard χ2 test.
Figure 1Distribution of locus-specific F.
Figure 2Neighbour-joining tree of the 203 individuals included in this study, using an allele-sharing distance matrix. The genotype of the ancestral state (ROOT) is taken from those markers showing one common homozygous genotype for two chimpanzees and two gorillas. Individual population affiliations are indicated by the following abbreviations: MBti (Mbuti), Brng (Burunge), Sp (Spanish), Indl (Indian lower caste), Indu (Indian upper caste), Bgvl (Nasioi), Qech (Quechua), Nah (Nahua), Alt (Altaian), Ch (Chinese), Jp (Japanese).
Figure 3Principal components (PCs) plot of the 203 individuals based on the allele-sharing distance matrix. Individuals are the basis of analysis and have been labelled with symbols as indicated in the figure legend. The first three of four significant PCs (using the broken-stick method) axis are shown on this plot. (a) Space showing all individuals. (b) Enlarging the segment of the plot with the European and Asian populations.
Figure 4Bivariate plots for the six possible combinations of the four significant principal coordinates. Symbols used to indicate populations are consistent across figures (a) to (f) (see individual keys) and the components presented are indicated on the X and Y axes: (a) 1st and 2nd, (b) 1st and 3rd, (c) 1st and 4th, (d) 2nd and 3rd, (e) 2nd and 4th and (f) 3rd and 4th.
Figure 5Triangle plot of three populations, illustrating maximum likelihood estimates of individual ancestry. Puerto Ricans (n = 20) are shown as filled circles, African-Americans (n = 42) as grey triangles and European-Americans (n = 41) as open circles. Parental populations in this analysis are the average of the Nahua and Quechua as the indigenous American; Mende as the West African; and Spanish as the European.
Correlation coefficients for comparisons between principal components (PC) estimates from the even and odd chromosome marker sets
| Population | First PC | Second PC | Third PC | Fourth PC |
|---|---|---|---|---|
| African-American | 0.125 (NS) | 0.219 (NS) | ||
| European-American | 0.021 (NS) | 0.134 (NS) | 0.078 (NS) | |
| Puerto Rican | 0.417 (NS) | 0.412 (NS) | ||
| Mbuti | 0.098 (NS) | 0.004 (NS) | 0.014 (NS) | 0.41 (NS) |
| Mende | 0.170 (NS) | 0.236 (NS) | 0.353 (NS) | 0.302 (NS) |
| Burunge | 0.197 (NS) | 0.114 (NS) | 0.051 (NS) | 0.265 (NS) |
| Spanish | 0.203 (NS) | 0.065 (NS) | 0.014 (NS) | 0.157 (NS) |
| Indian, all | 0.183 (NS) | 0.483 (NS) | ||
| Indian, lower caste | 0.145 (NS) | 0.565 (NS) | 0.236 (NS) | 0.491 (NS) |
| Indian, upper caste | 0.500 (NS) | 0.342 (NS) | 0.191 (NS) | |
| Altaian | 0.136 (NS) | 0.108 (NS) | 0.448 (NS) | |
| East Asian | 0.158 (NS) | 0.403 (NS) | 0.047 (NS) | 0.043 (NS) |
| Chinese | 0.515 (NS) | 0.127 (NS) | 0.188 (NS) | 0.376 (NS) |
| Japanese | 0.503 (NS) | 0.539 (NS) | 0.049 (NS) | 0.24 (NS) |
| Quechua | 0.160 (NS) | 0.380 (NS) | 0.384 (NS) | 0.399 (NS) |
| Nahua | 0.047 (NS) | 0.215 (NS) | 0.097 (NS) | 0.165 (NS) |
| Nasioi | 0.015 (NS) | 0.283 (NS) |
Shown is Spearman's correlation coefficient and p value in parentheses. Significant corrections among even and odd chromosomal estimates are shown in bold.
Correlation coefficients for comparisons between biogeographical ancestry estimates from the even and odd chromosome marker sets
| Population (n)/ancestral C component | West African | European | Indigenous American |
|---|---|---|---|
| African American ( | 0.951 ( | 0.904 ( | 0.635 ( |
| European American ( | 0.766 ( | 0.750 ( | 0.395 ( |
| Puerto Rican ( | 0.881 ( | 0.924 ( | 0.810 ( |
Shown is Spearman's correlation coefficient and p value in parentheses.