| Literature DB >> 16451699 |
John S K Kauwe1, Sarah Bertelsen, Laura Jean Bierut, Gerald Dunn, Anthony L Hinrichs, Carol H Jin, Brian K Suarez.
Abstract
Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results.Entities:
Mesh:
Year: 2005 PMID: 16451699 PMCID: PMC1866696 DOI: 10.1186/1471-2156-6-S1-S84
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
A description of MAFs and rates of missing data in the 1,000 SNP sets
| Marker set | Mean MAF | SD | Range | Missing rate |
| Illumina Low Frequency | 0.26 | 0.05 | [0.08–0.33] | 0.35% |
| Illumina High Frequency | 0.48 | 0.01 | [0.47–0.5] | 0.19% |
| Affymetrix Low Frequency | 0.04 | 0.02 | [0–0.08] | 5.44% |
| Affymetrix High Frequency | 0.48 | 0.01 | [0.46–0.5] | 5.34% |
Probability of cluster membership based on the 328 COGA STRPs
| Average probability of membership in cluster | |||
| Self reported race | European American | African American | |
| European American | 257 | 0.98 | 0.02 |
| African American | 29 | 0.08 | 0.92 |
Figure 1Structure results of sets of SNPs compared to STRPs. Similarity is one minus the absolute value of the differences between the probability generated by each set of SNPs and the one generated by the STRPs in each individual averaged across the individuals genotyped in both sets.
Figure 2Information content of each SNP. Information content of each SNP is measured as the squared difference between the MAF in self-reported African American and European American individuals.
The 20 most informative SNPs, their information content (IC), and MAFs in the COGA GAW14 data
| SNP ID | IC | MAF |
| rs2341823 | 0.62 | 0.16 |
| rs719776 | 0.56 | 0.22 |
| rs3843777 | 0.56 | 0.19 |
| rs718387 | 0.55 | 0.15 |
| rs723632 | 0.55 | 0.18 |
| rs721684 | 0.54 | 0.13 |
| rs2078588 | 0.54 | 0.15 |
| rs1478785 | 0.53 | 0.33 |
| rs1369290 | 0.52 | 0.16 |
| rs1872861 | 0.52 | 0.13 |
| rs719191 | 0.51 | 0.27 |
| rs714857 | 0.51 | 0.14 |
| rs2021781 | 0.50 | 0.19 |
| rs526593 | 0.49 | 0.16 |
| rs1438405 | 0.48 | 0.08 |
| rs1371231 | 0.47 | 0.30 |
| rs1352405 | 0.47 | 0.12 |
| rs726391 | 0.47 | 0.39 |
| rs725472 | 0.46 | 0.27 |
| rs2351254 | 0.46 | 0.12 |