| Literature DB >> 18852890 |
Feng Zhang1, Yuping Wang, Hong-Wen Deng.
Abstract
Population stratification can cause spurious associations in population-based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population-based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population-based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies.Entities:
Mesh:
Year: 2008 PMID: 18852890 PMCID: PMC2562035 DOI: 10.1371/journal.pone.0003392
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Parameter configurations in the simulation studies.
| Stratification levels | Sample sizes | FDSA | Numbers of AIMs |
| 0.30−0.30, | 400 | 0.10±0.02 |
|
| 0.35−0.25 | 800 |
| 80 |
| 0.40−0.20 |
| 0.30±0.02 | 120 |
|
| 2000 | 0.40±0.02 | 200 |
Note: a denote the proportions of YRI individuals in cases-controls, respectively.
denote the numbers of total samples comprising of equivalent cases and controls.
denote the frequencies of disease susceptible allele.
The basic parameter configuration is highlighted in bold. Each possible parameter setting can be obtained by replacing one entry of the basic parameter configuration with a different entry of corresponding parameter.
Figure 1Performance of the four analytical methods in stratified populations with stratification levels varying from 0.3−0.3 to 0.5−0.1 (sample size = 1200, frequency of disease susceptible allele = 0.20±0.02 and number of AIMs = 40).
Figure 2Performance of the four analytical methods in stratified populations with sample sizes varying from 400 to 2000 (stratification level = 0.5−0.1, frequency of disease susceptible allele = 0.20±0.02 and number of AIMs = 40).
Figure 3Performance of the four analytical methods in stratified populations with frequencies of disease susceptible allele varying from 0.10±0.02 to 0.40±0.02 (stratification level = 0.5−0.1, sample size = 1200 and number of AIMs = 40).
Figure 4Performance of SA with numbers of AIMs varying from 40 to 200 (stratification level = 0.5−0.1, sample size = 1200 and frequency of disease susceptible allele = 0.20±0.02).
Average corrector factor λ estimated by GC in populations with various stratification levels.
| Stratification levels | λ | |
| Power | Type I error rates | |
| 0.30−0.30 | 1.10 | 1.04 |
| 0.35−0.25 | 2.98 | 2.85 |
| 0.40−0.20 | 7.56 | 7.51 |
| 0.50−0.10 | 11.93 | 11.91 |
Note: a denote the proportions of YRI individuals in cases-controls, respectively.
were calculated from power comparison results.
were calculated form type I error rate comparison results.