| Literature DB >> 24662750 |
Naveen K Kadri1, Bernt Guldbrandtsen1, Peter Sørensen1, Goutam Sahana1.
Abstract
Population structure is known to cause false-positive detection in association studies. We compared the power, precision, and type-I error rates of various association models in analyses of a simulated dataset with structure at the population (admixture from two populations; P) and family (K) levels. We also compared type-I error rates among models in analyses of publicly available human and dog datasets. The models corrected for none, one, or both structure levels. Correction for K was performed with linear mixed models incorporating familial relationships estimated from pedigrees or genetic markers. Linear models that ignored K were also tested. Correction for P was performed using principal component or structured association analysis. In analyses of simulated and real data, linear mixed models that corrected for K were able to control for type-I error, regardless of whether they also corrected for P. In contrast, correction for P alone in linear models was insufficient. The power and precision of linear mixed models with and without correction for P were similar. Furthermore, power, precision, and type-I error rate were comparable in linear mixed models incorporating pedigree and genomic relationships. In summary, in association studies using samples with both P and K, ancestries estimated using principal components or structured assignment were not sufficient to correct type-I errors. In such cases type-I errors may be controlled by use of linear mixed models with relationships derived from either pedigree or from genetic markers.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24662750 PMCID: PMC3963841 DOI: 10.1371/journal.pone.0088926
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic representation of the simulation.
Figure 2Absolute differences in allele frequencies of single-nucleotide polymorphism (SNP) markers between populations (100 replicates).
Figure 3Differences in mean phenotypes of two populations after separation for 30 generations (100 replicates).
Figure 4Distribution of individuals' ancestries after admixture (100 replicates).
Figure 5Quantile-quantile plot of −log10 p-values for association tests using different models in the simulated dataset.
Average number of significant single-nucleotide polymorphisms (SNPs; 100 replicates) in five chromosomes (10,000 SNPs) without simulated quantitative trait loci.
| Model | Significance level | ||||
| Correction for | 0.05 | 0.005 | 0.0005 | 0.000005 | |
| LMMped |
| 514.73* (7.83) | 53* (1.7) | 5.5 (0.4) | 0.02 (0.014) |
| LMMstr |
| 511.88 (7.9) | 51.97 (1.6) | 5.64* (0.4) | 0.02 (0.014) |
| LMMpca |
| 520.89** (7.6) | 53.92** (1.6) | 5.88** (0.4) | 0.02 (0.014) |
| LMMgmat |
| 472.15 (5.7) | 44.92 (1.2) | 4.45 (0.3) | 0.01 (0.01) |
| LM | - | 1014.51** (18.9) | 190.2** (6.8) | 36.93** (2.0) | 1.43** (0.17) |
| LMstr |
| 998.22** (17.8) | 184.33** (6.4) | 34.78** (1.8) | 1.34** (0.16) |
| LMpca |
| 968.01** (18.7) | 175.27** (6.5) | 32.01** (1.7) | 1.17** (0.16) |
| Expected false-positive associations | 500 | 50 | 5 | 0.05 | |
Standard errors are given in parentheses. The average number of significant SNPs (Sobs) in 100 replicates was compared with the expected number (Sexp) at different significance levels using t-tests. (H0: Sobs = Sexp; H1: Sobs>Sexp; *p<0.05, **p<0.01). Significance level of 0.000005 corresponds to a nominal significance level of 0.05 after Bonferroni correction for 10000 tests.
LMMped = Linear Mixed Model Including Pedigree-Based Relationship, LMMstr = Linear Mixed Model with STRUCTURE, LMMpca = Principal Component Analysis in a Linear Mixed Model, LMMgmat = Linear Mixed Model Including Genomic Relationship, LM = Linear Model, LMstr = Linear Model with STRUCTURE, LMpca = Principal Component Analysis in a Linear Model, P = admixture, K = Familial relationships.
Figure 6The power [% of quantitative trait loci (QTLs) detected] and precision (absolute distance between simulated and detected QTL; gray bars) of the models in QTL localization.
Absolute error (cM) in quantitative trait loci localization.
| Small effect | Large effect | All | |
| LMMped | 0.62 | 0.38 | 0.40 |
| LMMstr | 0.66 | 0.39 | 0.41 |
| LMMpca | 0.66 | 0.38 | 0.40 |
| LMMgmat | 0.66 | 0.37 | 0.38 |
Precision is given as the absolute genetic distance between simulated and detected quantitative trait loci (±1 cM).
LMMped = Linear Mixed Model Including Pedigree-Based Relationship, LMMpca = Principal Component Analysis in a Linear Mixed Model, LMMstr = Linear Mixed Model with STRUCTURE, LMMgmat = Linear Mixed Model Including Genomic Relationship.
Figure 7Quantile-quantile plot of −log10 p-values for association tests of the human GOLDN dataset using different models.
Figure 8Quantile-quantile plot of −log10 p-values for association tests of the dog hip dysplasia dataset using different models.