| Literature DB >> 26377959 |
Jan Graffelman1, S Nelson2, S M Gogarten2, B S Weir3.
Abstract
This paper addresses the issue of exact-test based statistical inference for Hardy-Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy-Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ(2) statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy-Weinberg disequilibrium. Depending on the imputation method used, 6-13% of the test results changed qualitatively at the 5% level.Entities:
Keywords: Hardy−Weinberg equilibrium; exact test; imputation; missing data
Mesh:
Year: 2015 PMID: 26377959 PMCID: PMC4632056 DOI: 10.1534/g3.115.022111
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Left panel: ternary plot for 677 single-nucleotide polymorphisms (SNPs) with >5% missing. A total of 229 SNPs (34%) are significant in a χ2 test. Right panel: 677 SNPs without missings taken at random. A total of 56 (8%) SNPs are significant in a χ2 test. Significant markers are red and nonsignificant markers are green (α = 0.05).
Figure 2Plots of genotype calls for four single-nucleotide polymorphisms (A: rs818284; B: rs13022866; C: rs3766263; D: rs2714888) with >5% missings in the GENEVA project on prematurity.
Hardy-Weinberg equilibrium statistics for 4 SNPs with more than 5% missing values
| Panel | RS | AA | AB | BB | NMV | Exact | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | rs818284 | 1593 | 138 | 67 | 141 | 0.451 | 0.458 | 0.451 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| B | rs13022866 | 788 | 781 | 237 | 133 | 0.046 | 0.012 | 0.015 | 0.046 | 0.596 | 0.571 | 0.525 | 0.526 |
| C | rs3766263 | 533 | 865 | 277 | 264 | −0.058 | 0.014 | 0.012 | 0.020 | 0.549 | 0.539 | 0.607 | 0.601 |
| D | rs2714888 | 1092 | 499 | 69 | 279 | 0.031 | 0.061 | 0.056 | 0.192 | 0.007 | 0.007 | 0.014 | 0.015 |
RS number, genotype counts (AA,AB,BB), NMV, inbreeding coefficient under discarding , inbreeding coefficients obtained by single and multiple imputation (, ), two-sided exact p-value under discarding, two-sided exact p-value using SI (χ2 based and exact), and two-sided exact p-value using MI (χ2 based and exact). NMV, number of missing values, SNP, single-nucleotide polymorphism.
Figure 3Relationships between inbreeding coefficients. (A) Estimates obtained by discarding against estimates obtained by multiple imputation using two flanking markers (MI). (B) Estimates obtained by discarding against estimates obtained by single imputation (SI). (C) SI estimates against MI estimates. Plotting symbols and colors indicate the significance of the markers in two Hardy−Weinberg equilibrium tests (α = 0.05). Red diamonds: both tests significant, green circles: both tests nonsignificant, upward blue triangles: significant in the test on the x-axis, nonsignificant for the test on the y-axis. Downward orange triangles: nonsignificant on the x-axis, significant on the y-axis.
Figure 4Relationships between exact p-values of tests for Hardy−Weinberg equilibrium (HWE) for heterozygote deficiency, excess or either (two-sided) obtained by discarding missings and imputing missings using single imputation and multiple imputation using information from two flanking markers. Plotting symbols and colors indicate the significance of the markers in two HWE tests (α = 0.05). Red diamonds: both tests significant, green circles: both tests non-significant, upward blue triangles: significant for the test on the x-axis, nonsignificant for the test on the y-axis. Downward orange triangles: non-significant on the x-axis, significant on the y-axis.
Figure 5Q-Q plots of p-values for tests for Hardy−Weinberg equilibrium obtained by (A) discarding missings, (B) single imputation of missing values and (C) multiple imputation. (D) shows a reference Q-Q plot of the p-values for a dataset of 677 simulated SNPs with the same sample size and allele frequency distribution as the observed data.