| Literature DB >> 26890800 |
Chong Wu1, Ellen W Demerath2, James S Pankow2, Jan Bressler3, Myriam Fornage3, Megan L Grove3, Wei Chen4,5, Weihua Guan1.
Abstract
DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed.Entities:
Keywords: DNA methylation; Illumina 450K; epigenome-wide association; missing data; phenotype imputation
Mesh:
Year: 2016 PMID: 26890800 PMCID: PMC4846117 DOI: 10.1080/15592294.2016.1145328
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528
Figure 1.Power under different effect size for simulation model 1. Comparisons between resulting power using the projection-based method, MI-Norm, MI-PMM, full data (assuming that we have all data without any missing), and complete-case analysis (excluding subjects with missing values). (A) Missing rate = 0.3. (B) Missing rate = 0.9.
Methods comparison for simulation model 1, missing rate is 90%, imputing time is 30 for MI method.
| Projection-based | MI-Norm | MI-PMM | Complete | |||||
|---|---|---|---|---|---|---|---|---|
| Est | Cov | Est | Cov | Est | Cov | Est | Cov | |
| 0 | 0.0000 | 0.953 | 0.0002 | 0.963 | 0.0007 | 0.960 | 0.0000 | 0.951 |
| 0.005 | 0.0042 | 0.940 | 0.0051 | 0.966 | 0.0056 | 0.962 | 0.0050 | 0.951 |
| 0.01 | 0.0085 | 0.912 | 0.0100 | 0.965 | 0.0104 | 0.965 | 0.0100 | 0.951 |
| 0.012 | 0.0102 | 0.897 | 0.0119 | 0.962 | 0.0124 | 0.962 | 0.0120 | 0.951 |
| 0.014 | 0.0119 | 0.875 | 0.0139 | 0.960 | 0.0144 | 0.959 | 0.0140 | 0.951 |
| 0.016 | 0.0136 | 0.851 | 0.0158 | 0.959 | 0.0163 | 0.959 | 0.0160 | 0.951 |
| 0.018 | 0.0153 | 0.826 | 0.0177 | 0.961 | 0.0182 | 0.961 | 0.0180 | 0.951 |
| 0.02 | 0.0170 | 0.799 | 0.0197 | 0.960 | 0.0202 | 0.962 | 0.0200 | 0.951 |
| 0.022 | 0.0187 | 0.776 | 0.0217 | 0.960 | 0.0221 | 0.962 | 0.0220 | 0.951 |
Est = Regression coefficient estimate while adjusting the confounders.
Cov= 95% coverage rate.
The effect size is the same for all CpG sites evaluated under the simulation model.
Figure 2.Boxplots for estimated effect size for simulation model 1. Comparisons between resulting estimated effect size using the projection-based method, MI-Norm, MI-PMM, full data (assuming that we have all data without any missing), and complete-case analysis (excluding subjects with missing values). (A) Missing rate = 0.3. (B) Missing rate = 0.9.
Figure 3.EWAS of smoking status: MI-PMM vs. the projection-based method. The figure is based on 1,640 participants after imputing the missing cell type composite using MI-PMM or projection-based method. (A) Comparison of (P-value) from MI-PMM and from the projection-based method (P-values truncated at ). (B) Comparison of coefficient estimates from MI-PMM and from the projection-based method.
Figure 4.EWAS of smoking status with imputed WBC. Analysis is-based on MI-PMM. The data excluding observations with missing WBC contains 1,640 participants, while imputing missing WBC leads to 1,932 participants. (A) Comparison of coefficient estimate with and without imputing missing WBC values. (B) Comparison of –log10(P-value) with and without imputing missing WBC values (P-values truncated at 10−30).
Figure 5.Simulation models 1-3. X: covariate of interest; Y: DNA methylation level; Z: Covariate(s) that contain missing values.
Methods comparison for simulation model 1, missing rate is 30%, imputing time is 10 for MI method.
| Projection-based | MI-Norm | MI-PMM | Complete | |||||
|---|---|---|---|---|---|---|---|---|
| Est | Cov | Est | Cov | Est | Cov | Est | Cov | |
| 0 | 0.0000 | 0.951 | −0.0001 | 0.964 | 0.0001 | 0.963 | 0.0000 | 0.951 |
| 0.005 | 0.0048 | 0.951 | 0.0049 | 0.965 | 0.0050 | 0.964 | 0.0050 | 0.951 |
| 0.01 | 0.0095 | 0.946 | 0.0099 | 0.964 | 0.0100 | 0.965 | 0.0100 | 0.951 |
| 0.012 | 0.0114 | 0.943 | 0.0119 | 0.963 | 0.0120 | 0.965 | 0.0120 | 0.951 |
| 0.014 | 0.0134 | 0.941 | 0.0139 | 0.963 | 0.0140 | 0.966 | 0.0140 | 0.951 |
| 0.016 | 0.0153 | 0.939 | 0.0159 | 0.963 | 0.0160 | 0.966 | 0.0160 | 0.951 |
| 0.018 | 0.0172 | 0.937 | 0.0179 | 0.963 | 0.0180 | 0.966 | 0.0180 | 0.951 |
| 0.02 | 0.0191 | 0.935 | 0.0199 | 0.963 | 0.0199 | 0.964 | 0.0200 | 0.951 |
| 0.022 | 0.0210 | 0.932 | 0.0219 | 0.963 | 0.0220 | 0.964 | 0.0220 | 0.951 |
Est = Regression coefficient estimate while adjusting the confounders.
Cov= 95% coverage rate.
The effect size is the same for all CpG sites evaluated under the simulation model.