| Literature DB >> 23452721 |
Wenan Chen1, Guimin Gao, Srilaxmi Nerella, Christina M Hultman, Patrik K E Magnusson, Patrick F Sullivan, Karolina A Aberg, Edwin J C G van den Oord.
Abstract
BACKGROUND: In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. RESULT: We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders.Entities:
Mesh:
Year: 2013 PMID: 23452721 PMCID: PMC3599654 DOI: 10.1186/1471-2105-14-74
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The comparison of the genomic control inflation factor of association tests with and without top PCs
| without PCs† | 2.962 | 3.199 | 2.979 | 2.987 | 2.873 | 2.949 |
| with PCs‡ | 1.006 | 1.008 | 1.009 | 1.004 | 1.009 | 0.996 |
*The number of continuous confounding factors (followed with c) and the number of dichotomous confounding factors (followed with d) in the simulated data set. †The association test without incorporating the top PCs as covariates. ‡The association test incorporating the top PCs as covariates.
Figure 1Scree test on a simulated data set with the top 10 eigenvalues.
Figure 2QQ plot and the lambda values of the association tests before PCA and after PCA on a simulated data set. Figure 2 (a) shows the QQ plot and lambda without including PCs in the association test. Figure 2 (b) shows the QQ plot and lambda including PCs in the association test.
Figure 3Scree test from the PCA on the on the MBD-seq data set.
Figure 4QQ plot and lambda of the methylome-wide association tests before PCA and after PCA on the schizophrenia data set. (a) shows the QQ plot and lambda without including PCs in the association test. (b) shows the QQ plot and lambda including PCs in the association test.
Figure 5Manhattan plot for MWAS results on the on the schizophrenia data set. The red line is q value < 0.01 and blue q values < 0.1.