| Literature DB >> 26683022 |
Xuan Li1, Weiliang Qiu2, Jarrett Morrow2, Dawn L DeMeo2, Scott T Weiss2, Yuejiao Fu1, Xiaogang Wang1.
Abstract
Variable DNA methylation has been associated with cancers and complex diseases. Researchers have identified many DNA methylation markers that have different mean methylation levels between diseased subjects and normal subjects. Recently, researchers found that DNA methylation markers with different variabilities between subject groups could also have biological meaning. In this article, we aimed to help researchers choose the right test of equal variance in DNA methylation data analysis. We performed systematic simulation studies and a real data analysis to compare the performances of 7 equal-variance tests, including 2 tests recently proposed in the DNA methylation analysis literature. Our results showed that the Brown-Forsythe test and trimmed-mean-based Levene's test had good performance in testing for equality of variance in our simulation studies and real data analyses. Our results also showed that outlier profiles could be biologically very important.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26683022 PMCID: PMC4684215 DOI: 10.1371/journal.pone.0145295
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The distribution settings for the scenarios in Simulation Study I.
| chi squared distribution | t distribution | normal distribution | ||||
|---|---|---|---|---|---|---|
| Mean & variance | Non-D | D | Non-D | D | Non-D | D |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
eqM: equal-mean; eqV: equal-variance; diffM: different-mean; diffV: different-variance; D: diseased; Non-D: non-diseased;N(a,b): normal distribution with mean a and variance b; t : t-distribution with degrees of freedom c; t : non-central t-distribution with degrees of freedom d and non-centrality parameter e; : chi squared distribution with degrees of freedom f; : non-central chi squared distribution with degrees of freedom g and non-centrality parameter h.
Fig 1Plots of n versus m, where n is the number of scenarios where an equal-variance test rejected the null hypothesis H that the type I error rates is ≤ 0.05 and m is the median of the ranks of powers.
For ranks with ties, average ranks were used. The upper-left, upper-right, bottom-left panels are the plots where n and m were obtained based on scenarios with sample size 20, 50, or 200 subjects per group, respectively. The bottom-right panel is the plot where n and m were obtained based on all 48 scenarios.
Number of significant CpG sites (FDR-adjusted p-value < 0.05) in testing for equality of variance based on GSE37020, and the numbers and proportions of significant CpG sites validated via GSE20080 (unadjusted p-value < 0.05).
Total number of CpG sites is 22859.
| GSE37020 | GSE20080 | ||
|---|---|---|---|
| test | nCpG (p.adj<0.05) | nCpG.validated (pval<0.05) | Proportion |
| F | 2318 | 1154 | 49.8% |
| Bartlett | 2315 | 1164 | 50.3% |
| Levene | 235 | 183 | 77.9% |
| L.trim | 15 | 9 | 60.0% |
| BF | 7 | 3 | 42.9% |
| PO.AD | 130 | 91 | 70% |
| PO.SQ | 0 | 0 | - |
*: proportion = nCpG (p.adj<0.05) / nCpG.validated (pval<0.05).
Number/proportion of significant CpG sites that contain outliers detected in GSE37020, and the number/proportion of these CpG sites that also contains outliers detected via GSE20080.
| GSE37020 | validation | |
|---|---|---|
| test | nOutlier/pOutlier | nOutlier/pOutlier |
| F | 1503/64.8% | 495/32.9% |
| Bartlett | 1501/64.8% | 497/33.1% |
| Levene | 70/29.8% | 34/48.6% |
| Trim.mean | 2/13.3% | 0/0% |
| BF | 2/28.6% | 0/0% |
| PO.AD | 64/49.2% | 31/48.4% |
| PO.SQ | 0/- | 0/- |
*: Number/proportion of significant CpG sites containing outliers detected in GSE37020;
**: Number/proportion of significant CpG sites that contain outliers detected in both GSE37020 and GSE 20080.
Fig 2Paired parallel boxplots of DNA methylation level versus case-control status for the 4 unique top 1 CpG sites obtained by the 7 equal-variance tests based on GSE37020.
The red dots indicate subjects. 2a1, 2b1, 2c1 are for cg26363196 (F), cg00027083 (Levene), and cg06675478 (BF), respectively, based on GSE37020; 2a2 2b2, 2c2 are for cg26363196 (F), cg00027083 (Levene), and cg06675478 (BF), respectively, based on GSE20080.