| Literature DB >> 33926513 |
Jochen Kruppa1,2, Miriam Sieg3,4, Gesa Richter4,5, Anne Pohrt3,4.
Abstract
BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both.Entities:
Keywords: DNA methylation; Epigenome-wide association study (EWAS); Estimands; Multiple testing; Reproducible research
Mesh:
Year: 2021 PMID: 33926513 PMCID: PMC8086103 DOI: 10.1186/s13148-021-01083-9
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Table of used terms, their statistical meaning, and description
| Term | Description and usage |
|---|---|
| Beta-values | Describe the frequency of methylation at a given CpG site. Numeric values between 0 and 1. Biological interpretable. |
| Beta | Single Beta-value |
| Standardized Beta-values. The standardization must be read as “logit” transformation. Numeric values from | |
| M | Single M-value |
| Outcome | Dependent variable |
| Difference in Beta-values | |
| Difference in | |
| Coefficients of the regression model; |
Fig. 1Simulation of the effects estimation influenced by none or two confounder effects. On the y-axis, the percentage deviation from the predefined to estimated and on the x-axis the raw mean difference of the Beta-values between treatment groups. The first subplot shows the 0% confounder effect. The other two subplot the confounder effects of 10% and 20%. Simulated data with two treatment levels. The deviation is not symmetrical, because the confounder effects were always simulated in the same direction. 5000 simulations with each
Fig. 5Mustache plot of the theoretical relation of differences in M-values to differences in Beta-values. On the left side, the difference in M-values () is mapped to all possible corresponding differences in Beta-values (). A difference of , for example, can be mapped to a from 0.0009 to 0.6996
Fig. 2Simulation of the effects estimation with the intercept method and the influence of two confounder effects. On the y-axis the percentage deviation from the predefined to estimated and on the x-axis the raw mean difference of the Beta-values between treatment groups if we ignoring the confounder effects of 10% and 20%. Simulated data with two treatment levels (5000 simulations with each)
Overview and guidance on common and selected R packages used in DNA methylation pipelines as a starting point for making decisions based on the desired estimate. See Heiss et al. [4] for information on the differences between Illumina microarrays and bisulfite sequencing. See table 3 for information on M-values and Beta-values
| R function (Package) | Estimates come from | Used input |
|---|---|---|
| BioMethyl | ||
| minfi | ||
| ChAMP | ||
| RnBeads | ||
| metilene | Beta-values (BS-seq | |
| ComBat (sva | ||
| melon (wateRmelon | Beta-values | |
| BMIQ (wateRmelon | Beta-values | |
| SWAN (missMethyl | Beta-values | |
| CellDMC (EpiDISH | lm (stats) | Beta-values |
| champ.DMP (ChAMP | lmFit (limma | |
| dmpFinder (minfi | lmFit (limma | |
| calDEG (BioMethyl | t-test | |
| varFit (missMethyl | lmFit (limma | |
| DMLtest (DSS | Count values (BS-seq | |
| bumphunter (bumphunter | lmFit (limma | |
| champ.DMR (ChAMP | bumphunter (bumphunter | |
| dmrcate (DMRcate | lmFit (limma | |
| gometh (missMethyl | ||
| BSmooth (bsseq | Beta-values (BS-seq | |
Wang [59], Aryee [32], Tian [31], Müller [29], Jühling [36],
Johnson [52], Pidsley [44], Phipson [60], Zheng [35], Smyth [34],
Park [16] Irizarry [61], Peters [62], Hansen [63]
BS-seq: Supports (processed) bisulfite sequencing data. Packages might need “transformed count data”
See Assenov [30] for bisulfite sequencing and McEwen [33] for Illumina microarray data
Table of example for the transformation of Beta-Values to M-values and the differences, respectively. The Beta-value difference between the Placebo group and the Treatments group is constant at 10%. Due to the transformation, the M-values differ and the differences in M-values can not be mapped to the differences in Beta-values
| Grp | Grp | Regression formula | ||||
|---|---|---|---|---|---|---|
| Beta-value | M-value | Beta-value | M-value | on | ||
| 0.001 | 0.101 | 0.10 | 6.81 | |||
| 0.101 | 0.201 | 0.10 | 1.16 | |||
| 0.201 | 0.301 | 0.10 | 0.77 | |||
| 0.301 | 0.401 | 0.10 | 0.64 | |||
| 0.401 | 0.501 | 0.01 | 0.10 | 0.59 | ||
| 0.501 | 0.01 | 0.601 | 0.59 | 0.10 | 0.58 | 0.01 + 0.58 |
| 0.601 | 0.59 | 0.701 | 1.23 | 0.10 | 0.64 | 0.59 + 0.64 |
| 0.701 | 1.23 | 0.801 | 2.01 | 0.10 | 0.78 | 1.23 + 0.78 |
| 0.801 | 2.01 | 0.901 | 3.19 | 0.10 | 1.18 | 2.01 + 1.18 |
| 0.901 | 3.19 | 0.999 | 9.96 | 0.10 | 6.77 | 3.19 + 6.77 |
Summary table of the ArrayExpress data
| Min | 1st | Median | Mean | SD | 3rd | Max | |
|---|---|---|---|---|---|---|---|
| E-GEOD-55763 | 0.721 | 3.505 | 2.598 | 8.500 | |||
| E-GEOD-68379 | 0.350 | 3.479 | 2.846 | 15.974 |
Fig. 3Histogram of the -values of the study population of the ArrayExpress data set E-GEOD-68379. This study in particular shows a high number of methylation sites close to 0 and 1, which could be of interest and a problem in modeling
Fig. 43D surface density plot of the distribution of differences in M-values to differences in Beta-values from E-GEOD-55763 (left) and E-GEOD-68379 (right). The difference in M-values () is mapped to the corresponding differences in Beta-values () observed in the data set by comparing two groups of five observations each with random group assignment in 5000 simulations . For larger than 7, we run 10000 simulations. The small group size of five was chosen for demonstration purposes and is by no means a sufficient group size