| Literature DB >> 28410574 |
Akhilesh Kaushal1, Hongmei Zhang2, Wilfried J J Karmaus1, Meredith Ray1, Mylin A Torres3,4, Alicia K Smith3,5, Shu-Li Wang6.
Abstract
BACKGROUND: Whole blood is frequently utilized in genome-wide association studies of DNA methylation patterns in relation to environmental exposures or clinical outcomes. These associations can be confounded by cellular heterogeneity. Algorithms have been developed to measure or adjust for this heterogeneity, and some have been compared in the literature. However, with new methods available, it is unknown whether the findings will be consistent, if not which method(s) perform better.Entities:
Keywords: Cell-type composition; CpG sites; Genome-scale DNA methylation; Surrogate variables
Mesh:
Substances:
Year: 2017 PMID: 28410574 PMCID: PMC5391562 DOI: 10.1186/s12859-017-1611-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of significant CpG sites with and without cell type correction and overlap with the SVA method (data on prenatal arsenic exposure and DNA methylation)
| Method | Identified CpGs ( | Overlap with SVA (%) |
|
|---|---|---|---|
| Houseman et al. | 10 | 1.20 | <0.0001 |
| minfi | 57 | 4.62 | <0.0001 |
| SVA | 498 | – | – |
| RefFreeEWAS | 133 | 6.01 | <0.0001 |
| RefFreeCellMix | 2932 | 0.60 | 1.0 |
| ReFACTor | 58,871 | 13.03 | 1.0 |
| EWASher a | 0 | 0.0 | – |
| RUV | 356 | 0.20 | 1.0 |
| Unadjusted b | 3 | 0.60 | <0.0001 |
#The selection of CpG sites is based on FDR-adjusted p-values (FDR is controlled at 0.05)
## P-value is based on Fishers exact test for overlap with results from SVA. The null hypothesis is that there is no overlap with the CpGs identified based on SVA
aThe FasT-LMM-EWASher method
bUnadjusted: cell type compositions were not included in the analyses
Fig. 1Venn diagram illustrating the overlap of identified CpG sites that were associated with prenatal arsenic exposure at FDR level of 0.05 after incorporating estimated cell type compositions by different methods for the association study of prenatal arsenic exposure with DNA-methylation. Results from Houseman et al., minfi, RefFreeEWAS, and SVA as well as the analyses without adjusting for cell types are displayed (Results from other methods are in the text). “UN”: results from an analysis without adjusting for cell type compositions
Number of significant CpG sites with and without cell-correction methods and overlap of CpG sites with those from the SVA method (example data from FasT-LMM-EWASher package)
| Method | Identified CpGs ( | Overlap with SVA (%) |
| J-index c |
|---|---|---|---|---|
| Houseman et al. | 1835 | 54.71 | <0.0001 | 0.40 |
| minfi | 3589 | 84.59 | <0.0001 | 0.40 |
| SVA | 1888 | – | – | – |
| RefFreeEWAS | 788 | 30.51 | <0.0001 | 0.30 |
| RefFreeCellMix | 1006 | 18.38 | <0.0001 | 0.10 |
| ReFACTor | 4224 | 87.45 | <0.0001 | 0.40 |
| EWASher a | 3 | 0.16 | <0.0001 | 0 |
| RUV | 6008 | 99.95 | <0.0001 | 0.30 |
| Unadjusted b | 3768 | 82.89 | <0.0001 | 0.40 |
#The selection of CpG sites is based on FDR-adjusted p-values (FDR is controlled at 0.05)
## P-value is based on Fishers exact test for overlap. The null hypothesis is that there is no overlap with the CpGs identified based on SVA
aThe FasT-LMM-EWASher method
bUnadjusted: cell type compositions were not incorporated into the analyses
cJ-index is Jaccard index
Summary of sensitivity, specificity of Unadjusted, FaST-LMM-EWASher, RefFreeEWAS, SVA, ReFACTor and RefFreeCellMix for 100 simulated data across three settings
| Sensitivity (Median, 95% interval) | Specificity (Median, 95% interval) | |||
|---|---|---|---|---|
| Scenario 1 ( | Scenario 2 ( | Scenario 1 ( | Scenario 2 ( | |
| Number of Important CpGs =50 | ||||
| Unadjusted | 0.960 (0.470, 1.000) | 1.000 (1.000, 1.000) | 1.000 (0.987, 1.000) | 0.000 (0.000, 0.000) |
| Ewasher a | 0.000 (0.000, 0.000) | 0.000 (0.000, 0.000) | 1.000 (0.999, 1.000) | 1.000 (0.999, 1.000) |
| RefEWAS b | 1.000 (0.960, 1.000) | 0.000 (0.000,0.494) | 0.997 (0.994, 0.999) | 0.579 (0.055,1.000) |
| CellMix c | 1.000 (0.980, 1.000) | 1.000 (1.000, 1.000) | 0.997 (0.993, 0.999) | 0.546 (0.199, 0.923) |
| ReFACTor | 1.000 (0.960, 1.000) | 1.000 (1.000, 1.000) | 0.996 (0.825, 1.000) | 0.000 (0.000, 0.000) |
| SVA d | 1.000 (0.980, 1.000) | 1.000 (0.960, 1.000) | 0.998 (0.996, 1.000) | 0.998 (0.996, 1.000) |
| Number of Important CpGs =100 | ||||
| Unadjusted | 0.980 (0.664, 1.000) | 1.000 (1.000, 1.000) | 0.999 (0.976, 1.000) | 0.000 (0.000, 0.000) |
| Ewasher a | 0.000 (0.000, 0.000) | 0.000 (0.000, 0.000) | 1.000 (0.999, 1.000) | 1.000 (0.999, 1.000) |
| RefEWAS b | 1.000 (0.965,1.000) | 0.000 (0.000,0.403) | 0.995 (0.991, 0.998) | 0.520 (0.014,1.000) |
| CellMix c | 1.000 (0.975, 1.000) | 1.000 (1.000, 1.000) | 0.988 (0.968, 0.996) | 0.211 (0.047, 0.525) |
| ReFACTor | 1.000 (0.965, 1.000) | 1.000 (1.000, 1.000) | 0.994 (0.808, 0.998) | 0.000 (0.000, 0.000) |
| SVA d | 1.000 (0.990,1.000) | 0.990 (0.965, 1.000) | 0.996 (0.993, 0.999) | 0.996 (0.993, 0.999) |
| Number of Important CpGs =150 | ||||
| Unadjusted | 0.993 (0.723, 1.000) | 1.000 (1.000, 1.000) | 0.999 (0.965, 1.000) | 0.000 (0.000, 0.000) |
| Ewasher a | 0.000 (0.000,0.000) | 0.000 (0.000,0.000) | 1.000 (0.999,1.000) | 1.000 (0.999,1.000) |
| RefEWAS b | 0.993 (0.973,1.000) | 0.000 (0.000,0.294) | 0.992 (0.986, 0.997) | 0.496 (0.013,1.000) |
| CellMix c | 1.000 (0.983, 1.000) | 1.000 (1.000, 1.000) | 0.975 (0.929, 0.993) | 0.098 (0.022, 0.293) |
| ReFACTor | 1.000 (0.980, 1.000) | 1.000 (1.000, 1.000) | 0.989 (0.794, 0.997) | 0.000 (0.000, 0.000) |
| SVA d | 1.000 (0.993,1.000) | 0.993 (0.970, 1.000) | 0.992 (0.988, 0.996) | 0.993 (0.988, 0.996) |
ρ = correlation between primary covariate and latent variables
ρ = 0 corresponds to data simulated from Scenario 1, while ρ = 0.7 corresponds to data simulated from Scenario 2
aFasT-LMM-EWASher
bRefFreeEWAS
cRefFreeCellMix
dSurrogate variable analysis
Fig. 2Plots of sensitivity vs. 1-specificity and estimated ROC curves, a) SVA. b) RefFreeEWAS
| Method | Link |
| minfi | http://bioconductor.org/packages/release/bioc/html/minfi.html |
| Houseman | http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-86 |
| RefFreeEWAS | https://cran.r-project.org/web/packages/RefFreeEWAS/index.html |
| SVA |
|
| EWASher |
|
| RefFreeCellMix |
|
| ReFACTor |
|
| RUV |
|