| Literature DB >> 30678737 |
Jonathan A Heiss1, Allan C Just2.
Abstract
BACKGROUND: DNA methylation microarrays are popular for epigenome-wide association studies (EWAS), but spurious values complicate downstream analysis and threaten replication. Conventional cut-offs for detection p values for filtering out undetected probes were demonstrated in a single previous study as insufficient leading to many apparent methylation calls in samples from females in probes targeting the Y-chromosome. We present an alternative approach to calculate more accurate detection p values utilizing non-specific background fluorescence. We evaluate and compare our proposed approach of filtering observations with conventional ones by assessing the detection of Y-chromosome probes among males and females in 2755 samples from 17 studies on the 450K microarray and masking of large outliers between technical replicates and their impact downstream via an EWAS reanalysis.Entities:
Keywords: DNA methylation; Data cleaning; EWAS; Illumina 450K; Microarray analysis; Outlier detection
Mesh:
Year: 2019 PMID: 30678737 PMCID: PMC6346546 DOI: 10.1186/s13148-019-0615-3
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Overview of 450K Gene Expression Omnibus datasets used in the current study
| GEO Accession | Tissue | Male ( | Female ( |
|---|---|---|---|
| GSE60655 | Vastus lateralis muscle | 16 | 20 |
| GSE61496 | Whole blood | 154 | 141 |
| GSE63106 | Cartilage from knees and hip joints | 24 | 38 |
| GSE65163 | Nasal epithelial cells | 35 | 36 |
| GSE69502 | Fetal tissues: muscle, kidney, spinal cord, brain, chorionic villi | 89 | 81 |
| GSE74432 | Whole blood | 57 | 62 |
| GSE75196 | Placenta | 11 | 13 |
| GSE75248 | Placenta | 155 | 162 |
| GSE85042 | Cord blood | 32 | 39 |
| GSE85566 | Airway epithelial cells | 36 | 78 |
| GSE86961 | Papillary thyroid tumor tissue, non-neoplastic adjacent tissue | 21 | 60 |
| GSE87571 | Whole blood | 341 | 389 |
| GSE89251 | CD4+ T cells | 38 | 98 |
| GSE90871 | Developing dorsolateral prefrontal cortex | 13 | 11 |
| GSE97362 | Whole blood | 145 | 83 |
| GSE99863 | Whole blood | 126 | 117 |
| GSE102177 | Whole blood | 20 | 14 |
Fig. 1Choosing the right cut-off. Median number (with 2.5th and 97.5th percentiles) of detected Y-chromosome probes among 1313 male and 1442 female samples for a range of detection p value cut-offs. Negative control probes were used to estimate the background noise distribution in the left panel, whereas non-specific fluorescence was used in the right panel. The cut-off 0.01 (corresponding to 2 in the left panel due to the transformed x-axis) is highlighted
Fig. 2Detecting what is not supposed to be there. Call rates of Y-chromosome probes among the 1442 female samples for three approaches to classifying detected probes. Probes are ordered on the x-axis by increasing call rate (order not identical between curves). Only six probes had a call rate < 2% when using the conventional cut-off of 0.01 with the background distribution estimated from negative control probes (NEG/0.01). For more stringent criteria (NSP/0.1 and NEG/1e−40), there is an almost clear-cut separation between undetected and detected probes including some cross-hybridizing to autosomal CpG sites
Fig. 3Previously obscured associations. Results from a reanalysis of a previously published epigenome-wide association study. Top nine uncovered associations between chronological age and DNA methylation levels (red) in peripheral blood reaching significance (relative to a Bonferroni threshold of 1.06e−07) after dropping observations (black) passing the more permissive NEG/0.01 cut-off but failing the more stringent NSP/0.01 cut-off. Annotations represent raw p values and sample sizes
Fig. 4Associations losing significance. Results from a reanalysis of a previously published epigenome-wide association study. Top nine associations between chronological age and DNA methylation levels (red + black) in peripheral blood losing significance (relative to a Bonferroni threshold of 1.06e−07) after dropping observations (black) passing the more permissive NEG/0.01 cut-off but failing the more stringent NSP/0.01 cut-off. Annotations represent raw p values and sample sizes