| Literature DB >> 30086146 |
Yan Zhou1, Jiadi Zhu1, Mingtao Zhao2, Baoxue Zhang3, Chunfu Jiang1, Xiyan Yang4.
Abstract
DNA methylation is an essential epigenetic modification involved in regulating the expression of mammalian genomes. A variety of experimental approaches to generate genome-wide or whole-genome DNA methylation data have emerged in recent years. Methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) is one of the major tools used in whole-genome epigenetic studies. However, analyzing this data in terms of accuracy, sensitivity, and speed still remains an important challenge. Existing methods, such as BATMAN and MEDIPS, analyze MeDIP-seq data by dividing the whole genome into equal length windows and assume that each CpG of the same window has the same methylation level. More precise work is necessary to estimate the methylation level of each CpG site in the whole genome. In this paper, we propose a Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site. In addition, we analyze a real dataset for DNA methylation. The results show that our method displays improved precision in detecting differentially methylated CpG sites compared to the existing method. To meet the demands of the application, we have developed an R package called "SIMD", which is freely available in https://github.com/FocusPaka/SIMD.Entities:
Mesh:
Year: 2018 PMID: 30086146 PMCID: PMC6080771 DOI: 10.1371/journal.pone.0201586
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Short reads mapped to reference regions.
For example, there are six short reads that cover five CpG sites.
The false positive numbers of two methods at each p-value cutoff (two H1-ESCs).
| Levels | 1e-3 | 1e-4 | 1e-5 | 1e-6 | 1e-7 | 1e-8 |
|---|---|---|---|---|---|---|
| False positive numbers | ||||||
| SIMD | 3621 | 1054 | 295 | 115 | 45 | 33 |
| Raw | 13336 | 4915 | 2089 | 1047 | 607 | 415 |
The differentially methylated site number of two methods at each p-value cutoff (chr1 of H1 vs HuFNSC02).
| Levels | 1e-3 | 1e-4 | 1e-5 | 1e-6 | 1e-7 | 1e-8 |
|---|---|---|---|---|---|---|
| Differentially methylated | ||||||
| SIMD | 7830 | 2518 | 838 | 395 | 198 | 135 |
| Raw | 31653 | 12796 | 5997 | 3304 | 2110 | 1474 |
| FDRs of two methods | ||||||
| SIMD | 0.46245 | 0.41858 | 0.35202 | 0.29113 | 0.22727 | 0.24444 |
| Raw | 0.42131 | 0.38410 | 0.34834 | 0.31688 | 0.28767 | 0.28154 |
The number of differentially methylated sites derived from two methods at each q-value cutoff (chr1 of H1 vs HuFNSC02).
| Levels | 5e-2 | 1e-2 | 1e-3 | 1e-4 | 1e-5 | 1e-6 |
|---|---|---|---|---|---|---|
| Differentially methylated | ||||||
| SIMD | 1259 | 542 | 199 | 105 | 67 | 33 |
| Raw | 27412 | 11106 | 4070 | 2100 | 1363 | 885 |
| FDRs of two methods | ||||||
| SIMD | 0.36536 | 0.31549 | 0.22110 | 0.22857 | 0.17910 | 0.2121 |
| Raw | 0.41853 | 0.38015 | 0.32776 | 0.29142 | 0.27953 | 0.2655 |