| Literature DB >> 29419727 |
Hong Tran1, Hongxiao Zhu2, Xiaowei Wu3, Gunjune Kim4, Christopher R Clarke5, Hailey Larose6, David C Haak7, Shawn D Askew8, Jacob N Barney9, James H Westwood10, Liqing Zhang11.
Abstract
Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities-both datasets have weak methylation effects of <1%-show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs.Entities:
Keywords: differentially methylated regions; wavelet-based functional mixed model; weak methylation effect
Year: 2018 PMID: 29419727 PMCID: PMC5852571 DOI: 10.3390/genes9020075
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Correlation of methylation levels of neighboring cytosine regions in monozygotic (MZ) twin and neighboring cytosines in Arabidopsis thaliana datasets. Details of the calculation are in Section 1 in Supplementary Materials.
Figure 2Receiver operating characteristic (ROC) curve comparison between wavelet-based functional mixed model (WFMM) (blue curve) and methylKit (red curve) when the differentially methylated cutoff is 0.04 in correlated cytosines (a) and uncorrelated cytosines (b) and when the differentially methylated cutoff is 0.08 in correlated cytosines (c) and uncorrelated cytosines (d). The gray line represents points where sensitivity equals specificity.
Figure 3ROC curves of WFMM (blue curve) and methylKit (red curve) as differentially methylated cutoff increases from 0.1, to 0.25 (diff = 0.1, 0.12, 0.15, 0.2, 0.25).
Figure 4Effect of different sample sizes on WFMM with = 0.01 and methylKit with adjusted settings (q value = 1.00; difference = 4) using correlated simulated data when the differentially methylated cutoff is 0.04.
The number of significant differentially methylated cytosine (DMCs), and genes recognized by Database for Annotation, Visualization and Integrated Discovery (DAVID) by applying wavelet-based functional mixed model (WFMM) with δ = 0.01 and methylKit with default settings (difference = 25; q value= 0.01) and methylKit with adjusted settings (difference = 4; q value= 1.00) on a real A. thaliana dataset.
| Chromosome | WFMM δ = 0.01; Number of DMCs | methylKit Default; | methylKit | WFMM δ = 0.01; Number of Significant Genes | methylKit Default; | methylKit |
|---|---|---|---|---|---|---|
| Chr1 | 133,512 | 12,048 | 294,153 | 4041 | 3098 | 7760 |
| Chr2 | 87,488 | 7627 | 244,683 | 2417 | 1887 | 5129 |
| Chr3 | 113,229 | 9863 | 274,382 | 3180 | 2459 | 6254 |
| Chr4 | 91,327 | 7708 | 227,539 | 2563 | 1943 | 4815 |
| Chr5 | 123,027 | 10,776 | 290,090 | 3622 | 2779 | 6989 |
| ChrC * | 9081 | 19 | 7306 | 0 | 0 | 0 |
| ChrM * | 0 | 0 | 66 | 0 | 0 | 0 |
| Total | 557,664 | 48,041 | 1,338,219 | 15,823 | 12,166 | 30,947 |
* ChrC stands for chloroplast; ChrM designates mitochondria.
Figure 5Percentages of overlapping differentially methylated cytosine (DMCs) from methylKit with adjusted settings (difference = 4; q value = 1.00) and WFMM with δ = 0.01 in correlated simulated data when the differentially methylated cutoff is 0.04 (a) and in the real data (b).
Figure 6Gene ontology of molecular function for significant differentially methylated TAIR genes detected by WFMM with δ = 0.01 (a) and methylKit with default settings (difference = 25; q value = 0.01) (b).
Number of intersecting genes between 484 genes identified by Malay Das et al. [20] that are related to herbicide glyphosate stress and significant genes identified by WFMM and methylKit.
| Methods | Number of Significant DMRs | Number of Significant Genes Using DAVID |
|---|---|---|
| WFMM δ = 3.44 × 10−5 | 769 | 236 |
| methylKit adjusted; | 2023 | 892 |
Figure 7Gene clusters based on the gene ontology of molecular function for the top 3000 most significant genes from WFMM with δ = 0.01 (a), methylKit with default settings (difference = 25; q value = 0.01) (b), and methylKit with adjusted settings (difference = 4; q value= 1.00) (c).
Number of significant DMCs, and genes recognized by DAVID by applying WFMM with δ = 3.44 × 10−5 and difference = 4.34 × 10−5; q value= 1.00 on 25 monozygotic (MZ) twin pairs with different pain sensitivity temperature.
| Methods | Number of Significant Genes | Number of Shared Genes in All Significant Genes | Number of Shared Genes in Top 3000 Most Significant Genes |
|---|---|---|---|
| WFMM δ = 0.01 | 15,823 | 238 | 51 |
| methylKit default; | 12,166 | 181 | 39 |
| methylKit adjusted; | 30,947 | 466 | 44 |
Figure 8Gene clusters based on the gene ontology of molecular function for significant genes detected by WFMM with δ = 3.44 × 10−5 (a) and methylKit (difference = 4.34 × 10−5; q value = 1.00) (b).