| Literature DB >> 28213474 |
Owen J L Rackham1, Sarah R Langley1, Thomas Oates2, Eleni Vradi3, Nathan Harmston1, Prashant K Srivastava4, Jacques Behmoaras5, Petros Dellaportas6,7, Leonardo Bottolo8,9,10, Enrico Petretto11,2.
Abstract
DNA methylation is a key epigenetic modification involved in gene regulation whose contribution to disease susceptibility remains to be fully understood. Here, we present a novel Bayesian smoothing approach (called ABBA) to detect differentially methylated regions (DMRs) from whole-genome bisulfite sequencing (WGBS). We also show how this approach can be leveraged to identify disease-associated changes in DNA methylation, suggesting mechanisms through which these alterations might affect disease. From a data modeling perspective, ABBA has the distinctive feature of automatically adapting to different correlation structures in CpG methylation levels across the genome while taking into account the distance between CpG sites as a covariate. Our simulation study shows that ABBA has greater power to detect DMRs than existing methods, providing an accurate identification of DMRs in the large majority of simulated cases. To empirically demonstrate the method's efficacy in generating biological hypotheses, we performed WGBS of primary macrophages derived from an experimental rat system of glomerulonephritis and used ABBA to identify >1000 disease-associated DMRs. Investigation of these DMRs revealed differential DNA methylation localized to a 600 bp region in the promoter of the Ifitm3 gene. This was confirmed by ChIP-seq and RNA-seq analyses, showing differential transcription factor binding at the Ifitm3 promoter by JunD (an established determinant of glomerulonephritis), and a consistent change in Ifitm3 expression. Our ABBA analysis allowed us to propose a new role for Ifitm3 in the pathogenesis of glomerulonephritis via a mechanism involving promoter hypermethylation that is associated with Ifitm3 repression in the rat strain susceptible to glomerulonephritis.Entities:
Keywords: Bayesian statistics; DNA methylation; WGBS; glomerulonephritis
Mesh:
Substances:
Year: 2017 PMID: 28213474 PMCID: PMC5378105 DOI: 10.1534/genetics.116.195008
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1ABBA model. ABBA estimates the unobserved methylation profiles, i.e., the DNA average methylation levels across replicates, of two groups from WGBS data (blue diamonds and red stars). (A) A random effect accounts for the variability of experimental replicates. At each CpG, the methylation probability difference is the difference between the methylation profile of the two groups (blue and red dots). (B) The methylation profiles of each group are smoothed by a latent Gaussian field that probabilistically connects them (dotted lines). In particular “Smoothing scenario 1” shows that if a large spacing (distance) between two consecutive CpGs (CpG:A and CpG:B) exists, the methylation profile at CpG:B does not depend on the previous one at CpG:A (blue dotted line). The opposite happens in “Smoothing scenario 2” where the methylation profile at CpG:D is largely influenced by the previous one at CpG:C (red dotted line) despite some high levels of methylation (red stars), which are treated by ABBA as outliers. The degree of the smoothing, i.e., the correlation between DNA methylation profiles, is controlled automatically by the marginal variance of the Latent Gaussian Field (blue and red vertical bars): the correlation is higher (lower) when the variance is small (large). On the other hand, the variance decreases as the distance between neighboring CpGs’ decreases (Smoothing scenario 2) while it increases as the distance increases (Smoothing scenario 1).
Figure 3ABBA analysis of WGBS in rat macrophages. (A) CpG-based annotation 1004 DMR between WKY and LEW macrophages showing significantly higher proportions of CpGI and CpGS than those that would be expected by chance (P-value < 0.009 for CpGI, and P-value < 0.001 for CpGS, respectively, obtained by 1000 randomly sampled datasets of 1004 CpG-matched regions). (B) Proportions of DMRs in different genomic features of overlapping genes. Feature annotation was retrieved from UCSC genome browser (RN4). (C) KEGG pathway enrichment for the genes overlapping with DMRs. Only significant pathways are reported (FDR < 1%). (D) Enrichment for the TFBS within the DMRs was when compared to CG matched regions of the genome (FDR < 0.05). (E) RNA-seq analysis in WKY and LEW macrophages shows lack of Ifitm3 expression in WKY rats. (F) Percentage methylation at each CpG in WKY (crosses) and LEW (plus), and smoothed average methylation profiles by ABBA. The pink box highlights the significant DMR identified by ABBA (FDR < 5%). (G) ChIP-seq analysis for JunD in LEW.LCrgn2 (LEW*), and WKY macrophages identified a single region with differential binding of JunD (P-value < 0.05, Sign Diff row, black box). Units on the y-axis refer to relative ChIP-seq coverage with respect to the control. This region overlapped with two (out of four) JunD binding sites motifs identified within the gene promoter (±500 bp around the TSS). ABBA DMR, differentially methylated region identified by ABBA. TSS, transcription start site. * P-value < 0.05, *** P-value < 0.001.
Figure 2Benchmarking results. (A) ROC curves for selected combinations of parameters: (i) s = 0.1, Δmeth = 30%, r = 1, average read depth per CpG of 10×, δ = 0; (ii) s = 0.3, Δmeth = 30%, r = 3, average read depth per CpG of 10×, δ = 0; (iii) s = 0.2, Δmeth = 70%, r = 2, average read depth per CpG of 30×, δ = 0; (iv) s = 0.1, Δmeth = 70%, r = 1, average read depth per CpG of 30×, δ = 5%; (v) s = 0.2, Δmeth = 30%, r = 2, average read depth per CpG of 10×, δ = 10%; (vi) s = 0.3, Δmeth = 70%, r = 3, average read depth per CpG of 30×, δ = 10%. For each of this combination of parameters, the corresponding best method based on its pAUC is indicated in the benchmark grid below. In (i) and (iv), ROC curves are reported only for the methods that can analyze WGBS data generated from one biological sample. (B) Global snapshot of the method’s performance across 216 simulated datasets. A given combination of parameters is indicated by a square in the benchmark grid, and, for each square, we calculated the pAUC for each method and determined which method had the overall best pAUC (i.e., pAUCmethod_1 > pAUCmethod_2). Colors in the benchmark grid indicate which method had the best performance. When pAUC of two methods are similar (±1%) we report the colors of both methods (e.g., black and red colors in the same square indicate similar performance of ABBA and DSS). The six selected combination of parameters for which the ROC curves are reported in (A) are indicated within the benchmark grid: (i–vi). All ROC curves are reported in Figure S5, Figure S6, and Figure S7. (C) For the three best performing methods (ABBA, DSS, and BSmooth), we report the percentage of simulated scenarios in which each method resulted to be the best based on the pAUC comparison. “Tie” indicates the proportion of simulated scenarios in which the pAUCs of any two methods were similar (i.e., pAUCs ±1%), and it was not possible to single out a single best performing approach.