| Literature DB >> 31870409 |
Zijie Zhang1,2,3, Qi Zhan4,5, Mark Eckert6, Allen Zhu1,2,3, Agnieszka Chryplewicz6, Dario F De Jesus7, Decheng Ren8, Rohit N Kulkarni7, Ernst Lengyel6, Chuan He9,10,11, Mengjie Chen12,13.
Abstract
Epitranscriptome profiling using MeRIP-seq is a powerful technique for in vivo functional studies of reversible RNA modifications. We develop RADAR, a comprehensive analytical tool for detecting differentially methylated loci in MeRIP-seq data. RADAR enables accurate identification of altered methylation sites by accommodating variability of pre-immunoprecipitation expression level and post-immunoprecipitation count using different strategies. In addition, it is compatible with complex study design when covariates need to be incorporated in the analysis. Through simulation and real dataset analyses, we show that RADAR leads to more accurate and reproducible differential methylation analysis results than alternatives, which is available at https://github.com/scottzijiezhang/RADAR.Entities:
Keywords: Differential methylation; MeRIP-seq; N6-adenosine methylation (m6A)
Mesh:
Year: 2019 PMID: 31870409 PMCID: PMC6927177 DOI: 10.1186/s13059-019-1915-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Unique features of m6A-seq (MeRIP-seq) data. RADAR divides concatenated exons of a gene into consecutive bins and models the immunoprecipitation (IP)-enriched read counts in such bins. a depicts a pair of read counts in the INPUT and the IP library in the ith bin as c and t. In the RADAR workflow, the gene-level read count of the input library substitutes the bin-level read count c as the representation of the pre-IP RNA levels of the ith bin. b compares the relative variation of gene-level and bin-level (local) read counts of different bin sizes in four m6A-seq datasets, suggesting that unwanted variation can be reduced using gene-level counts as the estimates of pre-IP RNA levels. Panel c compares the cross-sample mean and variance of regular RNA-seq (pre-IP counts) and m6A-seq (post-IP read counts adjusted for pre-IP RNA level variation) data in four m6A-seq datasets. The fitted curvature of m6A-seq can differ from that of RNA-seq, indicating that m6A-seq may have a different mean-variance relationship from RNA-seq. Biological and experimental confounding factors are often encountered in patient samples. d shows the first two principal components (PCs) of m6A enrichment in each dataset, where the samples are colored by covariates that need to be accounted for. m6A enrichment was represented by IP sample read counts adjusted for pre-IP (INPUT) RNA-level variation. e shows the first two PCs after regressing out known covariates—age in the ovarian cancer dataset and batch in the T2D dataset. After regressing out the covariate, samples are separated by disease conditions on the PCA plot
Fig. 2Benchmarking RADAR on two simulation models. We benchmarked RADAR and other alternative methods on simulated data. Using two simulation models—a random effect (RADAR) model and a quad-negative-binomial (QNB) model, we simulated dataset of eight replicates of varying true effect sizes (0.5, 0.75, and 1) with and without covariates. We tested different methods on simulated dataset and compared the results at an FDR cutoff of 0.1 with simulated true sites. We show the sensitivity (fraction of true sites detected by the method at an FDR cutoff of 0.1) and false discovery rate (fraction of detected differential sites that are not true sites) of each method applied on data simulated by the random effect model without covariates (a) and with covariates (b) and the quad-negative-binomial model without covariates (c) and with covariates (d), respectively. The FDR cutoff used to select DM sites is labeled by a dashed line
Fig. 3Sensitivity of benchmarked methods on real m6A-seq data. We benchmarked RADAR and other alternative methods on four m6A-seq data with different characteristics. Each panel shows the histogram of p-values obtained from DM tests using RADAR, MeTDiff, QNB, Fisher’s exact test and exomePeak on each dataset, respectively
Fig. 4Benchmarking false-positive signals using permutation analysis on real m6A-seq data. To assess empirical FDR of the test, we permuted the phenotype labels of samples so that the new labels were not associated with true ones. Each panel shows the histograms of p values obtained from DM tests on 15 permuted copies (blue) and those from the tests on the original dataset (red)
Fig. 5Pathways enriched in differential methylated genes identified in ovarian cancer and T2D datasets. We performed KEGG pathway enrichment analysis using ClusterProfiler [37] on DMGs identified in the ovarian cancer dataset by RADAR (a) and MeTDiff (b), respectively. The enrichment maps represent identified pathways as a network with edges weighted by the ratio of overlapping gene sets
Fig. 6Experimental validation of RADAR-detected DM sites using the SELECT method. We applied antibody independent method SELECT on T2D samples (N = 4). Shown are SELECT results of six putative DM sites for validation. SELECT measures the relative abundance of non-methylated RNA molecules of target locus as represented by the elongation and ligation “read through” of oligo probes. Thus, SELECT results—“relative read through”—are inversely correlated with m6A level
Fig. 7The influence of sample size on the statistical power of differential methylation analysis. Sensitivity vs. empirical FDR for each method on simulated data with different numbers of replicates (2 to 8) at 10% FDR. Each data point represents the results on one of ten simulated copies. Sample sizes are labeled by colors