| Literature DB >> 29339507 |
Yu Zhang1,2,3, C Jake Harris4, Qikun Liu5,6,4, Wanlu Liu4, Israel Ausin5, Yanping Long1,2,3, Lidan Xiao1,2,3, Li Feng1, Xu Chen1, Yubin Xie4, Xinyuan Chen4, Lingyu Zhan4, Suhua Feng4, Jingyi Jessica Li7, Haifeng Wang8,9, Jixian Zhai10, Steven E Jacobsen11,12.
Abstract
Genome-wide characterization by next-generation sequencing has greatly improved our understanding of the landscape of epigenetic modifications. Since 2008, whole-genome bisulfite sequencing (WGBS) has become the gold standard for DNA methylation analysis, and a tremendous amount of WGBS data has been generated by the research community. However, the systematic comparison of DNA methylation profiles to identify regulatory mechanisms has yet to be fully explored. Here we reprocessed the raw data of over 500 publicly available Arabidopsis WGBS libraries from various mutant backgrounds, tissue types, and stress treatments and also filtered them based on sequencing depth and efficiency of bisulfite conversion. This enabled us to identify high-confidence differentially methylated regions (hcDMRs) by comparing each test library to over 50 high-quality wild-type controls. We developed statistical and quantitative measurements to analyze the overlapping of DMRs and to cluster libraries based on their effect on DNA methylation. In addition to confirming existing relationships, we revealed unanticipated connections between well-known genes. For instance, MET1 and CMT3 were found to be required for the maintenance of asymmetric CHH methylation at nonoverlapping regions of CMT2 targeted heterochromatin. Our comparative methylome approach has established a framework for extracting biological insights via large-scale comparison of methylomes and can also be adopted for other genomics datasets.Entities:
Keywords: Arabidopsis; DNA methylation; computational biology; epigenetics
Mesh:
Substances:
Year: 2018 PMID: 29339507 PMCID: PMC5798360 DOI: 10.1073/pnas.1716300115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Quality check and data processing of Arabidopsis WGBS libraries in GEO. (A) Summary of the pipeline for processing Arabidopsis WGBS libraries. (B) Percentage of unconverted-C from reads that mapped to nucleic and chloroplastic genome of each library. Libraries in the shaded area were discarded from further analysis due to the low bisulfite conversion rate. (C) Distribution of total data size and average genome coverage of sequencing reads of each library. Libraries in the shaded area were discarded from further analysis due to the low genome coverage.
Fig. 2.Validation of hcDMRs and comparison with other DMR calling strategy. (A) Boxplot for methylation levels at morc6 (lib_GSM1375965) defined hypo-CHH hcDMRs (n = 311). Lines connect levels of methylation at individual hcDMRs in the genotypes indicated (WT = lib_GSM1375966). (B) Boxplot for methylation levels at morc6 (lib_GSM1375965) matched wild-type (WT = lib_GSM1375966) defined hypo-CHH DMRs (n = 1,391). Note that lines at a near-horizontal or descending incline from “morc6” to “MORC6 in morc6” indicate a lack of complementation in the MORC6-FLAG T1 line. (C) Venn diagrams for comparison of number of DMRs identified in three independent nrpe1 samples, generated by using one WT from the same experiment (Left) and a group of at least 33 WTs (hcDMRs; Right). The numbers outside the Venn diagrams show the percentage of overlapped DMRs (intersect/total number of DMRs) in each sample. Middle illustrates the overlap of intersect set of DMRs identified by these two methods.
Fig. 3.Clustering of test libraries with overlapping DMRs by S-MOD method and Q-MOD method. (A) Summary of S-MOD calculation for pairwise relatedness between libraries based on overlapping hcDMRs. (B) Clustering of S-MOD scores between pairs of 260 test libraries at hypo-CHH hcDMRs. Submatrices with yellow borderline indicate libraries that have significant overlap of hcDMRs between any other libraries, whereas green borderlines indicate subgroups of high relatedness. (C) Summary of Q-MOD calculation for pairwise relatedness between libraries based on overlapping hcDMRs. (D) Q-MOD clustering at hypo-CHH hcDMRs after filtering out libraries by S-MOD score >100 maximum cutoff. With Q-MOD, finer-scale groupings, such as between weak vs. strong RdDM mutants, can be observed.
Fig. 4.MET1 and CMT3 are independently required for the maintenance of asymmetric CHH methylation at CMT2 target sites. (A) Q-MOD clustering of cmt2 hypo-CHH block. met1 and cmt3 related mutants show independent overlaps with cmt2 hypo-CHH hcDMRs. (B) Scatterplots for methylation levels in WT (lib_GSM980986) vs. met1 (lib_GSM981031) or cmt3 (lib_GSM981003) at the met1 or cmt3 defined hcDMR subset of cmt2 (lib_GSM981002) sites. Methylation levels at cmt3 subset sites (@ cmt3 sites, n = 2,290) are largely unaffected in met1, and methylation at met1 subset sites (@ met1 sites, n= 4,867) are largely unaffected in cmt3 (Upper). Lower is control, showing CHH methylation level loss at the met1 and cmt3 sites in their respective mutant backgrounds. (C) Levels of WT methylation in CG, CHG, and CHH at cmt3 and met1 subset sites. Significant differences indicated by P value (Wilcoxon rank sum test); ***P < 2.2e−16 and *P = 0.000196. (D) Average cytosine context density at cmt3 and met1 subset sites. Density is given in percentage (5% indicates five sites per 100 bp of sequence). ***P < 2.2e−16 (Wilcoxon rank sum test). (E) Browser shot showing cmt3 and met1 (cmt2 subset) hypo-CHH hcDMRs at distinct adjacent TEs. The TE requiring met1 for CHH methylation maintenance also shows loss of H3K9me2 in the met1 background. (F) H3K9me2 levels in WT vs. met1 over met1 subset hypo-CHH hcDMRs (Left) and over cmt3 subset hypo-CHH hcDMRs (Right). ***P < 2.2e−16 (Wilcoxon rank sum test).