Literature DB >> 34125875

Methylscaper: an R/shiny app for joint visualization of DNA methylation and nucleosome occupancy in single-molecule and single-cell data.

Parker Knight1, Marie-Pierre L Gauthier2, Carolina E Pardo2, Russell P Darst2, Kevin Kapadia3, Hadley Browder3, Eliza Morton3, Alberto Riva4, Michael P Kladde2, Rhonda Bacher1.   

Abstract

SUMMARY: Differential DNA methylation and chromatin accessibility are associated with disease development, particularly cancer. Methods that allow profiling of these epigenetic mechanisms in the same reaction and at the single-molecule or single-cell level continue to emerge. However, a challenge lies in jointly visualizing and analyzing the heterogeneous nature of the data and extracting regulatory insight. Here, we present methylscaper, a visualization framework for simultaneous analysis of DNA methylation and chromatin accessibility landscapes. Methylscaper implements a weighted principal component analysis that orders DNA molecules, each providing a record of the chromatin state of one epiallele, and reveals patterns of nucleosome positioning, transcription factor occupancy, and DNA methylation. We demonstrate methylscaper's utility on a long-read, single-molecule methyltransferase accessibility protocol for individual templates (MAPit-BGS) dataset and a single-cell nucleosome, methylation, and transcription sequencing (scNMT-seq) dataset. In comparison to other procedures, methylscaper is able to readily identify chromatin features that are biologically relevant to transcriptional status while scaling to larger datasets.
AVAILABILITY AND IMPLEMENTATION: Methylscaper, is implemented in R (version > 4.1) and available on Bioconductor: https://bioconductor.org/packages/methylscaper/, GitHub: https://github.com/rhondabacher/methylscaper/, and Web: https://methylscaper.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 34125875      PMCID: PMC8665741          DOI: 10.1093/bioinformatics/btab438

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Abnormal epigenetic changes are a key hallmark of cancer. Alterations in DNA methylation, including the co-occurrence of both hyper- and hypo-methylation of different regions of the genome, have been detected in nearly all cancer types (Kushwaha ; Orjuela ; Pérez ). In addition, both cancer- and tissue-specific differences exist in nucleosome positioning and occupancy, as well as transcription factor binding activity, which together determine chromatin accessibility (Corces ). However, profiling endogenous methylation and accessibility states separately ignores their complementary nature in regulating gene expression and, by definition, queries different sets of molecules (Portela ). To address this, assays such as MAPit-BGS (Pondugula and Kladde, 2008) and NOMe-seq (Kelly ) have been developed to simultaneously capture nucleosome occupancy and methylation states at single-molecule resolution. In both cases, chromatin accessibility is first probed by the methyltransferase M.CviPI (Xu, 1998), which methylates unprotected GC sites. Next, accessibility at GC sites and CG endogenous methylation are profiled by bisulfite (Darst ) or bisulfite-free enzymatic conversion (Schutsky ). After sequencing, the methylation signals of all cytosines are translated bioinformatically. Long-read sequencing is particularly advantageous to phase the co-occurrence of epigenetic features, e.g. multiple nucleosomes. Recently, an extension of NOMe-seq, nanoNOMe, made use of long-read nanopore sequencing and resolved long-range patterns along with individual DNA molecules (Lee ). Methods for simultaneously profiling accessibility and methylation have also been extended to single cells via the scNOMe-seq (Pott, 2017) and scNMT-seq (Clark ) techniques. For MAPit-BGS and nanoNOMe, the long reads derive from contiguous single DNA molecules, while single-cell methods use short reads that are reconstructed into contiguous DNA molecules from individual cells. Both types of methods allow for discerning the heterogeneous nature of cellular DNA methylation and chromatin structure. Bioinformatic software programs, such as Bismark (Krueger and Andrews, 2011), are first used to align the data; however, many analytical pipelines and downstream visualization tools fail to highlight the epigenetic variation in a useful way. Previously developed methods utilize the output from Bismark but are limited to a relatively small number of reads or provide summary plots rather than site-level data (Huang ; Wong ). Two other such visualization tools are the NOMePlot (Requena ) and MethylViewer (Pardo ) applications, which were designed to simultaneously visualize CG methylation/GC accessibility patterns. Despite their integrated pipelines, the commonly used ‘lollipop’ plots are not intuitive in highlighting the joint occupancy and methylation states along a continuous DNA strand, especially when considering hundreds or thousands of molecules. The previously developed MethylTracker (Darst ) plots visually intuitive methylation/accessibility patterns by connecting consecutively methylated or unmethylated sites with contrasting colors, however, it is computationally inefficient and unable to effectively organize hundreds or thousands of reads. Here, we describe methylscaper, a bioinformatic and statistical software package that generates visualizations of the DNA methylation and chromatin accessibility patterns. For single-molecule joint profiling data, or those using targeted sequencing approaches, methylscaper begins by processing raw sequencing reads. For single-cell data, or those using genome-wide approaches, output from Bismark or similar alignment programs is used as the initial input. Ordering the molecules is a key step for visualization, and our pipeline implements a two-stage weighted principal component analysis (PCA) framework that is feature- and site-specific. Weighting allows the user to emphasize specific genomic regions or features of interest. Compared to alternative procedures, our ordering is also efficient for large-scale datasets. Methylscaper is an interactive visualization platform available as a R/Shiny application and its functions may also be used directly via the R package. We evaluate methylscaper on an epigenetic DNA resilencing MAPit-BGS dataset and demonstrate its superior ability to elucidate epigenetic patterns and identify regions of cell-to-cell nucleosome sliding. We further demonstrate methylscaper on a single-cell dataset generated using scNMT-seq and identify a site of nucleosome positioning.

2 Materials and methods

Methylscaper first processes the data, followed by visualization and statistical analysis of methylated and accessible chromatin regions. For targeted sequencing datasets, the initial preprocessing steps include pairwise alignment of each sequence, quality control and filtering of poorly aligned sequences, and finally, conversion of the aligned sequences to plots of methylation and occupancy states (Fig. 1A). For genome-wide datasets, methylscaper begins with processed output from alignment programs such as Bismark. Additional details on the bioinformatic processing are available in Supplementary Materials. Regions of methylation or accessibility are identified by connecting consecutive sites having the same methylation state (Fig. 1B). A patch of endogenous methylation is plotted in red if ≥2 consecutive HCG sites show methylation (H = A, T or C, where a sequenced C [or G on the complementary strand] denotes methylation). Similarly, consecutive GCH methylation indicates accessibility, plotted in yellow. By contrast, consecutively unmethylated GCH or HCG [a sequenced T (or A on the complementary strand) in the sequence denotes an unmethylated status] are colored black. Patches of either color interrupted by a single GCH or HCG site of the opposite methylation state are emphasized as gray borders.
Fig. 1.

An overview of methylscaper. (A) Flowchart of the bioinformatic preprocessing pipeline. (B) methylscaper plots of the MAPit-BGS data, generated with two different orderings. The data in the left plot is not ordered; the data on the right was ordered with methylscaper’s weighted principal component algorithm. A pink oval was added to indicate the ∼150-bp +1 nucleosome downstream of the transcription start site; a green rectangle was added to indicate a sequence-specific DNA-binding factor; and a bent arrow was added to indicate the TSS

An overview of methylscaper. (A) Flowchart of the bioinformatic preprocessing pipeline. (B) methylscaper plots of the MAPit-BGS data, generated with two different orderings. The data in the left plot is not ordered; the data on the right was ordered with methylscaper’s weighted principal component algorithm. A pink oval was added to indicate the ∼150-bp +1 nucleosome downstream of the transcription start site; a green rectangle was added to indicate a sequence-specific DNA-binding factor; and a bent arrow was added to indicate the TSS Ordering the single-cells or molecules displays population heterogeneity and allows identification of patterns of endogenous methylation as well as transcription factor and nucleosome occupancy. To do so, methylscaper constructs a matrix containing both endogenous (HCG) and introduced (GCH) methylation states for the set of molecules. A numerical key is used to represent patches of methylation. Weighted PCA is performed on the entire matrix, with molecules assigned a weight based on the number of methylation patches between two fixed base pairs chosen by the user. This allows the weighting to focus on either type of methylation and to emphasize specific genomic regions (Supplementary Fig. S1). The first weighted principal component is used to determine the global order; as shown in Supplementary Figure S2, where the first component is highly correlated with methylation and accessibility. Following the determination of the global ordering, users can perform an optional second-stage refinement step in which a contiguous subset of the molecules is reordered using the PCA procedure to increase the resolution of patterns (Supplementary Fig. S3). Additional experiment-wide statistics are also calculated from the molecules that are then comparable across datasets or treatments (Supplementary Fig. S4).

3 Results

We applied methylscaper to a dataset with 149 single-molecule reads generated using MAPit-BGS. This dataset is from an epigenetic study of methylation resilencing in the EMP2AIP1 promoter region following withdrawal of the DNA methyltransferase inhibitor 5-aza-2′-deoxycytidine using cell line RKO. A comparison of the visualization without any ordering versus our weighted PCA is shown in Fig. 1B. Without any ordering of the molecules (left panel), drawing biological conclusions is precluded. Using methylscaper (right panel), it becomes evident that endogenous HCG methylation (red-gray) inversely correlates with GCH accessibility (yellow-gray). The quality of ordering also allows visualization of the +1 nucleosome sliding and occupying different positions in each cell–the ∼150 bp footprints (black areas) that move in register with expansion/shortening of the accessible nucleosome-free region at the transcription start site (TSS). In molecules 25–35, two-phased nucleosomes are observed, punctuated by an accessible linker. Finally, protection of two GCH sites upstream of the TSS and within the nucleosome-free region detects binding of a sequence-specific transcription factor. We also compared visualization with methylscaper to existing tools. In previous manuscripts using MAPit-BGS, hierarchical clustering alone was used to order the molecules. However, we have found this method fails with increasing complexity of patterns and number of molecules and often breaks the molecules into distinct blocks that have locally optimal orderings but are out of order with respect to a global structure (Supplementary Fig. S5). When patterns in the data are heterogeneous and many molecules are available, this leads to unorganized and potentially uninformative visualizations. Line plots, also commonly used to visualize methylation and accessibility status [e.g. as implemented in the aaRon R package (Statham ) or the NOMePlot software(Requena )] either present the status of a single molecule at a time or of a moving population average of statuses across all molecules (Supplementary Fig. S6). This type of plot is insufficient when visualizing a large number of molecules, as using population averages often leads to a loss of critical information when methylation status or nucleosome occupancy is highly variable in heterogeneous cell populations. Commonly used lollipop plots also become unclear when a large number of molecules are available (Supplementary Fig. S6). Next, we applied our results to a single-cell dataset generated using the scNMT-seq protocol that jointly profiles methylation and accessibility chromatin states in single cells (Clark ). As shown in Clark et al., we also observe high levels of open chromatin near the TSS for Eef1g, though we find evidence of a + 1 nucleosome approximately +250 bp downstream of the TSS (Supplementary Fig. S7).

Data Availability

The MAPit-BGS reads and reference sequence are available with this article as Supplementary data and available in the methylscaper R package on Github (https://github.com/rhondabacher/methylscaper/). The scNMT-seq dataset was downloaded from GSE109262.

Funding

This work was supported by the University of Florida Health Cancer Center; the National Institutes of Health [R01 CA155390 to M.P.K.] and the Defense Threat Reduction Agency [HDTRA1-16-1-0048]. Conflict of Interest: none declared. Click here for additional data file.
  18 in total

1.  Cloning, characterization and expression of the gene coding for a cytosine-5-DNA methyltransferase recognizing GpC.

Authors:  M Xu; M P Kladde; J L Van Etten; R T Simpson
Journal:  Nucleic Acids Res       Date:  1998-09-01       Impact factor: 16.971

Review 2.  Single-molecule analysis of chromatin: changing the view of genomes one molecule at a time.

Authors:  Santhi Pondugula; Michael P Kladde
Journal:  J Cell Biochem       Date:  2008-10-01       Impact factor: 4.429

3.  Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules.

Authors:  Theresa K Kelly; Yaping Liu; Fides D Lay; Gangning Liang; Benjamin P Berman; Peter A Jones
Journal:  Genome Res       Date:  2012-09-07       Impact factor: 9.043

4.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.

Authors:  Felix Krueger; Simon R Andrews
Journal:  Bioinformatics       Date:  2011-04-14       Impact factor: 6.937

5.  MethylViewer: computational analysis and editing for bisulfite sequencing and methyltransferase accessibility protocol for individual templates (MAPit) projects.

Authors:  Carolina E Pardo; Ian M Carr; Christopher J Hoffman; Russell P Darst; Alexander F Markham; David T Bonthron; Michael P Kladde
Journal:  Nucleic Acids Res       Date:  2010-10-19       Impact factor: 16.971

6.  Genome-wide nucleosome occupancy and DNA methylation profiling of four human cell lines.

Authors:  Aaron L Statham; Phillippa C Taberlay; Theresa K Kelly; Peter A Jones; Susan J Clark
Journal:  Genom Data       Date:  2014-12-08

7.  MethPat: a tool for the analysis and visualisation of complex methylation patterns obtained by massively parallel sequencing.

Authors:  Nicholas C Wong; Bernard J Pope; Ida L Candiloro; Darren Korbie; Matt Trau; Stephen Q Wong; Thomas Mikeska; Xinmin Zhang; Mark Pitman; Stefanie Eggers; Stephen R Doyle; Alexander Dobrovic
Journal:  BMC Bioinformatics       Date:  2016-02-24       Impact factor: 3.169

8.  ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data.

Authors:  Xiaosan Huang; Shaoling Zhang; Kongqing Li; Jyothi Thimmapuram; Shaojun Xie; Jonathan Wren
Journal:  Bioinformatics       Date:  2018-02-15       Impact factor: 6.937

9.  Distinct chromatin signatures of DNA hypomethylation in aging and cancer.

Authors:  Raúl F Pérez; Juan Ramón Tejedor; Gustavo F Bayón; Agustín F Fernández; Mario F Fraga
Journal:  Aging Cell       Date:  2018-03-05       Impact factor: 9.304

10.  NOMePlot: analysis of DNA methylation and nucleosome occupancy at the single molecule.

Authors:  Francisco Requena; Helena G Asenjo; Guillermo Barturen; Jordi Martorell-Marugán; Pedro Carmona-Sáez; David Landeira
Journal:  Sci Rep       Date:  2019-05-31       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.