Literature DB >> 35328026

enChIP-Seq Analyzer: A Software Program to Analyze and Interpret enChIP-Seq Data for the Detection of Physical Interactions between Genomic Regions.

Ashita Sarudate¹, Toshitsugu Fujita², Takahiro Nakayama¹, Hodaka Fujii².

Abstract

Accumulating evidence suggests that the physical interactions between genomic regions play critical roles in the regulation of genome functions, such as transcription and epigenetic regulation. Various methods to detect the physical interactions between genomic regions have been developed. We recently developed a method to search for genomic regions interacting with a locus of interest in a non-biased manner that combines pull-down of the locus using engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) and next-generation sequencing (NGS) analysis (enChIP-Seq). The clustered regularly interspaced short palindromic repeats (CRISPR) system, consisting of a nuclease-dead form of Cas9 (dCas9) and a guide RNA (gRNA), or transcription activator-like (TAL) proteins, can be used for enChIP. In enChIP-Seq, it is necessary to compare multiple datasets of enChIP-Seq data to unambiguously detect specific interactions. However, it is not always easy to analyze enChIP-Seq datasets to subtract non-specific interactions or identify common interactions. To facilitate such analysis, we developed the enChIP-Seq analyzer software. It enables easy extraction of common signals as well as subtraction of non-specific signals observed in negative control samples, thereby streamlining extraction of specific enChIP-Seq signals. enChIP-Seq analyzer will help users analyze enChIP-Seq data and identify physical interactions between genomic regions.

Entities: Chemical

Keywords: 3-D genomics; CRISPR; enChIP-Seq; enChIP-Seq analyzer; intergenomic interactions

Mesh：

Substances：
DNA

Year: 2022 PMID： 35328026 PMCID： PMC8949577 DOI： 10.3390/genes13030472

Source DB: PubMed Journal: Genes (Basel) ISSN： 2073-4425 Impact factor: 4.096

1. Introduction

The dynamics of the three-dimensional structure of the genome play a critical role in the regulation of genome functions such as transcription, epigenetic regulation, genomic imprinting, and X-chromosome inactivation [1,2]. Methods to detect physical interactions between genomic regions have been developed in the last two decades. Chromosome conformation capture (3C) was the first biochemical method widely used to detect physical interactions between genomic regions [3]. Various derivatives of 3C have been developed [4]. 3C and its derivatives depend on cross-linking, fragmentation of chromatin by enzymatic digestion or other methods, and ligation of proximal DNA ends. The ligation step can be a source of artifactual “interactions”, leading to occasional discrepancies relative to results derived from other methods, such as imaging analysis [5,6]. Therefore, several “ligation-free” biochemical methods were developed to detect physical interactions between genomic regions. Genome architecture mapping (GAM) is a method to measure the statistical proximities of genomic regions by sequencing the DNA extracted from ultra-thin cryosectioned nuclear slices [7]. Split-pool recognition of interactions by tag extension (SPRITE) is a method that involves performing repeated rounds of splitting and barcoding of individual chromatin complexes followed by identification of the interacting genomic regions by matching the barcodes [8]. Chromatin interaction analysis via droplet-based and barcode-linked sequencing (ChIA-Drop) tracks amplicons arising from gel-bead-in-emulsion (GEM) droplets of each chromatin complex by barcode sequencing [9]. We recently developed a method to search for genomic regions interacting with a locus of interest in a non-biased manner that combines pull-down of the locus using engineered DNA-binding molecule-mediated chromatin immunoprecipitation (enChIP) and next-generation sequencing (NGS) analysis (enChIP-Seq). The CRISPR system, consisting of a nuclease-dead form of Cas9 (dCas9) and a guide RNA (gRNA), or transcription activator-like (TAL) proteins, can be used for enChIP. In enChIP-Seq, it is necessary to compare multiple datasets of enChIP-Seq data to unambiguously detect specific interactions. However, it is not always easy to analyze enChIP-Seq datasets to subtract non-specific interactions or identify common interactions. To facilitate such analysis, we developed the enChIP-Seq analyzer software. It enables easy extraction of common signals as well as subtraction of non-specific signals observed in negative control samples, thereby streamlining extraction of specific enChIP-Seq signals. enChIP-Seq analyzer will help users analyze enChIP-Seq data and identify physical interactions between genomic regions.

2. Materials and Methods

2.1. Implementation

enChIP-Seq analyzer is provided as free software at GitHub (https://github.com/TKY-SE/enChIP-Seq-Analyzer, accessed on 1 December 2021). All the program was executed on a computer with Intel® Core™ i5-7500 CPU @ 3.40 GHz 3.41 GHz, and 8 GB of RAM. Preparation of data sets, mode of analysis, and system handling for enChIP-Seq analyzer are shown in a step-by-step manner below.

2.2. Procedures for enChIP-Seq

The procedures to use enChIP-Seq to detect genomic regions interacting with a locus of interest depend on how the locus is tagged with an engineered DNA-binding molecule [10,11]. In “in cell” enChIP-Seq, an engineered DNA-binding molecule, such as a CRISPR complex, is expressed in the cells to be analyzed. After cross-linking with formaldehyde or other cross-linkers, if necessary, chromatin is fragmented using sonication or enzymatic digestion. Subsequently, the tagged locus is isolated using affinity purification, and the interacting genomic regions are identified by NGS [10]. In in vitro enChIP-Seq, chromatin is cross-linked, if necessary, fragmented, and then incubated with the CRISPR complex, consisting of a recombinant dCas9 protein and a synthetic or in vitro transcribed gRNA, for in vitro locus tagging. Subsequently, the tagged locus is isolated using affinity purification, and the interacting genomic regions are identified by NGS [11]. The CRISPR complex also binds to off-target sites in addition to the target genomic regions [12,13,14,15]. Therefore, to identify genomic regions truly interacting with the target site, it is necessary to analyze negative control samples, such as cells expressing only dCas9 or dCas9 plus irrelevant gRNAs. In addition, multiple gRNAs for the target locus should be analyzed to extract common peaks from the NGS analysis, which can then be identified as true positives with higher confidence. Consequently, it is necessary to compare multiple negative controls and multiple samples with on-target gRNAs. It is tedious to compare these samples manually, and this observation prompted us to develop a software program to automate comparison of multiple NGS peaks.

2.3. Filtering of the NGS Peaks to Identify Genomic Regions Interacting with a Target Locus

enChIP-Seq analyzer is a software program that enables easy consolidation of enChIP-Seq data, including extraction of common peaks observed for different gRNAs bound to the target genomic region and subtraction of peaks in negative control samples (Figure 1).

Figure 1

An example of using enChIP-Seq to identify interacting genomic regions. enChIP-Seq was performed in the presence of different gRNAs or in the absence of a gRNA (a negative control). Peaks detected by enChIP without gRNA were subtracted from those for enChIP-Seq with gRNAs to eliminate off-target sites (Comparison sets 1 and 2). Subsequently, the resultant data could be compared to extract bona fide interactions.

2.4. Preparation of the Data Set

To extract enChIP-specific NGS peaks, Model-based Analysis of ChIP-Seq (MACS) [16] is used to compare NGS data from enChIP with that for input DNA. MACS data for enChIP-specific NGS peaks is exported as a tab file (Figure 2), which is used by enChIP-Seq analyzer.

Figure 2

An example of a tab file used in enChIP-Seq analyzer. (A) Lines starting with # are comments and are not processed by the software. (B) Lines separated by tags are used for calculations in enChIP-Seq analyzer.

2.5. Mode of Analysis

enChIP-Seq analyzer extracts common peak information or eliminates negative peak information, as shown in Figure 3.

Figure 3

Mode of analysis. enChIP-Seq analyzer extracts overlapped regions (Positive (AND)) as common peak information or eliminates union range regions (Negative (OR)) as negative peak information.

Extraction of common peak information. When two or more tab files are analyzed, overlapped peak regions (red double-headed arrow) are extracted as common peak information (region). Elimination of negative peak information. When two or more tab files are analyzed, union range regions (green double-headed arrow) are eliminated as negative peak information (region).

2.6. Handling of the System

enChIP-Seq analyzer, a Java-based software program, is designed for use with Microsoft Windows 10. The software can be downloaded from GitHub (https://github.com/TKY-SE/enChIP-Seq-Analyzer, accessed on 1 December 2021). The browser for the software is shown in Figure 4A. The detailed procedures are outlined below.

Figure 4

The main screen of enChIP-Seq analyzer. (A–C) Examples of the screens of enChIP-Seq analyzer. Handling processes are shown in the main text.

2.6.1. Extraction of Peak Information Commonly Included in Two or More Tab Files

(1-1) Add tab files that include enChIP-specific NGS peaks in the field “I”. To this end, click on “add” to open a selection window and select a tab file from your computer. Repeat this step to add additional tab files. (1-2) (If necessary) To remove an added file, select a file(s) and click on “remove”. (1-3) (Mandatory) Select an added file as a base file, which is then highlighted. (1-4) To execute the program, click on “view”. (1-5) The result is shown in “V”, which is based on information from the base file. The number of common peaks is shown in “VI” as “count”. An example is shown in Figure 4B.

2.6.2. Elimination of Negative Peak Information

(2-1) Perform steps (1-1) through (1-3). For this analysis, use of only one tab file is also acceptable. (2-2) Add a tab file(s) that includes NGS peaks with negative peak information (e.g., NGS peaks from enChIP experiments performed without a gRNA) in the field “II”. To this end, click on “add” to open a selection window and select a tab file(s) from your computer. Repeat this step to add two or more tab files. (2-3) (If necessary) To remove an added file, select a file(s) and click on “remove” in “II”. (2-4) (Mandatory) Select an added file in “I” as a base file, which is then highlighted. (2-5) To execute the program, click on “Pros.-Neg. (IV)”. (2-6) The result is shown in “V”, which is based on the information from the base file. The number of peaks remaining after elimination of negative peak information is shown in “VI” as “count”. An example is shown in Figure 4C. (2-7) (If necessary) If tag number and/or fold enrichment are necessary criteria for filtering, set tag number and/or fold enrichment criteria in “III”. The following symbols can be used for filtering: ≥, =, and ≤.

2.6.3. Data Export

Data shown in “V” can be exported as a csv file and easily utilized by other software/web tools. To this end, click on “Export Table (VII)”. Select a folder to save the file from the opened window. The information within a tab file can be viewed directly in the software program. To this end, add one tab file that includes enChIP-specific NGS peaks in the field “I” (or “II”). Click on “view” in the field “I” (or “II”) to view the information within the file. The information is shown in “V”.

3. Results and Discussion

We previously identified genomic regions associated with the 5′HS5 locus on a genome-wide scale in K562 cells under undifferentiated or sodium butyrate (NaB)-mediated differentiated conditions [10]. In that study, we performed enChIP using gRNAs #6 and #17, which targeted the 5′HS5 locus and used the purified DNA for NGS analysis. As a negative control, we also performed enChIP in the absence of a gRNA and used the purified DNA for NGS analysis. Next, we compared the NGS peaks from each enChIP experiment to those for input DNA and used MACS analysis to extract the list of enriched peaks (“enChIP #6 peaks”, “enChIP #17 peaks”, and “Off-target sites”) (Data sets in Figure 5A). Using enChIP-Seq analyzer, we compared the lists of extracted peaks in a step-by-step manner as follows (Figure 5A).

Figure 5

An example of results from enChIP-Seq analyzer. (A) Step-by step procedures to extract bona fide genomic regions interacting with a target genomic region. (B) An example of the results from enChIP-Seq analyzer.

Step 1: We compared the data from “enChIP #6 peaks” and “Off-target sites” to eliminate off-target binding sites. The resultant information was named “enChIP #6-specific sites”. We compared the data from “enChIP #17 peaks” and “Off-target sites” to eliminate off-target binding sites. The resultant information was named “enChIP #17-specific sites”. To identify peaks with confidence, we adopted two criteria for choosing peaks based on NGS information from the target 5′HS5 locus: (i) Tag number ≥5% of that of the target 5′HS5 locus (which can be considered as an interacting ratio of ≥5%) and (ii) fold enrichment relative to input genomic DNA ≥10. In this regard, 19 and 228 peaks for “enChIP #6-specific sites” and “enChIP #17-specific sites”, respectively, passed the two criteria. Step 2: We compared the data from “enChIP #6-specific sites” and “enChIP #17-specific sites” to extract “enChIP #6/#17-common sites”. We extracted six peaks, which can be considered bona fide physically interacting genomic regions (Figure 5B). These results were consistent with those from our previous study [10] (please note that in Ref. [10], the symbols > should have been ≥).

4. Conclusions

In this study, we developed enChIP-Seq analyzer to compare multiple enChIP-Seq datasets to unambiguously detect specific interactions. enChIP-Seq analyzer is simple and easy to use. We believe that enChIP-Seq analyzer will help users to analyze enChIP-Seq data and detect physical interactions between genomic regions. In addition, the software can be used to compare ChIP-Seq datasets to extract common peaks. Project name: enChIP-Seq analyzer Software homepage (GitHub): https://github.com/TKY-SE/enChIP-Seq-Analyzer, accessed on 1 December 2021 Programming language: Java Other requirements: Windows machine License: None Any restrictions to use by non-academics: None. Notes: We have a plan to adapt the software for Linux and Mac. As soon as they are available, we will upload the code in GitHub

16 in total

1. Higher-Order Inter-chromosomal Hubs Shape 3D Genome Organization in the Nucleus.

Authors: Sofia A Quinodoz; Noah Ollikainen; Barbara Tabak; Ali Palla; Jan Marten Schmidt; Elizabeth Detmar; Mason M Lai; Alexander A Shishkin; Prashant Bhat; Yodai Takei; Vickie Trinh; Erik Aznauryan; Pamela Russell; Christine Cheng; Marko Jovanovic; Amy Chow; Long Cai; Patrick McDonel; Manuel Garber; Mitchell Guttman
Journal: Cell Date: 2018-06-07 Impact factor: 41.582

2. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells.

Authors: Xuebing Wu; David A Scott; Andrea J Kriz; Anthony C Chiu; Patrick D Hsu; Daniel B Dadon; Albert W Cheng; Alexandro E Trevino; Silvana Konermann; Sidi Chen; Rudolf Jaenisch; Feng Zhang; Phillip A Sharp
Journal: Nat Biotechnol Date: 2014-04-20 Impact factor: 54.908

3. Capturing chromosome conformation.

Authors: Job Dekker; Karsten Rippe; Martijn Dekker; Nancy Kleckner
Journal: Science Date: 2002-02-15 Impact factor: 47.728

4. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease.

Authors: Cem Kuscu; Sevki Arslan; Ritambhara Singh; Jeremy Thorpe; Mazhar Adli
Journal: Nat Biotechnol Date: 2014-05-18 Impact factor: 54.908

5. Multiplex chromatin interactions with single-molecule precision.

Authors: Meizhen Zheng; Simon Zhongyuan Tian; Daniel Capurso; Minji Kim; Rahul Maurya; Byoungkoo Lee; Emaly Piecuch; Liang Gong; Jacqueline Jufen Zhu; Zhihui Li; Chee Hong Wong; Chew Yee Ngan; Ping Wang; Xiaoan Ruan; Chia-Lin Wei; Yijun Ruan
Journal: Nature Date: 2019-02-18 Impact factor: 49.962

6. Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization.

Authors: Elizabeth H Finn; Gianluca Pegoraro; Hugo B Brandão; Anne-Laure Valton; Marlies E Oomen; Job Dekker; Leonid Mirny; Tom Misteli
Journal: Cell Date: 2019-02-21 Impact factor: 41.582

Review 7. The Self-Organizing Genome: Principles of Genome Architecture and Function.

Authors: Tom Misteli
Journal: Cell Date: 2020-09-24 Impact factor: 41.582

8. Protospacer adjacent motif (PAM)-distal sequences engage CRISPR Cas9 DNA target cleavage.

Authors: Regina Cencic; Hisashi Miura; Abba Malina; Francis Robert; Sylvain Ethier; T Martin Schmeing; Josée Dostie; Jerry Pelletier
Journal: PLoS One Date: 2014-10-02 Impact factor: 3.240

9. Complex multi-enhancer contacts captured by genome architecture mapping.

Authors: Robert A Beagrie; Antonio Scialdone; Markus Schueler; Dorothee C A Kraemer; Mita Chotalia; Sheila Q Xie; Mariano Barbieri; Inês de Santiago; Liron-Mark Lavitas; Miguel R Branco; James Fraser; Josée Dostie; Laurence Game; Niall Dillon; Paul A W Edwards; Mario Nicodemi; Ana Pombo
Journal: Nature Date: 2017-03-08 Impact factor: 49.962

10. Model-based analysis of ChIP-Seq (MACS).

Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583