| Literature DB >> 31776829 |
David C Klein1, Sarah J Hainer2.
Abstract
Recent advancements in next-generation sequencing technologies and accompanying reductions in cost have led to an explosion of techniques to examine DNA accessibility and protein localization on chromatin genome-wide. Generally, accessible regions of chromatin are permissive for factor binding and are therefore hotspots for regulation of gene expression; conversely, genomic regions that are highly occupied by histone proteins are not permissive for factor binding and are less likely to be active regulatory regions. Identifying regions of differential accessibility can be useful to uncover putative gene regulatory regions, such as enhancers, promoters, and insulators. In addition, DNA-binding proteins, such as transcription factors that preferentially bind certain DNA sequences and histone proteins that form the core of the nucleosome, play essential roles in all DNA-templated processes. Determining the genomic localization of chromatin-bound proteins is therefore essential in determining functional roles, sequence motifs important for factor binding, and regulatory networks controlling gene expression. In this review, we discuss techniques for determining DNA accessibility and nucleosome positioning (DNase-seq, FAIRE-seq, MNase-seq, and ATAC-seq) and techniques for detecting and functionally characterizing chromatin-bound proteins (ChIP-seq, DamID, and CUT&RUN). These methods have been optimized to varying degrees of resolution, specificity, and ease of use. Here, we outline some advantages and disadvantages of these techniques, their general protocols, and a brief discussion of their development. Together, these complimentary approaches have provided an unparalleled view of chromatin architecture and functional gene regulation.Entities:
Keywords: ATAC; CUT&RUN; ChIP; Chromatin; DNase; MNase; genomics; nucleosome occupancy; transcription factors
Mesh:
Substances:
Year: 2019 PMID: 31776829 PMCID: PMC7125251 DOI: 10.1007/s10577-019-09619-9
Source DB: PubMed Journal: Chromosome Res ISSN: 0967-3849 Impact factor: 5.239
Fig. 1Methods for mapping genome accessibility. A DNase-seq identifies open regions of chromatin. DNase-seq relies upon preferential digestion of regions of chromatin that are unprotected by bound proteins, leaving behind accessible regions that are known as DNase I hypersensitive sites (DHSs). B FAIRE-seq is dependent on crosslinking of chromatin-interacting proteins to DNA using formaldehyde. Chromatin is then sheared, and regions that are unbound by proteins (e.g., histones) remain in the aqueous layer of a phenol-chloroform extraction, while crosslinked DNA remains in the organic layer. C MNase-seq profiles nucleosome occupancy and positioning. After formaldehyde crosslinking, added MNase digests DNA that is unprotected by bound proteins, allowing one to infer increased accessibility by decreased presence in sequencing library. D. ATAC-seq relies on the hyperactive Tn5 transposase to insert sequencing adapters at accessible regions of the genome. Following transposition, genomic DNA can be isolated and amplified by PCR, then subjected to deep sequencing. Figure created with Biorender.com
Fig. 2Methods for profiling protein localization on chromatin. A DamID exploits the E. coli DNA adenine methyltransferase (Dam) by fusing it to a factor of interest and transfecting that plasmid into a cell. This construct methylates adenines located near factor binding sites. Genomic DNA can then be isolated and digested with DpnI, which specifically cleaves at the sequence GmATC. A portion of the digested DNA is then digested with DpnII, which cleaves unmethylated GATC to identify potential methylated sites out of Dam’s range. Side-by-side libraries are built and subjected to deep sequencing. B ChIP-seq is an antibody-based technology that begins with crosslinking of factors to DNA, followed by chromatin shearing and antibody pulldowns for the factor of interest on either magnetic or agarose beads. Crosslinks are then reversed, and DNA is isolated for deep sequencing. C CUT&RUN makes use of a recombinant Protein A-MNase (pA-MNase) fusion construct to bind to a primary antibody recognizing the factor of interest and specifically cleave DNA at factor binding sites, thereby creating small fragments that can be isolated from nuclei and used as a template for library construction and deep sequencing. CUT&RUN offers near-base pair resolution and can be carried out under native (i.e., non-crosslinking) conditions due to its high sequencing signal-to-noise ratio. Figure created with Biorender.com
Fig. 3A general bioinformatic pipeline for analyzing genome-wide accessibility or profiling datasets. Although analyses vary depending on the technique used so as to minimize biases, we have presented a general pipeline for analyzing NGS-generated datasets. Following relevant quality control information (Andrews 2010), all sequencing experiments involve mapping to the genome of interest, generating files containing the sequence, alignment information, and quality information, known as .sam files (or, when compressed, .bam files; Langmead et al. 2009; Langmead and Saltzburg 2012; Li and Durbin 2009). These aligned files are filtered and used in downstream analyses; for studying nucleosome and factor occupancy and positioning, size classes are created to divide inaccessible regions by the factors blocking their availability (Li, Handsaker et al. 2009; Schep et al. 2015). From the size-divided accessibility .bam files and the quality-filtered localization .bam files, peaks can be called above local background scoring and/or compared with an input file (Heinz et al. 2010; Meers, Tenenbaum, and Henikoff, 2019; Zhang et al. 2008). From factor peaks, motifs can be called to determine which factors most likely bind these locations. Genomic data are typically viewed in the form of either heatmaps or metaplots (Heinz et al. 2010; Ramírez et al. 2016). Figure created with Biorender.com
Considerations when choosing a genome accessibility or profiling technique. Although many of the techniques described in this review have been optimized for single-cell input, typical cellular input tends to be much higher. A few advantages and disadvantages for each technique have been listed, as well as references for papers that have been highly influential in the method’s development and refinement
| Technique | Typical cell input | Minimal cell input | Approximate sequencing coverage necessary for mammalian genome | Genomic target | Advantages | Disadvantages | References |
|---|---|---|---|---|---|---|---|
| DNase-seq | ≥ 1 M cells | 1 cell | 20–50 M reads | Open chromatin | DHSs are the gold standard for identification of regulatory regions | High cell input typically required | (Cooper et al. |
| FAIRE-seq | ≥ 100,000 cells | 100,000 cells | 20–50 M reads | Nucleosome occupancy | Fast and easy protocol | Low signal-to-noise ratio Highly dependent on correct crosslinking efficiency | (Giresi et al. |
| MNase-seq | ≥ 1 M cells | 1 cell | 40–60 M reads | Nucleosome and TF occupancy and positioning | TF and nucleosome binding information | Indirect detection of active regulatory regions High cell input typically required | (Lai et al. |
| ATAC-seq | ≥ 50,000 cells | 1 cell | 40–60 M reads | Open chromatin | Fast protocol Native conditions | Requires high sequencing coverage to accurately map factors High prevalence of mitochondrial read contaminants | (Buenrostro et al., |
| ChIP-seq | ≥ 500,000 cells | 100–10,000 cells | 20–40 M reads | Protein localization | Most common profiling technique Numerous protocols and comparative datasets available | Mapping resolution limited by chromatin shearing efficiency Limited by quality of antibody | (Albert et al. |
| DamID | ≥ 10,000 cells | 1 cell | 10–40 M reads | TF localization 3D genome contacts | Not antibody-dependent | Dependent on GATC presence Does not profile endogenous protein Low base-pair resolution because of extensive Dam range of action | (Kind et al. |
| CUT&RUN | ≥ 100,000 cells | 1 cell | 10 M reads | Protein localization | High signal to noise ratio Low cellular input necessary Native conditions | Limited by quality of antibody | (Hainer et al. |