| Literature DB >> 26446758 |
Lijing Yao1, Benjamin P Berman2, Peggy J Farnham1.
Abstract
Enhancers are short regulatory sequences bound by sequence-specific transcription factors and play a major role in the spatiotemporal specificity of gene expression patterns in development and disease. While it is now possible to identify enhancer regions genomewide in both cultured cells and primary tissues using epigenomic approaches, it has been more challenging to develop methods to understand the function of individual enhancers because enhancers are located far from the gene(s) that they regulate. However, it is essential to identify target genes of enhancers not only so that we can understand the role of enhancers in disease but also because this information will assist in the development of future therapeutic options. After reviewing models of enhancer function, we discuss recent methods for identifying target genes of enhancers. First, we describe chromatin structure-based approaches for directly mapping interactions between enhancers and promoters. Second, we describe the use of correlation-based approaches to link enhancer state with the activity of nearby promoters and/or gene expression. Third, we describe how to test the function of specific enhancers experimentally by perturbing enhancer-target relationships using high-throughput reporter assays and genome editing. Finally, we conclude by discussing as yet unanswered questions concerning how enhancers function, how target genes can be identified, and how to distinguish direct from indirect changes in gene expression mediated by individual enhancers.Entities:
Keywords: Chromatin interactions; computational tools; enhancers; gene expression; genome editing
Mesh:
Substances:
Year: 2015 PMID: 26446758 PMCID: PMC4666684 DOI: 10.3109/10409238.2015.1087961
Source DB: PubMed Journal: Crit Rev Biochem Mol Biol ISSN: 1040-9238 Impact factor: 8.250
Figure 1. Enhancer-mediated gene regulation. (A) Shown are two models for gene regulation by enhancers. The left panel illustrates the “scanning or tracking” model in which a transcription factor (TF)-containing protein complex binds at an enhancer and moves along the genome, searching for a target promoter (the nearest promoters are labeled in brown and distal promoters are labeled in red). The right panel illustrates the “looping” model in which an enhancer directly interacts with a target promoter by forming a DNA loop mediated by protein–protein contacts. (B) Shown is an illustration of the distinctive chromatin signatures at active versus inactive enhancers and promoters. Active enhancers provide nucleosome-free regions for the binding of clusters of TFs and are flanked by nucleosomes marked by H3K4me1 (cyan dots) and H3K27ac (green dots); active promoters have flanking nucleosomes marked by H3K4me3 (blue dots). CpG sites throughout the human genome have high levels of DNA methylation (red dots) except at active enhancers and promoters. (C) DNA methylation (WGBS), ENCODE ChIP-seq data (labeled according to the antibody used in each experiment), and the location of DHSs and TF binding data for HCT116 cells from the University of California, Santa Cruz genome browser are shown for an enhancer and a promoter region. (See the color version of this figure at www.informahealthcare.com/bmg).
Figure 2. 3C-based technologies used to identify enhancer–promoter loops. All 3C-based technologies begin with formaldehyde treatment, leading to crosslinking of DNA fragments in close proximity. The 3C, 4C, and 5C methods begin with restriction enzyme (RE) digestion of the chromatin into small pieces (digestion sites represented by black bars). Crosslinked fragments are ligated to form unique hybrid DNA molecules, and then, the DNA is purified. In 3C, a predicted ligation product can be analyzed by PCR using a pair of primers; this is termed a one-to-one approach. In 4C, the 3C ligation library is digested with a second RE to digest the DNA to smaller sizes (second digestion sites are labeled as green ovals), and then, the fragments are ligated to form a circle. Inverse PCR is utilized to generate a genomewide interaction profile for a single locus (analyzed by high-throughput sequencing); this is termed a one-to-all approach. 5C detects ligation products from a 3C library using ligation-mediated amplification (LMA) followed by high-throughput sequencing; this is termed a many-to-many approach. Starting from 3C fragmentation products, Hi-C includes a unique step in which sticky ends resulting from the RE digestion are filled in with biotinylated nucleotides (shown as red dots). This facilitates a streptavidin-based enrichment of the ligation products for sequencing. The difference between TCC and Hi-C is that TCC adds an initial protein biotinylation and tethering step, such that the fragmentation and ligation are performed on a solid substrate; TCC and Hi-C are termed all-to-all approaches. Specific subsets of TCC and HCC products can be selected prior to sequencing using oligonucleotides or arrays in CHI-C and Capture-C, allowing an all-to-all analysis of selected genomic regions. DNase Hi-C uses the conventional Hi-C protocol but replaces the RE fragmentation step with DNase I digestion and thus is an all-to-all approach. ChIA-PET, which is quite different from the other 3C-based methods, begins with sonication of the chromatin, which is followed by a conventional chromatin immunoprecipitation step. Then, A (purple) and B (orange) linkers are added to two groups of materials that are mixed together for the ligation step, the ligation products are digested with MmeI, and the DNA is sequenced. The frequency of random ligations between the two different linkers (AB) is used to estimate the frequency of nonspecific ligation. ChIA-PET is termed an all-to-all approach for interactions involving a specific protein. (see the color version of this figure at www.informahealthcare.com/bmg).
Summary of available software for chromatin interaction data.
| Software | Data | Program | Input | Analysis | Software website | PMID |
|---|---|---|---|---|---|---|
| Basic4Cseq | 4C | R/Bioconductor package | BAM | 1. Creat RE fragment library. 2. Filter 4C-seq data and map read to fragment. 3. Visualization. 4. Quality control | 25078398 | |
| fourSig | 4C | perl and R | SAM | 1. Filter 4C-seq data and map read to fragment. 2. Determination of significant enrichment by perumations. 3. Visualization | 24561615 | |
| r3Cseq | 4C | R package | BAM | 1. Filter 4C-seq data and map read to fragment. 2. Data normalization. 3. Identify significant interactions. 4. Visualization | 23671339 | |
| 3DG | 5C | web-based | Primer sets and contact matrix | 1. Design 5C primer sets. 2. Visualization | 19789528 | |
| HiTC | 5C, Hi-C | R/Bioconductor package | Interaction count matrix in csv | 1. Quality control. 2. Visualization. 3. Interaction map transformation and normalization | 22923296 | |
| HiFive | 5C, Hi-C | Python | BAM | 1. Filter 4C-seq data and map read to fragment. 2. Data normalization. 3. Identify interaction enrichment and boundary index. 4. 3D modeling | ||
| ChIA-PET tool | ChIA-PET | java software | Fastq | 1. Linker filtering. 2. Reads mapping. 3. Identify interaction and ChIP-seq peak calling. | 20181287 | |
| Chiasig | ChIA-PET | Perl and R | Contacts in BEDPE-format | 1. Identify significant interactions using model based on noncentral hypergenometric distribution. | 25114054 | |
| Mango | ChIA-PET | R package | Fastq | 1. Linker filtering. 2. PET mapping to reference genome. 3. Identify interaction and ChIP-seq peak calling. | NA | |
| HiC-inspector | Hi-C, TCC | perl and R | fastq | 1. Alignment. 2. Filter reads that within the DNA fragment size window around restriction enzyme sites. 3. Count interactions. 4. Visualization heatmap | NA | |
| HiC-Pro | Hi-C, TCC | C++, python,R,bash | fastq | 1. Alignment. 2. Count interactions. 3. Visualization | NA | |
| ICE | Hi-C, TCC | Python | fastq | 1. Mapping against reference genome. 2. Use iterative correlation to generate corrected Hi-C interaction map | 22941365 | |
| HIPPIE | Hi-C, TCC | perl and bash | fastq | 1. Mapping against reference genome. 2. Quality control. 3. Indentify Hi-C peaks. 4. Integrate epigenomic data to predict enhancer–gene linkages. | 25480377 | |
| HiCUP | Hi-C, TCC | Perl and R | fastq | 1. Mapping against reference genome. 2. Filter experimental artifacts. 3. Quality control. 4. Generate BAM/SAM file for postanalysis by other software. | NA | |
| Homer | Hi-C, TCC | perl | SAM, BED and so on | 1. Normalization of interaction matrices. 2. identify significant interactions. 3. Subnuclear compartment analysis. 4. Structure interaction matrix analysis. 5. Visualization | 20513432 | |
| HiCat | Hi-C, TCC | C and R | BAM | 1. Interaction analysis. 2. Integrate multiple epigenetic information for interaction annotation. 3. Comparison analysis for Hi-C data | 25132176 | |
| HiCNorm | Hi-C, TCC | R package | Raw Hi-C cis contact map and the local genomic | Normalization of contacts | 23023982 | |
| HiCorrector | Hi-C, TCC | C | Raw contact matrix | Normalization of contacts | 25391400 | |
| hicpipe | Hi-C, TCC | perl and R | Raw Hi-C contacts file | Estimate Hi-C biases and normalize interactions. | 22001755 | |
| MDM | ChIA-PET | R package | Contact file | Identify the true interaction. | 24835279 | |
| HiCseq | Hi-C, TCC | R package | Contact matrix | Detect cis-interactions | 25161224 | |
| AutoChrom 3D | Hi-C | web-based | Contact file | 1. Analyze chromatin interactions using sequencing-bias-releaxed structure parameter to normalize chromatin interactions. 2. 3D modeling | 23965308 | |
| InfMod3DGen | Hi-C, TCC | MATLAB | Contact file | 3D modeling | 25690896 | |
| ChromSDE | Hi-C, TCC | Matlab | Output files from Hi-C pipeline of Tanay's Group | 3D modeling | 24195706 | |
| PASTIS | Hi-C, TCC | Python | Contact file | 3D modeling | 24931992 | |
| LACHESIS | Hi-C, TCC | C, perl, and R | fastq | 24185095 | ||
| FisHiCal | Hi-C, TCC | R package | Normalized Hi-C interactions and FISH data | Integrate Hi-C data interaction and FISH to reconstruct nuclear 3D structure | 25061071 | |
| NuChart | Hi-C, TCC | R package | Contact file | 1. Integrate Hi-C and other genomic feature to annotate and analyze a list of input genes. 2. Visualization | 24069388 | |
| CytoHiC | Hi-C, TCC | Cytoscape plugin | Normalized Hi-C interactions | Interaction network analysis | 23508968 | |
| Juicebox | 5C, Hi-C, ChIA-PET | java software | Visualization of published Hi-C data | 25497547 |
Figure 3. Chromatin 3D structures. Shown is a two-dimensional heatmap of Hi-C interaction frequencies in IMR90 cells from a 5 MB region of Chr2 generated using the website: http://www.3dgenome.org and the color key represents the interaction counts between two loci. Highlighted in gray is a repressed compartment and highlighted in orange is an active compartment. Also shown is ChIP-seq data for CTCF and histone modifications, as well as a wavelet-smoothed Repli-seq track representing DNA replication timing; all datasets were taken from the University of California, Santa Cruz genome browser. For each compartment, a model of chromatin interactions is shown (which are more frequent within a TAD than between TADs) facilitated by CTCF, Cohesin, and Mediator. Long-distance constitutive interactions require a pair of CTCF sites with convergently orientated motifs as anchors; any combination of CTCF, cohesin, and mediator can facilitate median distance interactions. Many other CTCF-binding sites (green bars) are not involved in chromatin interactions and occur within loops. (see the color version of this figure at www.informahealthcare.com/bmg).
Figure 4. Computational methods to link enhancers to putative target genes. (A) The eQTL method uses the association between genotypes of a SNP within an enhancer and gene expression levels across multiple individuals to predict target genes. (B) A correlation between dynamic enhancer activity and gene expression across multiple cell lines or tissues can be used to predict enhancer–gene linkages. Levels of H3K4me1, H3K27ac, and DHS show positive correlations with enhancer activities; each color represents data from an individual cell line or tissue. (C) Similar to panel B, DNA methylation data can also be used to predict target genes; in this case, one expects a negative correlation between DNA methylation at enhancers and gene expression. (D) Integrating multiple layers of information can help to predict a target gene; e.g. the gene with a higher score for phylogenetic proximity and similar GO terms with the TF (indicating that the TF and target gene are in a same pathway) is predicted to be the putative target gene. (see the color version of this figure at www.informahealthcare.com/bmg).
Available databases for eQTL.
| Data source | Cell type | Website | PMID |
|---|---|---|---|
| Genevar(GENe Expression VARiation) | adipose, LCL, skin, fibroblast, and T cell | 20702402 | |
| GTExPortal | Blood, esophagus mucosa, esophagus muscularis, heart lung, muscle skeletal, nerve tibial, skin, stomach, thyroid, adipose, and artery | 25954001 | |
| MuTHER | adipose, LCL, Bkin | 21304890 | |
| UKBEC | Brain | 25174004 |
Summary of computational methods.
| Input data | Model | Distance range to S enhancer | Statistics | Tissue type | List of enhancer–gene linkages | PMID |
|---|---|---|---|---|---|---|
| H3K27ac, H3K4me1,0 RNA-seq | Gene expression versus H3K27ac and H3K4me1 | 5 kB -125 kB around TSS | Machine learning using logistic regression classifier | 9 human cell line | NA | 214419O7 |
| H3K4me1, RNAPII | RNAPII versus H3K4me1 | NA | Spearman correlation | 19 mouse tissue | Supplementary Table 7 in PMID:22763441 | 22763441 |
| H3K4me1 and RNA- seq | Gene expression versus H3K4me1(PreSTIGE) | Within 100 kB of TSS and CTCF boundary | Shannon entropy | 12 human cell lines | 24196873 | |
| H3K4me1 and RNA- seq | Gene expression versus H3K4me1(PreSTIGE) | Within 100 kB of TSS and CTCF boundary | Shannon entropy | 13 mouse tissues | 249O5156 | |
| Dnase I | DHS at promoters versus enhancer | Within 500 kB around TSS | Spearman correlation | 79 human cell lines | ||
| penchrom/jan2O11/dhs_gene_connectivity/ | 22955617 | |||||
| Dnase Ind RNA-seq | Gene expression versus DHS | within100 kB of TSS | Pearson correlation | 720 human cell lines | 23482648 | |
| DNA methylation and | ||||||
| RNA-seq | Gene expression versus DNA methylation | within1 MB around TSS | Machine learning using SVM-MAP | 58 human cell lines | NA | 23497655 |
| DNA methylation and RNA-seq | Gene expression versus DNA methylation (ELMER) | upstream10 genes and downstream 10 genes | Mann–Whitney | ∼2000 human primary Tumor sample from TCGA | Supplementary Table 4 in PMID:25994O56 | 25994O56 |
| CAGE | Gene expression versus RNA0evel | Within 500 kB around TSS | Pearson correlation | ∼400 human cell lines | 2467O763 | |
| H3K4me1,H3K27ac,0 H3Kme3,RNA-seq | Multilayer genetic and epigenetic information | Within 2 MB of TSS | Machine learning using random forest classifier | 12 human cell line | 24821768 |
Figure 5. Experimental strategies to study enhancer activity. (A) In transient transfection assays, the enhancer (orange) is placed upstream of reporter gene (green) driven by a heterologous promoter (brown) in a plasmid backbone, and then, the plasmid is transiently transfected into cells. The activity of the enhancer is monitored by the level of reporter RNA or protein. (B) In a transgenic assay, a plasmid containing the enhancer and reporter gene is microinjected into a mouse egg and then integrated into the mouse genome. Enhancer activity is monitored in the embryo using LacZ staining. (C) High-throughput enhancer assays can be used to test enhancer activity. In STARR-seq, potential regulatory elements are inserted between an ORF and a polyA tail and plasmids are transfected into cells; elements that can be detected in the RNA-seq data are functional enhancers. In the massively parallel reporter assay (MPRA), sequence synthesis technology is used to link each potential regulatory element to a unique tag sequence. Then, an ORF is inserted between the element and tag sequence to form plasmids that are transfected into cells. After performing RNA-seq, the enrichment ratio between tag counts in the starting library and in the RNA-seq data is used to identify functional enhancers. In the enhancer-FACS-seq (eFS) method, a pool of potential regulatory elements is cloned upstream of the GFP reporter gene. The plasmids are injected into fly embryos and GFP fly lines are created which are crossed with a fly line that expresses CD2 under control of the tissue-specific enhancer Twi. Embryos from the cross are dissociated and fluorescent-activated cell sorting (FACS) is used to select two group of cells: CD2+GFP+ and CD2+GFP−/+(input). Through sequence enrichment analysis between the two groups, the elements that are functional enhancers can be identified. (see the color version of this figure at www.informahealthcare.com/bmg).
Figure 6. Experimental strategies to identify target genes. (A) DNA-targeting tools can consist of tandem zinc finger DNA-binding domains, each of which binds to three nucleotides of DNA. Top: fusion of the nonsequence-specific nuclease FokI to zinc finger arrays creates genomic scissors called zinc finger nucleases (ZNFs); dimerization of two ZFNs targeting a specific sequence from opposite sides is required for DNA cleavage. Bottom: effector domains can also be fused to zinc finger arrays; the ZNF-effector proteins do not require heterodimerization to function. (B) DNA-targeting tools can consist of tandem TALE DNA-binding domains, each of which binds to one nucleotide of DNA. Top: fusion of the nonsequence-specific nuclease FokI to the DNA-binding array creates TALENs. Bottom: effector domains can be fused to TALE domains. Similar to ZNFs, two TALENs are necessary to perform a site-specific DNA cleavage, but only one TALE-effector is needed for modify the genome. (C) The CRISPR/Cas9 system utilizes guide RNAs (gRNAs) to bring a Cas9 nuclease to a complementary DNA target to perform site-specific genomic editing. Effector domains can also be fused to a nuclease-deficient Cas9 (dCas9). (D) Genomic-editing tools can be used to create a single DNA cleavage event that disrupts a TF motif. (E) Two sets of heterodimeric ZFNs or TALENs or one pair of guide RNAs can be used to create two DSBs flanking the target enhancer region. The enhancer will be deleted, and the gap will be repaired by nonhomologous end joining (NHEJ). (F) Enhancer activity can be repressed using chromatin-editing tools if an effector domain, such as a DNA methyltransferase (DNMT) that can methylate an enhancer or a histone demethylase (LSD1), that can remove methylation from H3K4me1, is fused to the zinc finger or TALE arrays or to a nuclease-deficient Cas9 (dCas9). (see the color version of this figure at www.informahealthcare.com/bmg).