| Literature DB >> 25473424 |
Olivia Corradin1, Peter C Scacheri2.
Abstract
Gene enhancer elements are noncoding segments of DNA that play a central role in regulating transcriptional programs that control development, cell identity, and evolutionary processes. Recent studies have shown that noncoding single nucleotide polymorphisms (SNPs) that have been associated with risk for numerous common diseases through genome-wide association studies frequently lie in cell-type-specific enhancer elements. These enhancer variants probably influence transcriptional output, thereby offering a mechanistic basis to explain their association with risk for many common diseases. This review focuses on the identification and interpretation of disease-susceptibility variants that influence enhancer function. We discuss strategies for prioritizing the study of functional enhancer SNPs over those likely to be benign, review experimental and computational approaches to identifying the gene targets of enhancer variants, and highlight efforts to quantify the impact of enhancer variants on target transcript levels and cellular phenotypes. These studies are beginning to provide insights into the mechanistic basis of many common diseases, as well as into how we might translate this knowledge for improved disease diagnosis, prevention and treatments. Finally, we highlight five major challenges often associated with interpreting enhancer variants, and discuss recent technical advances that may help to surmount these challenges.Entities:
Year: 2014 PMID: 25473424 PMCID: PMC4254432 DOI: 10.1186/s13073-014-0085-3
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1Model of enhancer function. Transcriptional enhancer elements are noncoding stretches of DNA that regulate gene expression levels, most often in cis. Active enhancer elements are located in open chromatin sensitive to DNase I digestion and flanked by histones marked with H3K4me1 and H3K27ac. Enhancers are often bound by a number of transcription factors (TF), such as p300 (blue). Mediator and cohesin are part of a complex (orange, green and purple) that mediates physical contacts between enhancers and their target promoters.
Figure 2Enrichment of genome-wide association study variants in putative enhancer elements. (a) Number of disease-associated variants (identified in the National Human Genome Resource Institute’s genome-wide association study (GWAS) catalog) that lie in protein-coding regions (red), promoters (blue), noncoding intragenic regions (light purple) and noncoding intergenic regions (dark purple). (b) Examples of four different common diseases, showing the number of associated single nucleotide polymorphisms (SNPs) that lie in putative enhancers, promoters and exons [6-8]. Putative enhancer elements were defined by chromatin features in each of the four indicated cell types.
Computational approaches to predicting gene targets of enhancer elements
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Nearest gene | None | Not considered | Nearest gene | NA | ~40% to 73% | Any | NA |
| Nearest expressed gene | Gene expression | Considered | Nearest expressed gene | NA | ~53% to 77% | Any | NA |
| Ernst | H3K4me1, H3K4me2, H3K27ac, RNA-seq | Considered | Distance based (125 kb) | NA | Not determined | Human | No |
| Thurman | DNase I hypersensitivity | Not considered | Distance based (500 kb) | NA | Not determined | Human | No |
| Sheffield | DNase I hypersensitivity and RNA-seq | Considered | 100 kb | NA | Not determined | Human | Predicted interactions: [ |
| Shen | H3K4me1, H3K27ac, RNA Pol II | Not considered | Topological domain based | 5,000 to 8,000 | Not determined | Mouse | No |
| Andersson | CAGE | Considered | 500 kb | NA | Not determined | Human | No |
| PreSTIGE [ | H3K4me1 | Considered | Distance (100 kb) and CTCF based | 3,000 to 5,000 | ~13% to 23% | Human | Predicted interactions: |
| PreSTIGEouse [ | H3K4me1 | Considered | Distance based (100 kb) | 3,000 to 5,000 | Not determined | Mouse | Predicted interactions: |
| IM-PET [ | H3K4me1, H3K27ac, H3K4me3 and RNA-seq* | Considered | Distance (2 Mb) | 7,000 to 10,000 | ~1% | Human | Method application: |
*Input data utilized in publication, other input options exist. CAGE, cap analysis of gene expression; CTCF, CCCTC-binding factor (zinc finger protein demonstrated to function as an insulator protein); FDR, false discovery rate; Mb, megabases; NA, not applicable; RNA-seq, RNA sequencing.
Functional enhancer studies of GWAS risk loci
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Blond hair color | Guenther | rs12821256 | rs12821256 |
| Phenotype in mouse model | Developing hair follicles, HaCaT karatinocyte cell line | Allele-specific luciferase activity, allele-specific ChIP, effect of SNP on mouse phenotype |
| Breast cancer | Cowper-Sal lari | rs4784227 | rs4784227 |
| 3C | MCF7 | Binding motif disruption, allele-specific ChIP, allele-specific 3C, allele-specific expression, eQTL |
| Colorectal cancer | Pomerantz | rs6983267 | rs6983267 |
| 3C | Colo205 and LS174T | Allele-specific luciferase activity, allele-specific ChIP |
| Colorectal cancer | Wright | rs6983267 | rs6983267 |
| 3C | DLD1 and HCT116 | Allele-specific ChIP, allele-specific expression |
| Colorectal cancer | Tuupanen | rs67491583 | rs67491583 |
| Nearest gene | HeLa | Binding motif disruption, allele-specific ChIP, allele-specific luciferase activity |
| Prostate cancer | Wasserman | rs6983267 | rs6983267 |
| Nearest gene | Prostate tissue (mouse) | Allele-specific activity, LacZ enhancer assay (mouse) |
| Coronary artery disease | Harismendy | rs10757278 | rs10811656/ rs10757278 |
| 3C and FISH ( | HUVEC | Binding motif disruption, allele-specific ChIP |
| Coronary heart disease | Miller | rs12190287 | rs12190287 |
| Nearest gene, eQTL gene | HCASMC | Binding motif disruption, allele-specific luciferase activity, EMSA, allele-specific ChIP and allele-specific expression |
| Fetal hemoglobin level | Bauer | rs1427407, rs7606173 | rs1427407, rs7606173 |
| 3C | Primary human erythroblasts | Binding motif disruption, allele-specific ChIP, allele-specific expression, LacZ enhancer assay (mouse), deletion by TALEN |
| Multiple sclerosis | Alcina | rs658115 | rs10877013 |
| eQTL | LCLs and monocytes | Allele-specific luciferase activity, eQTL |
| Obesity | Smemo | rs9930506 | NA |
| 4C-seq, 3C, ChIA-PET, Hi-C | Whole mouse embryo and adult mouse brain | eQTL mapping |
| Prostate cancer | Hazelett | rs5945619 | rs4907792 |
| Nearest gene, eQTL gene | LNCaP | Allele-specific luciferase activity, eQTL |
| Prostate cancer | Hazelett | rs10486567 | rs10486567 |
| Nearest gene | LNCaP | Allele-specific luciferase activity, binding motif disruption |
| QT interval | Kapoor | rs12143842 | rs7539120 |
| eQTL gene, genetic association | Cardiac tissues | Allele-specific luciferase activity, eQTL, enhancer assay (zebrafish embryos) |
| Restless leg syndrome | Spieler | rs12469063 | rs13469063 |
| PreSTIGE prediction method, ChIA-PET, Hi-C | Telencephalon | Allele-specific expression of reporter gene in zebrafish, Allele-specific LacZ (mouse), EMSA, binding motif disruption, effect of decreased gene expression on phenotype |
| Systemic lupus erythematosus | Wang | rs2230926 | rs148314165, rs200820567 |
| 3C | LCLs | EMSA, allele-specific luciferase activity, allele-specific 3C |
| Type 2 diabetes | Gaulton | rs7903146 | rs7903146 |
| Nearest gene | Pancreatic islet cells | Allele-specific luciferase activity, allele-specific FAIRE |
3C, chromosome conformation capture; 4C-seq, circular chromosome conformation capture followed by sequencing; ChIA-PET, chromatin interaction analysis by paired-end tag sequencing; ChIP, chromatin immunoprecipitation; EMSA, electrophoretic mobility shift assay; eQTL, expression quantitative trait loci; FAIRE, formaldehyde-assisted isolation of regulatory elements; FISH, fluorescence in situ hybridization; LCLs, lymphoblastoid cell lines; NA, not applicable; SNP, single nucleotide polymorphism.
Figure 3Interpreting enhancer variants. Various strategies for interpreting enhancer variants. (Top) Single- or high-throughput reporter assays can be used to test whether a putative enhancer is functional. (Middle) Gene targets of enhancers can be identified through experimental approaches such as fluorescence in situ hybridization and chromosome conformation capture assays, or through computational methods. (Bottom) The impact of a single nucleotide polymorphism (SNP) on enhancer function can be evaluated through CRISPR/Cas9-based DNA editing approaches, followed by measures of enhancer activity or target gene expression. The effect of a risk SNP on transcriptional activity and chromatin architecture can be evaluated through reporter assays and chromosome-conformation-capture-based experiments. Effects of the risk SNP on allele-specific expression and transcription factor binding can also be studied through quantitative ChIP and expression studies. Expression quantitative trait loci (eQTL) analysis can be performed to determine the effect of risk SNPs on gene expression levels.
Figure 4Future challenges for the functional evaluation of enhancer variants. The challenges described in the conclusion section are depicted in this hypothetical enhancer locus. Chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-seq) tracks from ENCODE [77] and linkage disequilibrium (LD) plots from HapMap [78,79] are displayed via the UCSC genome browser. Number 1 highlights the challenge of utilizing the proper cell type to assess enhancer activity. Enhancers at this locus are only active in one of the three cell lines depicted. Challenge number 2 is the discrepancy between predicted and validated enhancer function. Shown is a putative enhancer defined by chromatin state that requires experimental validation of its enhancer activity. Challenge number 3 illustrates the large number of single nucleotide polymorphisms (SNPs) in LD that lie in putative enhancer elements, any of which could be functional. Number 4 is the challenge of determining the gene impacted by the enhancer variant. Here, the target of the enhancers at this locus could be IL22RA2, IFNGR1, or a gene distal to this locus. Number 5 is the complexity of enhancer gene regulation. Here, multiple enhancers each with several associated variants are distributed across the locus. One or a combination of several of the enhancer variants could influence target gene expression. chr, chromosome; GWAS, genome-wide association study; kb, kilobases.