| Literature DB >> 28964259 |
Arpit Mishra1, R David Hawkins2.
Abstract
Genome compaction is a universal feature of cells and has emerged as a global regulator of gene expression. Compaction is maintained by a multitude of architectural proteins, long non-coding RNAs (lncRNAs), and regulatory DNA. Each component comprises interlinked regulatory circuits that organize the genome in three-dimensional (3D) space to manage gene expression. In this review, we update the current state of 3D genome catalogues and focus on how recent technological advances in 3D genomics are leading to an enhanced understanding of disease mechanisms. We highlight the use of genome-wide chromatin conformation capture (Hi-C) coupled with oligonucleotide capture technology (capture Hi-C) to map interactions between gene promoters and distal regulatory elements such as enhancers that are enriched for disease variants from genome-wide association studies (GWASs). We discuss how aberrations in architectural units are associated with various pathological outcomes, and explore how recent advances in genome and epigenome editing show great promise for a systematic understanding of complex genetic disorders. Our growing understanding of 3D genome architecture-coupled with the ability to engineer changes in it-may create novel therapeutic opportunities.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28964259 PMCID: PMC5623062 DOI: 10.1186/s13073-017-0477-2
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Hierarchical chromatin organization. Top tier: higher-order compartments A and B, where A is an active compartment and B is an inactive or densely packed compartment (beige-colored top-most triangles). Moving downward, topologically associated domains (TADs) are organized into increasingly higher-resolution structures. Second tier: representative metaTAD structure (gray-colored triangle), where many TADs together form one metaTAD. Inter-TAD interactions, while more sparse, can be detected. Third tier: TADs (light pink triangle) consist of numerous intra-TAD regulatory loops (small red triangles in TADs). These regulatory loops are major governing factors for differential transcriptional output. In tiers 1–3, triangles represent higher-frequency contacts of the three-dimensional (3D) genome shown in two dimensions (2D). Tier four illustrates how a TAD may look in 3D, comprising intra-TAD regulatory loops. Representative examples of regulatory loops are also shown: one enhancer to multiple promoter interactions, promoter–promoter interactions, and multiple enhancers to one promoter interactions. TAD boundaries are marked by the CTCF–cohesin complex (green pentagon). Intra-TAD elements likely consist of different transcription factors (light green circles) and long non-coding RNA (dark gray circles)
Commonly used terminologies
| Terminology | Definition |
|---|---|
| Euchromatin | Chromatin that contains loosely packed nucleosomes. Usually represents transcriptionally active sites in the genome, including regulatory elements |
| Heterochromatin | Chromatin that is densely packed with nucleosomes. Usually represents transcriptionally silent site in the genome |
| DNase I hypersensitive sites (DHSs) | Nucleosome-free regions of chromatin that are mostly found at enhancers and promoters. Largely indicative of transcription factor binding |
| Enhancer elements | Enhancers are sequences of DNA that enhance gene expression by being bound by transcription factors and looping to interact with gene promoters. These elements are located on the same chromosome ( |
| Super-enhancer | Group of multiple enhancers located within 12 kb of each other, which are bound by an array of transcription factors and marked by acetylation |
| Temp enhancer | A novel class of |
| Human-gained enhancer | Putative novel enhancer-like elements gained in the human lineage, discovered from brain Hi-C data |
| Purifying selection | Negative selection in which deleterious alleles are selectively removed through evolution |
| Gene desert | Large genomic regions that are devoid of genes, but may harbor many disease-causing variants and distal regulatory elements |
| Promoter interacting regions (PIRs) | PIRs are broadly defined as distal regulatory elements interacting with promoters via looping interactions |
| Frequently interacting regions (FIREs) | FIREs are regional groups of putative enhancer-like elements that interact with each other and many promoters |
| Population average ensemble structure | During Hi-C experiments in bulk, cells are present in multiple growth stages; thus, they exhibit multiple 3D architectural landscapes. In bulk Hi-C, different architectural landscapes are captured and this is called population average ensemble structure |
| Haplotype phasing | Deciphering haplotype block structures for polymorphic sites using genotype data. This is traditionally done computationally to determine if variants are on the same allele. Hi-C provides an experimental means of determining if variants reside on the same allele |
| Combinatorial indexing | Method that tags DNA within intact nuclei in each cell with successive rounds (combinatorial) of nucleic acid barcodes for adapting to different genomics application such as transcriptomics, Hi-C and chromatin accessibility for single-cell studies, without the need for isolating single cells physically |
3D three-dimensional, DHSs DNase I hypersensitive sites, HiC genome-wide chromatin conformation capture, FIREs frequently interacting regions, kb kilobases, PIRs promoter interacting regions, Temp temporarily phenotypic
List of genome architectural methods
| Technique | Most applicable scenario and/or advantages | Limitations | Relevant example(s) | Suitable computational pipeline(s) |
|---|---|---|---|---|
| DNA-centric view of genome architectural methods | ||||
| Chromosome conformation capture (3C) [ | Interrogating looping interactions between single gene locus to single regulatory locus (one locus to one locus) | Not suitable for high-throughput identification of novel looping interaction | Association of causal variants from GWASs in 16p13 to | Not required |
| Circular chromosome conformation capture on chip (4C) [ | Exploring all possible interactions with a single clinically relevant locus (one locus to all loci) | Limited throughput | Association of regulatory SNP with target genes [ | FourCseq [ |
| Circular chromosome conformation capture combined with sequencing (4C-seq) [ | Exploring all possible interactions with a single clinically relevant locus (one locus to all loci) | GC content or length of interacting fragment may introduce PCR bias | Chromosomal rearrangement detection [ | FourCseq [ |
| Chromosome conformation capture carbon copy (5C) [ | Studying interactions between many chromosomal loci with many interacting regions across the genome (many loci to all loci) | Complicated primer/probe design can introduce amplification bias. Occasionally misses weak long-range contacts | Determined interaction profiles at pilot promoter regions in ENCODE project [ | HiFive [ |
| Genome-wide chromatin conformation capture (Hi-C) [ | Circumstances where extensive chromatin reorganization occurs (i.e., stem cell differentiation), in which it is important to understand interactions between all parts of the genome (all loci to all loci). The most extensively used genome architectural method | Insensitive method for probing local intra-TAD interactions (<40 kb) unless performed at very high resolution | Genome-wide TAD distributions [ | Methods are primarily divided into: (1) quality control and mapping; (2) domain calling; (3) visualization; and (4) 3D modeling. These methods have been extensively reviewed earlier [ |
| Tethered conformation capture (TCC) [ | Proximity ligation step performed on solid substrate, thus reducing random intermolecular ligation (all loci to all loci) | Although more specific, proximity ligation occurs outside the cell, and thus some native cell context may be lost. Biotinylation step may require optimization | Originally applied to B-cell line, but intended for direct clinical/diagnostic applications | Accompanied by a novel method for TCC data analysis [ |
| Genome-wide chromatin conformation capture with DNase I digestion (DNase Hi-C) or targeted DNase Hi-C [ | Improves on Hi-C restriction-enzyme-mediated resolution limits, and thus is most suitable for higher-resolution architectural studies (all loci to all loci) | DNase I treatment may digest bait-targeted region; thus, tiling probes designed across the region are needed for targeted DNase Hi-C | Targeted DNase Hi-C was used to investigate chromatin architecture at many lncRNA loci | Analysis pipelines are similar to Hi-C methods |
| Genome architectural mapping (GAM) [ | First ligation-free method for investigating | Time and specialization required to individually section and dissect out nuclei. Cell asynchrony and heterogeneity affect overall outcome | Uses thin tissue sample slices, which can be applied to frozen clinical tissue samples | GAMTools: specialized automated pipeline [ |
| DNA-centric view of genome architectural methods with target enrichment | ||||
| Chromosome conformation capture (3C) coupled with oligonucleotide capture technology (capture-C) [ | Delineating interaction profiles for many chromosomal loci in a single experiment without introduction of PCR bias and without missing weak long-range interactions (many loci to all loci) | Initial capture-C data suffered from insufficient depth and captured some non-specific interactions. NG-capture-C overcomes these limitations and provides higher sensitivity and resolution | Initially found complex patterns of HIF response by defining chromatin architecture at multiple HIF-bound enhancer and promoter sites [ | Capture-C analyzer and capture-C oligo design tools available on github [ |
| Targeted locus amplification (TLA) [ | Little requirement of prior sequence knowledge. Most suitable for studying chromosomal rearrangements, single nucleotide variants (SNVs), transgene integration sites, and haplotyping at large genomic intervals. Entire restriction fragments are sequenced, unlike 4C-seq where only ends of fragments are analyzed (many loci to all loci) | Potential for applying to purified genomic DNA or formalin-fixed paraffin embedded material, but current protocol is limited to cells only | Used for haplotyping at | TLA analysis pipeline details in [ |
| Targeted chromatin capture (T2C) [ | Provides affordable diagnostic tools with restriction enzyme resolution to understand domain and compartments at clinically relevant site. Can be applied to many regions of the genome simultaneously (many loci to all loci) | Output limited to preselected regions. Does not perform well at repeat regions | Was used to validate architecturally well-characterized mouse β-globin and human | No specialized pipeline. Uses mainly well-known tools such as BWA, Samtools, and BEDtools |
| Hi-C coupled with RNA bait capture probes (CHi-C) [ | Provides high-resolution | Difference in hybridization of RNA probes may introduce enrichment bias. RNA probe location is restricted due to restriction enzyme sites and requires tilling of probes, which increases cost | Identification of three cancer-associated gene deserts in | CHiCAGO tools [ |
| Promoter capture-Hi-C (p-CHi-C) [ | Similar to CHi-C, but RNA enrichment baits target all promoters (many loci to all loci) | Similar limitations to CHi-C | A detailed catalogue of 22,000 promoter interactions where autoimmune- and hematological-disorder-related SNPs are significantly enriched [ | CHiCAGO tools [ |
| Promoter-anchored chromatin interaction (HiCap) [ | Similar approach to CHi-C but uses a 4-bp cutter restriction enzyme for improved resolution (many loci to all loci) | Similar limitations to CHi-C | Promoter-anchored interactions for 15,905 promoters in mouse embryonic stem cells (mESCs) | CHiCAGO tools [ |
| DNA-centric view of single-cell genome architectural methods | ||||
| Single-cell genome-wide chromatin conformation capture (single-cell Hi-C) [ | Can delineate cellular heterogeneity at architectural level. Overcomes limitation of population ensemble average structure from bulk Hi-C (all loci to all loci at single-cell level) | Can be technically more challenging than bulk Hi-C. Data from multiple, individual cells are likely needed for a useful interpretation | Single-cell Hi-C has been used to understand architectural heterogeneity for Th1 cells, cell cycle transition and during oocyte to zygotic transition [ | Single cell Hi-C Pipeline (scell_hicpipe) [ |
| Single-cell combinatorial indexing Hi-C (sciHi-C) [ | Probes cellular heterogeneity by using combinatorial indexing, thus eliminating requirement of single-cell separation using fluorescence-activated cell sorting. Provides rapid scaling for large number of cells. Technically feasible to use for clinically important tissue samples (all loci to all loci at single-cell level) | Comparatively new method; may require optimization compared to bulk Hi-C | sciHi-C data for more than 10,000 single cells was reported. Yet to be explored clinically, but has potential for application to important diseases such as cancer where cellular heterogeneity plays crucial role | Single-cell combinatorial indexing Hi-C pipeline on github [ |
| Protein-centric view of genome architectural methods | ||||
| Chromatin interaction analysis-end tag sequencing (ChIA-PET) [ | To understand the protein-specific chromatin interactome. Important in identifying chromatin architectural roles for proteins (many loci to all loci) | Requires known/target protein of interest, similar to chromatin immunoprecipitation followed by sequencing (ChIP-seq). Protein may not bind directly to DNA but bind in complex | Used for studying chromatin architecture mediated by estrogen receptor α binding [ | ChIA-PET2 data analysis pipeline [ |
| Hi-C chromatin immunoprecipitation (HiChIP) [ | Protein-centric view of genome architecture similar to ChIA-PET but more sensitive and requires fewer cells (many loci to all loci) | As above | Identified genome-wide cohesin-mediated looping interactions [ | Uses Hi-C Pro for data processing; Fit-Hi-C, Mango, and Juicer for contact interaction calls; and MACS2 for peak calls [ |
| Proximity ligation assisted chromatin immunoprecipitation (PLAC-seq) [ | Protein-centric view of genome architecture similar to ChIA-PET, but more sensitive and requires fewer cells (many loci to all loci) | As above | Generated improved maps of promoter–enhancer interactions in mESCs using H3K4me3 mark. Can be used in place of CHi-C methods and does not require probe design/acquisition | PLAC-seq data analysis pipeline [ |
3C chromosome conformation capture, 3D three-dimensional, 4C circular chromosome conformation capture on chip, 4C-seq circular chromosome conformation capture combined with sequencing, 5C chromosome conformation capture carbon copy, bp base pairs, capture-C chromosome conformation capture coupled with oligonucleotide capture technology, ChIA-PET chromatin interaction analysis-end tag sequencing, CHi-C Hi-C coupled with RNA bait capture probes, ChIP-seq chromatin immunoprecipitation followed by sequencing, DNase Hi-C genome-wide chromatin conformation capture with DNase I digestion, GAM genome architectural mapping, GWAS genome-wide association study, Hi-C genome-wide chromatin conformation capture, HiCap promoter-anchored chromatin interaction, HiChIP Hi-C chromatin immunoprecipitation, kb kilobases, mESC mouse embryonic stem cell, NG-capture-C next-generation capture-C, p-CHi-C promoter capture-Hi-C, PLAC-seq proximity ligation assisted chromatin immunoprecipitation, sciHi-C single-cell combinatorial indexing Hi-C, SNP single nucleotide polymorphism, SNV single nucleotide variant, T2C targeted chromatin capture, TAD topologically associated domain, TCC tethered conformation capture, TLA targeted locus amplification
Architectural changes and disease
| Architectural component | Disease phenotype or mutation effect | Underlying cause or architectural change | References |
|---|---|---|---|
| CTCF | Silencing of tumor suppressor | Hypermethylation of CTCF-binding site near | [ |
| CTCF | Illegitimate enhancer access of | Hypermethylation of CTCF-binding site due to IDH mutation and disruption of TAD boundary | [ |
| CTCF | Human limb malformation | Altered TAD structure surrounding | [ |
| CTCF-cohesin | Activation of proto-oncogenes in T-cell acute lymphoblastic leukemia | Microdeletion of insulated boundary and aberrant access of enhancer to oncogene | [ |
| Cohesin loading factor NIPBL in 50% of cases | Cornelia de Lange syndrome |
| [ |
| MED12 | X-linked mental retardation Opitz Kaveggia syndrome | Recurrent mutation R961W in | [ |
| Lamin A | Hutchinson–Gilford Progeria syndrome | Point mutation in lamin A, loss of H3K27me3, which in turn leads to global loss of spatial chromatin structure at the nuclear lamina | [ |
| Long non-coding RNA (lncRNA) | Colorectal cancer | This lncRNA is transcribed from an 8q24 gene desert and interacts with CTCF to form looping structuresat the | [ |
| lncRNA | Brachydactyly type E | Translocation-mediated disruption of | [ |
lncRNA long non-coding RNA, ncRNA non-coding RNA