| Literature DB >> 31565573 |
Mingxin Gan1, Wenran Li2,3,4, Rui Jiang2.
Abstract
Chromatin contacts between regulatory elements are of crucial importance for the interpretation of transcriptional regulation and the understanding of disease mechanisms. However, existing computational methods mainly focus on the prediction of interactions between enhancers and promoters, leaving enhancer-enhancer (E-E) interactions not well explored. In this work, we develop a novel deep learning approach, named Enhancer-enhancer contacts prediction (EnContact), to predict E-E contacts using genomic sequences as input. We statistically demonstrated the predicting ability of EnContact using training sets and testing sets derived from HiChIP data of seven cell lines. We also show that our model significantly outperforms other baseline methods. Besides, our model identifies finer-mapping E-E interactions from region-based chromatin contacts, where each region contains several enhancers. In addition, we identify a class of hub enhancers using the predicted E-E interactions and find that hub enhancers tend to be active across cell lines. We summarize that our EnContact model is capable of predicting E-E interactions using features automatically learned from genomic sequences.Entities:
Keywords: Attention-based RNN; Deep learning; Enhancer-enhancer contacts; HiChIP data; Hub enhancers
Year: 2019 PMID: 31565573 PMCID: PMC6746221 DOI: 10.7717/peerj.7657
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Number of enhancer-enhancer interactions collected from HiChIP.
| Cell type | Acronym | 1v1 | mvm |
|---|---|---|---|
| GM12878 | GM | 143,810 | 144,380 |
| K562 | K562 | 158,058 | 142,131 |
| Human coronary artery smooth muscle | HCASMC | 110,078 | 114,758 |
| CD4+ T cell leukemia | MyLa | 124,858 | 123,163 |
| Naïve CD4+ T cells | Naive | 128,450 | 146,308 |
| T helper 17 cells | Th17 | 145,706 | 178,600 |
| T regulatory cells | Treg | 112,392 | 125,625 |
| Total | – | 923,352 | 974,965 |
Figure 1The EnContact method.
(A) One-hot encoded sequence matrix. (B) Schematic illustration of the deep neural network architecture in EnContact. See “Methods” for details. (C) Region-based interactions in HiChIP data are divided into two sets: one set consists of interacting regions with only one enhancer in each region (1v1); the other set contains interacting regions where each region contains more than one enhancer (mvm).
Figure 2Comparison with baseline methods.
(A) Three types of background enhancer-enhancer pairs. Random contacts are generated based on the distance distribution, f(d), of positive interactions; d the distance between two enhancers. Random enhancers are sampled with one end of positive interactions fixed. Random pairs are randomly selected from all possible combinations of any two enhancers. (B–D) Performance of EnContact and other baseline methods across seven cell lines on account of random contacts (B), random enhancers (C), and random pairs (D) as background.
Model performance of EnContact across seven cell lines.
| Cell line | Random contacts | Random enhancers | Random pairs | |||
|---|---|---|---|---|---|---|
| AUROC | AUPRC | AUROC | AUPRC | AUROC | AUPRC | |
| GM | 0.806 | 0.773 | 0.831 | 0.830 | 0.853 | 0.863 |
| K562 | 0.837 | 0.850 | 0.852 | 0.826 | 0.874 | 0.887 |
| HCASMC | 0.821 | 0.804 | 0.811 | 0.815 | 0.846 | 0.857 |
| MyLa | 0.803 | 0.792 | 0.827 | 0.800 | 0.838 | 0.851 |
| Naive | 0.858 | 0.848 | 0.867 | 0.875 | 0.865 | 0.885 |
| Th17 | 0.842 | 0.832 | 0.861 | 0.868 | 0.872 | 0.891 |
| Treg | 0.835 | 0.828 | 0.851 | 0.854 | 0.845 | 0.864 |
| Mean | 0.829 | 0.818 | 0.843 | 0.838 | 0.856 | 0.871 |
Key TFs captured by convolution kernels of EnContact.
| Cell type | Key TFs |
|---|---|
| GM | PKNOX1, PKNOX2, FOS, BHLHE40, JUND, USF1, EBF1 |
| K562 | EGR1, MGA, BHLHE40, MNT, CREB1, FOXA1, NFYB, USF1 |
| HCASMC | ETS1, IRF2, HSF2, MZF1, HOXB3, JUND, CREB1, FOXO3, POU6F1, NFE2, ZNF263, BATF3, ATF7, NKX6-2 |
| MyLa | ETS1, FLI1, RORA, CREB1, ELK4, NFATC1, ETV3, ZBTB33, REST, ELK3, MLX, NR3C1, SP2, ETV6 |
| Naive | IRF4, PLAG1, IRF2, HSF2, MZF1, HOXB3, MEF2D, JUND, CREB1, EMX1, FOXO3, NFYB, MEF2A, POU6F1 |
| Th17 | IRF4, IRF2, FOXP3, FOXP1, TP53, RXRB, SMAD3, NR2C2, POU6F1, CREB1, TFE3, IRF7, PRDM1 |
| Treg | ETS1, NFATC2, MAX, RORA, ELF4, NR2C2, RUNX1, ELF1, RELA, ZBTB33 |
Figure 3Fine-scale enhancer-enhancer interactions predicted by EnContact.
(A) Region-based interactions in HiChIP data are divided into two sets. Interacting regions with more than one enhancer located on each end may result in several candidate enhancer-enhancer interactions. (B) Concept of enhancer co-opening based on DNase I hypersensitivity signal. (C–I) Co-opening percentages of positive predictions and negative predictions across seven cell lines. (J–P) Comparison of validated enhancer-enhancer interactions within E-E pairs predicted by EnContact, E-E pairs derived from candidate interactions, and random E-E pairs. (Q–W) Comparison of three E-E groups (i.e., EnContact E-E group, Candidate E-E group, and Random E-E group) in terms of the number of E-E pairs whose two enhancers regulate the same promoter.
Figure 4Characterization of hub enhancers.
(A–G) Distribution of enhancer interaction degrees in different cell lines. (H–K) Comparison of hub enhancers derived from E-E pairs predicted by EnContact, hub enhancers derived from candidate E-E pairs, and non-hub enhancers in terms of four histone marker H3K4me3 (H), H3K27ac (I), H3K4me2 (J), and H3K9ac (K). (L) Comparison of the number of ChIP-seq experiments where hub enhancers are active. (M) Comparison of the number of TFs included in the experiments where hub enhancers are active. The y-axis represents the number of experiments where a promoter is active.