| Literature DB >> 28881991 |
Yang Yang1, Ruochi Zhang2, Shashank Singh3, Jian Ma1.
Abstract
MOTIVATION: A large number of distal enhancers and proximal promoters form enhancer-promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer-promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions.Entities:
Mesh:
Year: 2017 PMID: 28881991 PMCID: PMC5870728 DOI: 10.1093/bioinformatics/btx257
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Method overview of PEP
Fig. 2Evaluation of PEP-Motif, PEP-Word and PEP-Integrate (K = 6 for K-mers) on E/P data from six cell lines in comparison with TargetFinder (E/P/W) in terms of AUROC, AUPR, Precision, Recall, F1 and MCC
Fig. 3Estimated feature importance of motifs in PEP-Motif that have top 5% importance in at least one cell line. The feature importance is scaled between 0 (low importance) and 1 (high importance). Of the 503 motif representatives (427 single motifs and 76 motif clusters) found by PEP-Motif, 139 in enhancers and 48 in promoters have top 5% feature importance in at least one cell line. Here we display the top 100 of 139 predictive motif representatives in enhancers and all 48 predictive motif representatives in promoters. Each motif is represented by the name of its corresponding TF. If a TF has multiple associated motifs, alternative motifs are marked according to their identities in the database [e.g. EHF(S) denotes a single site motif of EHF (Kulakovskiy )]. If a motif represents a motif cluster, names of all the member motifs are shown in combination. We performed hierarchical clustering on both motifs (rows of the feature importance matrix) and cell types (columns) to have the motif features grouped. A cell is highlighted with white border if the corresponding motif has top 5% feature importance in the respective cell type
Important TF motifs discovered by PEP-Motif (but not by TargetFinder) to be of top 5% feature importance in at least two cell lines (the upper part) and those of top 10% feature importance in at least three cell lines (the lower part)
| Cell lines | Potential novel predictive TF of top 5% importance (in PEP-Motif but not in TargetFinder) |
|---|---|
| 2 (E) | ( |
| CEBPE, EGR3, ENOA,( | |
| HAND1, HBP1, HOXA1, MCR, ( | |
| ( | |
| BRAC, TEAD3, ZNF713 | |
| 3 (E) | EHF, ZSC16 |
| 2 (P) | AP2D, ( |
| 3 (P) | ( |
Note: Each TF is represented by one or multiple motifs. If the corresponding motif is associated with a motif cluster, all members of the cluster are displayed and the motif reaching the specified importance level is in italic. ‘E’ represents enhancer regions and ‘P’ represents promoter regions. The row name represents the exact number of cell lines where the motif reaches the specified importance level, e.g. ‘2 (E)’ denotes that the feature in the enhancer region has top 5% importance in exactly two cell lines.