| Literature DB >> 35130376 |
Saidi Wang1, Haiyan Hu1, Xiaoman Li2.
Abstract
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.Entities:
Keywords: enhancers; enhancer–promoter interactions; motif pairs; promoters; transcription factors
Mesh:
Substances:
Year: 2022 PMID: 35130376 PMCID: PMC9069648 DOI: 10.1515/jib-2021-0038
Source DB: PubMed Journal: J Integr Bioinform ISSN: 1613-4516
Figure 1:(A) The procedure to obtain positive and negative EP pairs. (B) The pipeline to study motif pairs in positive EP pairs.
The predicted motif pairs in seven cell lines.
| Cell line (billion) | #Enhancers | #Promoters | #EP pairs | #Predicted motifs | #Predicted motif pairs |
|---|---|---|---|---|---|
| GM12878 (15.1) | 2731 | 2171 | 3688 | 51 (76.47%) | 233 (66.52%, 0.86%, 1.23E-14) |
| HMEC (1.1) | 1761 | 1713 | 2157 | 33 (87.88%) | 88 (59.09%, 2.27%, 0) |
| HUVEC (0.9) | 751 | 650 | 835 | 8 (100.0%) | 5 (60.0%, 0, 0) |
| IMR90 (1.7) | 2344 | 2137 | 3226 | 53 (71.7%) | 116 (59.48%, 7.76%, 0) |
| K562 (1.3) | 2096 | 1942 | 2972 | 48 (83.33%) | 144 (56.25%, 6.25%, 3.33E-16) |
| KBM7 (1.2) | 6278 | 5970 | 7862 | 78 (53.85%) | 264 (42.8%, 8.33%, 1.25E-14) |
| NHEK (1.3) | 1160 | 1018 | 1313 | 18 (88.89%) | 28 (89.29%, 7.14%, 4.44E-16) |
The sequencing depth is under each cell line name in the first column, in the unit of billion. The percentage in the second last column is the percent of motifs in a cell line identified in other cell lines. The four numbers in the last column are the number of the predicted motif pairs, the percentage of the predicted motif pairs in a cell line identified in other cell lines, the percentage of random motif pairs in a cell line identified in other cell lines, and the p-value of the number of the predicted motif pairs in a cell line identified in other cell lines, respectively.
Figure 2:The predicted motif pairs are enriched with known interacting TF pairs.
EP motif pair comparison with EN-CODEC.
| Cell line | Method | % Predicted motif pairs shared with EN-CODEC | % TF pairs in EN-CODEC identified |
|---|---|---|---|
| GM12878 | Based on all TFs | 64/75 = 85.33% | 87/1379 = 6.31% |
| Based on unique TFs | 51/66 = 77.27% | 50/1379 = 3.63% | |
| K562 | Based on all TFs | 25/25 = 100.00% | 490/4390 = 11.16% |
| Based on unique TFs | 22/22 = 100.00% | 237/4390 = 5.40% |
‘Based on all TFs’ is the result based on all TFs with their motifs similar to each predicted motif (STAMP E-value < 1E-05). ‘Based on unique TF’ is the result based on the TF with its motif most similar to each predicted motif (STAMP E-value < 1E-05).
The accuracy of motif pairs in distinguishing positive EP pairs from three types of negative EP pairs based on lasso.
| Cell line | 1st type | 2nd type | 3rd type | #Selected motif pairs | % Selected motif pairs shared |
|---|---|---|---|---|---|
| GM12878 | (0.91, 0.92, 0.92) | (0.86, 0.76, 0.80) | (0.69, 0.87, 0.77) | (78, 96, 70) | 43/70 = 61.43% |
| HMEC | (0.90, 0.90, 0.90) | (0.85, 0.72, 0.78) | (0.52, 0.99, 0.68) | (66, 58, 36) | 26/36 = 72.22% |
| HUVEC | (0.83, 0.88, 0.85) | (0.67, 0.79, 0.70) | (0.51, 1.00, 0.67) | (5, 5, 5) | 5/5 = 100.00% |
| IMR90 | (0.91, 0.92, 0.92) | (0.91, 0.79, 0.84) | (0.50, 0.99, 0.67) | (56, 86, 43) | 18/43 = 41.86% |
| K562 | (0.91, 0.90, 0.91) | (0.87, 0.75, 0.81) | (0.50, 0.97, 0.66) | (71, 102, 53) | 25/53 = 47.17% |
| KBM7 | (0.91, 0.89, 0.9) | (0.90, 0.72, 0.80) | (0.59, 0.90, 0.71) | (107, 108, 53) | 16/17 = 94.12% |
| NHEK | (0.89, 0.90, 0.89) | (0.65, 0.78, 0.70) | (0.51, 0.99, 0.67) | (23, 24, 17) | 16/40 = 40.00% |
| Average | (0.89, 0.90, 0.90) | (0.82, 0.76, 0.78) | (0.55, 0.96, 0.69) | (58,68,40) | 55.46% |
The three numbers from the 2nd column to the 4th column are the precision, recall and F1 score. The second last column is the number of motif pairs selected by lasso in distinguishing positives from negatives for the three types of negatives in order. The last column shows the percentage of the selected motif pairs based on the third type of negatives by lasso in multiple cell lines.
Figure 3:(A) The 423 motif pairs discovered in a network. (B) The TF pair GATA1-ZNF423 in the network. (C) The TF pair EBF1-ZNF143 in the network.