| Literature DB >> 28240320 |
Gaofeng Pan1, Jijun Tang1,2, Fei Guo1.
Abstract
Transcription factors (TFs) binding to specific DNA sequences or motifs, are elementary to the regulation of transcription. The gene is regulated by a combination of TFs in close proximity. Analysis of co-TFs is an important problem in understanding the mechanism of transcriptional regulation. Recently, ChIP-seq in mapping TF provides a large amount of experimental data to analyze co-TFs. Several studies show that if two TFs are co-associated, the relative distance between TFs exhibits a peak-like distribution. In order to analyze co-TFs, we develop a novel method to evaluate the associated situation between TFs. We design an adjacency score based on ordered differences, which can illustrate co-TF binding affinities for motif analysis. For all candidate motifs, we calculate corresponding adjacency scores, and then list descending-order motifs. From these lists, we can find co-TFs for candidate motifs. On ChIP-seq datasets, our method obtains best AUC results on five datasets, 0.9432 for NMYC, 0.9109 for KLF4, 0.9006 for ZFX, 0.8892 for ESRRB, 0.8920 for E2F1. Our method has great stability on large sample datasets. AUC results of our method on all datasets are above 0.8.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28240320 PMCID: PMC5327392 DOI: 10.1038/srep43597
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The sequence-specific binding information of motif V$MYOD_01 in TRANSFAC database.
(a) Position Weight Matrix of V$MYOD_01. (b) Sequence Logo of V$MYOD_01.
Figure 2Extracting sequences and creating matrix M on NANOG dataset.
Figure 3Comparison of sequence-specific binding scores for two motifs on c-Myc.
(a) sequence-specific binding scores in each bin of motif V$E2F_03. (b) sequence-specific binding scores in each bin of motif V$OCT1_07.
Figure 4Comparison of f1 scores for two motifs on c-Myc.
Figure 5Comparison of f2 scores for two motifs on c-Myc.
Figure 6Comparison of our method, CENTDIST, CEAS, and CORE_TF on seven large sample ChIP-seq datasets in ES cells.
AUC results of our method, CENTDIST, CEAS, and CORE_TF on three ChIP-seq datasets with size more than 20000 peak points.
| Our method | CENTDIST | CORE_TF promBG | CORE_TF randBG | CEAS | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 200 | 400 | 1000 | 200 | 400 | 1000 | 200 | 400 | 1000 | |||
| ESRRB | 0.7869 | 0.6373 | 0.6627 | 0.6065 | 0.5359 | 0.5451 | 0.6183 | 0.6203 | 0.6072 | 0.6111 | |
| E2F1 | 0.8761 | 0.8202 | 0.7966 | 0.7758 | 0.8076 | 0.7862 | 0.7303 | 0.5789 | 0.5625 | 0.5746 | |
| TCFCP | 0.8437 | 0.6889 | 0.6719 | 0.5386 | 0.6627 | 0.6484 | 0.6641 | 0.6333 | 0.6144 | 0.6105 | |
aCORE_TF promBG under promoter background uses enriched regions with size 200, 400 and 1000.
bCORE_TF randBG under random genome background uses enriched regions with size 200, 400 and 1000.
cCEAS uses enriched regions with size 200, 400 and 1000.
AUC results of our method, CENTDIST, CEAS, and CORE_TF on four ChIP-seq datasets with size more than 5000 peak points and less than 20000 peak points.
| Our method | CENTDIST | CORE_TF promBG | CORE_TF randBG | CEAS | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 200 | 400 | 1000 | 200 | 400 | 1000 | 200 | 400 | 1000 | |||
| NMYC | 0.8889 | 0.8052 | 0.7915 | 0.7627 | 0.7922 | 0.7719 | 0.7418 | 0.7255 | 0.6137 | 0.6039 | |
| NANOG | 0.9148 | 0.9320 | 0.9399 | 0.9020 | 0.9255 | 0.9046 | 0.8327 | 0.8386 | 0.8510 | 0.7268 | |
| KLF4 | 0.8550 | 0.7075 | 0.7058 | 0.6908 | 0.7058 | 0.6950 | 0.6813 | 0.6708 | 0.6883 | 0.6021 | |
| ZFX | 0.8758 | 0.8353 | 0.8248 | 0.7732 | 0.8288 | 0.8013 | 0.7190 | 0.6327 | 0.5137 | 0.5137 | |
aCORE_TF promBG under promoter background uses enriched regions with size 200, 400 and 1000.
bCORE_TF randBG under random genome background uses enriched regions with size 200, 400 and 1000.
cCEAS uses enriched regions with size 200, 400 and 1000.
Evaluate on AUC values of our method, CENTDIST, CEAS, and CORE_TF on seven ChIP-seq datasets.
| CORE_TF promBG | CORE_TF randBG | CEAS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Our method | CENTDIST | 200 | 400 | 1000 | 200 | 400 | 1000 | 200 | 400 | 1000 | ||
| [20000, ∞) | 0.8567 | 0.7155 | 0.7104 | 0.6403 | 0.6687 | 0.6599 | 0.6709 | 0.6108 | 0.5947 | 0.5987 | ||
| 0.0624 | 0.0943 | 0.0748 | 0.1222 | 0.1360 | 0.1210 | 0.0563 | 0.0284 | 0.0281 | 0.0209 | |||
| [5000, 20000] | 0.8974 | 0.8200 | 0.8155 | 0.7822 | 0.8131 | 0.7932 | 0.7437 | 0.7169 | 0.6667 | 0.6116 | ||
| 0.0503 | 0.0925 | 0.0969 | 0.0879 | 0.0910 | 0.0867 | 0.0644 | 0.0896 | 0.1422 | 0.0876 | |||
| [5000, ∞) | 0.8800 | 0.7752 | 0.7705 | 0.7214 | 0.7512 | 0.7361 | 0.7125 | 0.6714 | 0.6358 | 0.6061 | ||
| 0.0551 | 0.1018 | 0.0986 | 0.1208 | 0.1275 | 0.1171 | 0.0681 | 0.0866 | 0.1089 | 0.0635 | |||
Figure 7Evaluate the stability of our method, CENTDIST, CEAS, and CORE_TF on seven ChIP-seq datasets.