| Literature DB >> 33267515 |
Chun-Xiao Sun1, Yu Yang2,3, Hua Wang4,5, Wen-Hu Wang2.
Abstract
Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.Entities:
Keywords: ChIP-Seq; motif discovery; planted motif search; transcription factor binding sites
Year: 2019 PMID: 33267515 PMCID: PMC7515331 DOI: 10.3390/e21080802
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Values of and under different values of k for (18, 5) problem instance.
|
| 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|
|
|
|
| 0.0012 | 0.0054 | 0.0193 | 0.0569 |
|
| 0.2303 | 0.4741 | 0.7414 | 0.9242 | 0.9915 | 1 |
Comparisons on problem instances with , , and .
|
|
| MEME | VINE | Projection | AP-ChIP |
|---|---|---|---|---|---|
| (12, 2) | 0.0028 | 0.68 (4 s) | 1.00 (8 s) | 0.86 (10 s) | 0.98 (18 s) |
| (15, 3) | 0.0042 | 0.73 (7 s) | 1.00 (9 s) | 0.82 (1.3 m) | 1.00 (23 s) |
| (15, 4) | 0.0566 | 0.87 (8 s) | 0.96 (5.6 m) | 0.89 (4.2 m) | 0.97 (36 s) |
| (14, 4) | 0.1117 | 0.84 (10 s) | 0.95 (8.3 m) | 0.80 (27.4 m) | 0.96 (47 s) |
| (25, 8) | 0.1494 | 0.91 (12 s) | 0.93 (9.8 m) | 0.78 (32.6 m) | 0.94 (1.1 m) |
| (21, 7) | 0.2564 | 0.87 (28 s) | 0.92 (11.2 m) | 0.76 (48.7 m) | 0.91 (58 s) |
Comparisons on problem instances with , , .
|
|
| MEME | VINE | Projection | AP-ChIP |
|---|---|---|---|---|---|
| (12, 3) | 0.0540 | 0.94 (8 s) | 0.96 (2.4 m) | 0.91 (3.1 m) | 0.97 (21 s) |
| (11, 3) | 0.1146 | 0.86 (10 s) | 0.95 (5.2 m) | 0.84 (4.3 m) | 0.96 (33 s) |
| (13, 4) | 0.2060 | 0.83 (10 s) | 0.93 (8.1 m) | 0.78 (36.4 m) | 0.95 (36 s) |
| (15, 5) | 0.3135 | 0.78 (11 s) | 0.84 (9.6 m) | 0.74 (46.7 m) | 0.93 (34 s) |
| (17, 6) | 0.4261 | 0.70 (13 s) | 0.83 (18.6 m) | 0.72 (53.6 m) | 0.92 (38 s) |
| (19, 7) | 0.5346 | 0.68 (17 s) | 0.75 (24.5 m) | 0.70 (1.2 h) | 0.90 (40 s) |
The results on problem instances with , .
|
| Time | Predicted Motif | Published Motif |
|---|---|---|---|
| (9, 2) | 43 s | TTATCCCTC | TTATCCCTC |
| (12, 3) | 34 s | TTTCCCGTCTGC | CTTTCCCGTCTG |
| (15, 4) | 42 s | GGTTGRAGCTTAGGG | GGTTGGAGCTTAGGG |
| (18, 5) | 38 s | CTTTGCCATATCCATAGG | TTTGCCATATCCATAGGC |
| (21, 6) | 36 s | CAGGTAAACCATATTAAATTA | AGGTAAACCATATTAAATTAC |
R: A,G.
Comparison of problem instances with , , and .
|
|
| MEME-ChIP | ChIP-Munk | FMotif | AP-ChIP |
|---|---|---|---|---|---|
| (9, 2) | 0.049 | 0.96 (12 s) | 0.96 (1.8 m) | 1.00 (47 s) | 1.00 (43 s) |
| (11, 3) | 0.114 | 0.94 (24 s) | 0.92 (2.0 m) | 0.99 (7.9 m) | 0.98 (46 s) |
| (13, 4) | 0.205 | 0.90 (38 s) | 0.83 (2.4 m) | 0.98 (1.45 h) | 0.93 (58 s) |
| (15, 5) | 0.319 | 0.85 (42 s) | 0.80 (8.2 m) | – | 0.92 (1.1 m) |
| (17, 6) | 0.426 | 0.80 (45 s) | 0.78 (9.6 m) | – | 0.89 (1.3 m) |
| (19, 7) | 0.534 | 0.78 (48 s) | 0.76 (10.7 m) | – | 0.87 (1.6 m) |
Figure 1Prediction accuracy for different values of .
Results on the mESC data set.
| Data Set (Seq #) | Time | Predicted Motif | Published Motif |
|---|---|---|---|
| c-Myc (3422) | 125 s |
|
|
| CTCF (39609) | 19 s |
|
|
| Esrrb (21647) | 10 s |
|
|
| Klf4 (10875) | 138 s |
|
|
| Nanog (10343) | 12 s |
|
|
| n-Myc (7182) | 36 s |
|
|
| Oct4 (3761) | 48 s |
|
|
| Smad1 (1126) | 12 s |
|
|
| Sox2 (4525) | 15 s |
|
|
| STAT3 (2546) | 27 s |
|
|
| Tcfcp211 (26910) | 11 s |
|
|
| Zfx (10338) | 68 s |
|
|
Results on the ENCODE dataset.
| Data Set (Seq #) | AP-ChIP Predicted Motif | MEME-ChIP Predicted Motif | Published Motif |
|---|---|---|---|
| Nfyb (10096) |
|
|
|
| Hnf4 (11045) |
|
|
|
| Elf1 (8611) |
|
|
|
| Ets (5525) |
|
|
|
| Egr1 (15400) |
|
|
|
| Yy1 (2077) |
|
|
|
| Six5 (4664) |
|
|
|
| Srf (4903) |
|
|
|
| Tal1 (25507) |
|
|
|