| Literature DB >> 24736605 |
Abstract
Genome-wide profiling of DNA-binding proteins using ChIP-Seq has emerged as an alternative to ChIP-chip methods. ChIP-Seq technology offers many advantages over ChIP-chip arrays, including but not limited to less noise, higher resolution, and more coverage. Several algorithms have been developed to take advantage of these abilities and find enriched regions by analyzing ChIP-Seq data. However, the complexity of analyzing various patterns of ChIP-Seq signals still needs the development of new algorithms. Most current algorithms use various heuristics to detect regions accurately. However, despite how many formulations are available, it is still difficult to accurately determine individual peaks corresponding to each binding event. We developed Constrained Multi-level Thresholding (CMT), an algorithm used to detect enriched regions on ChIP-Seq data. CMT employs a constraint-based module that can target regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks) by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies) for Drosophila melanogaster and the H3K4ac antibody dataset.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24736605 PMCID: PMC3988018 DOI: 10.1371/journal.pone.0093873
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A detected region from the FoxA1 dataset for chromosome 1.
The -axis corresponds to the genome position in bp and the -axis corresponds to the number of reads.
Binding motifs corresponding to each dataset.
| FoxA1 | CAD | GT | HB | KR |
| TGCATG | TTTATTG, TTTATGA | TTACGTAA | TTTTTT | GANGGGT, AANGGGT |
Figure 2Venn diagrams corresponding to all datasets.
Each Venn diagram shows the number of detected regions by CMT, MACS and T-PIC in each dataset along with the number of detected regions by each pair and all aformentioned methods.
Percentage of common peaks detected by each method included in the comparison and related to each protein of interest.
| CMT | T-PIC | MACS | ||
|
| CMT | 100 | 79.8 | 50.8 |
| T-PIC | 96.7 | 100 | 59.8 | |
| MACS | 95.1 | 92.3 | 100 | |
|
| CMT | 100 | 41.9 | 24.4 |
| T-PIC | 72.7 | 100 | 47.3 | |
| MACS | 79.0 | 88.4 | 100 | |
|
| CMT | 100 | 66.1 | 16.9 |
| T-PIC | 70.1 | 100 | 18.0 | |
| MACS | 68.9 | 69.0 | 100 | |
|
| CMT | 100 | 82.9 | 93.1 |
| T-PIC | 74.0 | 100 | 97.8 | |
| MACS | 66.1 | 77.7 | 100 | |
|
| CMT | 100 | 85.3 | 64.2 |
| T-PIC | 73.4 | 100 | 55.6 | |
| MACS | 66.7 | 67.1 | 100 | |
|
| CMT | 100 | 54.0 | 28.2 |
| T-PIC | 73.6 | 100 | 44.7 | |
| MACS | 76.4 | 88.7 | 100 | |
|
| CMT | 100 | 54.5 | 34.4 |
| T-PIC | 74.2 | 100 | 54.8 | |
| MACS | 76.7 | 89.6 | 100 | |
|
| CMT | 100 | 16.1 | N/A |
| T-PIC | 16.7 | 100 | N/A | |
| MACS | N/A | N/A | N/A |
Peak number, length and score comparison.
| Dataset | Method of Comparison | CMT | T-PIC | MACS |
| FoxA1 | Mean length of peaks | 277 | 303 | 373 |
| Enrichment ratio | 2.39 | 2.42 | 1.83 | |
| CAD | Mean length of peaks | 476 | 818 | 507 |
| Enrichment ratio | 0.92 | 0.88 | 0.93 | |
| GT | Mean length of peaks | 303 | 866 | 194 |
| Enrichment ratio | 4.21 | 1.98 | 3.02 | |
| HB1 | Mean length of peaks | 365 | 920 | 429 |
| Enrichment ratio | 2.03 | 1.57 | 1.80 | |
| HB2 | Mean length of peaks | 343 | 891 | 228 |
| Enrichment ratio | 2.11 | 1.56 | 1.99 | |
| KR1 | Mean length of peaks | 517 | 728 | 492 |
| Enrichment ratio | 1.91 | 1.83 | 1.95 | |
| KR2 | Mean length of peaks | 513 | 737 | 500 |
| Enrichment ratio | 1.94 | 1.75 | 2.10 |
Comparison between CMT, MACS and T-PIC based on the number and mean length of detected peaks and enrichment score.
Length and enrichement score comparison.
| CMT | T-PIC | MACS | |
| Mean length of peaks | 220 | 421 | 337 |
| Enrichment ratio | 2.74 | 2.92 | 1.67 |
Comparison between CMT, MACS and T-PIC the average length of detected peaks and enrichment score on the FoxA1 dataset.
Conceptual comparison of recently proposed methods for finding peaks in ChIP-Seq data.
| Method | Peak selection criteria | Peak ranking | Parameters |
| GLITR |
| Peak height and fold enrichment | Target FDR, number of nearest neighbors for clustering |
| MACS | Local region Poisson |
|
|
| PeakSeq | Local region binomial |
| Target FDR |
| Quest v2.3 | Height threshold, background ratio |
| KDE bandwidth, peaks height, sub-peak valley depth, ratio to background |
| SICER v1.02 |
|
| Window length, gap size, FDR (with control) or |
| SiSSRs v1.4 |
|
| FDR, |
| T-PIC | Local height threshold |
| average fragment length, significance |
| Qeseq | Local enrichment significance |
| no parameter |
| CMT | Height threshold and volume difference | fold enrichment | average fragment length, minimum and maximum region size, cut-off, minimum supported reads |
Figure 3Schematic diagram of the pipeline for finding significant peaks.
Figure 4An example of finding the threshold using the CMT algorithm.
Area under curve (AUC) comparison between CMT, MACS and T-PIC, based on the number of false positive (FP) and true positive (TP) detected peaks.
| CMT | T-PIC | MACS | |
| AUC | 0.856 | 0.794 | 0.712 |
True positive and true negative peak comparison.
| CMT | T-PIC | MACS | |
| TP | 14 | 13 | 12 |
| TN | 0 | 0 | 0 |
The comparison of CMT, MACS and T-PIC is based on the number of true positive (TP) and true negative (TN) detected peaks.
Figure 5One of the true positive regions located in chromosome 3 of the FoxA1 dataset.
The red lines show the actual location of the previously verified true positive region. The -axis corresponds to the genome position in bp and the -axis corresponds to the number of reads. The peak is detected by CMT but not by T-PIC or MACS.
Comparison of CMT, MACS and T-PIC, based on the percentage of detected regions that are associated with different genomic features.
| Method | Number of Regions | Genes | Exons | Introns | Promoters | Inter-genetic Regions | |||||
| Regions | % | Regions | % | Regions | % | Regions | % | Regions | % | ||
| MACS | 14,026 | 12,249 | 87.3 | 967 | 6.9 | 12,438 | 88.7 | 676 | 4.8 | 7,338 | 52.3 |
| T-PIC | 21,662 | 19,041 | 87.9 | 1,721 | 7.9 | 18,731 | 86.5 | 934 | 4.3 | 10,989 | 50.7 |
| CMT | 26,253 | 23,311 | 88.8 | 2,231 | 8.5 | 22,143 | 84.3 | 1,226 | 4.7 | 13,053 | 49.7 |
Comparison of CMT, MACS and T-PIC, based on the percentage of regions detected by one method and not by the others.
| Method | Genes | Exons | Introns | Promoters | Inter-genetic Regions |
| MACS | 70.5% | 7.5% | 71.4% | 3.8% | 57.4% |
| T-PIC | 67.7% | 9.8% | 68.4% | 2.8% | 57.5% |
| CMT | 89.1% | 10.2% | 68.5% | 4.3% | 47.2% |
Figure 6Comparison between CMT, MACS and T-PIC based on the FDR rate and number of peaks.
Figure 7ROC curve corresponding to CMT, T-PIC and MACS.