| Literature DB >> 20598134 |
Zhaohui S Qin1, Jianjun Yu, Jincheng Shen, Christopher A Maher, Ming Hu, Shanker Kalyana-Sundaram, Jindan Yu, Arul M Chinnaiyan.
Abstract
BACKGROUND: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20598134 PMCID: PMC2912305 DOI: 10.1186/1471-2105-11-369
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of reproducibility from two-sample Kolmogorov-Smirnov tests performed on the STAT1 ChIP-Seq data*.
| Stimulated | |||||
|---|---|---|---|---|---|
| Lanes | 1:2 | 5:6 | 7:8 | ||
| Significant | 0 | 0 | 0 | ||
| Lanes | 2:5 | 3:4 | 7:8 | ||
| Significant | 1 | 1 | 0 | ||
| Lanes | 1:3 | 1:4 | 2:3 | 2:4 | 5:7 |
| Significant | 38 | 40 | 42 | 40 | 36 |
*numbers displayed in the table are quantities of chromosome/strand combinations that show significant discrepancy when conducting the two-sample Kolmogorov-Smirnov test.
Figure 1Comparison of motif enrichment in peaks identified by ChIP-chip and Chip-Seq. Chi-square test statistics from 2 × 2 contingency table is shown for all 153 families of vertebrate TF binding motif patterns found in the MatBase library 7.0 database of Genomatix (Genomatix, GmBH, Munich, Germany). Motif scan was performed using MatInspector in Genomatix using the default setting. A. STAT1 ChIP-chip result (on about 10% of the entire genome, the majority of them (88%) on chromosomes 20, 21, 22, X and Y). B. STAT1 ChIP-Seq result (subset of 2,023 peaks out of 24,394 located on Chromsome 20, 21, 22, X and Y). C. Correlation between motif enrichment and rank of significance in peaks indentified from STAT1 ChIP-Seq and ChIP-chip experiments. All peaks were ordered according to their significance and then divided into five segments of equal sizes. Their motif enrichment is measured by Chi-square test statistics in these five segments are shown from left to right.
Summary of peaks identified by various peaking calling algorithms.
| Peak Findera | MACS | HPeakb | FindPeaks | HPeak | ChIPseeqer | SISSRs | CisGenome | |
|---|---|---|---|---|---|---|---|---|
| NRSF | ||||||||
| chip: 1.7M | ||||||||
| mock: 2.3M | ||||||||
| Number of peaks | 1,935 | 4,679 | 4,404 | 3,445 | 4,085 | 2,361 | 5,243 | 2,545 |
| Covered space (kb) | 908 | 1,902 | 1,112 | 4,936 | 1,512 | 682 | 276 | 775 |
| Avg peak width (bp) | 469 | 406 | 253 | 1,433 | 370 | 289 | 53 | 304 |
| STAT1 | ||||||||
| stimulated: 15.3M | ||||||||
| unstimulated: 13.0M | ||||||||
| Number of peaks | - | 22,402 | 24,490 | 41,127 | 43,443 | 11,662 | 9,561 | 38,878 |
| Covered space (kb) | - | 16,940 | 6,562 | 46,781 | 15,354 | 3,025 | 455 | 10,012 |
| Avg peak width (bp) | - | 756 | 269 | 1,137 | 353 | 259 | 48 | 258 |
| H3K4me3 | ||||||||
| 16.8M | ||||||||
| Number of peaks | 28,960 | 27,568 | - | 33,890 | 41,217 | 31,773 | 137,286 | 46,261 |
| Covered space (kb) | 30,610 | 36,675 | - | 83,348 | 30,435 | 18,789 | 6,464 | 26,500 |
| Avg peak width (bp) | 1,057 | 1,330 | - | 2,459 | 738 | 591 | 47 | 573 |
| H3K27me3 | ||||||||
| 9.0M | ||||||||
| Number of peaks | 335 | 1,342 | - | 8,348 | 4,858 | 417 | 2,458 | 437 |
| Covered space (kb) | 83 | 607 | - | 19,234 | 894 | 115 | 138 | 191 |
| Avg peak width (bp) | 248 | 452 | - | 2304 | 184 | 276 | 56 | 436 |
Figure 2Performance comparison between HPeak (using data from both treated and untreated samples or using data from treated sample only) and other ChIP-Seq analysis algorithms. A. NRSF ChIP-Seq data: Chi-square test statistics of motif enrichment comparing original sequences under peaks and a set of random control sequences. B. STAT1 ChIP-Seq data: Chi-square test statistics of motif enrichment comparing original sequences under peaks and a set of random control sequences.
Summary of overlaps among peaks identified by different peaking calling algorithms in H3K4me3 and H3K27me3 ChIP-Seq datasets*.
| H3K4me3 | |||||||
|---|---|---|---|---|---|---|---|
| Peak Finder (28,960) | 84.3 | 99.8 | 100.0 | 98.2 | 98.0 | 99.6 | |
| MACS (27,568) | 90.6 | 95.4 | 84.8 | 87.7 | 84.8 | ||
| HPeak (41,217) | 99.4 | 100 | 98.9 | 97.1 | |||
| FindPeaks (33,886) | 100 | 99.8 | 100 | ||||
| ChIPseeqer (31,773) | 99.0 | 100 | |||||
| SISSRs (137,286) | 94.6 | ||||||
| CisGenome (46,261) | |||||||
| Peak Finder (335) | 44.8 | 99.7 | 100 | 77.3 | 90.4 | 70.9 | |
| MACS (1,341) | 42.5 | 42.4 | 23.5 | 24.4 | 32.9 | ||
| HPeak (4,858) | 72.1 | 77.7 | 98.2 | 82.0 | |||
| FindPeaks (8,346) | 77.7 | 91.6 | 87.1 | ||||
| ChIPseeqer (417) | 58.5 | 75.8 | |||||
| SISSRs (2,455) | 25.4 | ||||||
| CisGenome (437) | |||||||
* We compare two sets of peaks (generated from two different peak-calling algorithms) to assess how much overlap can be found among them. Numbers displayed is the percentage of peaks in one set that are overlapped with at least one peak in another set. For each pair of peak sets, two percentages can be calculated by switching the order of the two sets. The higher percentage for each pair of peak sets is shown.
Figure 3Workflow of HPeak analysis of ChIP-Seq data.