| Literature DB >> 21554688 |
Jose M Muiño1, Kerstin Kaufmann, Roeland Chj van Ham, Gerco C Angenent, Pawel Krajewski.
Abstract
BACKGROUND: In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology.Entities:
Year: 2011 PMID: 21554688 PMCID: PMC3114017 DOI: 10.1186/1746-4811-7-11
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Summary of read statistics for the ChIP-seq libraries analysed
| Library name* | No. of sequenced reads | No. of mapped reads | No. of non-duplicated mapped reads | Percentage of duplicated mapped reads | SRA ID |
|---|---|---|---|---|---|
| Sc | 4,065,558 | 1,640,977 | 1,047,009 | 37% | SRX004992 |
| S1 | 3,112,455 | 992,908 | 525,779 | 47% | SRX004990 |
| S2-S5 | NA | 1,192,908 | 525,779 | 56% | NA |
| S6 | 614,236 | 124,619 | 56,619 | 55% | E-MTAB-587 |
| S7 | 1,474,956 | 310,888 | 79,996 | 75% | E-MTAB-587 |
| S8 | 4,105,326 | 1,558,098 | 78,434 | 95% | E-MTAB-587 |
| Ac | 20,983,004 | 11,703,244 | 5,323,373 | 54% | SRX018394; SRX018395 |
| A1 | 15,941,703 | 13,293,909 | 9,708,068 | 27% | SRX018392; SRX018393 |
* For library description see text
Figure 1CSAR analysis workflow. (A) Typical analysis workflow using CSAR. (B) Mapped reads (continuous line) are virtually extended (dashed line) for each strand directionally. Number of extended reads that overlap each nucleotide position is counted for both strands independently, and the minimum value for both strands is taken as "number of hits". (C) Consequently, regions with duplicated reads mapping to only one strand will not be considered significant. (D) CSAR output can be visualized in a typical genome browser.
Figure 2ChIP-seq method comparison. (A) Proportion of peaks with a CArG-box (CCW6GG or CCW7G) within a distance of 50 bp among the significant regions detected by each method in the comparison of S1 to Sc. (B) Proportion of peaks detected by each method in the comparison of A1 to Ac with at least one target gene differentially expressed. Only peaks near a gene (3 kb upstream or 1kb downstream) represented in the microarray experiments were considered. The list of genes which expression is affected by AP1 was downloaded from [19], we used the list denoted "Agilent and_or Operon_BH-0h". Default options for QuEST results in the identification of only 66 significant peaks, therefore we used the option "Relaxed peak calling parameters" for Figure 2B. For comparison purposes, all scores reported by the different methods were transformed into rank scores with zero as the rank of the most significant peak.
Number of significant regions detected
| S1 vs Sc | S2-S5 vs Sc* | S6 vs Sc | S7 vs Sc | S8 vs Sc | ||
|---|---|---|---|---|---|---|
| CSAR | Total | 3,235 | 3,306(5) | 57 | 150 | 126 |
| Common | 3,235 | 3,226(2.6) | 52 | 130 | 104 | |
| False Positives | - | 2% | 9% | 13% | 17% | |
| QuEST | Total | 985 | 989(11) | 5,663 | 4,724 | 5,709 |
| Common | 985 | 971(4.2) | 440 | 433 | 422 | |
| False Positives | - | 2% | 92% | 91% | 92% | |
| CisGenome | Total | 2,030 | 14,632(30) | 9 | 91 | 169 |
| Common | 2,030 | 1,633(4) | 1 | 24 | 23 | |
| False Positives | - | 89% | 89% | 74% | 86% | |
| PICS | Total | 2,846 | 1,952(24.7) | 1,256 | 1,575 | 153 |
| Common | 2,846 | 1,253(5.9) | 382 | 435 | 51 | |
| False Positives | - | 36% | 70% | 72% | 67% | |
| MACS | Total | 2,728 | 2,728(0) | 2,821 | 3,687 | 3,624 |
| Common | 2,728 | 2728(0) | 631 | 761 | 716 | |
| False Positives | - | 0% | 78% | 79% | 80% | |
*Results for the in silico-modified libraries (S2-S5) are summarized with its average and standard deviation (in parenthesis)