| Literature DB >> 23171294 |
Gregor D Gilfillan1, Timothy Hughes, Ying Sheng, Hanne S Hjorthaug, Tobias Straub, Kristina Gervin, Jennifer R Harris, Dag E Undlien, Robert Lyle.
Abstract
BACKGROUND: Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) offers high resolution, genome-wide analysis of DNA-protein interactions. However, current standard methods require abundant starting material in the range of 1-20 million cells per immunoprecipitation, and remain a bottleneck to the acquisition of biologically relevant epigenetic data. Using a ChIP-seq protocol optimised for low cell numbers (down to 100,000 cells/IP), we examined the performance of the ChIP-seq technique on a series of decreasing cell numbers.Entities:
Mesh:
Year: 2012 PMID: 23171294 PMCID: PMC3533509 DOI: 10.1186/1471-2164-13-645
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Genomic mapping of sequence reads. The proportion of reads that were unmapped, those mapping to single genomic positions, and those mapping to multiple locations (repeats) are illustrated. The latter two categories are broken down into reads present as a unique copy, or those reads that are present in two or more identical copies (duplicates). The total number of reads generated for each experimental condition are given at the right.
Figure 2H3K4me3 peaks are found at promoters, where peak heights parallel gene expression levels. (a) 330 kb section of the gene-dense major histocompatibility complex (MHC) visualised in the Integrative Genomics Viewer [25]. Tracks display read depth for benchmark (gray) and new (black) ChIP methods at decreasing input cell numbers. Maximum read depth over the displayed area is indicated on the right of each track. Only uniquely mapping, non-duplicate reads are displayed. (b) 8 kb region showing H3K4me3 signal over the promoter of the RPL30 gene. (c) Sequence coverage over transcription start sites (TSS). Coverage is displayed as a function of gene expression, with genes divided into quartiles based on expression level.
Peak calling, sensitivity (detection of peaks called in the benchmark) and specificity (off-target peaks not present in the benchmark)
| Total number reads | 13 402 262 | 13 798 839 | 14 136 243 | 16 693 519 | 15 601 563 | 56 904 707 |
| Number unique, non-duplicate reads | 6 011 891 | 7 0330 709 | 5 794 519 | 3 463 886 | 2 423 126 | 661 591 |
| Number of peaks called (fraction relative to benchmark) | 16 545 | 16 244 | 17 054 | 15 636 | 14 771 | 12 296 |
| (1) | (0.98) | (1.03) | (0.95) | (0.89) | (0.74) | |
| Sensitivity relative to benchmark | 1 | 0.93 | 0.96 | 0.89 | 0.85 | 0.69 |
| Specificity relative to benchmark | 1 | 0.94 | 0.93 | 0.95 | 0.97 | 0.98 |
Peaks were called using MACS, allowing no ambiguously mapping or duplicate reads.
Figure 3Saturation, sensitivity and correlation of peak calling with decreasing cell number. (a) Saturation of peak calling as reads are randomly discarded. Peaks were called using only unique non-duplicated reads. (b) Overlap of called peaks in the different datasets with benchmark dataset peaks. Inset diagram defines examples of full or partial peak overlap, with the upper bar in each case representing the benchmark. Colours as in panel a. (c) Coverage of benchmark peaks by peaks in other datasets. Colours as in panel a. Inset shows examples of coverage, with upper bar in each case representing the benchmark. (d) Correlation of peak heights between benchmark and new sample datasets. Spearman correlation coefficients (ρ) are given. Only peaks overlapping a benchmark peak were included in this analysis. The number of reads in a given peak was normalised to the total number of reads (uniquely mapping non-duplicated) in the sample.
Figure 4Reproducibility of H3K4me3 and H3K27me3 ChIP-seq with the new method. (a) 1Mb region of chromosome 2 containing the transcriptionally active STAT1 / 4 and inactive MYO1B loci, visualised in the IGV genome browser. (b) Heatmap display in IGV genome browser showing triplicate ChIP signals over an 8 Mb region on chromosome 12. H3K27me3 and H3K4me3 signals are shown for 50 and 10 kb window sizes respectively. (c-f) Genome-wide pairwise correlations of read depth in 50 kb (H3K27me3) and 10 kb (H3K4me3) bins for selected replicate samples. Pearson correlation coefficients are given for each comparison. Read depth per bin was normalised to the total number of uniquely mapping reads per sample (reads per bin per million uniquely mapped reads).
Genome-wide pairwise correlation coefficients of replicate ChIP experiments
| GM H3K27 me3 R1 | 1.00 | 0.86 | 0.07 | −0.02 | 0.70 | 0.70 | 0.70 | 0.70 | 0.69 | 0.69 | −0.05 | −0.05 | −0.05 | 0.01 | 0.02 | 0.06 |
| GM H3K27 me3 R2 | 0.86 | 1.00 | 0.17 | 0.16 | 0.74 | 0.70 | 0.72 | 0.72 | 0.73 | 0.70 | 0.04 | 0.05 | 0.05 | 0.09 | 0.10 | 0.12 |
| GM H3K4 me3 R1 | 0.07 | 0.17 | 1.00 | 0.95 | 0.01 | 0.02 | 0.01 | 0.02 | 0.02 | 0.03 | 0.68 | 0.67 | 0.69 | 0.57 | 0.63 | 0.51 |
| GM H3K4 me3 R2 | −0.02 | 0.16 | 0.95 | 1.00 | 0.00 | −0.01 | −0.01 | 0.00 | 0.01 | 0.00 | 0.72 | 0.71 | 0.74 | 0.59 | 0.64 | 0.49 |
| H3K27me3 100k R1 | 0.70 | 0.74 | 0.01 | 0.00 | 1.00 | 0.95 | 0.97 | 0.96 | 0.96 | 0.94 | −0.09 | −0.09 | −0.09 | −0.06 | −0.06 | −0.03 |
| H3K27me3 100k R2 | 0.70 | 0.70 | 0.02 | −0.01 | 0.95 | 1.00 | 0.94 | 0.94 | 0.93 | 0.92 | −0.09 | −0.09 | −0.09 | −0.05 | −0.05 | −0.01 |
| H3K27me3 100k R3 | 0.70 | 0.72 | 0.01 | −0.01 | 0.97 | 0.94 | 1.00 | 0.95 | 0.95 | 0.93 | −0.09 | −0.09 | −0.09 | −0.05 | −0.05 | −0.02 |
| H3K27me3 20k R1 | 0.70 | 0.72 | 0.02 | 0.00 | 0.96 | 0.94 | 0.95 | 1.00 | 0.95 | 0.93 | −0.09 | −0.09 | −0.09 | −0.05 | −0.05 | −0.02 |
| H3K27me3 20k R2 | 0.69 | 0.73 | 0.02 | 0.01 | 0.96 | 0.93 | 0.95 | 0.95 | 1.00 | 0.93 | −0.08 | −0.08 | −0.08 | −0.04 | −0.04 | −0.01 |
| H3K27me3 20k R3 | 0.69 | 0.70 | 0.03 | 0.00 | 0.94 | 0.92 | 0.93 | 0.93 | 0.93 | 1.00 | −0.08 | −0.09 | −0.09 | −0.04 | −0.04 | 0.00 |
| H3K4me3 100k R1 | −0.05 | 0.04 | 0.68 | 0.72 | −0.09 | −0.09 | −0.09 | −0.09 | −0.08 | −0.08 | 1.00 | 0.86 | 0.88 | 0.70 | 0.76 | 0.59 |
| H3K4me3 100k R2 | −0.05 | 0.05 | 0.67 | 0.71 | −0.09 | −0.09 | −0.09 | −0.09 | −0.08 | −0.09 | 0.86 | 1.00 | 0.87 | 0.69 | 0.76 | 0.59 |
| H3K4me3 100k R3 | −0.05 | 0.05 | 0.69 | 0.74 | −0.09 | −0.09 | −0.09 | −0.09 | −0.08 | −0.09 | 0.88 | 0.87 | 1.00 | 0.70 | 0.77 | 0.58 |
| H3K4me3 20k R1 | 0.01 | 0.09 | 0.57 | 0.59 | −0.06 | −0.05 | −0.05 | −0.05 | −0.04 | −0.04 | 0.70 | 0.69 | 0.70 | 1.00 | 0.66 | 0.56 |
| H3K4me3 20k R2 | 0.02 | 0.10 | 0.63 | 0.64 | −0.06 | −0.05 | −0.05 | −0.05 | −0.04 | −0.04 | 0.76 | 0.76 | 0.77 | 0.66 | 1.00 | 0.60 |
| H3K4me3 20k R3 | 0.06 | 0.12 | 0.51 | 0.49 | −0.03 | −0.01 | −0.02 | −0.02 | −0.01 | 0.00 | 0.59 | 0.59 | 0.58 | 0.56 | 0.60 | 1.00 |
Pearson′s correlation coefficients for all pairwise sample comparisons were calculated for read depth across the genome divided into 50 kb (H3K27me3) or 10 kb (H3K4me3) non-overlapping bins. Replicate datasets derived from 100,000 cells / IP (100k) and those from 20,000 cells / IP (20k) are denoted by suffixes R1- R3. For comparison, four ENCODE datasets (two replicates each of H3K27me3 and H3K4me3) from the cell line GM12878, a lymphoblastoid cell line, have been included (GM; replicates denoted by R1 and R2).
ChIP-seq from primary cells isolated from human monozygotic twins
| Cell no. / IP | 5 x 105 | 5 x 105 | 5 x 105 | 5 x 105 | 5 x 105 | 5 x 105 |
| No. reads | 35 244 517 | 44 255 574 | 40 644 508 | 45 738 891 | 38 332 819 | 29 484 478 |
| No. unique, nonduplicate reads | 3 978 339 | 6 524 094 | 3 316 312 | 3 340 926 | 3 130 804 | 1 167 824 |
| No. Peaks called | 14 828 | 12 622 | 15 598 | 15 731 | 15 825 | 13 719 |
| No. overlapping peaks (%) | 12 457 (82%) | 14 833 (94%) | 13 091 (83%) | |||
| Cell no. / IP | 4.3 x 105 | 3.7 x 105 | 4.7 x 105 | 5 x 105 | 3.8 x 105 | 4.2 x 105 |
| No. reads | 45 245 361 | 35 681 254 | 38 455 042 | 35 257 788 | 34 505 357 | 41 729 689 |
| No. unique, nonduplicate reads | 18 996 309 | 4 778 486 | 12 403 566 | 6 654 203 | 5 840 308 | 4 438 574 |
| No. Peaks called | 17 704 | 18 720 | 20 728 | 18 145 | 18 743 | 20 899 |
| No. overlapping peaks (%) | 14 578 (78%) | 18 828 (76%) | 15 312 (73%) | |||
Peaks were called using MACS, allowing no ambiguously mapping or duplicate reads. Peaks with p-values > 1x10-10 were excluded from analysis. Numbers of overlapping peaks were counted, and expressed as a percentage of the highest peak count for the twin pair.
Variable parameters applied to chromatin from different starting cell numbers
| Cells per IP | 2 x 107 | 2 x 107 | 2.5 x 106 | 5 x 105 | 1 x 105 | 2 x 104 |
| MNase digestion volume (cells / ml) | 1500 μl | 1500 μl | 500 μl | 100 μl | 20 μl | 20 μl |
| (2.7 x 107/ml) | (2.7 x 107/ml) | (1 x 107 / ml) | (1 x 107 / ml) | (1 x 107 / ml) | (2 x 106 / ml) | |
| IP volume | 750 μl | 1500 μl | 500 μl | 100 μl | 100 μl | 100 μl |
| (in 1.5 ml tube) | (in 2 ml tube) | (in 1.5 ml tube) | (in 0.2 ml PCR tube) | (in 0.2 ml PCR tube) | (in 0.2 ml PCR tube) | |
| Protein A/G bead volume for preclearing / IP | 50 μl | 50 μl | 50 μl | 10 μl | 10 μl | 10 μl |
| Antibody amount / IP | 5μg | 5μg | 5μg | 1 μg | 1 μg | 1 μg |
| Wash buffer volumes | 1 ml | 1 ml | 1 ml | 150 μl | 150 μl | 150 μl |
“Benchmark” refers to the protocol now published by Zhao and colleagues [21] and “new” to the method presented here.