| Literature DB >> 32085702 |
Vasudha Sharma1, Sharmistha Majumdar2.
Abstract
BACKGROUND: ChIP (Chromatin immunoprecipitation)-exo has emerged as an important and versatile improvement over conventional ChIP-seq as it reduces the level of noise, maps the transcription factor (TF) binding location in a very precise manner, upto single base-pair resolution, and enables binding mode prediction. Availability of numerous peak-callers for analyzing ChIP-exo reads has motivated the need to assess their performance and report which tool executes reasonably well for the task.Entities:
Keywords: ChIP-exo; Peak-caller
Year: 2020 PMID: 32085702 PMCID: PMC7035708 DOI: 10.1186/s12859-020-3403-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Peak-callers used for comparison in this study along with their key features and output formats
| Tool | Key feature | Output |
|---|---|---|
| MACS, 2008 [ | 1. Uses bimodal distribution of reads to model fragment length. 2. Uses dynamic Poisson distribution to compare test and control samples | 1. Peak position 2. p-value (based on pileup height at peak summit) and q value (against random Poisson distribution with local lambda) |
| GEM, 2012 [ | 1. Uses a generative probabilistic model to assign positions to the reads after each iteration 2. Reciprocally links binding event discovery and motif discovery 3. Resolves closely spaced binding events | 1. Binding events file (including location, IP strength, fold enrichment, 2. Motif files 3. K-mer set memory motifs 4. HTML output 5. Read distribution file 6. The spatial distribution between primary and secondary motifs |
| Peakzilla, 2013 [ | 1. Estimates all parameters from the data itself 2. Uses bimodal distribution of reads to calculate fragment length and predict binding sites 3. Resolves closely spaced binding events | 1. Peak file with exact position, summit, score (based on read distribution in peaks that fits bimodal tag distribution and chi-square test), FDR, fold enrichment. 2. Negative peaks in the presence of control. |
| Genetrack, 2008 [ | 1. Rapid data smoothing using Gaussian smoothing 2. Peak detection by selecting the highest peak in a local maximum with an exclusion zone of up to a few hundred bp 3. Combines strand information in a composite value 4. Requires manual pairing of border peaks | 1. Gff file with chromosome, peak exclusion zone, tag sum, strand information and standard deviation of reads in the peak exclusion zone |
| MACE, 2014 [ | 1. Normalizes and corrects sequencing data for any biases 2. Consolidates signal to noise ratio by reducing noise 3. Detects border peaks using the Chebyshev Inequality and pairs them using Gale-Shapley stable matching algorithm | 1. BED file containing border pairs of the binding event, the method for detecting each border pair and corresponding p-value (composite p-value of two borders in a pair) |
| Exoprofiler, 2015 [ | 1. Useful to detect different types of footprints 2. The peaks are scanned against the motif database to find the highest scoring peaks 3. High scoring peaks are then used to calculate 5′ ChIP-exo coverage of reads relative to the TFBS center to find the protein-DNA crosslink boundaries | 1. Heat map of 5′ ChIP-exo coverage 2. Footprint profile of 5′ coverage of all reads 3. Footprint profile of the 5′ coverage of reads on both strands matching the scanned motif (output of motif permutation) |
| ChExMix, 2018 [ | 1. Probabilistic mixture model for characterizing different modes of DNA-protein interactions 2. Expectation Maximization (EM) algorithm for estimating binding subtype probability for each binding event | 1. Event subtype file (reports total read count, signal fraction, binding coordinate, fold enrichment, event subtype, binding sequence, log[2]p-value (log likelihood score of subtype specificity for a motif hit)) 2. Motif file 3. Peak-peak distance histogram 4. Peak-motif distance histogram |
Fig. 1Quality metrics of ChIP-exo datasets as reported by ChIPexoQual. a ARC vs. URC plots for IMR90, K562 and U2OS datasets. The color represents the number of read islands (enriched regions) or bins, and with increasing number of read islands, the color shifts from blue to yellow. b Region composite plots and Forward Strand Ratio plots for IMR90, K562 and U2OS datasets representing the strand compositions of read islands (enriched regions). Left panel: Region composite plots, in which green represents the proportion of read islands that have reads only on the reverse strand, blue represents the proportion with reads on forward strands and red represents read islands with reads on both strands. Right panel: FSR plots in which quantiles are marked with green (0.25), red (0.5) and purple (0.75). c β1 and β2, estimates of library complexity for IMR90, K562 and U2OS datasets. The box and whiskers plot here, gives the median value of β1 and β2 for all three cell types
Fig. 2Total number of peaks/binding events reported by various peak-callers i.e. GEM, Genetrack, MACE, MACS, and Peakzilla (a) Before de-duplication of reads (b) After deduplication of reads
Fig. 3Unique regions identified by each peak caller and GBS motif occupancy in the respective unique regions. a Number of unique peaks identified by GEM, Genetrack, MACE, MACS and Peakzilla in three cell types, IMR90, K562, and U2OS. b Percentage of MA0113.2 motif present in the reported unique peaks as found by FIMO
Fig. 4a Number of GBS motifs (MA0113.2) scanned by FIMO in peak output of GEM, Genetrack, MACE, MACS, and Peakzilla. b GBS (MA0113.2) and GATA (MA0140.2) occupancy in IMR90, U2OS, and K562 datasets before and after deduplication, as reported by FIMO
Fig. 5Fraction of motifs present in the top peaks of all the peak callers. a Fraction of GBS motif in top peaks (arranged in descending order of significance score) reported in IMR90 dataset (b) Fraction of GATA motif in top peaks (arranged in descending order of significance score) reported in K562 dataset (c) Fraction of GBS motif in top peaks (arranged in descending order of significance score) reported in U2OS dataset
MEME output for MACE, MACS, Genetrack and Peakzilla and motifs reported directly by ChExMix and GEM
Fig. 6Binding subtypes discovered by ChExMix. Motifs and read distributions from (a) IMR90 dataset (b) K562 dataset (c) U2OS dataset (first subtype)