| Literature DB >> 33858322 |
Xi Chen1,2, Xu Shi1, Andrew F Neuwald3, Leena Hilakivi-Clarke4, Robert Clarke4, Jianhua Xuan5.
Abstract
BACKGROUND: ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq 'peak' observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs.Entities:
Year: 2021 PMID: 33858322 PMCID: PMC8051094 DOI: 10.1186/s12859-021-04108-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1ChIP-seq peak detection using a Gaussian mixture model. ChIP-BIT2 a converted read counts to read intensity and then b used a mixture of Gaussian distributions to differentiate (strong and weak) binding events from background signals
Fig. 2ChIP-BIT2 pipeline. ChIP-BIT2 respectively extracted read location information from sample and input ChIIP-seq SAM format profiles. Depending on the running mode, it can detect peaks from the whole genome or from annotated regulatory regions like promoters or enhancers. To enable peak detection of different sizes, ChIP-BIT2 partitioned genomic segments into smaller windows and calculated read intensity in each window for distribution parameter learning and binding occurrence probability estimation. Windows with the posterior probability over 0.9 were output in BED format as final peaks
F1-score and run-time of competing peak callers on H3K4me3 and H3K36me3 benchmark region detection using ENCODE ChIP-seq datasets
| Cell line | K562 | GM12878 | ||
|---|---|---|---|---|
| Histone protein | H3K4me3 | H3K36me3 | H3K4me3 | H3K36me3 |
| F1-score (Supervised) | ||||
| ChIP-BIT2 | 0.93 | 0.90 | 0.95 | 0.90 |
| MACS2 | 0.89 | 0.78 | 0.93 | 0.83 |
| CNN-peaks | 0.91 | 0.85 | 0.90 | 0.88 |
| F1- score (unsupervised) | ||||
| ChIP-BIT2 | 0.88 | 0.82 | 0.91 | 0.82 |
| MACS2 | 0.82 | 0.77 | 0.84 | 0.79 |
| Run-time (unsupervised) | ||||
| ChIP-BIT2 | 14m1s | 9m21s | 15m7s | 9m9s |
| MACS2 | 3m42s | 2m39s | 5m35s | 2m32s |
Fig. 3Running time comparison between ChIP-BIT2 and ChIP-BIT
Fig. 4Peak detection summary of ChIP-BIT2 for 50 DNA proteins. a Using breast cancer MCF-7 cells ChIP-seq data of 39 TFs and 11 HMs from ENCODE data portal, ChIP-BIT2 detected peaks from the whole genome. b We calculated the proportion of peaks detected from promoters, enhancers or at other regions (peaks from whole genome minus peaks in promoters or enhancers), respectively, and c calculated the log2 ratio of the numbers of peaks between enhancers and promoters
Fig. 5Venn diagram of binding events detected by ChIP-BIT2 and ENCODE at MCF-7 active promoters or enhancers. a TFBSs at 489 promoters; b HMs at 489 promoters; c TFBSs at 1050 enhancers; d HMs at 1050 enhancers