Literature DB >> 21138948

W-ChIPeaks: a comprehensive web application tool for processing ChIP-chip and ChIP-seq data.

Xun Lan¹, Russell Bonneville, Jeff Apostolos, Wangcheng Wu, Victor X Jin.

Abstract

UNLABELLED: ChIP-based technology is becoming the leading technology to globally profile thousands of transcription factors and elucidate the transcriptional regulation mechanisms in living cells. It has evolved rapidly in recent years, from hybridization with spotted or tiling microarray (ChIP-chip), to pair-end tag sequencing (ChIP-PET), to current massively parallel sequencing (ChIP-seq). Although there are many tools available for identifying binding sites (peaks) for ChIP-chip and ChIP-seq, few of them are available as easy-accessible online web tools for processing both ChIP-chip and ChIP-seq data for the ChIP-based user community. As such, we have developed a comprehensive web application tool for processing ChIP-chip and ChIP-seq data. Our web tool W-ChIPeaks employed a probe-based (or bin-based) enrichment threshold to define peaks and applied statistical methods to control false discovery rate for identified peaks. The web tool includes two different web interfaces: PELT for ChIP-chip, BELT for ChIP-seq, where both were tested on previously published experimental data. The novel features of our tool include a comprehensive output for identified peaks with GFF, BED, bedGraph and .wig formats, annotated genes to which these peaks are related, a graphical interpretation and visualization of the results via a user-friendly web interface. AVAILABILITY: http://motif.bmi.ohio-state.edu/W-ChIPeaks/.

Entities: CellLine Disease Gene Species

Mesh：

Year: 2010 PMID： 21138948 PMCID： PMC3031039 DOI： 10.1093/bioinformatics/btq669

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

ChIP-based technology is becoming the leading technology to globally profile thousands of transcription factors and elucidate the transcriptional regulation mechanisms in living cells (Farnham, 2009). It has evolved rapidly in recent years, from hybridization with spotted or tiling microarray (ChIP-chip) (Kim ), to pair-end tag sequencing (ChIP-PET) (Loh ), to current massively parallel sequencing (ChIP-seq) (Johnson ). Despite the fact that microarray-based chromatin immuneprecipitation (ChIP) method, ChIP-chip, is gradually being replaced by the emerging sequencing-based method such as ChIP–seq, both methods are currently being used in many laboratories as a major tool to survey transcription factor binding patterns, study various histone modifications in an unbiased manner. Currently available tools for the ChIP-chip data are exemplified and comprehensively compared in the Spike-In data (Johnson ). ChIP-seq technology and related computational tools are also reviewed in Park (2009). CisGenome provided an integrated analyzing software system for both technologies (Ji ). While we appreciate the accuracy and efficiency of these tools, few of them are available as easy-accessible online web tools for processing both ChIP-chip and ChIP-seq data for the ChIP-based user community. As such, we have developed a comprehensive web application tool for processing ChIP-chip and ChIP-seq data. Our web tool W-ChIPeaks employed a probe-based (or bin-based) enrichment threshold to define peaks and applied statistical methods to control false discovery rate for identified peaks. The web tool includes two different web interfaces: probe-based enrichment threshold level (PELT) for ChIP-chip and BELT (bin-based enrichment threshold level) for ChIP-seq, where both were tested on previously published experimental data.

2 METHODS

2.1 Overview

The utility and layout of the W-ChIPeaks is demonstrated in Figure 1. W-ChIPeaks provides a web-based interface with three main features: identification of peaks with GFF, BED, bedGraph and .wig formats, annotated genes to which these peaks are related, annotated genes to which these peaks are related, a graphical interpretation and visualization of the results via a user-friendly web interface. The link of results will be emailed to the address given in the contact information. For two or three ChIP-chip datasets, a plot of overlapping comparison between datasets at different threshold levels is also provided. Usage of W-ChIPeaks web service is simple and does not require any knowledge of the underlying software.

Fig. 1.

The utility and layout of the W-ChIPeaks.

2.2 Input

For ChIP-chip, there are three required inputs from the user: GFF files from NimbleGen or Agilent Array (allow eight sets in maximum), the selection of array types and genomes, and e-mail contact information; For ChIP-seq, there are a few options and one required inputs from the user including Eland and extended Eland for Illumia GAII, bowtie alignment output, BED, GFF, SAM, or BAM format of aligned reads, and e-mail contact information.

2.3 Algorithms and statistical methods

2.3.1 PELT

We employed a probe-based enrichment threshold to define peaks and a permutation-based statistical method to control false discovery rate for identified peaks. Suppose for a sample with N probes (i = 1,…, N) on each array, after normalization of each array, a probe i on the array j has an intensity: I. For any particular peak P among B peaks, it is first defined by a percentile level d based on a distribution of the probes, in which the mean value of the peak intensity consisting of at least three probes in a row has to be greater than the value of that percentile level d (for example, the top 1, 2, or 5% of all probes on the array). We applied the permutation-based approach to estimate the false discovery rate. We permutated each array, and found the number of peaks at a percentile level d, and then repeated the permutation process 1000 times, finally averaged the number of peaks from these 1000 permutations. The number of peaks without permutation at level d is considered as TP(d), and the average of the number of peaks after permutations at the same level d is considered as FP(d). The FDR(d) [FDR(d) = FP(d)/TP(d)] was then obtained at that level d.

2.3.2 BELT

We employed a bin-based enrichment threshold to define peaks and a Monte-Carlo simulation statistical method to control false discovery rate for identified peaks. The BELT algorithm includes four steps: (i) define a series of bin size by evenly dividing the genome varying from 100 bp to 500 bp, and counting the density of reads for each bin; (ii) calculate an average length of ChIP fragments by considering the direction of the reads, decoding the binding site position by shifting the reads (Zhang ); (iii) determine significant enrichment threshold levels by a percentile rank statistic method and (iv) Estimate false discovery rates by utilizing Monte Carlo simulation for modeling background based on signal-noise-ratio of ChIP-seq data. (Supplementary Methods and Supplementary Figure S1). Scoring called peaks and estimation of FDR: A score for a called peak by BELT is empirically defined in Supplementary Methods formula (3) and is used to rank the peaks. A FDR is estimated using Supplementary Methods formulas (5) and (6). Comparison with other ChIP-seq programs: The performance of BELT was compared to four publicly available ChIP-seq programs: MACS, QuEST, PeakSeq and SISSRs on four published datasets: CTCF, FOXA1, ER and NRSF. The results of the number of overlapping peaks between BELT and other programs showed that all of the overlap rates are over 74% (Supplementary Figure S2A). A plot of the relative distance from the predicted binding motif to the real motif showed our program has a similar or higher accuracy than the other programs (Supplementary Figure S2B, Supplementary Table S1).

2.4 Implementation

W-ChIPeaks was implemented with PHP, Perl, Java and C++.

2.5 Output

W-ChIPeaks has a comprehensive output for identified peaks with different formats: GFF, BED, bedGraph and .wig files, annotated genes to which these peaks are related, a graphical interpretation and visualization for the results. For two or three ChIP-chip datasets, a plot of overlapping comparison between datasets at different threshold levels is also provided.

2.6 Sample test

The W-ChIPeaks was tested with different published datasets from the ChIP-chip and ChIP-seq experiments. The array platform for ChIP-chip data is from NimbleGen or Agilent Array Platform. Some of such datasets include E2F1 (Jin ), N-MYC (Cotterman ), ZNF263 (Frietze ), PolII, H3K4me3 in K562 cell line (ENCODE consortium), H3K9me2, H3Ac (Bapat ) and results are available online at: http://motif.bmi.ohio-state.edu/W-ChIPeaks/examples.shtml.

12 in total

1. A high-resolution map of active promoters in the human genome.

Authors: Tae Hoon Kim; Leah O Barrera; Ming Zheng; Chunxu Qu; Michael A Singer; Todd A Richmond; Yingnian Wu; Roland D Green; Bing Ren
Journal: Nature Date: 2005-06-29 Impact factor: 49.962

2. A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--a case study using E2F1.

Authors: Victor X Jin; Alina Rabinovich; Sharon L Squazzo; Roland Green; Peggy J Farnham
Journal: Genome Res Date: 2006-10-19 Impact factor: 9.043

3. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells.

Authors: Yuin-Han Loh; Qiang Wu; Joon-Lin Chew; Vinsensius B Vega; Weiwei Zhang; Xi Chen; Guillaume Bourque; Joshy George; Bernard Leong; Jun Liu; Kee-Yew Wong; Ken W Sung; Charlie W H Lee; Xiao-Dong Zhao; Kuo-Ping Chiu; Leonard Lipovich; Vladimir A Kuznetsov; Paul Robson; Lawrence W Stanton; Chia-Lin Wei; Yijun Ruan; Bing Lim; Huck-Hui Ng
Journal: Nat Genet Date: 2006-03-05 Impact factor: 38.330

4. Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets.

Authors: David S Johnson; Wei Li; D Benjamin Gordon; Arindam Bhattacharjee; Bo Curry; Jayati Ghosh; Leonardo Brizuela; Jason S Carroll; Myles Brown; Paul Flicek; Christoph M Koch; Ian Dunham; Mark Bieda; Xiaoqin Xu; Peggy J Farnham; Philipp Kapranov; David A Nix; Thomas R Gingeras; Xinmin Zhang; Heather Holster; Nan Jiang; Roland D Green; Jun S Song; Scott A McCuine; Elizabeth Anton; Loan Nguyen; Nathan D Trinklein; Zhen Ye; Keith Ching; David Hawkins; Bing Ren; Peter C Scacheri; Joel Rozowsky; Alexander Karpikov; Ghia Euskirchen; Sherman Weissman; Mark Gerstein; Michael Snyder; Annie Yang; Zarmik Moqtaderi; Heather Hirsch; Hennady P Shulha; Yutao Fu; Zhiping Weng; Kevin Struhl; Richard M Myers; Jason D Lieb; X Shirley Liu
Journal: Genome Res Date: 2008-02-07 Impact factor: 9.043

Review 5. Insights from genomic profiling of transcription factors.

Authors: Peggy J Farnham
Journal: Nat Rev Genet Date: 2009-08-11 Impact factor: 53.242

6. Multivalent epigenetic marks confer microenvironment-responsive epigenetic plasticity to ovarian cancer cells.

Authors: Sharmila A Bapat; Victor Jin; Nicholas Berry; Curt Balch; Neeti Sharma; Nawneet Kurrey; Shu Zhang; Fang Fang; Xun Lan; Meng Li; Brian Kennedy; Robert M Bigsby; Tim H M Huang; Kenneth P Nephew
Journal: Epigenetics Date: 2010-11-01 Impact factor: 4.528

7. N-Myc regulates a widespread euchromatic program in the human genome partially independent of its role as a classical transcription factor.

Authors: Rebecca Cotterman; Victor X Jin; Sheryl R Krig; Jessica M Lemen; Alice Wey; Peggy J Farnham; Paul S Knoepfler
Journal: Cancer Res Date: 2008-12-01 Impact factor: 12.701

Review 8. ChIP-seq: advantages and challenges of a maturing technology.

Authors: Peter J Park
Journal: Nat Rev Genet Date: 2009-09-08 Impact factor: 53.242

9. Genome-wide mapping of in vivo protein-DNA interactions.

Authors: David S Johnson; Ali Mortazavi; Richard M Myers; Barbara Wold
Journal: Science Date: 2007-05-31 Impact factor: 47.728

10. Model-based analysis of ChIP-Seq (MACS).

Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583

33 in total

Review 1. Uncovering transcription factor modules using one- and three-dimensional analyses.

Authors: Xun Lan; Peggy J Farnham; Victor X Jin
Journal: J Biol Chem Date: 2012-09-05 Impact factor: 5.157

2. Inference of hierarchical regulatory network of TCF7L2 binding sites in MCF7 cell line.

Authors: Yao Wang; Rui Wang; Victor X Jin
Journal: Int J Comput Biol Drug Des Date: 2016

3. Methyl-binding DNA capture Sequencing for Patient Tissues.

Authors: Rohit R Jadhav; Yao V Wang; Ya-Ting Hsu; Joseph Liu; Dawn Garcia; Zhao Lai; Tim H M Huang; Victor X Jin
Journal: J Vis Exp Date: 2016-10-31 Impact factor: 1.355

4. Agonist and antagonist switch DNA motifs recognized by human androgen receptor in prostate cancer.

Authors: Zhong Chen; Xun Lan; Jennifer M Thomas-Ahner; Dayong Wu; Xiangtao Liu; Zhenqing Ye; Liguo Wang; Benjamin Sunkel; Cassandra Grenade; Junsheng Chen; Debra L Zynger; Pearlly S Yan; Jiaoti Huang; Kenneth P Nephew; Tim H-M Huang; Shili Lin; Steven K Clinton; Wei Li; Victor X Jin; Qianben Wang
Journal: EMBO J Date: 2014-12-22 Impact factor: 11.598

5. EpCAM-Regulated Transcription Exerts Influences on Nanomechanical Properties of Endometrial Cancer Cells That Promote Epithelial-to-Mesenchymal Transition.

Authors: Ya-Ting Hsu; Pawel Osmulski; Yao Wang; Yi-Wen Huang; Lu Liu; Jianhua Ruan; Victor X Jin; Nameer B Kirma; Maria E Gaczynska; Tim Hui-Ming Huang
Journal: Cancer Res Date: 2016-08-28 Impact factor: 12.701

6. Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages.

Authors: Xun Lan; Heather Witt; Koichi Katsumura; Zhenqing Ye; Qianben Wang; Emery H Bresnick; Peggy J Farnham; Victor X Jin
Journal: Nucleic Acids Res Date: 2012-06-06 Impact factor: 16.971

7. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3.

Authors: Seth Frietze; Rui Wang; Lijing Yao; Yu Gyoung Tak; Zhenqing Ye; Malaina Gaddis; Heather Witt; Peggy J Farnham; Victor X Jin
Journal: Genome Biol Date: 2012-09-26 Impact factor: 13.583

8. ZBTB33 binds unmethylated regions of the genome associated with actively expressed genes.

Authors: Adam Blattler; Lijing Yao; Yao Wang; Zhenqing Ye; Victor X Jin; Peggy J Farnham
Journal: Epigenetics Chromatin Date: 2013-05-21 Impact factor: 4.954

9. Genome-wide analysis uncovers high frequency, strong differential chromosomal interactions and their associated epigenetic patterns in E2-mediated gene regulation.

Authors: Junbai Wang; Xun Lan; Pei-Yin Hsu; Hang-Kai Hsu; Kun Huang; Jeffrey Parvin; Tim H-M Huang; Victor X Jin
Journal: BMC Genomics Date: 2013-01-31 Impact factor: 3.969

10. Estimation of CpG coverage in whole methylome next-generation sequencing studies.

Authors: Edwin J C G van den Oord; Jozsef Bukszar; Gábor Rudolf; Srilaxmi Nerella; Joseph L McClay; Lin Y Xie; Karolina A Aberg
Journal: BMC Bioinformatics Date: 2013-02-12 Impact factor: 3.169