| Literature DB >> 22897152 |
Hamid Bolouri1, Walter L Ruzzo.
Abstract
We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ~0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ~8% of the genome (Ernst et al., Nature 2011 , 473, 43-49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011 , 21, 456-464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. (2011) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. (2011). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer ( http://www.sanger.ac.uk/genetics/CGP/cosmic/ ). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog (www.broadinstitute.org/genome_bio/human_lincrnas/). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22897152 DOI: 10.1089/cmb.2012.0100
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479