Literature DB >> 23271069

A novel Bayesian change-point algorithm for genome-wide analysis of diverse ChIPseq data types.

Haipeng Xing1, Willey Liao, Yifan Mo, Michael Q Zhang.   

Abstract

ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein(1). For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment(2). Reliably identifying these regions was the focus of our work. Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics(3-5) to more rigorous statistical models, e.g. Hidden Markov Models (HMMs)(6-8). We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized. Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types. To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs(9), which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor(10,11) and epigenetic data(12) to illustrate its usefulness.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23271069      PMCID: PMC3565849          DOI: 10.3791/4273

Source DB:  PubMed          Journal:  J Vis Exp        ISSN: 1940-087X            Impact factor:   1.355


  14 in total

1.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci.

Authors:  Michael L Stitzel; Praveen Sethupathy; Daniel S Pearson; Peter S Chines; Lingyun Song; Michael R Erdos; Ryan Welch; Stephen C J Parker; Alan P Boyle; Laura J Scott; Elliott H Margulies; Michael Boehnke; Terrence S Furey; Gregory E Crawford; Francis S Collins
Journal:  Cell Metab       Date:  2010-11-03       Impact factor: 27.287

3.  High-resolution profiling of histone methylations in the human genome.

Authors:  Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Dustin E Schones; Zhibin Wang; Gang Wei; Iouri Chepelev; Keji Zhao
Journal:  Cell       Date:  2007-05-18       Impact factor: 41.582

4.  The NIH Roadmap Epigenomics Mapping Consortium.

Authors:  Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson
Journal:  Nat Biotechnol       Date:  2010-10       Impact factor: 54.908

5.  HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.

Authors:  Zhaohui S Qin; Jianjun Yu; Jincheng Shen; Christopher A Maher; Ming Hu; Shanker Kalyana-Sundaram; Jindan Yu; Arul M Chinnaiyan
Journal:  BMC Bioinformatics       Date:  2010-07-02       Impact factor: 3.169

Review 6.  ChIP-seq: advantages and challenges of a maturing technology.

Authors:  Peter J Park
Journal:  Nat Rev Genet       Date:  2009-09-08       Impact factor: 53.242

7.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing.

Authors:  Gordon Robertson; Martin Hirst; Matthew Bainbridge; Misha Bilenky; Yongjun Zhao; Thomas Zeng; Ghia Euskirchen; Bridget Bernier; Richard Varhol; Allen Delaney; Nina Thiessen; Obi L Griffith; Ann He; Marco Marra; Michael Snyder; Steven Jones
Journal:  Nat Methods       Date:  2007-06-11       Impact factor: 28.547

8.  Identifying dispersed epigenomic domains from ChIP-Seq data.

Authors:  Qiang Song; Andrew D Smith
Journal:  Bioinformatics       Date:  2011-02-16       Impact factor: 6.937

9.  BayesPeak: Bayesian analysis of ChIP-seq data.

Authors:  Christiana Spyrou; Rory Stark; Andy G Lynch; Simon Tavaré
Journal:  BMC Bioinformatics       Date:  2009-09-21       Impact factor: 3.169

10.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles.

Authors:  Elodie Portales-Casamar; Supat Thongjuea; Andrew T Kwon; David Arenillas; Xiaobei Zhao; Eivind Valen; Dimas Yusuf; Boris Lenhard; Wyeth W Wasserman; Albin Sandelin
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

View more
  3 in total

1.  Application of quantitative trait locus mapping and transcriptomics to studies of the senescence-accelerated phenotype in rats.

Authors:  Elena E Korbolina; Nikita I Ershov; Leonid O Bryzgalov; Natalia G Kolosova
Journal:  BMC Genomics       Date:  2014-12-19       Impact factor: 3.969

2.  A model-based approach to identify binding sites in CLIP-Seq data.

Authors:  Tao Wang; Beibei Chen; MinSoo Kim; Yang Xie; Guanghua Xiao
Journal:  PLoS One       Date:  2014-04-08       Impact factor: 3.240

3.  Differential analysis of chromatin accessibility and histone modifications for predicting mouse developmental enhancers.

Authors:  Shaliu Fu; Qin Wang; Jill E Moore; Michael J Purcaro; Henry E Pratt; Kaili Fan; Cuihua Gu; Cizhong Jiang; Ruixin Zhu; Anshul Kundaje; Aiping Lu; Zhiping Weng
Journal:  Nucleic Acids Res       Date:  2018-11-30       Impact factor: 16.971

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.