Zheng Xu1, Guosheng Zhang2, Fulai Jin3, Mengjie Chen4, Terrence S Furey5, Patrick F Sullivan6, Zhaohui Qin7, Ming Hu8, Yun Li1. 1. Department of Biostatistics, Department of Genetics, Department of Computer Science. 2. Department of Computer Science, Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599, USA. 3. Department of Genetics, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44016. 4. Department of Biostatistics, Department of Genetics. 5. Department of Genetics. 6. Department of Genetics, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 7. Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA and. 8. Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA.
Abstract
MOTIVATION: Advances in chromosome conformation capture and next-generation sequencing technologies are enabling genome-wide investigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. One essential task in such studies is peak calling, that is, detecting non-random interactions between loci from the two-dimensional contact frequency matrix. Successful fulfillment of this task has many important implications including identifying long-range interactions that assist interpreting a sizable fraction of the results from genome-wide association studies. The task - distinguishing biologically meaningful chromatin interactions from massive numbers of random interactions - poses great challenges both statistically and computationally. Model-based methods to address this challenge are still lacking. In particular, no statistical model exists that takes the underlying dependency structure into consideration. RESULTS: In this paper, we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis. AVAILABILITY AND IMPLEMENTATION: The Source codes can be downloaded at: http://www.unc.edu/∼yunmli/HMRFBayesHiC CONTACT: ming.hu@nyumc.org or yunli@med.unc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Advances in chromosome conformation capture and next-generation sequencing technologies are enabling genome-wide investigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. One essential task in such studies is peak calling, that is, detecting non-random interactions between loci from the two-dimensional contact frequency matrix. Successful fulfillment of this task has many important implications including identifying long-range interactions that assist interpreting a sizable fraction of the results from genome-wide association studies. The task - distinguishing biologically meaningful chromatin interactions from massive numbers of random interactions - poses great challenges both statistically and computationally. Model-based methods to address this challenge are still lacking. In particular, no statistical model exists that takes the underlying dependency structure into consideration. RESULTS: In this paper, we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis. AVAILABILITY AND IMPLEMENTATION: The Source codes can be downloaded at: http://www.unc.edu/∼yunmli/HMRFBayesHiC CONTACT: ming.hu@nyumc.org or yunli@med.unc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205
Authors: Mark B Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G Landt; Koon-Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P Boyle; Philip Cayting; Alexandra Charos; David Z Chen; Yong Cheng; Declan Clarke; Catharine Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski; Phil Lacroute; Jing Jane Leng; Jin Lian; Hannah Monahan; Henriette O'Geen; Zhengqing Ouyang; E Christopher Partridge; Dorrelyn Patacsil; Florencia Pauli; Debasish Raha; Lucia Ramirez; Timothy E Reddy; Brian Reed; Minyi Shi; Teri Slifer; Jing Wang; Linfeng Wu; Xinqiong Yang; Kevin Y Yip; Gili Zilberman-Schapira; Serafim Batzoglou; Arend Sidow; Peggy J Farnham; Richard M Myers; Sherman M Weissman; Michael Snyder Journal: Nature Date: 2012-09-06 Impact factor: 49.962
Authors: Nathan C Sheffield; Robert E Thurman; Lingyun Song; Alexias Safi; John A Stamatoyannopoulos; Boris Lenhard; Gregory E Crawford; Terrence S Furey Journal: Genome Res Date: 2013-03-12 Impact factor: 9.043
Authors: Zhaohui Qin; Ben Li; Karen N Conneely; Hao Wu; Ming Hu; Deepak Ayyala; Yongseok Park; Victor X Jin; Fangyuan Zhang; Han Zhang; Li Li; Shili Lin Journal: Stat Biosci Date: 2016-03-07
Authors: Itunu G Osuntoki; Andrew Harrison; Hongsheng Dai; Yanchun Bao; Nicolae Radu Zabet Journal: Bioinformatics Date: 2022-06-09 Impact factor: 6.931