Jianling Zhong1, Todd Wasson1, Alexander J Hartemink2. 1. Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA. 2. Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA.
Abstract
MOTIVATION: Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape. RESULTS: Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation. AVAILABILITY AND IMPLEMENTATION: The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink. CONTACT: amink@cs.duke.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape. RESULTS: Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation. AVAILABILITY AND IMPLEMENTATION: The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink. CONTACT: amink@cs.duke.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: B Ren; F Robert; J J Wyrick; O Aparicio; E G Jennings; I Simon; J Zeitlinger; J Schreiber; N Hannett; E Kanin; T L Volkert; C J Wilson; S P Bell; R A Young Journal: Science Date: 2000-12-22 Impact factor: 47.728
Authors: Cong Zhu; Kelsey J R P Byers; Rachel Patton McCord; Zhenwei Shi; Michael F Berger; Daniel E Newburger; Katrina Saulrieta; Zachary Smith; Mita V Shah; Mathangi Radhakrishnan; Anthony A Philippakis; Yanhui Hu; Federico De Masi; Marcin Pacek; Andreas Rolfs; Tal Murthy; Joshua Labaer; Martha L Bulyk Journal: Genome Res Date: 2009-01-21 Impact factor: 9.043
Authors: Christopher T Harbison; D Benjamin Gordon; Tong Ihn Lee; Nicola J Rinaldi; Kenzie D Macisaac; Timothy W Danford; Nancy M Hannett; Jean-Bosco Tagne; David B Reynolds; Jane Yoo; Ezra G Jennings; Julia Zeitlinger; Dmitry K Pokholok; Manolis Kellis; P Alex Rolfe; Ken T Takusagawa; Eric S Lander; David K Gifford; Ernest Fraenkel; Richard A Young Journal: Nature Date: 2004-09-02 Impact factor: 49.962
Authors: Jay R Hesselberth; Xiaoyu Chen; Zhihong Zhang; Peter J Sabo; Richard Sandstrom; Alex P Reynolds; Robert E Thurman; Shane Neph; Michael S Kuehn; William S Noble; Stanley Fields; John A Stamatoyannopoulos Journal: Nat Methods Date: 2009-03-22 Impact factor: 28.547
Authors: Kenzie D MacIsaac; Ting Wang; D Benjamin Gordon; David K Gifford; Gary D Stormo; Ernest Fraenkel Journal: BMC Bioinformatics Date: 2006-03-07 Impact factor: 3.169
Authors: Kaixuan Luo; Jianling Zhong; Alexias Safi; Linda K Hong; Alok K Tewari; Lingyun Song; Timothy E Reddy; Li Ma; Gregory E Crawford; Alexander J Hartemink Journal: Genome Res Date: 2022-05-24 Impact factor: 9.438
Authors: Jianling Zhong; Kaixuan Luo; Peter S Winter; Gregory E Crawford; Edwin S Iversen; Alexander J Hartemink Journal: Genome Res Date: 2016-01-15 Impact factor: 9.043