Bo Jiang1, Jun S Liu, Martha L Bulyk. 1. Department of Statistics, Harvard University, Cambridge, MA 02138, USA. bojiang83@gmail.com
Abstract
MOTIVATION: Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved. RESULTS: Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein-DNA binding specificity. AVAILABILITY AND IMPLEMENTATION: Source code, compiled code and R and Python scripts are available from http://thebrain.bwh.harvard.edu/hierarchicalANOVA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved. RESULTS: Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein-DNA binding specificity. AVAILABILITY AND IMPLEMENTATION: Source code, compiled code and R and Python scripts are available from http://thebrain.bwh.harvard.edu/hierarchicalANOVA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk Journal: Nat Biotechnol Date: 2006-09-24 Impact factor: 54.908
Authors: Marcus B Noyes; Ryan G Christensen; Atsuya Wakabayashi; Gary D Stormo; Michael H Brodsky; Scot A Wolfe Journal: Cell Date: 2008-06-27 Impact factor: 41.582
Authors: L McCue; W Thompson; C Carmack; M P Ryan; J S Liu; V Derbyshire; C E Lawrence Journal: Nucleic Acids Res Date: 2001-02-01 Impact factor: 16.971
Authors: Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry Journal: Nat Rev Genet Date: 2010-09-14 Impact factor: 53.242
Authors: Jason B Warner; Anthony A Philippakis; Savina A Jaeger; Fangxue Sherry He; Jolinta Lin; Martha L Bulyk Journal: Nat Methods Date: 2008-03-02 Impact factor: 28.547
Authors: Daniel Gusenleitner; Eleanor A Howe; Stefan Bentink; John Quackenbush; Aedín C Culhane Journal: Bioinformatics Date: 2012-07-12 Impact factor: 6.937
Authors: Devesh Bhimsaria; José A Rodríguez-Martínez; Junkun Pan; Daniel Roston; Elif Nihal Korkmaz; Qiang Cui; Parameswaran Ramanathan; Aseem Z Ansari Journal: Proc Natl Acad Sci U S A Date: 2018-10-19 Impact factor: 11.205
Authors: Anastasia Vedenko; Jesse V Kurland; Luis A Barrera; Julia M Rogers; Stephen S Gisselbrecht; Elizabeth J Rossin; Jaie Woodard; Luca Mariani; Kian Hong Kock; Sachi Inukai; Trevor Siggers; Leila Shokri; Raluca Gordân; Nidhi Sahni; Chris Cotsapas; Tong Hao; Song Yi; Manolis Kellis; Mark J Daly; Marc Vidal; David E Hill; Martha L Bulyk Journal: Science Date: 2016-03-24 Impact factor: 47.728
Authors: Seyed Yahya Anvar; Lusine Khachatryan; Martijn Vermaat; Michiel van Galen; Irina Pulyakhina; Yavuz Ariyurek; Ken Kraaijeveld; Johan T den Dunnen; Peter de Knijff; Peter A C 't Hoen; Jeroen F J Laros Journal: Genome Biol Date: 2014 Impact factor: 13.583
Authors: Nan Liu; Victoria V Hargreaves; Qian Zhu; Jesse V Kurland; Jiyoung Hong; Woojin Kim; Falak Sher; Claudio Macias-Trevino; Julia M Rogers; Ryo Kurita; Yukio Nakamura; Guo-Cheng Yuan; Daniel E Bauer; Jian Xu; Martha L Bulyk; Stuart H Orkin Journal: Cell Date: 2018-03-29 Impact factor: 41.582
Authors: Søren Lindemose; Michael K Jensen; Jan Van de Velde; Charlotte O'Shea; Ken S Heyndrickx; Christopher T Workman; Klaas Vandepoele; Karen Skriver; Federico De Masi Journal: Nucleic Acids Res Date: 2014-06-09 Impact factor: 16.971
Authors: Qinwen Liu; Pinar Onal; Rhea R Datta; Julia M Rogers; Urs Schmidt-Ott; Martha L Bulyk; Stephen Small; Joseph W Thornton Journal: Elife Date: 2018-10-09 Impact factor: 8.140