Shuxiang Ruan1, S Joshua Swamidass2, Gary D Stormo1. 1. Department of Genetics. 2. Department of Pathology and Immunology, Washington University School of Medicine, St. Louis 63110, USA.
Abstract
MOTIVATION: Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity. RESULTS: We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://stormo.wustl.edu/resources.html . CONTACT: stormo@wustl.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity. RESULTS: We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://stormo.wustl.edu/resources.html . CONTACT: stormo@wustl.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk Journal: Nat Biotechnol Date: 2006-09-24 Impact factor: 54.908
Authors: Arttu Jolma; Teemu Kivioja; Jarkko Toivonen; Lu Cheng; Gonghong Wei; Martin Enge; Mikko Taipale; Juan M Vaquerizas; Jian Yan; Mikko J Sillanpää; Martin Bonke; Kimmo Palin; Shaheynoor Talukder; Timothy R Hughes; Nicholas M Luscombe; Esko Ukkonen; Jussi Taipale Journal: Genome Res Date: 2010-04-08 Impact factor: 9.043
Authors: Matthew T Weirauch; Atina Cote; Raquel Norel; Matti Annala; Yue Zhao; Todd R Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Harmen J Bussemaker; Quaid D Morris; Martha L Bulyk; Gustavo Stolovitzky; Timothy R Hughes Journal: Nat Biotechnol Date: 2013-01-27 Impact factor: 54.908
Authors: David Y Rhee; Dong-Yeon Cho; Bo Zhai; Matthew Slattery; Lijia Ma; Julian Mintseris; Christina Y Wong; Kevin P White; Susan E Celniker; Teresa M Przytycka; Steven P Gygi; Robert A Obar; Spyros Artavanis-Tsakonas Journal: Cell Rep Date: 2014-09-18 Impact factor: 9.423
Authors: Timothy E Reddy; Jason Gertz; Florencia Pauli; Katerina S Kucera; Katherine E Varley; Kimberly M Newberry; Georgi K Marinov; Ali Mortazavi; Brian A Williams; Lingyun Song; Gregory E Crawford; Barbara Wold; Huntington F Willard; Richard M Myers Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Devesh Bhimsaria; José A Rodríguez-Martínez; Junkun Pan; Daniel Roston; Elif Nihal Korkmaz; Qiang Cui; Parameswaran Ramanathan; Aseem Z Ansari Journal: Proc Natl Acad Sci U S A Date: 2018-10-19 Impact factor: 11.205
Authors: Judith F Kribelbauer; Chaitanya Rastogi; Harmen J Bussemaker; Richard S Mann Journal: Annu Rev Cell Dev Biol Date: 2019-07-05 Impact factor: 13.827
Authors: Liyang Zhang; Gabriella D Martini; H Tomas Rube; Judith F Kribelbauer; Chaitanya Rastogi; Vincent D FitzPatrick; Jon C Houtman; Harmen J Bussemaker; Miles A Pufall Journal: Genome Res Date: 2017-12-01 Impact factor: 9.043
Authors: Chaitanya Rastogi; H Tomas Rube; Judith F Kribelbauer; Justin Crocker; Ryan E Loker; Gabriella D Martini; Oleg Laptenko; William A Freed-Pastor; Carol Prives; David L Stern; Richard S Mann; Harmen J Bussemaker Journal: Proc Natl Acad Sci U S A Date: 2018-04-02 Impact factor: 11.205