Zhen Gao1, Jianhua Ruan1. 1. Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA.
Abstract
MOTIVATION: The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. While the development of both in vivo and in vitro profiling techniques have significantly enhanced our knowledge of transcription factor (TF)-DNA interactions, computational models of TF-DNA interactions are relatively simple and may not reveal sufficient biological insight. In particular, supervised learning based models for TF-DNA interactions attempt to map sequence-level features ( k -mers) to binding event but usually ignore the location of k -mers, which can cause data fragmentation and consequently inferior model performance. RESULTS: Here, we propose a novel algorithm based on the so-called multiple-instance learning (MIL) paradigm. MIL breaks each DNA sequence into multiple overlapping subsequences and models each subsequence separately, therefore implicitly takes into consideration binding site locations, resulting in both higher accuracy and better interpretability of the models. The result from both in vivo and in vitro TF-DNA interaction data show that our approach significantly outperform conventional single-instance learning based algorithms. Importantly, the models learned from in vitro data using our approach can predict in vivo binding with very good accuracy. In addition, the location information obtained by our method provides additional insight for motif finding results from ChIP-Seq data. Finally, our approach can be easily combined with other state-of-the-art TF-DNA interaction modeling methods. AVAILABILITY AND IMPLEMENTATION: http://www.cs.utsa.edu/∼jruan/MIL/. CONTACT: jianhua.ruan@utsa.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. While the development of both in vivo and in vitro profiling techniques have significantly enhanced our knowledge of transcription factor (TF)-DNA interactions, computational models of TF-DNA interactions are relatively simple and may not reveal sufficient biological insight. In particular, supervised learning based models for TF-DNA interactions attempt to map sequence-level features ( k -mers) to binding event but usually ignore the location of k -mers, which can cause data fragmentation and consequently inferior model performance. RESULTS: Here, we propose a novel algorithm based on the so-called multiple-instance learning (MIL) paradigm. MIL breaks each DNA sequence into multiple overlapping subsequences and models each subsequence separately, therefore implicitly takes into consideration binding site locations, resulting in both higher accuracy and better interpretability of the models. The result from both in vivo and in vitro TF-DNA interaction data show that our approach significantly outperform conventional single-instance learning based algorithms. Importantly, the models learned from in vitro data using our approach can predict in vivo binding with very good accuracy. In addition, the location information obtained by our method provides additional insight for motif finding results from ChIP-Seq data. Finally, our approach can be easily combined with other state-of-the-art TF-DNA interaction modeling methods. AVAILABILITY AND IMPLEMENTATION: http://www.cs.utsa.edu/∼jruan/MIL/. CONTACT: jianhua.ruan@utsa.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Gabriel Cuellar-Partida; Fabian A Buske; Robert C McLeay; Tom Whitington; William Stafford Noble; Timothy L Bailey Journal: Bioinformatics Date: 2011-11-08 Impact factor: 6.937
Authors: Razvan Nutiu; Robin C Friedman; Shujun Luo; Irina Khrebtukova; David Silva; Robin Li; Lu Zhang; Gary P Schroth; Christopher B Burge Journal: Nat Biotechnol Date: 2011-06-26 Impact factor: 54.908
Authors: Michael M Hoffman; Jason Ernst; Steven P Wilder; Anshul Kundaje; Robert S Harris; Max Libbrecht; Belinda Giardine; Paul M Ellenbogen; Jeffrey A Bilmes; Ewan Birney; Ross C Hardison; Ian Dunham; Manolis Kellis; William Stafford Noble Journal: Nucleic Acids Res Date: 2012-12-05 Impact factor: 16.971