Zizhen Yao1, Kyle L Macquarrie, Abraham P Fong, Stephen J Tapscott, Walter L Ruzzo, Robert C Gentleman. 1. Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, 98105, USA, Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Pediatrics, School of Medicine, Department of Neurology, School of Medicine, University of Washington, Seattle, Washington, 98105, USA, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Computer Science and Engineering, Department of Genome Sciences, University of Washington, Seattle, Washington, 98105, USA and Bioinformatics and Computational Biology, Genentech, South San Francisco, CA 94080, USA.
Abstract
MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY: The motifRG package is publically available via the bioconductor repository. CONTACT: yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Martin Tompa; Nan Li; Timothy L Bailey; George M Church; Bart De Moor; Eleazar Eskin; Alexander V Favorov; Martin C Frith; Yutao Fu; W James Kent; Vsevolod J Makeev; Andrei A Mironov; William Stafford Noble; Giulio Pavesi; Graziano Pesole; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher Workman; Chun Ye; Zhou Zhu Journal: Nat Biotechnol Date: 2005-01 Impact factor: 54.908
Authors: Abraham P Fong; Zizhen Yao; Jun Wen Zhong; Nathan M Johnson; Gist H Farr; Lisa Maves; Stephen J Tapscott Journal: Cell Rep Date: 2015-03-19 Impact factor: 9.423