Charles E Grant1, James Johnson2, Timothy L Bailey2, William Stafford Noble3. 1. Department of Genome Sciences, University of Washington, Seattle, WA, USA. 2. Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia and. 3. Department of Genome Sciences, University of Washington, Seattle, WA, USA, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
Abstract
UNLABELLED: Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis-regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. RESULTS: The MCAST algorithm uses a hidden Markov model with a P-value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. AVAILABILITY AND IMPLEMENTATION: MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org CONTACT: t.bailey@imb.uq.edu.au or william-noble@uw.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
UNLABELLED: Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis-regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. RESULTS: The MCAST algorithm uses a hidden Markov model with a P-value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. AVAILABILITY AND IMPLEMENTATION: MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org CONTACT: t.bailey@imb.uq.edu.au or william-noble@uw.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Gabriel Cuellar-Partida; Fabian A Buske; Robert C McLeay; Tom Whitington; William Stafford Noble; Timothy L Bailey Journal: Bioinformatics Date: 2011-11-08 Impact factor: 6.937
Authors: Warren A Whyte; David A Orlando; Denes Hnisz; Brian J Abraham; Charles Y Lin; Michael H Kagey; Peter B Rahl; Tong Ihn Lee; Richard A Young Journal: Cell Date: 2013-04-11 Impact factor: 41.582