UNLABELLED: Direct binding by a transcription factor (TF) to the proximal promoter of a gene is a strong evidence that the TF regulates the gene. Assaying the genome-wide binding of every TF in every cell type and condition is currently impractical. Histone modifications correlate with tissue/cell/condition-specific ('tissue specific') TF binding, so histone ChIP-seq data can be combined with traditional position weight matrix (PWM) methods to make tissue-specific predictions of TF-promoter interactions. RESULTS: We use supervised learning to train a naïve Bayes predictor of TF-promoter binding. The predictor's features are the histone modification levels and a PWM-based score for the promoter. Training and testing uses sets of promoters labeled using TF ChIP-seq data, and we use cross-validation on 23 such datasets to measure the accuracy. A PWM+histone naïve Bayes predictor using a single histone modification (H3K4me3) is substantially more accurate than a PWM score or a conservation-based score (phylogenetic motif model). The naïve Bayes predictor is more accurate (on average) at all sensitivity levels, and makes only half as many false positive predictions at sensitivity levels from 10% to 80%. On average, it correctly predicts 80% of bound promoters at a false positive rate of 20%. Accuracy does not diminish when we test the predictor in a different cell type (and species) from training. Accuracy is barely diminished even when we train the predictor without using TF ChIP-seq data. AVAILABILITY: Our tissue-specific predictor of promoters bound by a TF is called Dr Gene and is available at http://bioinformatics.org.au/drgene. CONTACT: t.bailey@imb.uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
UNLABELLED: Direct binding by a transcription factor (TF) to the proximal promoter of a gene is a strong evidence that the TF regulates the gene. Assaying the genome-wide binding of every TF in every cell type and condition is currently impractical. Histone modifications correlate with tissue/cell/condition-specific ('tissue specific') TF binding, so histone ChIP-seq data can be combined with traditional position weight matrix (PWM) methods to make tissue-specific predictions of TF-promoter interactions. RESULTS: We use supervised learning to train a naïve Bayes predictor of TF-promoter binding. The predictor's features are the histone modification levels and a PWM-based score for the promoter. Training and testing uses sets of promoters labeled using TF ChIP-seq data, and we use cross-validation on 23 such datasets to measure the accuracy. A PWM+histone naïve Bayes predictor using a single histone modification (H3K4me3) is substantially more accurate than a PWM score or a conservation-based score (phylogenetic motif model). The naïve Bayes predictor is more accurate (on average) at all sensitivity levels, and makes only half as many false positive predictions at sensitivity levels from 10% to 80%. On average, it correctly predicts 80% of bound promoters at a false positive rate of 20%. Accuracy does not diminish when we test the predictor in a different cell type (and species) from training. Accuracy is barely diminished even when we train the predictor without using TF ChIP-seq data. AVAILABILITY: Our tissue-specific predictor of promoters bound by a TF is called Dr Gene and is available at http://bioinformatics.org.au/drgene. CONTACT: t.bailey@imb.uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler Journal: Genome Res Date: 2002-06 Impact factor: 9.043
Authors: Rosa Karlić; Ho-Ryun Chung; Julia Lasserre; Kristian Vlahovicek; Martin Vingron Journal: Proc Natl Acad Sci U S A Date: 2010-02-01 Impact factor: 11.205
Authors: Daryl J Thomas; Kate R Rosenbloom; Hiram Clawson; Angie S Hinrichs; Heather Trumbower; Brian J Raney; Donna Karolchik; Galt P Barber; Rachel A Harte; Jennifer Hillman-Jackson; Robert M Kuhn; Brooke L Rhead; Kayla E Smith; Archana Thakkapallayil; Ann S Zweig; David Haussler; W James Kent Journal: Nucleic Acids Res Date: 2006-12-13 Impact factor: 16.971
Authors: David Cole Stevens; Kyle R Conway; Nelson Pearce; Luis Roberto Villegas-Peñaranda; Anthony G Garza; Christopher N Boddy Journal: PLoS One Date: 2013-05-28 Impact factor: 3.240
Authors: Heng-Yi Wu; Pengyue Zheng; Guanglong Jiang; Yunlong Liu; Kenneth P Nephew; Tim H M Huang; Lang Li Journal: BMC Genomics Date: 2012-10-26 Impact factor: 3.969