| Literature DB >> 19578060 |
Abstract
Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for this purpose are laborious. Computational methods have been proven extremely effective in identifying TF-binding motifs (TFBMs). In this article, we propose a novel computational method called MotifExpress for discovering active TFBMs. Unlike existing methods, which either use only DNA sequence information or integrate sequence information with a single-sample measurement of gene expression, MotifExpress integrates DNA sequence information with gene expression measured in multiple samples. By selecting TFBMs that are significantly associated with gene expression, we can identify active TFBMs under specific experimental conditions and thus provide clues for the construction of regulatory networks. Compared with existing methods, MotifExpress substantially reduces the number of spurious results. Statistically, MotifExpress uses a penalized multivariate regression approach with a composite absolute penalty, which is highly stable and can effectively find the globally optimal set of active motifs. We demonstrate the excellent performance of MotifExpress by applying it to synthetic data and real examples of Saccharomyces cerevisiae. MotifExpress is available at http://www.stat.illinois.edu/~pingma/MotifExpress.htm.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19578060 PMCID: PMC2760818 DOI: 10.1093/nar/gkp554
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.MotifExpress system diagram.
Mean (Ave.) and standard deviation (SD) of MCC for MotifExpress with regularization parameters selected through AICc-minimization and BIC-minimization as the random error's standard deviation σ and the number of samples m are varied in the simulation study
Figure 2.Summary plot of MCC for Motif Regressor (MR) and MotifExpress with regularization parameters selected through AICc-minimization and BIC-minimization as the random error's standard deviation σ and the number of samples m are varied in the simulation study. Motif Regressor performance in the same simulation was computed by pooling results from independent runs.
GCN4 motif discovery on constitutively activated Gcn2 mutant dataset by MotifExpress on all samples simultaneously and by Motif Regressor, on each sample individually
Figure 3.HSF1-binding motif discovered by MotifExpress analyzing all samples in in GSE7665 simultaneously compared to MotifRegressor analyzing each sample individually and current literature. The head-to-head inverted NGAAN motif is prominent in the MotifExpress results.