| Literature DB >> 21081511 |
Simon J van Heeringen1, Gert Jan C Veenstra.
Abstract
SUMMARY: Accurate prediction of transcription factor binding motifs that are enriched in a collection of sequences remains a computational challenge. Here we report on GimmeMotifs, a pipeline that incorporates an ensemble of computational tools to predict motifs de novo from ChIP-sequencing (ChIP-seq) data. Similar redundant motifs are compared using the weighted information content (WIC) similarity score and clustered using an iterative procedure. A comprehensive output report is generated with several different evaluation metrics to compare and evaluate the results. Benchmarks show that the method performs well on human and mouse ChIP-seq datasets. GimmeMotifs consists of a suite of command-line scripts that can be easily implemented in a ChIP-seq analysis pipeline. AVAILABILITY: GimmeMotifs is implemented in Python and runs on Linux. The source code is freely available for download at http://www.ncmls.eu/bioinfo/gimmemotifs/. CONTACT: s.vanheeringen@ncmls.ru.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21081511 PMCID: PMC3018809 DOI: 10.1093/bioinformatics/btq636
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.An example of the GimmeMotifs output for p63 (Kouwenhoven ). Shown are the sequence logo of the predicted motif (Schneider and Stephens, 1990), the best matching motif in the JASPAR database (Sandelin ), the ROC curve, the positional preference plot and several statistics to evaluate the motif performance. See the Supplementary Material for a complete example.