Alvaro Sebastian1, Bruno Contreras-Moreira. 1. Laboratory of Computational Biology, Department of Genetics and Plant Production, Estación Experimental de Aula Dei/CSIC, Av. Montañana 1005, Zaragoza (http://www.eead.csic.es/compbio) and Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain.
Abstract
MOTIVATION: Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS: FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION: Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
MOTIVATION: Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS: FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION: Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
Authors: Ivan V Kulakovskiy; Ilya E Vorontsov; Ivan S Yevshin; Ruslan N Sharipov; Alla D Fedorova; Eugene I Rumynskiy; Yulia A Medvedeva; Arturo Magana-Mora; Vladimir B Bajic; Dmitry A Papatsenko; Fedor A Kolpakov; Vsevolod J Makeev Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971
Authors: Ivan V Kulakovskiy; Ilya E Vorontsov; Ivan S Yevshin; Anastasiia V Soboleva; Artem S Kasianov; Haitham Ashoor; Wail Ba-Alawi; Vladimir B Bajic; Yulia A Medvedeva; Fedor A Kolpakov; Vsevolod J Makeev Journal: Nucleic Acids Res Date: 2015-11-19 Impact factor: 16.971
Authors: Mauro Gismondi; Lucas D Daurelio; Claudia Maiorano; Laura L Monti; Maria V Lara; Maria F Drincovich; Claudia A Bustamante Journal: Planta Date: 2020-01-16 Impact factor: 4.116
Authors: Lidan Zeng; Xuesong Li; Christopher B Preusch; Gary J He; Ningyi Xu; Tom H Cheung; Jianan Qu; Ho Yi Mak Journal: PLoS Genet Date: 2021-07-08 Impact factor: 5.917