Narayan Jayaram1, Daniel Usvyat1, Andrew C R Martin2. 1. Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK. 2. Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK. andrew@bioinf.org.uk.
Abstract
BACKGROUND: Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. RESULTS: We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best-performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes - those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. CONCLUSIONS: Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease.
BACKGROUND: Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. RESULTS: We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best-performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes - those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. CONCLUSIONS: Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease.
Authors: Siwen Hu-Lieskovan; Srabani Bhaumik; Kavita Dhodapkar; Jean-Charles J B Grivel; Sumati Gupta; Brent A Hanks; Sylvia Janetzki; Thomas O Kleen; Yoshinobu Koguchi; Amanda W Lund; Cristina Maccalli; Yolanda D Mahnke; Ruslan D Novosiadly; Senthamil R Selvan; Tasha Sims; Yingdong Zhao; Holden T Maecker Journal: J Immunother Cancer Date: 2020-12 Impact factor: 13.751
Authors: Marinka Zitnik; Francis Nguyen; Bo Wang; Jure Leskovec; Anna Goldenberg; Michael M Hoffman Journal: Inf Fusion Date: 2018-09-21 Impact factor: 12.975
Authors: Benjamin Lebeau; Kaiqiong Zhao; Maika Jangal; Tiejun Zhao; Maria Guerra; Celia M T Greenwood; Michael Witcher Journal: Nucleic Acids Res Date: 2022-08-10 Impact factor: 19.160
Authors: Jarkko Toivonen; Teemu Kivioja; Arttu Jolma; Yimeng Yin; Jussi Taipale; Esko Ukkonen Journal: Nucleic Acids Res Date: 2018-05-04 Impact factor: 16.971
Authors: Abdullah M Khamis; Olaa Motwalli; Romina Oliva; Boris R Jankovic; Yulia A Medvedeva; Haitham Ashoor; Magbubah Essack; Xin Gao; Vladimir B Bajic Journal: Nucleic Acids Res Date: 2018-07-06 Impact factor: 16.971