Jichen Yang1, Stephen A Ramsey2. 1. Department of Biomedical Sciences and. 2. Department of Biomedical Sciences and School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
Abstract
MOTIVATION: The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites. RESULTS: We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs. AVAILABILITY AND IMPLEMENTATION: The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/. CONTACT: stephen.ramsey@oregonstate.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites. RESULTS: We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs. AVAILABILITY AND IMPLEMENTATION: The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/. CONTACT: stephen.ramsey@oregonstate.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk Journal: Nat Biotechnol Date: 2006-09-24 Impact factor: 54.908
Authors: Joel Rozowsky; Ghia Euskirchen; Raymond K Auerbach; Zhengdong D Zhang; Theodore Gibson; Robert Bjornson; Nicholas Carriero; Michael Snyder; Mark B Gerstein Journal: Nat Biotechnol Date: 2009-01-04 Impact factor: 54.908
Authors: Jinsen Li; Jared M Sagendorf; Tsu-Pei Chiu; Marco Pasi; Alberto Perez; Remo Rohs Journal: Nucleic Acids Res Date: 2017-12-15 Impact factor: 16.971