Literature DB >> 16873507

On counting position weight matrix matches in a sequence, with application to discriminative motif finding.

Saurabh Sinha1.   

Abstract

MOTIVATION AND
RESULTS: The position weight matrix (PWM) is a popular method to model transcription factor binding sites. A fundamental problem in cis-regulatory analysis is to "count" the occurrences of a PWM in a DNA sequence. We propose a novel probabilistic score to solve this problem of counting PWM occurrences. The proposed score has two important properties: (1) It gives appropriate weights to both strong and weak occurrences of the PWM, without using thresholds. (2) For any given PWM, this score can be computed while allowing for occurrences of other, a priori known PWMs, in a statistically sound framework. Additionally, the score is efficiently differentiable with respect to the PWM parameters, which has important consequences for designing search algorithms. The second problem we address is to find, ab initio, PWMs that have high counts in one set of sequences, and low counts in another. We develop a novel algorithm to solve this "discriminative motif-finding problem", using the proposed score for counting a PWM in the sequences. The algorithm is a local search technique that exploits derivative information on an objective function to enhance speed and performance. It is extensively tested on synthetic data, and shown to perform better than other discriminative as well as non-discriminative PWM finding algorithms. It is then applied to cis-regulatory modules involved in development of the fruitfly embryo, to elicit known and novel motifs. We finally use the algorithm on genes predictive of social behavior in the honey bee, and find interesting motifs. AVAILABILITY: The program is available upon request from the author.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16873507     DOI: 10.1093/bioinformatics/btl227

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  36 in total

1.  Learning cellular sorting pathways using protein interactions and sequence motifs.

Authors:  Tien-Ho Lin; Ziv Bar-Joseph; Robert F Murphy
Journal:  J Comput Biol       Date:  2011-10-14       Impact factor: 1.479

2.  Discriminative motif optimization based on perceptron training.

Authors:  Ronak Y Patel; Gary D Stormo
Journal:  Bioinformatics       Date:  2013-12-24       Impact factor: 6.937

3.  Discriminative motif analysis of high-throughput dataset.

Authors:  Zizhen Yao; Kyle L Macquarrie; Abraham P Fong; Stephen J Tapscott; Walter L Ruzzo; Robert C Gentleman
Journal:  Bioinformatics       Date:  2013-10-25       Impact factor: 6.937

4.  An ensemble model of competitive multi-factor binding of the genome.

Authors:  Todd Wasson; Alexander J Hartemink
Journal:  Genome Res       Date:  2009-08-31       Impact factor: 9.043

5.  An effective model for natural selection in promoters.

Authors:  Michael M Hoffman; Ewan Birney
Journal:  Genome Res       Date:  2010-03-01       Impact factor: 9.043

6.  DECOD: fast and accurate discriminative DNA motif finding.

Authors:  Peter Huggins; Shan Zhong; Idit Shiff; Rachel Beckerman; Oleg Laptenko; Carol Prives; Marcel H Schulz; Itamar Simon; Ziv Bar-Joseph
Journal:  Bioinformatics       Date:  2011-07-12       Impact factor: 6.937

7.  Discriminative motif finding for predicting protein subcellular localization.

Authors:  Tien-ho Lin; Robert F Murphy; Ziv Bar-Joseph
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Mar-Apr       Impact factor: 3.710

8.  Deep learning for inferring gene relationships from single-cell expression data.

Authors:  Ye Yuan; Ziv Bar-Joseph
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-10       Impact factor: 11.205

9.  RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

Authors:  Hilal Kazan; Debashish Ray; Esther T Chan; Timothy R Hughes; Quaid Morris
Journal:  PLoS Comput Biol       Date:  2010-07-01       Impact factor: 4.475

10.  A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.

Authors:  Xin He; Chieh-Chun Chen; Feng Hong; Fang Fang; Saurabh Sinha; Huck-Hui Ng; Sheng Zhong
Journal:  PLoS One       Date:  2009-12-01       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.