Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fast probabilistic analysis of sequence function using scoring matrices.

Literature DB >> 10869016

Fast probabilistic analysis of sequence function using scoring matrices.

T D Wu¹, C G Nevill-Manning, D L Brutlag.

Abstract

MOTIVATION: We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects.
RESULTS: We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5. AVAILABILITY: The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.

Mesh：

Year: 2000 PMID： 10869016 DOI： 10.1093/bioinformatics/16.3.233

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

19 in total

1. The EMOTIF database.

Authors: J Y Huang; D L Brutlag
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. DIAN: a novel algorithm for genome ontological classification.

Authors: Y Pouliot; J Gao; Q J Su; G G Liu; X B Ling
Journal: Genome Res Date: 2001-10 Impact factor: 9.043

3. 3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs.

Authors: Steven P Bennett; Lin Lu; Douglas L Brutlag
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

4. SMOTIF: efficient structured pattern and profile motif search.

Authors: Yongqiang Zhang; Mohammed J Zaki
Journal: Algorithms Mol Biol Date: 2006-11-21 Impact factor: 1.405

5. LigProf: a simple tool for in silico prediction of ligand-binding sites.

Authors: Grzegorz Koczyk; Lucjan S Wyrwicz; Leszek Rychlewski
Journal: J Mol Model Date: 2007-01-03 Impact factor: 1.810

6. A probabilistic method for small RNA flowgram matching.

Authors: Vladimir Vacic; Hailing Jin; Jian-Kang Zhu; Stefano Lonardi
Journal: Pac Symp Biocomput Date: 2008

7. Identification and characterization of a pSLA2 plasmid locus required for linear DNA replication and circular plasmid stable inheritance in Streptomyces lividans.

Authors: Zhongjun Qin; Meijuan Shen; Stanley N Cohen
Journal: J Bacteriol Date: 2003-11 Impact factor: 3.490

8. MOODS: fast search for position weight matrix matches in DNA sequences.

Authors: Janne Korhonen; Petri Martinmäki; Cinzia Pizzi; Pasi Rastas; Esko Ukkonen
Journal: Bioinformatics Date: 2009-09-22 Impact factor: 6.937

9. The distribution of GYR- and YLP-like motifs in Drosophila suggests a general role in cuticle assembly and other protein-protein interactions.

Authors: R Scott Cornman
Journal: PLoS One Date: 2010-09-02 Impact factor: 3.240

10. Significant speedup of database searches with HMMs by search space reduction with PSSM family models.

Authors: Michael Beckstette; Robert Homann; Robert Giegerich; Stefan Kurtz
Journal: Bioinformatics Date: 2009-10-14 Impact factor: 6.937