Literature DB >> 11222260

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.

G Bejerano1, G Yona.   

Abstract

MOTIVATION: We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can be incorporated to improve performance.
RESULTS: The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11222260     DOI: 10.1093/bioinformatics/17.1.23

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  Stochastic computing with biomolecular automata.

Authors:  Rivka Adar; Yaakov Benenson; Gregory Linshiz; Amit Rosner; Naftali Tishby; Ehud Shapiro
Journal:  Proc Natl Acad Sci U S A       Date:  2004-06-23       Impact factor: 11.205

2.  Evolutionary insights from suffix array-based genome sequence analysis.

Authors:  Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan
Journal:  J Biosci       Date:  2007-08       Impact factor: 1.826

3.  Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences.

Authors:  Sang Chul Choi; Benjamin D Redelings; Jeffrey L Thorne
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2008-12-27       Impact factor: 6.237

4.  Supervised protein family classification and new family construction.

Authors:  Gangman Yi; Michael R Thon; Sing-Hoi Sze
Journal:  J Comput Biol       Date:  2012-08       Impact factor: 1.479

5.  Long-range order in canary song.

Authors:  Jeffrey E Markowitz; Elizabeth Ivie; Laura Kligler; Timothy J Gardner
Journal:  PLoS Comput Biol       Date:  2013-05-02       Impact factor: 4.475

6.  VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees.

Authors:  Jan Grau; Irad Ben-Gal; Stefan Posch; Ivo Grosse
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

7.  A lexical approach for identifying behavioural action sequences.

Authors:  Gautam Reddy; Laura Desban; Hidenori Tanaka; Julian Roussel; Olivier Mirat; Claire Wyart
Journal:  PLoS Comput Biol       Date:  2022-01-10       Impact factor: 4.475

8.  TransportTP: a two-phase classification approach for membrane transporter prediction and characterization.

Authors:  Haiquan Li; Vagner A Benedito; Michael K Udvardi; Patrick Xuechun Zhao
Journal:  BMC Bioinformatics       Date:  2009-12-14       Impact factor: 3.169

9.  Local similarity search to find gene indicators in mitochondrial genomes.

Authors:  Ruby L V Moritz; Matthias Bernt; Martin Middendorf
Journal:  Biology (Basel)       Date:  2014-03-11
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.