Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Protein family classification using sparse markov transducers.

Literature DB >> 12804091

Protein family classification using sparse markov transducers.

Eleazar Eskin¹, William Stafford Noble, Yoram Singer.

Abstract

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.

Mesh：

Substances：
Proteins

Year: 2003 PMID： 12804091 DOI： 10.1089/106652703321825964

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

2 in total

1. Supervised protein family classification and new family construction.

Authors: Gangman Yi; Michael R Thon; Sing-Hoi Sze
Journal: J Comput Biol Date: 2012-08 Impact factor: 1.479

2. Solving the master equation for Indels.

Authors: Ian H Holmes
Journal: BMC Bioinformatics Date: 2017-05-12 Impact factor: 3.169

2 in total