Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Supervised protein family classification and new family construction.

Literature DB >> 22876787

Supervised protein family classification and new family construction.

Gangman Yi¹, Michael R Thon, Sing-Hoi Sze.

Abstract

The goal of protein family classification is to group proteins into families so that proteins within the same family have common function or are related by ancestry. While supervised classification algorithms are available for this purpose, most of these approaches focus on assigning unclassified proteins to known families but do not allow for progressive construction of new families from proteins that cannot be assigned. Although unsupervised clustering algorithms are also available, they do not make use of information from known families. By computing similarities between proteins based on pairwise sequence comparisons, we develop supervised classification algorithms that achieve improved accuracy over previous approaches while allowing for construction of new families. We show that our algorithm has higher accuracy rate and lower mis-classification rate when compared to algorithms that are based on the use of multiple sequence alignments and hidden Markov models, and our algorithm performs well even on families with very few proteins and on families with low sequence similarity. A software program implementing the algorithm (SClassify) is available online (http://faculty.cse.tamu.edu/shsze/sclassify).

Mesh：

Substances：
Proteins

Year: 2012 PMID： 22876787 PMCID： PMC3415071 DOI： 10.1089/cmb.2011.0044

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
References

18 in total

1. The Pfam protein families database.

Authors: A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.

Authors: G Bejerano; G Yona
Journal: Bioinformatics Date: 2001-01 Impact factor: 6.937

3. A discriminative framework for detecting remote protein homologies.

Authors: T Jaakkola; M Diekhans; D Haussler
Journal: J Comput Biol Date: 2000 Feb-Apr Impact factor: 1.479

4. An efficient algorithm for large-scale detection of protein families.

Authors: A J Enright; S Van Dongen; C A Ouzounis
Journal: Nucleic Acids Res Date: 2002-04-01 Impact factor: 16.971

5. UniProt: the Universal Protein knowledgebase.

Authors: Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. Protein family classification using sparse markov transducers.

Authors: Eleazar Eskin; William Stafford Noble; Yoram Singer
Journal: J Comput Biol Date: 2003 Impact factor: 1.479

7. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships.

Authors: Li Liao; William Stafford Noble
Journal: J Comput Biol Date: 2003 Impact factor: 1.479