Literature DB >> 22876787

Supervised protein family classification and new family construction.

Gangman Yi1, Michael R Thon, Sing-Hoi Sze.   

Abstract

The goal of protein family classification is to group proteins into families so that proteins within the same family have common function or are related by ancestry. While supervised classification algorithms are available for this purpose, most of these approaches focus on assigning unclassified proteins to known families but do not allow for progressive construction of new families from proteins that cannot be assigned. Although unsupervised clustering algorithms are also available, they do not make use of information from known families. By computing similarities between proteins based on pairwise sequence comparisons, we develop supervised classification algorithms that achieve improved accuracy over previous approaches while allowing for construction of new families. We show that our algorithm has higher accuracy rate and lower mis-classification rate when compared to algorithms that are based on the use of multiple sequence alignments and hidden Markov models, and our algorithm performs well even on families with very few proteins and on families with low sequence similarity. A software program implementing the algorithm (SClassify) is available online (http://faculty.cse.tamu.edu/shsze/sclassify).

Mesh:

Substances:

Year:  2012        PMID: 22876787      PMCID: PMC3415071          DOI: 10.1089/cmb.2011.0044

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  18 in total

1.  The Pfam protein families database.

Authors:  A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.

Authors:  G Bejerano; G Yona
Journal:  Bioinformatics       Date:  2001-01       Impact factor: 6.937

3.  A discriminative framework for detecting remote protein homologies.

Authors:  T Jaakkola; M Diekhans; D Haussler
Journal:  J Comput Biol       Date:  2000 Feb-Apr       Impact factor: 1.479

4.  An efficient algorithm for large-scale detection of protein families.

Authors:  A J Enright; S Van Dongen; C A Ouzounis
Journal:  Nucleic Acids Res       Date:  2002-04-01       Impact factor: 16.971

5.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  Protein family classification using sparse markov transducers.

Authors:  Eleazar Eskin; William Stafford Noble; Yoram Singer
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

7.  Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships.

Authors:  Li Liao; William Stafford Noble
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

8.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

9.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

10.  Exhaustive enumeration of protein domain families.

Authors:  Andreas Heger; Liisa Holm
Journal:  J Mol Biol       Date:  2003-05-02       Impact factor: 5.469

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.