Mensur Dlakić1. 1. Department of Microbiology, Montana State University, Bozeman, MT 59717-3520, USA. mdlakic@montana.edu
Abstract
MOTIVATION: Recently developed profile-profile methods rival structural comparisons in their ability to detect homology between distantly related proteins. Despite this tremendous progress, many genuine relationships between protein families cannot be recognized as comparisons of their profiles result in scores that are statistically insignificant. RESULTS: Using known evolutionary relationships among protein superfamilies in SCOP database, support vector machines were trained on four sets of discriminatory features derived from the output of HHsearch. Upon validation, it was shown that the automatic classification of all profile-profile matches was superior to fixed threshold-based annotation in terms of sensitivity and specificity. The effectiveness of this approach was demonstrated by annotating several domains of unknown function from the Pfam database. AVAILABILITY: Programs and scripts implementing the methods described in this manuscript are freely available from http://hhsvm.dlakiclab.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recently developed profile-profile methods rival structural comparisons in their ability to detect homology between distantly related proteins. Despite this tremendous progress, many genuine relationships between protein families cannot be recognized as comparisons of their profiles result in scores that are statistically insignificant. RESULTS: Using known evolutionary relationships among protein superfamilies in SCOP database, support vector machines were trained on four sets of discriminatory features derived from the output of HHsearch. Upon validation, it was shown that the automatic classification of all profile-profile matches was superior to fixed threshold-based annotation in terms of sensitivity and specificity. The effectiveness of this approach was demonstrated by annotating several domains of unknown function from the Pfam database. AVAILABILITY: Programs and scripts implementing the methods described in this manuscript are freely available from http://hhsvm.dlakiclab.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971
Authors: Antonina Andreeva; Dave Howorth; John-Marc Chandonia; Steven E Brenner; Tim J P Hubbard; Cyrus Chothia; Alexey G Murzin Journal: Nucleic Acids Res Date: 2007-11-13 Impact factor: 16.971
Authors: Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman Journal: Nucleic Acids Res Date: 2007-11-26 Impact factor: 16.971
Authors: Casey Schlenker; Anupam Goel; Brian P Tripet; Smita Menon; Taylor Willi; Mensur Dlakić; Mark J Young; C Martin Lawrence; Valérie Copié Journal: Biochemistry Date: 2012-03-22 Impact factor: 3.162
Authors: Walid S Maaty; Kyla Selvig; Stephanie Ryder; Pavel Tarlykov; Jonathan K Hilmer; Joshua Heinemann; Joseph Steffens; Jamie C Snyder; Alice C Ortmann; Navid Movahed; Kevin Spicka; Lakshindra Chetia; Paul A Grieco; Edward A Dratz; Trevor Douglas; Mark J Young; Brian Bothner Journal: J Proteome Res Date: 2012-01-24 Impact factor: 4.466