MOTIVATION: Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. RESULTS: We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST). AVAILABILITY: Source code is freely available for academic use by contacting the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. RESULTS: We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST). AVAILABILITY: Source code is freely available for academic use by contacting the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Shivas R Amin; Serkan Erdin; R Matthew Ward; Rhonald C Lua; Olivier Lichtarge Journal: Proc Natl Acad Sci U S A Date: 2013-10-21 Impact factor: 11.205
Authors: Drew H Bryant; Mark Moll; Brian Y Chen; Viacheslav Y Fofanov; Lydia E Kavraki Journal: BMC Bioinformatics Date: 2010-05-11 Impact factor: 3.169
Authors: Torgeir R Hvidsten; Astrid Laegreid; Andriy Kryshtafovych; Gunnar Andersson; Krzysztof Fidelis; Jan Komorowski Journal: PLoS One Date: 2009-07-15 Impact factor: 3.240
Authors: Oliver C Redfern; Benoît H Dessailly; Timothy J Dallman; Ian Sillitoe; Christine A Orengo Journal: PLoS Comput Biol Date: 2009-08-28 Impact factor: 4.475