Zheng Rong Yang1, Kuo-Chen Chou. 1. Department of Computer Science, Exeter University, Exeter EX4 4PT, UK. Z.R.Yang@exeter.ac.uk
Abstract
MOTIVATION: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.
MOTIVATION: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.
Authors: David M Good; Petra Zürbig; Angel Argilés; Hartwig W Bauer; Georg Behrens; Joshua J Coon; Mohammed Dakna; Stéphane Decramer; Christian Delles; Anna F Dominiczak; Jochen H H Ehrich; Frank Eitner; Danilo Fliser; Moritz Frommberger; Arnold Ganser; Mark A Girolami; Igor Golovko; Wilfried Gwinner; Marion Haubitz; Stefan Herget-Rosenthal; Joachim Jankowski; Holger Jahn; George Jerums; Bruce A Julian; Markus Kellmann; Volker Kliem; Walter Kolch; Andrzej S Krolewski; Mario Luppi; Ziad Massy; Michael Melter; Christian Neusüss; Jan Novak; Karlheinz Peter; Kasper Rossing; Harald Rupprecht; Joost P Schanstra; Eric Schiffer; Jens-Uwe Stolzenburg; Lise Tarnow; Dan Theodorescu; Visith Thongboonkerd; Raymond Vanholder; Eva M Weissinger; Harald Mischak; Philippe Schmitt-Kopplin Journal: Mol Cell Proteomics Date: 2010-07-08 Impact factor: 5.911
Authors: Manuel Wallbach; Petra Zürbig; Hassan Dihazi; Gerhard A Müller; Rolf Wachter; Joachim Beige; Michael J Koziolek; Harald Mischak Journal: J Clin Hypertens (Greenwich) Date: 2018-09-10 Impact factor: 3.738
Authors: Jaeseong Jo; Chung Mo Nam; Jae Woong Sull; Ji Eun Yun; Sang Yeun Kim; Sun Ju Lee; Yoon Nam Kim; Eun Jung Park; Heejin Kimm; Sun Ha Jee Journal: Genomics Inform Date: 2012-09-28