Thorsteinn Rögnvaldsson1, Liwen You1, Daniel Garwicz1. 1. CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden.
Abstract
MOTIVATION: Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved. RESULTS: The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed. AVAILABILITY AND IMPLEMENTATION: The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. CONTACT: thorsteinn.rognvaldsson@hh.se.
MOTIVATION: Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved. RESULTS: The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed. AVAILABILITY AND IMPLEMENTATION: The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. CONTACT: thorsteinn.rognvaldsson@hh.se.
Authors: Justen Manasa; Vici Varghese; Sergei L Kosakovsky Pond; Soo-Yon Rhee; Philip L Tzou; W Jeffrey Fessel; Karen S Jang; Elizabeth White; Thorsteinn Rögnvaldsson; David A Katzenstein; Robert W Shafer Journal: Sci Rep Date: 2017-09-14 Impact factor: 4.379