Nan Xiao1, Dong-Sheng Cao1, Min-Feng Zhu1, Qing-Song Xu1. 1. School of Mathematics and Statistics and School of Pharmaceutical Sciences, Central South University, Changsha 410083, People's Republic of China.
Abstract
UNLABELLED: Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package. AVAILABILITY AND IMPLEMENTATION: The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.
UNLABELLED: Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package. AVAILABILITY AND IMPLEMENTATION: The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.
Authors: Xingshen Sun; Yaling Yi; Bo Liang; Yu Yang; Nan He; Katie Larson Ode; Aliye Uc; Kai Wang; Katherine N Gibson-Corley; John F Engelhardt; Andrew W Norris Journal: J Cyst Fibros Date: 2019-02-07 Impact factor: 5.482