Literature DB >> 21271978

Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features.

Lele Hu1, Lulu Zheng, Zhiwen Wang, Bing Li, Lei Liu.   

Abstract

Proteases are essential to most biological processes though they themselves remain intact during the processes. In this research, a computational approach was developed for predicting the families of proteases based on their sequences. According to the concept of pseudo amino acid composition, in order to catch the essential patterns for the sequences of proteases, the sample of a protein was formulated by a series of its biological features. There were a total of 132 biological features, which were sourced from various biochemical and physicochemical properties of the constituent amino acids. The importance of these features to the prediction is rated by Maximum Relevance Minimum Redundancy algorithm and then the Incremental Feature Selection was applied to select an optimal feature set, which was used to construct a predictor through the nearest neighbor algorithm. As a demonstration, the overall success rate by the jackknife test in identifying proteases among their seven families was 92.74%. It was revealed by further analysis on the optimal feature set that the secondary structure and amino acid composition play the key roles for the classification, which is quite consistent with some previous findings. The promising results imply that the predictor as presented in this paper may become a useful tool for studying proteases.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21271978     DOI: 10.2174/092986611795222795

Source DB:  PubMed          Journal:  Protein Pept Lett        ISSN: 0929-8665            Impact factor:   1.890


  4 in total

1.  Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

Authors:  Samad Jahandideh; Vinodh Srinivasasainagendra; Degui Zhi
Journal:  J Theor Biol       Date:  2012-08-03       Impact factor: 2.691

2.  Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network.

Authors:  Bi-Qing Li; Tao Huang; Lei Liu; Yu-Dong Cai; Kuo-Chen Chou
Journal:  PLoS One       Date:  2012-04-04       Impact factor: 3.240

3.  iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix.

Authors:  Xuan Xiao; Pu Wang; Kuo-Chen Chou
Journal:  PLoS One       Date:  2012-02-21       Impact factor: 3.240

4.  A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins.

Authors:  Xiao Wang; Guo-Zheng Li
Journal:  PLoS One       Date:  2012-05-22       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.