| Literature DB >> 25343279 |
Huda A Maghawry1, Mostafa G M Mostafa, Tarek F Gharib.
Abstract
One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.Entities:
Keywords: algorithms; distance geometry; protein families; protein structure; structural and functional genomics
Mesh:
Substances:
Year: 2014 PMID: 25343279 DOI: 10.1089/cmb.2014.0137
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479