Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automatic classification of protein structures using physicochemical parameters.

Literature DB >> 25205495

Automatic classification of protein structures using physicochemical parameters.

Abhilash Mohan¹, M Divya Rao, Shruthi Sunderrajan, Gautam Pennathur.

Abstract

Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

Entities: Disease

Mesh：

Substances：
Proteins

Year: 2014 PMID： 25205495 DOI： 10.1007/s12539-013-0199-0

Source DB: PubMed Journal: Interdiscip Sci ISSN： 1867-1462 Impact factor: 2.233

Keyword Cloud
Cited

2 in total

1. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

Authors: Rabia Aziz; C K Verma; Namita Srivastava
Journal: Genom Data Date: 2016-02-23

2. Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods.

Authors: Muhammad Fazal Ijaz; Muhammad Attique; Youngdoo Son
Journal: Sensors (Basel) Date: 2020-05-15 Impact factor: 3.576

2 in total