Literature DB >> 25205495

Automatic classification of protein structures using physicochemical parameters.

Abhilash Mohan1, M Divya Rao, Shruthi Sunderrajan, Gautam Pennathur.   

Abstract

Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25205495     DOI: 10.1007/s12539-013-0199-0

Source DB:  PubMed          Journal:  Interdiscip Sci        ISSN: 1867-1462            Impact factor:   2.233


  2 in total

1.  A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

Authors:  Rabia Aziz; C K Verma; Namita Srivastava
Journal:  Genom Data       Date:  2016-02-23

2.  Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods.

Authors:  Muhammad Fazal Ijaz; Muhammad Attique; Youngdoo Son
Journal:  Sensors (Basel)       Date:  2020-05-15       Impact factor: 3.576

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.