Literature DB >> 32472446

A two-stage approach towards protein secondary structure classification.

Kushal Kanti Ghosh1, Soulib Ghosh2, Sagnik Sen2, Ram Sarkar2, Ujjwal Maulik2.   

Abstract

Protein secondary structure (PSS) describes the local folded structures which get formed inside a polypeptide due to interactions among atoms of the backbone. Generally, globular proteins are divided into four classes, namely all-α, all-β, α + β, and α/β. As nearly 90% of proteins fall into the said four classes, these are mostly considered for the purpose of computational classification of proteins. Classification of PSS is important for different biological functions that include protein fold recognition, tertiary structure prediction, prediction of DNA-binding sites, and reduction of the conformation search space among others. In this paper, we have proposed a machine learning-based model for secondary structure classification of proteins into four classes: all-α, all-β, α + β, and α/β. In doing so, we have considered both sequence-based and structure-based features. At first, mutual information (MI), a filter-based feature selection method, is used to remove the redundant features, and then these selected features are used to train three different classifiers-random forest, K-nearest neighbor (KNN), and multi-layer perceptron (MLP). After that, some standard classifier combination approaches are applied to integrate the decision made by the said classifiers and it has been found that weighted product rule performs the best among all. The overall accuracies obtained using the proposed model on the four standard datasets, namely 640, 1189, 25pdb, and fc699 are 86.89%, 92.93%, 91.38%, and 94.87% respectively. The proposed model outperforms some state-of-the-art methods considered here for comparison. Significantly high classification accuracy produced by our proposed model on four datasets is attributed to the development of a comprehensive feature set (by eliminating redundant features through feature selection technique) which is then passed through an ensemble consists of three different classifiers. Assigning different weights to the outcome of different classifiers thus proved to be useful in designing the model for predicting the secondary structure of proteins based on its sequence-based and structure-based features. Graphical abstract.

Entities:  

Keywords:  Classifier combination; Feature selection; Protein; Protein sequence; Secondary structure

Mesh:

Substances:

Year:  2020        PMID: 32472446     DOI: 10.1007/s11517-020-02194-w

Source DB:  PubMed          Journal:  Med Biol Eng Comput        ISSN: 0140-0118            Impact factor:   2.602


  1 in total

1.  A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.

Authors:  Mohammad Saleh Refahi; A Mir; Jalal A Nasiri
Journal:  Sci Rep       Date:  2020-09-01       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.