Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A two-stage approach towards protein secondary structure classification.

Literature DB >> 32472446

A two-stage approach towards protein secondary structure classification.

Kushal Kanti Ghosh¹, Soulib Ghosh², Sagnik Sen², Ram Sarkar², Ujjwal Maulik².

Abstract

Protein secondary structure (PSS) describes the local folded structures which get formed inside a polypeptide due to interactions among atoms of the backbone. Generally, globular proteins are divided into four classes, namely all-α, all-β, α + β, and α/β. As nearly 90% of proteins fall into the said four classes, these are mostly considered for the purpose of computational classification of proteins. Classification of PSS is important for different biological functions that include protein fold recognition, tertiary structure prediction, prediction of DNA-binding sites, and reduction of the conformation search space among others. In this paper, we have proposed a machine learning-based model for secondary structure classification of proteins into four classes: all-α, all-β, α + β, and α/β. In doing so, we have considered both sequence-based and structure-based features. At first, mutual information (MI), a filter-based feature selection method, is used to remove the redundant features, and then these selected features are used to train three different classifiers-random forest, K-nearest neighbor (KNN), and multi-layer perceptron (MLP). After that, some standard classifier combination approaches are applied to integrate the decision made by the said classifiers and it has been found that weighted product rule performs the best among all. The overall accuracies obtained using the proposed model on the four standard datasets, namely 640, 1189, 25pdb, and fc699 are 86.89%, 92.93%, 91.38%, and 94.87% respectively. The proposed model outperforms some state-of-the-art methods considered here for comparison. Significantly high classification accuracy produced by our proposed model on four datasets is attributed to the development of a comprehensive feature set (by eliminating redundant features through feature selection technique) which is then passed through an ensemble consists of three different classifiers. Assigning different weights to the outcome of different classifiers thus proved to be useful in designing the model for predicting the secondary structure of proteins based on its sequence-based and structure-based features. Graphical abstract.

Entities: Gene

Keywords: Classifier combination; Feature selection; Protein; Protein sequence; Secondary structure

Mesh：

Substances：
Peptides
Proteins

Year: 2020 PMID： 32472446 DOI： 10.1007/s11517-020-02194-w

Source DB: PubMed Journal: Med Biol Eng Comput ISSN： 0140-0118 Impact factor: 2.602

Keyword Cloud
Cited

1 in total

1. A novel fusion based on the evolutionary features for protein fold recognition using support vector machines.

Authors: Mohammad Saleh Refahi; A Mir; Jalal A Nasiri
Journal: Sci Rep Date: 2020-09-01 Impact factor: 4.379

1 in total