Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence.

Literature DB >> 35098379

A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence.

SiJie Yao¹, ChunHou Zheng², Bing Wang³, Peng Chen^4,5.

Abstract

Protein hot spot residues are functional sites in protein-protein interactions. Biological experimental methods are traditionally used to identify hot spot residues, which is laborious and time-consuming. Thus a variety of computational methods were widely used in recent years. Despite the success of computational methods in hot spot identification, most of them are impractical in reality because they can recognize hot spot residues only from known protein-protein interface residues. Therefore, identifying hot spots from whole protein sequence is a meaningful and interesting issue. However, it will bring extreme imbalance between positive and negative samples. Hot spot residues only account for about 1-2% of whole protein sequences. To address the issue, this paper proposes a two-step ensemble model for identifying hot spot residues from extremely unbalanced data set. The model is composed of 134 classifiers constructed by base KNN and SVM. Compared to the previous methods, our model yields good performance with an F1 score of 0.593 on the BID test set. Furthermore, to validate the robustness of our model, it was tested on other three independent test sets and also achieved good predictions. More importantly, the performance of our model tested on unbalanced data set is comparable with other methods tested on balanced hot spot data set.

Entities: Chemical

Keywords: Ensemble learning; F1 score; Protein hot spot residues; Unbalanced data set

Mesh：

Substances：
Proteins

Year: 2022 PMID： 35098379 DOI： 10.1007/s00726-022-03129-5

Source DB: PubMed Journal: Amino Acids ISSN： 0939-4451 Impact factor: 3.520

Keyword Cloud
References

1 in total

1. Stabilizing a flexible interdomain hinge region harboring the SMB binding site drives uPAR into its closed conformation.

Authors: Baoyu Zhao; Sonu Gandhi; Cai Yuan; Zhipu Luo; Rui Li; Henrik Gårdsvoll; Valentina de Lorenzi; Nicolai Sidenius; Mingdong Huang; Michael Ploug
Journal: J Mol Biol Date: 2015-02-07 Impact factor: 5.469

1 in total