Literature DB >> 35098379

A two-step ensemble learning for predicting protein hot spot residues from whole protein sequence.

SiJie Yao1, ChunHou Zheng2, Bing Wang3, Peng Chen4,5.   

Abstract

Protein hot spot residues are functional sites in protein-protein interactions. Biological experimental methods are traditionally used to identify hot spot residues, which is laborious and time-consuming. Thus a variety of computational methods were widely used in recent years. Despite the success of computational methods in hot spot identification, most of them are impractical in reality because they can recognize hot spot residues only from known protein-protein interface residues. Therefore, identifying hot spots from whole protein sequence is a meaningful and interesting issue. However, it will bring extreme imbalance between positive and negative samples. Hot spot residues only account for about 1-2% of whole protein sequences. To address the issue, this paper proposes a two-step ensemble model for identifying hot spot residues from extremely unbalanced data set. The model is composed of 134 classifiers constructed by base KNN and SVM. Compared to the previous methods, our model yields good performance with an F1 score of 0.593 on the BID test set. Furthermore, to validate the robustness of our model, it was tested on other three independent test sets and also achieved good predictions. More importantly, the performance of our model tested on unbalanced data set is comparable with other methods tested on balanced hot spot data set.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.

Entities:  

Keywords:  Ensemble learning; F1 score; Protein hot spot residues; Unbalanced data set

Mesh:

Substances:

Year:  2022        PMID: 35098379     DOI: 10.1007/s00726-022-03129-5

Source DB:  PubMed          Journal:  Amino Acids        ISSN: 0939-4451            Impact factor:   3.520


  1 in total

1.  Stabilizing a flexible interdomain hinge region harboring the SMB binding site drives uPAR into its closed conformation.

Authors:  Baoyu Zhao; Sonu Gandhi; Cai Yuan; Zhipu Luo; Rui Li; Henrik Gårdsvoll; Valentina de Lorenzi; Nicolai Sidenius; Mingdong Huang; Michael Ploug
Journal:  J Mol Biol       Date:  2015-02-07       Impact factor: 5.469

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.