Literature DB >> 34119922

Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis.

Xue Wang1, Yaqun Zhang1, Bin Yu2, Adil Salhi3, Ruixin Chen1, Lin Wang1, Zengfeng Liu1.   

Abstract

Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and time-consuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISP-XGBoost method can further enhance the prediction of PPI sites.
Copyright © 2021 Elsevier Ltd. All rights reserved.

Keywords:  Feature extraction; KPCA; Protein-protein interaction sites; SMOTE; XGBoost

Mesh:

Substances:

Year:  2021        PMID: 34119922     DOI: 10.1016/j.compbiomed.2021.104516

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  4 in total

1.  DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.

Authors:  Yan Zhang; Zhiwen Jiang; Cheng Chen; Qinqin Wei; Haiming Gu; Bin Yu
Journal:  Interdiscip Sci       Date:  2021-11-03       Impact factor: 2.233

2.  Kernel principal component analysis and differential non-linear feature extraction of pesticide residues on fruit surface based on surface-enhanced Raman spectroscopy.

Authors:  Guolong Shi; Xinyi Shen; Huan Ren; Yuan Rao; Shizhuang Weng; Xianghu Tang
Journal:  Front Plant Sci       Date:  2022-07-19       Impact factor: 6.627

3.  Deep Learning-Based Approach for Heat Transfer Efficiency Prediction with Deep Feature Extraction.

Authors:  Yuanhao Shi; Mengwei Li; Jie Wen; Yanru Yang; Jianchao Zeng
Journal:  ACS Omega       Date:  2022-08-24

4.  DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform.

Authors:  Farman Ali; Omar Barukab; Ajay B Gadicha; Shruti Patil; Omar Alghushairy; Akram Y Sarhan
Journal:  Comput Intell Neurosci       Date:  2022-09-28
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.