Literature DB >> 32768046

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.

Cheng Chen1, Qingmei Zhang1, Bin Yu2, Zhaomin Yu1, Patrick J Lawrence3, Qin Ma3, Yan Zhang4.   

Abstract

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.
Copyright © 2020 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Multi-information fusion; Protein-protein interactions; Stacked ensemble classifier; XGBoost

Mesh:

Year:  2020        PMID: 32768046     DOI: 10.1016/j.compbiomed.2020.103899

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  19 in total

Review 1.  Machine learning: its challenges and opportunities in plant system biology.

Authors:  Mohsen Hesami; Milad Alizadeh; Andrew Maxwell Phineas Jones; Davoud Torkamaneh
Journal:  Appl Microbiol Biotechnol       Date:  2022-05-16       Impact factor: 4.813

2.  DWPPI: A Deep Learning Approach for Predicting Protein-Protein Interactions in Plants Based on Multi-Source Information With a Large-Scale Biological Network.

Authors:  Jie Pan; Zhu-Hong You; Li-Ping Li; Wen-Zhun Huang; Jian-Xin Guo; Chang-Qing Yu; Li-Ping Wang; Zheng-Yang Zhao
Journal:  Front Bioeng Biotechnol       Date:  2022-03-21

3.  DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.

Authors:  Yan Zhang; Zhiwen Jiang; Cheng Chen; Qinqin Wei; Haiming Gu; Bin Yu
Journal:  Interdiscip Sci       Date:  2021-11-03       Impact factor: 2.233

4.  Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit.

Authors:  Hongyan Shi; Shengli Zhang
Journal:  Interdiscip Sci       Date:  2022-04-27       Impact factor: 3.492

5.  Protein-protein interaction and non-interaction predictions using gene sequence natural vector.

Authors:  Nan Zhao; Maji Zhuo; Kun Tian; Xinqi Gong
Journal:  Commun Biol       Date:  2022-07-02

6.  nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning.

Authors:  Yong-Zi Chen; Zhuo-Zhi Wang; Yanan Wang; Guoguang Ying; Zhen Chen; Jiangning Song
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

7.  Proteomic Approaches to Defining Remission and the Risk of Relapse in Rheumatoid Arthritis.

Authors:  Liam J O'Neil; Pingzhao Hu; Qian Liu; Md Mohaiminul Islam; Victor Spicer; Juergen Rech; Axel Hueber; Vidyanand Anaparti; Irene Smolik; Hani S El-Gabalawy; Georg Schett; John A Wilkins
Journal:  Front Immunol       Date:  2021-11-18       Impact factor: 7.561

8.  BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information.

Authors:  Lu Zhang; Xinyi Qin; Min Liu; Guangzhong Liu; Yuxiao Ren
Journal:  Comput Math Methods Med       Date:  2021-08-25       Impact factor: 2.238

9.  Regional Population Forecast and Analysis Based on Machine Learning Strategy.

Authors:  Chian-Yue Wang; Shin-Jye Lee
Journal:  Entropy (Basel)       Date:  2021-05-24       Impact factor: 2.524

10.  Development of a Web-Based Ensemble Machine Learning Application to Select the Optimal Size of Posterior Chamber Phakic Intraocular Lens.

Authors:  Eun Min Kang; Ik Hee Ryu; Geunyoung Lee; Jin Kuk Kim; In Sik Lee; Ga Hee Jeon; Hojin Song; Kazutaka Kamiya; Tae Keun Yoo
Journal:  Transl Vis Sci Technol       Date:  2021-05-03       Impact factor: 3.283

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.