Literature DB >> 22286086

Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions.

Yongqing Zhang1, Danling Zhang, Gang Mi, Daichuan Ma, Gongbing Li, Yanzhi Guo, Menglong Li, Min Zhu.   

Abstract

In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. Copyright Â
© 2011 Elsevier Ltd. All rights reserved.

Mesh:

Year:  2012        PMID: 22286086     DOI: 10.1016/j.compbiolchem.2011.12.003

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  7 in total

1.  Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach.

Authors:  Danaipong Chetchotsak; Sirorat Pattanapairoj; Banchar Arnonkijpanich
Journal:  Cogn Neurodyn       Date:  2015-07-31       Impact factor: 5.082

2.  Improving the chances of successful protein structure determination with a random forest classifier.

Authors:  Samad Jahandideh; Lukasz Jaroszewski; Adam Godzik
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-02-15

3.  Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types.

Authors:  Weizhong Lin; Dong Xu
Journal:  Bioinformatics       Date:  2016-08-26       Impact factor: 6.937

4.  How to balance the bioinformatics data: pseudo-negative sampling.

Authors:  Yongqing Zhang; Shaojie Qiao; Rongzhao Lu; Nan Han; Dingxiang Liu; Jiliu Zhou
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

5.  A hybrid CNN-LSTM model for pre-miRNA classification.

Authors:  Abdulkadir Tasdelen; Baha Sen
Journal:  Sci Rep       Date:  2021-07-08       Impact factor: 4.379

6.  Computational prediction of the human-microbial oral interactome.

Authors:  Edgar D Coelho; Joel P Arrais; Sérgio Matos; Carlos Pereira; Nuno Rosa; Maria José Correia; Marlene Barros; José Luís Oliveira
Journal:  BMC Syst Biol       Date:  2014-02-27

7.  PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction.

Authors:  Jianzhuang Yao; Hong Guo; Xiaohan Yang
Journal:  Int J Genomics       Date:  2015-10-11       Impact factor: 2.326

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.