Literature DB >> 26563228

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Guang-Hui Liu1,2, Hong-Bin Shen3, Dong-Jun Yu4.   

Abstract

Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

Entities:  

Keywords:  Data cleaning; Imbalanced learning; Post-filtering; Protein–protein interaction sites; Random forests

Mesh:

Substances:

Year:  2015        PMID: 26563228     DOI: 10.1007/s00232-015-9856-z

Source DB:  PubMed          Journal:  J Membr Biol        ISSN: 0022-2631            Impact factor:   1.843


  78 in total

1.  Prediction of protein interaction sites from sequence profile and residue neighbor list.

Authors:  H X Zhou; Y Shan
Journal:  Proteins       Date:  2001-08-15

2.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate.

Authors:  Bing Wang; Peng Chen; De-Shuang Huang; Jing-jing Li; Tat-Ming Lok; Michael R Lyu
Journal:  FEBS Lett       Date:  2005-12-19       Impact factor: 4.124

3.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces.

Authors:  Nicholas J Burgoyne; Richard M Jackson
Journal:  Bioinformatics       Date:  2006-03-07       Impact factor: 6.937

Review 4.  Targeting and tinkering with interaction networks.

Authors:  Robert B Russell; Patrick Aloy
Journal:  Nat Chem Biol       Date:  2008-11       Impact factor: 15.040

5.  TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.

Authors:  Xue He; Ke Han; Jun Hu; Hui Yan; Jing-Yu Yang; Hong-Bin Shen; Dong-Jun Yu
Journal:  J Membr Biol       Date:  2015-06-10       Impact factor: 1.843

6.  iMem-Seq: A Multi-label Learning Classifier for Predicting Membrane Proteins Types.

Authors:  Xuan Xiao; Hong-Liang Zou; Wei-Zhong Lin
Journal:  J Membr Biol       Date:  2015-03-22       Impact factor: 1.843

7.  Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition.

Authors:  Jianhua Jia; Zi Liu; Xuan Xiao; Bingxiang Liu; Kuo-Chen Chou
Journal:  J Biomol Struct Dyn       Date:  2015-10-29

8.  Identification of real microRNA precursors with a pseudo structure status composition approach.

Authors:  Bin Liu; Longyun Fang; Fule Liu; Xiaolong Wang; Junjie Chen; Kuo-Chen Chou
Journal:  PLoS One       Date:  2015-03-30       Impact factor: 3.240

9.  Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors:  Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2010-12-17       Impact factor: 2.691

10.  A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction.

Authors:  Jun Hu; Xue He; Dong-Jun Yu; Xi-Bei Yang; Jing-Yu Yang; Hong-Bin Shen
Journal:  PLoS One       Date:  2014-09-17       Impact factor: 3.240

View more
  6 in total

1.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique.

Authors:  Xiaoying Wang; Bin Yu; Anjun Ma; Cheng Chen; Bingqiang Liu; Qin Ma
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

2.  Machine-learning techniques for the prediction of protein-protein interactions.

Authors:  Debasree Sarkar; Sudipto Saha
Journal:  J Biosci       Date:  2019-09       Impact factor: 1.826

3.  MultiP-Apo: A Multilabel Predictor for Identifying Subcellular Locations of Apoptosis Proteins.

Authors:  Xiao Wang; Hui Li; Rong Wang; Qiuwen Zhang; Weiwei Zhang; Yong Gan
Journal:  Comput Intell Neurosci       Date:  2017-07-04

4.  Imbalance learning for the prediction of N6-Methylation sites in mRNAs.

Authors:  Zhixun Zhao; Hui Peng; Chaowang Lan; Yi Zheng; Liang Fang; Jinyan Li
Journal:  BMC Genomics       Date:  2018-08-01       Impact factor: 3.969

Review 5.  Machine Learning and Integrative Analysis of Biomedical Big Data.

Authors:  Bilal Mirza; Wei Wang; Jie Wang; Howard Choi; Neo Christopher Chung; Peipei Ping
Journal:  Genes (Basel)       Date:  2019-01-28       Impact factor: 4.096

6.  SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences.

Authors:  Jian Zhang; Lukasz Kurgan
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.