Literature DB >> 26932275

Issues in performance evaluation for host-pathogen protein interaction prediction.

Wajid Arshad Abbasi1, Fayyaz Ul Amir Afsar Minhas1.   

Abstract

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

Entities:  

Keywords:  Performance evaluation; cross-validation; host–pathogen interactions; machine learning; protein–protein interactions

Mesh:

Substances:

Year:  2016        PMID: 26932275     DOI: 10.1142/S0219720016500116

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  3 in total

1.  Learned protein embeddings for machine learning.

Authors:  Kevin K Yang; Zachary Wu; Claire N Bedbrook; Frances H Arnold
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

2.  Predicting protein-binding regions in RNA using nucleotide profiles and compositions.

Authors:  Daesik Choi; Byungkyu Park; Hanju Chae; Wook Lee; Kyungsook Han
Journal:  BMC Syst Biol       Date:  2017-03-14

3.  Learning protein binding affinity using privileged information.

Authors:  Wajid Arshad Abbasi; Amina Asif; Asa Ben-Hur; Fayyaz Ul Amir Afsar Minhas
Journal:  BMC Bioinformatics       Date:  2018-11-15       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.