Literature DB >> 28113437

Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.

Zhanzhan Cheng, Shuigeng Zhou, Yang Wang, Hui Liu, Jihong Guan, Yi-Ping Phoebe Chen.   

Abstract

Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. More information can be found at http://admis.fudan.edu.cn/projects/pucpi.html.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 28113437     DOI: 10.1109/TCBB.2016.2570211

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.

Authors:  Mostafa Karimi; Di Wu; Zhangyang Wang; Yang Shen
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

2.  SSGraphCPI: A Novel Model for Predicting Compound-Protein Interactions Based on Deep Learning.

Authors:  Xun Wang; Jiali Liu; Chaogang Zhang; Shudong Wang
Journal:  Int J Mol Sci       Date:  2022-03-29       Impact factor: 5.923

3.  CGINet: graph convolutional network-based model for identifying chemical-gene interaction in an integrated multi-relational graph.

Authors:  Wei Wang; Xi Yang; Chengkun Wu; Canqun Yang
Journal:  BMC Bioinformatics       Date:  2020-11-26       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.