Karim Abbasi1, Parvin Razzaghi2, Antti Poso3, Massoud Amanlou4, Jahan B Ghasemi5, Ali Masoudi-Nejad1. 1. Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran. 2. Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 4513766731, Iran. 3. School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland. 4. Department of Medicinal Chemistry, Drug Design and Development Research Center, Tehran University of Medical Sciences, Tehran 1416753955, Iran. 5. Chemistry Department, Faculty of Sciences, University of Tehran, Tehran 1417614418, Iran.
Abstract
MOTIVATION: An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity. RESULTS: To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations. AVAILABILITY AND IMPLEMENTATION: https://github.com/LBBSoft/DeepCDA.
MOTIVATION: An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity. RESULTS: To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations. AVAILABILITY AND IMPLEMENTATION: https://github.com/LBBSoft/DeepCDA.
Authors: Eric D Cosoreanu; Joseph Dooley; Joshua S Fryer; Shaun M Gordon; Nikhil Kharbanda; Martin Klamrowski; Patrick N L LaCasse; Thomas F Leung; Muneeb A Nasir; Chang Qiu; Aisha S Robinson; Derek Shao; Boyan R Siromahov; Evening Starlight; Christophe Tran; Christopher Wang; Yu-Kai Yang; Kevin Dick; Daniel G Kyrollos; James R Green Journal: Sci Rep Date: 2022-08-02 Impact factor: 4.996