Mostafa Karimi1,2, Di Wu1, Zhangyang Wang3, Yang Shen1,2. 1. Department of Electrical and Computer Engineering, College Station, TX, USA. 2. TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, College Station, TX, USA. 3. Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.
Abstract
MOTIVATION: Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy and interpretability. RESULTS: We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC50 within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy and interpretability. RESULTS: We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC50 within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: L F Iversen; H S Andersen; S Branner; S B Mortensen; G H Peters; K Norris; O H Olsen; C B Jeppesen; B F Lundt; W Ripka; K B Møller; N P Møller Journal: J Biol Chem Date: 2000-04-07 Impact factor: 5.157
Authors: Michael J Keiser; Vincent Setola; John J Irwin; Christian Laggner; Atheir I Abbas; Sandra J Hufeisen; Niels H Jensen; Michael B Kuijer; Roberto C Matos; Thuy B Tran; Ryan Whaley; Richard A Glennon; Jérôme Hert; Kelan L H Thomas; Douglas D Edwards; Brian K Shoichet; Bryan L Roth Journal: Nature Date: 2009-11-01 Impact factor: 49.962
Authors: Michael Kuhn; Christian von Mering; Monica Campillos; Lars Juhl Jensen; Peer Bork Journal: Nucleic Acids Res Date: 2007-12-15 Impact factor: 16.971
Authors: Rahmad Akbar; Habib Bashour; Puneet Rawat; Philippe A Robert; Eva Smorodina; Tudor-Stefan Cotet; Karine Flem-Karlsen; Robert Frank; Brij Bhushan Mehta; Mai Ha Vu; Talip Zengin; Jose Gutierrez-Marcos; Fridtjof Lund-Johansen; Jan Terje Andersen; Victor Greiff Journal: MAbs Date: 2022 Jan-Dec Impact factor: 5.857
Authors: Jalil Villalobos-Alva; Luis Ochoa-Toledo; Mario Javier Villalobos-Alva; Atocha Aliseda; Fernando Pérez-Escamirosa; Nelly F Altamirano-Bustamante; Francine Ochoa-Fernández; Ricardo Zamora-Solís; Sebastián Villalobos-Alva; Cristina Revilla-Monsalve; Nicolás Kemper-Valverde; Myriam M Altamirano-Bustamante Journal: Front Bioeng Biotechnol Date: 2022-07-07