Literature DB >> 33275143

DeepPurpose: a deep learning library for drug-target interaction prediction.

Kexin Huang¹, Tianfan Fu², Lucas M Glass³, Marinka Zitnik¹, Cao Xiao³, Jimeng Sun⁴.

Abstract

SUMMARY: Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets.
AVAILABILITY AND IMPLEMENTATION: https://github.com/kexinhuang12345/DeepPurpose. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2021 PMID： 33275143 PMCID： PMC8016467 DOI： 10.1093/bioinformatics/btaa1005

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Drug–target interactions (DTI) characterize the binding of compounds to protein targets (Santos ). Accurate identification of molecular drug targets is fundamental for drug discovery and development (Rutkowska ; Zitnik ) and is especially important for finding effective and safe treatments for new pathogens, including SARS-CoV-2 (Velavan and Meyer, 2020). Deep learning (DL) has advanced traditional computational modeling of compounds by offering an increased expressive power in identifying, processing and extrapolating complex patterns in molecular data (Lee ; Öztürk ). There are many DL models designed for DTI prediction (Lee et al., 2019; Nguyen ; Öztürk ). However, to generate predictions, deploy DL models in practice, test and evaluate model performance, one needs considerable programming skills and extensive biochemical knowledge. Prevailing tools are designed for experienced interdisciplinary researchers. They are challenging to use by both computer scientists entering the biomedical field and domain bioinformaticians with limited experience in training and deploying DL models. Furthermore, each open-sourced tool has a different programming interface and is coded differently, which prevents easy integration of outputs from various methods for model ensembles (Yang ). Here, we introduce DeepPurpose, a DL library for encoding and downstream prediction of proteins and compounds. DeepPurpose allows rapid prototyping via a programming framework that implements over 50 DL models, seven protein encoders and eight compound encoders. Empirically, we find that models implemented in DeepPurpose achieve state-of-the-art prediction performance on DTI benchmark datasets.

2 DeepPurpose library

DL models for DTI prediction can be formulated as an encoder-decoder architectures (Cho ). DeepPurpose library implements a unifying encoder-decoder framework, which makes the library uniquely flexible. By merely specifying an encoder’s name, the user can automatically connect a encoder of interest with the relevant decoder. DeepPurpose then trains the corresponding encoder-decoder model in an end-to-end manner. Finally, the user accesses the trained model either programmatically or via a visual interface and uses the model for DTI prediction.

2.1 Module for encoding proteins and compounds

DeepPurpose takes the compound’s simplified molecular-input line-entry system (SMILES) string and protein amino acid sequence pair as input. Then, they are fed into molecular encoders which specifies a deep transformation function that maps compounds and proteins to a vector representation. In particular, for compounds, DeepPurpose provides eight encoders using different modalities of compounds: Multi-Layer Perceptrons (MLP) on Morgan, PubChem, Daylight and RDKit 2D Fingerprint; Convolutional Neural Network (CNN) on SMILES strings; Recurrent Neural Network (RNN) on top of CNN; transformer encoders on substructure fingerprints; message passing graph neural network on molecular graph. For proteins, DeepPurpose provides seven encoders for the input amino acid sequence: MLP on Amino Acid Composition (AAC), Pseudo AAC, Conjoint Triad, Quasi-Sequence descriptors; CNN on amino acid sequences; RNN on top of CNN; transformer encoder on substructure fingerprints. Note that alternative input features may not work for a specific encoder architecture. The detailed encoder specifications and references are described in Supplementary Material.

2.2 Module for DTI prediction

DeepPurpose feeds the learned protein and compound embeddings into an MLP decoder to generate predictions. Output scores include both continuous binding scores, such as the median inhibitory concentration (), as well as binary outputs indicating whether a protein binds to a compound. The library detects whether the task is regression or classification and switches to the correct loss function and evaluation metrics. In the case of regression, we use the Mean Square Error (MSE) as the loss function and MSE, Concordance Index and Pearson Correlation as performance metrics. In the classification case, we use Binary Cross Entropy as the loss function and Area Under the Receiver Operating Characteristics (AUROC), Area Under Precision-Recall (AUPRC) and F-1 score as performance metrics. At inference, given new proteins and new compounds, DeepPurpose returns prediction scores representing predicted probabilities of binding between compounds and proteins.

2.3 Modules for other downstream prediction tasks

DeepPurpose includes repurposing and virtual_screening functions. Using only a few lines of codes that specify a list of compounds library to be screened upon and an optional set of training dataset, DeepPurpose trains five DL models, aggregates prediction results and generates a descriptive ranked list in which compound candidates with the highest predicted binding scores are placed at the top. If the user does not specify a training dataset, DeepPurpose uses a pre-trained deep model for prediction. This list can then be examined to identify promising compound candidates for further experiments. Second, DeepPurpose also supports user-friendly programming frameworks for other modeling tasks, including drug and protein property prediction, drug–drug interaction prediction and protein–protein interaction prediction (see Supplementary Material). Third, DeepPurpose provides an interface to many types of data, including public large binding affinity dataset (Liu ), bioassay data (Kim ) and a drug repurposing library (Corsello ).

2.4 Programming framework and implementation details

The functionality of DeepPurpose is modularized into six key steps where a single line of code can invoke each step: (i) Load the dataset from a local file or load a DeepPurpose benchmark dataset. (ii) Specify the names of compound and protein encoders. (iii) Split the dataset into training, validation and testing sets using data_process function, which implements a variety of data-split strategies. (iv) Create a configuration file and specify model parameters. If needed, DeepPurpose can automatically search for optimal values of hyper-parameters. (v) Initialize a model using the configuration file. Alternatively, the user can load a pre-trained model or a previously saved model. (vi) Finally, train the model using train function and monitor the progress of training and performance metrics. DeepPurpose is OS-agnostic and uses the Jupyter Notebook interface. It can be run in the cloud or locally. All datasets, models, documentation, installation instructions and tutorials are provided at https://github.com/kexinhuang12345/DeepPurpose.

3 Using DeepPurpose for DTI prediction

To demonstrate the use of DeepPurpose, we compare DeepPurpose with KronRLS (Pahikkala ), a popular DTI method, and GraphDTA (Nguyen ) and DeepDTA (Öztürk ), state-of-the-art DL methods. We find that many DeepPurpose models achieve comparable prediction performance on two benchmark datasets, DAVIS (Davis ) and KIBA (He ) (Fig. 1D). A complete script to generate the results is provided in Supplementary Material.

Fig. 1.

Overview of DeepPurpose library. (A) DeepPurpose takes as input the SMILES of a compound and a protein’s amino acid sequence and then generates embeddings for them. (B) The learned embeddings are then concatenated and fed into a decoder to predict DTI binding affinity. (C) DeepPurpose provides a simple but flexible programming framework that implements over 50 state-of-the-art DL models for DTI prediction. (D) DeepPurpose models achieve comparable performance with three other DTI prediction algorithms on two benchmark datasets. (E) Finally, DeepPurpose has many functionalities, including monitoring the training process, debugging and generation ranked lists for repurposing and screening. Further, DeepPurpose supports other downstream prediction tasks (e.g. drug–drug interaction prediction, compound property prediction)

4 DeepPurpose with interactive web interface

In addition to rapid model prototyping, DeepPurpose also provides utility functions to load a pre-trained model and make predictions for a new drug and target inputs. This functionality allows domain scientists to examine predictions quickly, modify the inputs based on predictions, and iterate on the process until finding a drug or target with desired properties. We leverage Gradio (Abid ) to create a web interface programmatically. We use a user-trained DeepPurpose model in the backend and create a custom web interface in fewer than ten code lines. This web interface takes the SMILES and amino acid sequence as the input and returns prediction scores with less than 1-second latency. We provide examples in the Supplementary Material. Financial Support: none declared.M.Z. and K.H. are supported, in part, by NSF grant nos. IIS-2030459 and IIS-2033384, and by the Harvard Data Science Initiative. T.F. and J.S. was in part supported by the NSF SCH-2014438, IIS-1418511, CCF-1533768, IIS-1838042, the NIH award NIH R01 1R01NS107291-01 and R56HL138415. Conflict of Interest: none declared. Click here for additional data file.

14 in total

1. Comprehensive analysis of kinase inhibitor selectivity.

Authors: Mindy I Davis; Jeremy P Hunt; Sanna Herrgard; Pietro Ciceri; Lisa M Wodicka; Gabriel Pallares; Michael Hocker; Daniel K Treiber; Patrick P Zarrinkar
Journal: Nat Biotechnol Date: 2011-10-30 Impact factor: 54.908

2. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities.

Authors: Marinka Zitnik; Francis Nguyen; Bo Wang; Jure Leskovec; Anna Goldenberg; Michael M Hoffman
Journal: Inf Fusion Date: 2018-09-21 Impact factor: 12.975

3. A Modular Probe Strategy for Drug Localization, Target Identification and Target Occupancy Measurement on Single Cell Level.

Authors: Anna Rutkowska; Douglas W Thomson; Johanna Vappiani; Thilo Werner; Katrin M Mueller; Lars Dittus; Jana Krause; Marcel Muelbaier; Giovanna Bergamini; Marcus Bantscheff
Journal: ACS Chem Biol Date: 2016-07-20 Impact factor: 5.100

4. The Drug Repurposing Hub: a next-generation drug library and information resource.

Authors: Steven M Corsello; Joshua A Bittker; Zihan Liu; Joshua Gould; Patrick McCarren; Jodi E Hirschman; Stephen E Johnston; Anita Vrcic; Bang Wong; Mariya Khan; Jacob Asiedu; Rajiv Narayan; Christopher C Mader; Aravind Subramanian; Todd R Golub
Journal: Nat Med Date: 2017-04-07 Impact factor: 53.440

5. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities.

Authors: Tiqing Liu; Yuhmei Lin; Xin Wen; Robert N Jorissen; Michael K Gilson
Journal: Nucleic Acids Res Date: 2006-12-01 Impact factor: 16.971

6. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines.

Authors: Tong He; Marten Heidemeyer; Fuqiang Ban; Artem Cherkasov; Martin Ester
Journal: J Cheminform Date: 2017-04-18 Impact factor: 5.514

7. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences.

Authors: Ingoo Lee; Jongsoo Keum; Hojung Nam
Journal: PLoS Comput Biol Date: 2019-06-14 Impact factor: 4.475

8. Analyzing Learned Molecular Representations for Property Prediction.

Authors: Kevin Yang; Kyle Swanson; Wengong Jin; Connor Coley; Philipp Eiden; Hua Gao; Angel Guzman-Perez; Timothy Hopper; Brian Kelley; Miriam Mathea; Andrew Palmer; Volker Settels; Tommi Jaakkola; Klavs Jensen; Regina Barzilay
Journal: J Chem Inf Model Date: 2019-08-13 Impact factor: 4.956

9. The COVID-19 epidemic.

Authors: Thirumalaisamy P Velavan; Christian G Meyer
Journal: Trop Med Int Health Date: 2020-02-16 Impact factor: 2.622

10. Toward more realistic drug-target interaction predictions.

Authors: Tapio Pahikkala; Antti Airola; Sami Pietilä; Sushil Shakyawar; Agnieszka Szwajda; Jing Tang; Tero Aittokallio
Journal: Brief Bioinform Date: 2014-04-09 Impact factor: 11.622

22 in total

Review 1. New Insights Into Drug Repurposing for COVID-19 Using Deep Learning.

Authors: Chun Yen Lee; Yi-Ping Phoebe Chen
Journal: IEEE Trans Neural Netw Learn Syst Date: 2021-10-27 Impact factor: 10.451

2. Targeting SARS-CoV-2 endoribonuclease: a structure-based virtual screening supported by in vitro analysis.

Authors: Ibrahim M Ibrahim; Abdo A Elfiky; Mohamed M Fathy; Sara H Mahmoud; Mahmoud ElHefnawi
Journal: Sci Rep Date: 2022-08-03 Impact factor: 4.996

3. BETA: a comprehensive benchmark for computational drug-target prediction.

Authors: Nansu Zong; Ning Li; Andrew Wen; Victoria Ngo; Yue Yu; Ming Huang; Shaika Chowdhury; Chao Jiang; Sunyang Fu; Richard Weinshilboum; Guoqian Jiang; Lawrence Hunter; Hongfang Liu
Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994

4. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning.

Authors: Bowei Yan; Xiaona Ye; Jing Wang; Junshan Han; Lianlian Wu; Song He; Kunhong Liu; Xiaochen Bo
Journal: Molecules Date: 2022-05-12 Impact factor: 4.927

5. Novel 1,2,3-Triazole Erlotinib Derivatives as Potent IDO1 Inhibitors: Design, Drug-Target Interactions Prediction, Synthesis, Biological Evaluation, Molecular Docking and ADME Properties Studies.

Authors: Gui-Qing Xu; Xiao-Qing Gong; Ying-Ying Zhu; Xiao-Jun Yao; Li-Zeng Peng; Ge Sun; Jian-Xue Yang; Long-Fei Mao
Journal: Front Pharmacol Date: 2022-05-23 Impact factor: 5.988

6. Deep Learning and Structure-Based Virtual Screening for Drug Discovery against NEK7: A Novel Target for the Treatment of Cancer.

Authors: Mubashir Aziz; Syeda Abida Ejaz; Seema Zargar; Naveed Akhtar; Abdullahi Tunde Aborode; Tanveer A Wani; Gaber El-Saber Batiha; Farhan Siddique; Mohammed Alqarni; Ashraf Akintayo Akintola
Journal: Molecules Date: 2022-06-25 Impact factor: 4.927

7. CSConv2d: A 2-D Structural Convolution Neural Network with a Channel and Spatial Attention Mechanism for Protein-Ligand Binding Affinity Prediction.

Authors: Xun Wang; Dayan Liu; Jinfu Zhu; Alfonso Rodriguez-Paton; Tao Song
Journal: Biomolecules Date: 2021-04-27

8. Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning.

Authors: Maha A Thafar; Mona Alshahrani; Somayah Albaradei; Takashi Gojobori; Magbubah Essack; Xin Gao
Journal: Sci Rep Date: 2022-03-19 Impact factor: 4.379

9. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation.

Authors: Sk Mazharul Islam; Sk Md Mosaddek Hossain; Sumanta Ray
Journal: PLoS One Date: 2021-02-19 Impact factor: 3.240

10. Identification of drug compounds for keloids and hypertrophic scars: drug discovery based on text mining and DeepPurpose.

Authors: Yuyan Pan; Zhiwei Chen; Fazhi Qi; Jiaqi Liu
Journal: Ann Transl Med Date: 2021-02