| Literature DB >> 31824921 |
Maha Thafar1,2, Arwa Bin Raies1, Somayah Albaradei1,3, Magbubah Essack1, Vladimir B Bajic1.
Abstract
The drug development is generally arduous, costly, and success rates are low. Thus, the identification of drug-target interactions (DTIs) has become a crucial step in early stages of drug discovery. Consequently, developing computational approaches capable of identifying potential DTIs with minimum error rate are increasingly being pursued. These computational approaches aim to narrow down the search space for novel DTIs and shed light on drug functioning context. Most methods developed to date use binary classification to predict if the interaction between a drug and its target exists or not. However, it is more informative but also more challenging to predict the strength of the binding between a drug and its target. If that strength is not sufficiently strong, such DTI may not be useful. Therefore, the methods developed to predict drug-target binding affinities (DTBA) are of great value. In this study, we provide a comprehensive overview of the existing methods that predict DTBA. We focus on the methods developed using artificial intelligence (AI), machine learning (ML), and deep learning (DL) approaches, as well as related benchmark datasets and databases. Furthermore, guidance and recommendations are provided that cover the gaps and directions of the upcoming work in this research area. To the best of our knowledge, this is the first comprehensive comparison analysis of tools focused on DTBA with reference to AI/ML/DL.Entities:
Keywords: artificial intelligence; bioinformatics; deep learning; drug repurposing; drug-target binding affinity; drug-target interaction; information integration; machine learning
Year: 2019 PMID: 31824921 PMCID: PMC6879652 DOI: 10.3389/fchem.2019.00782
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
Figure 1An overview of the different types of computational methods developed to predict drug-target interactions (DTIs) and drug-target binding affinity (DTBA) categories.
Figure 2A hypothetical example of a binding curve for ligand 1 and ligand 2. The x-axis shows the concentration of the ligand, and the y-axis shows the percentage of available binding sites (Θ) in a protein that is occupied by the ligand.
Figure 3Relationship between concentration of inhibitors and enzymes activity.
Binding affinity benchmark datasets statistics.
| Davis | 68 | 442 | 30,056 |
| Metz | 1,421 | 156 | 35,259 |
| Kiba | 2,116 | 229 | 118,254 |
| ToxCast | 7,675 | 335 | 530,605 |
Figure 4Flowchart of the general framework of deep learning (DL) models used for drug-target binding affinity (DTBA) prediction.
RMSE calculated using multiple settings for all baseline methods.
| Davis | 0.608 | 0.5109 | 0.43225 | 0.5119 | |||
| 0.84048 | N/A | N/A | 0.80644 | N/A | |||
| 0.65964 | N/A | N/A | 0.57840 | N/A | |||
| Metz | 0.562 | 0.1660 | N/A | 0.59926 | N/A | ||
| 0.78429 | N/A | N/A | 0.74292 | N/A | |||
| 0.89889 | N/A | N/A | 0.81893 | N/A | |||
| KIBA | 0.620 | 0.4405 | 0.43214 | 0.42308 | |||
| 0.70243 | N/A | N/A | 0.62029 | N/A | |||
| 0.68111 | N/A | N/A | 0.62345 | N/A | |||
| ToxCast | N/A | N/A | N/A | 0.40779 | N/A | ||
| N/A | N/A | N/A | 0.4485 | N/A | |||
| N/A | N/A | N/A | 0.49439 | N/A | |||
The star symbols denote results that are not self-reported, i.e., the single star
indicates that PADME reported the other methods results, double stars
indicates that DeepDTA reported the other methods results, and the triple stars
indicates that SimBoost reported the other methods results. Missing data are indicated with N/A. The best values for each setting are indicated in bold font.
CI across multiple datasets of all baseline methods.
| Davis | 0.8830 | 0.8840 | 0.8780 | 0.90388 | 0.8860 | ||
| N/A | N/A | 0.71630 | 0.72001 | N/A | |||
| N/A | N/A | 0.85503 | 0.84483 | N/A | |||
| Metz | 0.7930 | N/A | 0.80756 | 0.79400 | N/A | ||
| 0.7360 0.70916 | N/A | N/A | 0.74104 | N/A | |||
| 0.6660 | N/A | N/A | 0.69830 | N/A | |||
| KIBA | 0.782 | 0.8470 | 0.8630 | 0.85745 | 0.86370 | ||
| 0.6890 | N/A | N/A | 0.75450 | N/A | |||
| 0.7122 | N/A | N/A | 0.76790 | N/A | |||
| ToxCast | N/A | N/A | N/A | 0.79655 | N/A | ||
| N/A | N/A | N/A | 0.72057 | N/A | |||
| N/A | N/A | N/A | 0.68481 | N/A | |||
The star symbols denote results that are not self-reported, i.e., the single star
indicates that PADME reported the other methods results, and the double stars
indicates that DeepDTA reported the other methods results. Missing data are indicated with N/A. The best values for each setting are indicated in bold font.
Baseline methods features.
| Datasets | Davis, Metz | Davis, Metz, Kiba | Davis, Kiba | Davis, Kiba | Davis, Metz, Kiba, ToxCast |
| ML/DL | AI/ML | AI/ML | DL | DL | DL |
| Similarity (OR) Feature based method | Similarity-based | Similarity and feature based | Feature-based | Feature-based | Feature-based |
| Drug representation (or features) | PubChem Sim Chemical kernels | PubChem Sim + statistical and network features | SMILES | SMILES + LMCS | SMILES / ECFP |
| Protein representation (or features) | SW sim score, Normalized SW sim score | SW sim score | aaseq | aaseq + PDM | PSC |
| NN type for features learning | CNN | two 1D-CNN | GCNN | ||
| NN type for prediction | 3 FC layers | FC layer | Feedforward NN | ||
| Regressor/OR/activation function | KronRLS model | Gradient boosting model | ReLU | ReLU | ReLU |
| Validation setting | S1, S2, S3 | S1 | S1 | S1 | S1, S2, S3 |
| Cross Validation | Repeated 10-folds CV, Nested CV, LDO-CV, LTO-CV | 10 times 5 folds CV, LDO-CV, LTO-CV | 5 folds CV | 6 folds CV | 5 folds CV, LDO-CV, LTO-CV |
| Performance metrics | CI, MSE | CI, RMSE | CI, MSE, PCC | CI, MSE, PCC | CI, RMSE, R2 |
| Classification/Regression | Both | Both | Regression | Regression | Both |
| Year | 2014 | 2017 | 2018 | 2019 | 2018 |
ML, Machine Learning; DL, Deep Learning; Sim, Similarity; aaseq, amino-acid sequence; SPS, structural property sequence; PSC, protein sequence composition; PDM, protein domain and motif; ECFP, extended-connectivity fingerprint; LMCS, ligand maximum common substructure; KronRLS, Kronecker Regularized Least Square; CNN, convolutional neural network; GCNN, graph convolution neural network; RNN, recurrent neural network; FC, fully connected; ReLU, rectified linear unit; CV, cross validation; LDO, leave one drug out; LTO, leave one target out; MSE, Mean Square Error; RMSE, root square of mean square error; CI, concordance index; PCC, Pearson correlation coefficient.