| Literature DB >> 35306525 |
Maha A Thafar1,2, Mona Alshahrani3, Somayah Albaradei1,4, Takashi Gojobori1, Magbubah Essack5, Xin Gao6.
Abstract
Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.Entities:
Mesh:
Year: 2022 PMID: 35306525 PMCID: PMC8934358 DOI: 10.1038/s41598-022-08787-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Statistics for the drug-target binding affinity benchmark datasets.
| Datasets | No. of drugs | No. of proteins | Known DTBA | Density (%) | Refs |
|---|---|---|---|---|---|
| Davis | 68 | 442 | 30,056 | 100 | [ |
| KIBA | 2116 | 229 | 118,254 | 24.4 | [ |
Figure 1The pipeline of Affinity2Vec, which consists of four main steps and three models for feature extraction a, b, and c.
Figure 2The seq2seq fingerprint architecture used to generate drug embeddings. For the drug’s SMILES 2D structure image we used the smi2img tool http://hulab.rxnfinder.org/smi2img/.
The seq2seq model’s parameter optimization, bold fonts indicate the selected values.
| Parameters | Tested values |
|---|---|
| Feature Vector length | { |
| Learning rate | {0.1, |
| Number of layers | { |
| Batch size | {5, |
| Dropout value | {0, |
| Variational autoencoder | (True, |
Figure 3The process used to generate three lists of non-overlapping biological words that served as the corpus for ProtVec training.
Best-obtained results among all variants of Affinity2Vec for Davis & KIBA datasets in term of MSE and CI.
| Model name | Features type | Number of features | Davis Dataset | KIBA Dataset | ||
|---|---|---|---|---|---|---|
| MSE | CI | MSE | CI | |||
Affinity2Vec Pscore | (a) Meta-path scores of G1 | 12 | 0.251 | 0.886 | 0.247 | 0.83 |
| (b) Meta-path scores of G2 | 12 | 0.35 | 0.85 | |||
| (c) Meta-path scores of G1 & G2 | 24 | 0.253 | 0.885 | |||
Affinity2Vec Embed | Dr SMILES embeddings FV + Pr aaseq embeddings FV | D (228) K (356) | 0.339 | 0.857 | 0.295 | 0.80 |
Affinity2Vec Hybrid | (a) Meta-path scores of G1 + Dr SMILES embeddings FV + Pr aaseq embeddings FV | D (240) K (368) | 0.194 | 0.854 | ||
| (b) Meta-path scores of G2 + Dr SMILES embeddings FV + Pr aaseq embeddings FV | D (240) K (368) | 0.325 | 0.861 | 0.124 | ||
| (c) Meta-path scores of G1 & G2 + Dr SMILES embeddings FV + Pr aaseq embeddings FV | D (252) K (380) | 0.124 | 0.905 | |||
Bold font with underline indicates the best results, and bold font alone shows the second-best results.
Dr Drugs, Pr proteins, aaseq amino-acid sequence, D Davis dataset, K KIBA dataset. G1 consists of (DTBA training part, DDsim1 2D chemical structures similarity, TTsim1 targets’ sequence alignment similarity using normalized Smith-waterman scores), G2 consists of (DTBA training part, DDsim2 drugs’ SMILES embeddings cosine similarity, TTsim2 targets’ sequence embeddings cosine similarity).
Figure 4The average performance on the test set of each evaluation measurement (MSE, CI, rm2, and AUPR) across Davis and KIBA datasets for each method.
Comparing Affinity2Vec with five baseline methods in terms of CI, MSE, rm2, and AUPR scores for the Davis dataset on the test data.
| Method | CI | MSE | rm2 | AUPR | References |
|---|---|---|---|---|---|
| 0.871 | 0.379 | 0.407 | 0.661 | [ | |
| 0.836 | 0.282 | 0.644 | 0.709 | [ | |
| 0.878 | 0.261 | 0.714 | [ | ||
| 0.88 | 0.271 | 0.653 | 0.691 | [ | |
| 0.649 | [ | ||||
Bold font with underline indicates the best results, bold font alone shows the second-best results, and italic font shows the third-best results.
Comparing Affinity2Vec with five baseline methods in terms of CI, MSE, rm2, and AUPR scores for the KIBA dataset on the test data.
| Method | CI | MSE | rm2 | AUPR | References |
|---|---|---|---|---|---|
| 0.782 | 0.411 | 0.342 | 0.635 | [ | |
| 0.836 | 0.222 | 0.629 | 0.76 | [ | |
| 0.863 | 0.194 | 0.673 | 0.788 | [ | |
| 0.866 | 0.224 | 0.675 | 0.753 | [ | |
| [ | |||||
Bold font with underline indicates the best results, bold font alone shows the second-best results, and italic font shows the third-best results.
Predicted and actual affinity values of the top 5 drug-target pairs for Davis and KIBA datasets.
| Dataset | Drug ID PubChem/CHEMBLE | Protein ID | Protein name (Primary gene name) | Predicted Aff values | Actual Aff values |
|---|---|---|---|---|---|
| Davis | 3,025,986 | P51451 | Tyrosine-protein kinase Blk (BLK) | 4.6658 | 5.00 |
| 3,025,986 | Q9H2G2 | STE20-like serine/threonine-protein kinase (SLK) | 4.6973 | 5.00 | |
| 10,138,260 | P42680 | Tyrosine-protein kinase Tec (TEC) | 4.7683 | 5.00 | |
| 44,150,621 | Q9UBE8 | Serine/threonine-protein kinase (NLK) | 4.7820 | 5.00 | |
| 17,755,052 | O14976 | Cyclin-G-associated kinase (GAK) | 4.7956 | 5.5686 | |
| KIBA | CHEMBL373751 | O14920 | Inhibitor of nuclear factor kappa-B kinase subunit beta (IKBKB) | 4.2933 | 6.09999 |
| CHEMBL281470 | Q05655 | Protein kinase C delta type (PRKCD) | 9.0828 | 9.2397 | |
| CHEMBL7929 | Q05655 | Protein kinase C delta type (PRKCD) | 9.2187 | 9.4010 | |
| CHEMBL2163772 | P35968 | Vascular endothelial growth factor receptor 2 (KDR) | 9.2408 | 9.10003 | |
| CHEMBL8163 | P05771 | Protein kinase C beta type (PRKCB) | 9.3374 | 9.4010 |
The lower the affinity (AFF) value, the stronger the binding between drug and protein. The strongest binding affinity value in the Davis dataset is 5.00, while is 6.1 the strongest binding affinity value in the KIBA dataset after removing the zero values.
Figure 5Binding affinities predicted by the Affinity2Vec best model vs. the actual binding affinity values for drug-target pairs in the Davis and KIBA datasets.
Performance results of Affinity2Vec, MDeePred, and the MoleculeNet benchmarking methods (under setting-2) based on the PDBBind refined dataset.
| Method | RMSE | CI | Average AUPR | |
|---|---|---|---|---|
| MDeePred | ||||
| MoleculeNet | GridF-RF | 1.844 | 0.723 | |
| GridF-DNN | 1.901 | 0.67 | 0.643 | |
| ECFP-RF | 1.791 | 0.657 | 0.638 | |
| ECFP-DNN | 2.292 | 0.608 | 0.545 | |
| Affinity2Vec | Embed model FV | |||
| Pscore model FV | 0.663 | 0.673 | ||
| Hybrid model FV | 0.723 | |||
Bold font with underline (indicate the best results), bold font (indicate the second-best results), and italics (indicate the third-best results).