| Literature DB >> 31380152 |
Haiping Zhang1, Linbu Liao1, Konda Mani Saravanan1, Peng Yin1, Yanjie Wei1.
Abstract
Proteins interact with small molecules to modulate several important cellular functions. Many acute diseases were cured by small molecule binding in the active site of protein either by inhibition or activation. Currently, there are several docking programs to estimate the binding position and the binding orientation of protein-ligand complex. Many scoring functions were developed to estimate the binding strength and predict the effective protein-ligand binding. While the accuracy of current scoring function is limited by several aspects, the solvent effect, entropy effect, and multibody effect are largely ignored in traditional machine learning methods. In this paper, we proposed a new deep neural network-based model named DeepBindRG to predict the binding affinity of protein-ligand complex, which learns all the effects, binding mode, and specificity implicitly by learning protein-ligand interface contact information from a large protein-ligand dataset. During the initial data processing step, the critical interface information was preserved to make sure the input is suitable for the proposed deep learning model. While validating our model on three independent datasets, DeepBindRG achieves root mean squared error (RMSE) value of pKa (-logKd or -logKi) about 1.6-1.8 and R value around 0.5-0.6, which is better than the autodock vina whose RMSE value is about 2.2-2.4 and R value is 0.42-0.57. We also explored the detailed reasons for the performance of DeepBindRG, especially for several failed cases by vina. Furthermore, DeepBindRG performed better for four challenging datasets from DUD.E database with no experimental protein-ligand complexes. The better performance of DeepBindRG than autodock vina in predicting protein-ligand binding affinity indicates that deep learning approach can greatly help with the drug discovery process. We also compare the performance of DeepBindRG with a 4D based deep learning method "pafnucy", the advantage and limitation of both methods have provided clues for improving the deep learning based protein-ligand prediction model in the future.Entities:
Keywords: Deep neural network; Drug design; Native-like protein–ligand complex; Protein–ligand binding affinity; ResNet
Year: 2019 PMID: 31380152 PMCID: PMC6661145 DOI: 10.7717/peerj.7362
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The workflow of model training and testing.
Figure 2The architecture of our ResNet model.
The performance of the ResNet regression model DeepBindRG, Autodock Vina, and Pafnucy.
| Data set | MAE | MSE | RMSE | MAPE | sMAPE | Size | |
|---|---|---|---|---|---|---|---|
| DeepBindRG performance | |||||||
| Training set | 0.6779 | 1.1153 | 1.9896 | 1.4105 | 21.5282 | 8.8678 | 13,500 |
| Validation set | 0.5829 | 1.2067 | 2.267 | 1.5057 | 22.7713 | 9.6429 | 1,000 |
| Testing set | 0.5993 | 1.2049 | 2.241 | 1.497 | 22.4016 | 9.5895 | 925 |
| CASF-2013 | 0.6394 | 1.4829 | 3.3015 | 1.817 | 28.8105 | 11.9433 | 195 |
| CSAR_HiQ_NRC_set | 0.6585 | 1.3607 | 2.9719 | 1.7239 | 63.0363 | 11.1805 | 343 |
| Astex_diverse_set | 0.4657 | 1.3355 | 2.6274 | 1.6209 | 20.7896 | 9.9863 | 74 |
| Autodock Vina performance | |||||||
| CASF-2013 | 0.5725 | 1.9462 | 5.7647 | 2.401 | 38.1536 | 14.2026 | 195 |
| CSAR_HiQ_NRC_set | 0.5707 | 1.7268 | 5.237 | 2.2884 | 52.8847 | 13.89 | 343 |
| ASTEX_diverse_set | 0.422 | 1.7068 | 4.8518 | 2.2027 | 27.0829 | 11.7127 | 74 |
| Pafnucy performance | |||||||
| CASF-2013 | 0.5855 | 1.5131 | 3.4192 | 1.8491 | 30.979 | 11.784 | 195 |
| CSAR_HiQ_NRC_set | 0.7167 | 1.2419 | 2.4787 | 1.5744 | 54.5188 | 9.973 | 343 |
| Astex_diverse_set | 0.5146 | 1.1732 | 2.1473 | 1.4654 | 19.6549 | 8.4168 | 74 |
Figure 3Predictions for three extra validation sets (A, CASF-2013; B, astex_diverse_set; C, CSAR_HiQ_NRC_set).
The selected cases that DeepBindRG had significant better performance than the vina score in the CASF-2013 data set.
| PDBID | Experimental affinity | Vina score | DeltaG_vina | DeepBindRG predicted affinity | DeltaG_DeepbindRG |
|---|---|---|---|---|---|
| 2yki | 9.46 | 16.1137 | 6.6537 | 8.1597 | 1.3003 |
| 4dew | 7 | 0.6853 | 6.3147 | 5.8712 | 1.1288 |
| 3acw | 4.76 | 10.2398 | 5.4798 | 6.2444 | 1.4844 |
| 3n86 | 5.64 | 10.9262 | 5.2862 | 6.1666 | 0.5266 |
| 1gpk | 5.37 | 10.1323 | 4.7623 | 6.3859 | 1.0159 |
| 3e93 | 8.85 | 13.3378 | 4.4878 | 7.3438 | 1.5062 |
| 3g2n | 4.09 | 8.5575 | 4.4675 | 4.9792 | 0.8892 |
| 3su2 | 7.35 | 11.6916 | 4.3416 | 6.9883 | 0.3617 |
| 1nvq | 8.25 | 12.5577 | 4.3077 | 6.6401 | 1.6099 |
| 3coy | 6.02 | 10.2338 | 4.2138 | 5.9635 | 0.0565 |
| Vina MAPE | 79.9297 | ||||
| Vina sMAPE | 15.3444 | ||||
| Vina correlation | 0.4362 | ||||
| DeepBindRG MAPE | 29.3784 | ||||
| DeepBindRG sMAPE | 7.5111 | ||||
| DeepBindRG correlation | 0.8519 | ||||
Note:
We define the significant better as DeltaG_vina >4, while DeltaG_DeepbindRG <2. The average error and correlation coefficient are provided below the table.
Figure 4Examples of ligand–protein interaction in the CASF-2013 data set that can be correctly identified by our DeepBindRG, but are not predicted by vina score (DetaG_vina >4, while DetaG_DeepbindRG <2).
Among them, the affinity of 4dew is underestimated, while all other nine cases are overestimated. The vina score seems to overestimate pi–pi interaction (A, 1gpk; B, 1nvq; C, 2yki; D, 2acw; F, 3e93) hydrophobic interaction (I, 3su2) and hydrogen bond interaction (E, 3coy; G, 2g2n; H, 3n86), and underestimate polar/electrical interaction, or interaction meditated by water or ion (J, 4edw).
The performance of the DeepBindRG and autodock vina on the datasets from DUD.E database.
| MAE | MSE | RMSE | Size | ||
|---|---|---|---|---|---|
| kith dataset | |||||
| DeepBindRG_X | 0.4742 | 1.823 | 4.2923 | 2.0718 | 57 |
| DeepBindRG_Y | 0.3156 | 1.3382 | 2.6312 | 1.6221 | 1,127 |
| DeepBindRG_Z | 0.5588 | 2.123 | 5.336 | 2.31 | 57 |
| Vina score | 0.6664 | 3.8567 | 15.8536 | 3.9817 | 57 |
| Pafnucy | 0.4673 | 3.2789 | 11.6922 | 3.4194 | 57 |
| Jak2 dataset | |||||
| DeepBindRG_X | −0.028 | 1.1715 | 2.2772 | 1.509 | 107 |
| DeepBindRG_Y | 0.0189 | 1.4913 | 3.2848 | 1.8124 | 2,078 |
| DeepBindRG_Z | −0.0195 | 0.9314 | 1.525 | 1.2349 | 107 |
| Vina score | 0.1037 | 2.1678 | 6.0232 | 2.4542 | 107 |
| Pafnucy | −0.1186 | 1.0141 | 1.5354 | 1.2391 | 107 |
| Egfr dataset | |||||
| DeepBindRG_X | −0.0705 | 1.124 | 2.1048 | 1.4508 | 542 |
| DeepBindRG_Y | −0.0241 | 1.3153 | 2.8598 | 1.6911 | 10,614 |
| DeepBindRG_Z | −0.0314 | 1.043 | 1.7365 | 1.3177 | 542 |
| Vina score | 0.0146 | 2.2055 | 6.5095 | 2.5514 | 542 |
| Pafnucy | 0.1701 | 1.1253 | 1.8209 | 1.3494 | 542 |
| Cdk2 dataset | |||||
| DeepBindRG_X | 0.2205 | 1.0317 | 1.61 | 1.2689 | 474 |
| DeepBindRG_Y | 0.1947 | 1.3589 | 2.5988 | 1.6121 | 9,027 |
| DeepBindRG_Z | 0.2797 | 0.7854 | 0.9238 | 0.9612 | 474 |
| Vina score | 0.0554 | 1.5393 | 3.2222 | 1.795 | 474 |
| Pafnucy | 0.1230 | 0.7346 | 0.8277 | 0.9098 | 474 |
Notes:
DeepBindRG_X*: the top autodock vina predicted conformations were used as the final prediction.
DeepBindRG_Y*: all the autodock vina predicted conformations were used as the ligand–protein complex.
DeepBindRG_Z*: among all the generated conformation, we selected the top predicted value of DeepBindRG as final prediction.
Pafnucy*: among all the generated conformation, we selected the top predicted value of Pafnucy as final prediction.