| Literature DB >> 32201802 |
Dmitry S Karlov1, Sergey Sosnin1,2, Maxim V Fedorov1,2,3, Petr Popov1,4.
Abstract
In this work, we present graph-convolutional neural networks for the prediction of binding constants of protein-ligand complexes. We derived the model using multi task learning, where the target variables are the dissociation constant (K d), inhibition constant (K i), and half maximal inhibitory concentration (IC50). Being rigorously trained on the PDBbind dataset, the model achieves the Pearson correlation coefficient of 0.87 and the RMSE value of 1.05 in pK units, outperforming recently developed 3D convolutional neural network model K deep.Entities:
Year: 2020 PMID: 32201802 PMCID: PMC7081425 DOI: 10.1021/acsomega.9b04162
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Results of t-SNE mapping of ligand protein interactions represented by SILIRID[46] fingerprints: (left) blue color mark complexes that consist the initial PDBbind refined set while the red one represents the additional data; (right) color scheme is based on protein functions.
Results of graphDelta Evaluation on the CSAR Data Compared to Kdeep and RF-Scorea
| graphDelta | RF-score[ | |||||||
|---|---|---|---|---|---|---|---|---|
| dataset | epochs | MT/ST | RMSE | RMSE | RMSE | |||
| CASP2016 | 500 | true | 0.82 | 1.22 | 0.82 | 1.27 | 0.80 | 1.39 |
| 500 | false | 0.84 | 1.17 | |||||
| 1000 | true | 0.86 | 1.11 | |||||
| 1000 | false | 0.84 | 1.16 | |||||
| 2000 | true | 0.84 | 1.17 | |||||
| 2000 | false | |||||||
| CSAR NRC HiQ set1 | 500 | true | 0.74 | 1.67 | 0.72 | 2.08 | 1.99 | |
| 500 | false | 0.64 | 1.81 | |||||
| 1000 | true | 0.71 | 1.70 | |||||
| 1000 | false | 0.71 | 1.66 | |||||
| 2000 | true | 0.74 | ||||||
| 2000 | false | 0.74 | ||||||
| CSAR NRC HiQ set2 | 500 | true | 0.60 | 1.86 | 0.65 | 1.91 | 1.66 | |
| 500 | false | 0.59 | 1.72 | |||||
| 1000 | true | 0.56 | 1.92 | |||||
| 1000 | false | 0.71 | ||||||
| 2000 | true | 0.64 | 1.73 | |||||
| 2000 | false | 0.71 | 1.53 | |||||
| CSAR12 | 500 | true | 0.52 | 1.16 | 0.37 | 1.59 | 0.46 | 1.00 |
| 500 | false | 0.41 | 1.37 | |||||
| 1000 | true | |||||||
| 1000 | false | 0.54 | 1.11 | |||||
| 2000 | true | 0.52 | 1.10 | |||||
| 2000 | false | 0.48 | 1.14 | |||||
| CSAR14 | 500 | true | 0.72 | 1.40 | 0.61 | 1.75 | ||
| 500 | false | 0.66 | 1.51 | |||||
| 1000 | true | 0.65 | 1.34 | |||||
| 1000 | false | 0.59 | 1.67 | |||||
| 2000 | true | 0.70 | 1.32 | |||||
| 2000 | false | 0.74 | 1.22 | |||||
| average | 500 | true | 0.68 | 1.46 | 0.62 | 1.72 | 1.38 | |
| 500 | false | 0.63 | 1.52 | |||||
| 1000 | true | 0.67 | 1.40 | |||||
| 1000 | false | 0.68 | 1.42 | |||||
| 2000 | true | 0.69 | 1.38 | |||||
| 2000 | false | 0.71 | ||||||
Bold font is used to stress the best correlation coefficient and RMSE for the selected data set.
Figure 4Results of prediction (graphDelta, 2000 epochs, single task) for the CASF2016 data set: (left) histogram of correlation coefficients computed for all targets from CASF2016, (right) the depiction of the prediction results with the trend line.
Results of graphDelta Evaluation on the Data Set Used for FEP and MM-PBSA Evaluation (graphDelta, 2000 Epochs, Multi Task)a
| graphDelta | RF-score | FEP or MM-PBSA | ||||||
|---|---|---|---|---|---|---|---|---|
| subset | RMSE | RMSE | RMSE | RMSE | ||||
| p38 | 0.64 | 1.56 | 0.36 | 1.57 | 0.48 | 1.03 | ||
| PTP1B | 0.46 | 1.22 | 0.58 | 0.93 | 0.26 | 1.22 | ||
| thrombin | 0.39 | 0.74 | 0.58 | 0.08 | 0.71 | 0.93 | ||
| Tyk2 | 0.17 | 1.08 | 0.05 | 1.23 | 0.41 | 0.94 | ||
| Bace | 0.65 | 0.78 | –0.06 | 0.84 | –0.14 | 1.03 | ||
| CDK2 | 0.19 | 1.94 | 1.26 | –0.23 | 1.05 | 0.48 | ||
| JNK1 | 0.33 | 1.53 | 0.69 | 1.18 | 0.5 | 1.00 | ||
| MCL1 | 0.22 | 1.12 | 0.34 | 1.04 | 0.52 | 1.41 | ||
| AMPA | 0.39 | 1.37 | 0.74 | 0.38 | 1.71 | 0.62 | ||
| average | 0.35 | 1.31 | 0.41 | 1.07 | 0.26 | 1.00 | ||
Bold font is used to stress the best correlation coefficient and RMSE for the selected data set. MM-PBSA data are provided for AMPA receptor ligands, while FEP data are provided for other targets.
Figure 2Illustration of the G2 computation.
Figure 3General description of the MPNN forward pass with interaction net architecture.
Figure 5Results of prediction (graphDelta, 2000 epochs, single task) for CSAR data sets.
Figure 6Results of prediction (graphDelta, 2000 epochs, multi task) for data sets used to assess FEP[52] and MM-PBSA[53] performance.
Network Architecture of Message, Update, and Readout Functionsa
| message
function | update function | readout function | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| layer | in | out | BN | in | out | BN | in | out | BN | |
| 1 | 1 | 751 | 200 | yes | 473 | 200 | yes | |||
| 2 | 200 | 100 | yes | 200 | 100 | yes | ||||
| 3 | 100 | 100 | no | 100 | 100 | no | ||||
| 2 | 1 | 205 | 200 | yes | 200 | 200 | yes | |||
| 2 | 200 | 100 | yes | 200 | 100 | yes | ||||
| 3 | 100 | 100 | no | 100 | 100 | no | ||||
| 3 | 1 | 205 | 200 | yes | 200 | 200 | yes | 1100 | 300 | yes |
| 2 | 200 | 100 | yes | 200 | 100 | yes | 300 | 200 | yes | |
| 3 | 100 | 100 | no | 100 | 100 | no | 200 | 100 | yes | |
| 4 | 100 | 2 | no | |||||||
“In” and “Out” means the number of input and output neurons in the current layer, and “BN” denotes the application of the batch normalization layer.