| Literature DB >> 34901645 |
Lina Dong1, Xiaoyang Qu2, Yuan Zhao3, Binju Wang2.
Abstract
Accurate prediction of protein-ligand binding free energies is important in enzyme engineering and drug discovery. The molecular mechanics/generalized Born surface area (MM/GBSA) approach is widely used to estimate ligand-binding affinities, but its performance heavily relies on the accuracy of its energy components. A hybrid strategy combining MM/GBSA and machine learning (ML) has been developed to predict the binding free energies of protein-ligand systems. Based on the MM/GBSA energy terms and several features associated with protein-ligand interactions, our ML-based scoring function, GXLE, shows much better performance than MM/GBSA without entropy. In particular, the good transferability of the GXLE model is highlighted by its good performance in ranking power for prediction of the binding affinity of different ligands for either the docked structures or crystal structures. The GXLE scoring function and its code are freely available and can be used to correct the binding free energies computed by MM/GBSA.Entities:
Year: 2021 PMID: 34901645 PMCID: PMC8655939 DOI: 10.1021/acsomega.1c04996
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Summary of the Data Sets
| Source | numbers | |
|---|---|---|
| traning set | PDBbind refined set (before 2018) | 3511 |
| validation set | PDBbind refined set (after 2018) | 301 |
| test set | CASF-2016 | 285 |
Figure 1Performance of eight ML models on the validation set with three different feature sets (G, G + X, and G + X + L). (A) Pearson’s correlation coefficient (Rp). (B) Mean square error (MSE).
Figure 2Pearson correlation coefficients and mean-squared error between the experimental data and the predicted binding free energies: (A) calculated by MM/GBSA, (B) X-Score, (C) AutoDock Vina, and (D) GXLE on the validation set.
Figure 3Pearson correlation coefficients and mean-squared error between the experimental data and predicted binding free energies: (A) calculated by MM/GBSA, (B) X-score, (C) AutoDock Vina, and (D) GXLE on the test set CASF-2016.
Figure 4Performance of scoring functions on the CASF-2016 benchmark. (A) Scoring power measured by the Pearson correlation coefficient and (B) ranking power measured by the Spearman correlation coefficient. GXLE’s performances are colored orange and other scoring functions’ performances are blue.
Figure 5Interaction between the ligand and the protein in the crystal structure 4D0L. (A) Interaction analysis of the binding pocket, brown for the hydrophobic part and blue for the hydrophilic part. (B) Interaction of the important residues, green for hydrogen bonding and pink for hydrophobicity. (C) Molecular formulas of six inhibitors. The ligand c is contained in the crystal structure (PDB id: 4D0L), while the other ligands (a, b, d, e, and f) are docked to the pocket of the target.
Evaluation of the Ranking Power of Selected Scoring Functions Using a Set of PI4KIIIβ Inhibitors
| ID | IC50 (nM) | GXLE | GBSA | X-score | D-score | PMF-score | G-score | ChemScore | Vina |
|---|---|---|---|---|---|---|---|---|---|
| a | 0.98 | –10.09 | –46.05 | 6.37 | –160.81 | –50.87 | –127.24 | –24.44 | –5.12 |
| b | 6.1 | –9.95 | –52.02 | 6.35 | –150.30 | –54.91 | –125.00 | –27.58 | –4.46 |
| c | 19 | –9.83 | –44.94 | 6.10 | –47.85 | 91.31 | –149.75 | –7.71 | –4.47 |
| d | 220 | –9.68 | –57.55 | 6.30 | –190.07 | –76.92 | –248.21 | –25.67 | –4.77 |
| e | 316 | –6.31 | –29.61 | 5.70 | –86.48 | –66.81 | –162.86 | –23.89 | –6.17 |
| f | 1250 | –3.80 | –20.36 | 5.00 | –83.48 | –31.16 | –125.92 | –12.76 | –2.87 |
Performances of GXLE, MM/GBSA, X-Score, AutoDock Vina, and PSH-ML Evaluated against a Set Consisting of 10 Selected Diverse Biological Targetsa
| method | GXLE | MM/GBSA | X-score | AutoDock
Vina | PSH-ML | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| test target | number | ||||||||||
| BACE-1 | 73 | 0.833 | 0.746 | 0.836 | 0.783 | 0.810 | 0.711 | 0.660 | 0.710 | 0.836 | 0.752 |
| CHK1 | 15 | 0.777 | 0.737 | 0.687 | 0.293 | 0.736 | 0.564 | 0.876 | 0.787 | 0.924 | 0.755 |
| DPP4 | 13 | 0.467 | 0.399 | 0.197 | 0.250 | 0.456 | 0.285 | 0.191 | 0.316 | 0.711 | 0.301 |
| ER | 7 | 0.844 | 0.857 | 0.794 | 0.786 | 0.642 | 0.857 | 0.601 | 0.750 | 0.391 | 0.571 |
| LTA-4H | 22 | 0.770 | 0.873 | 0.700 | 0.749 | 0.774 | 0.859 | 0.606 | 0.613 | 0.769 | 0.894 |
| P38a | 18 | 0.793 | 0.780 | 0.689 | 0.706 | 0.814 | 0.851 | 0.765 | 0.764 | 0.849 | 0.685 |
| PPAR | 11 | 0.750 | 0.645 | 0.485 | 0.600 | 0.756 | 0.581 | 0.734 | 0.573 | 0.528 | 0.509 |
| PTP1B | 14 | 0.675 | 0.737 | –0.292 | –0.189 | 0.713 | 0.724 | –0.025 | 0.070 | 0.526 | 0.530 |
| thrombin | 15 | 0.840 | 0.679 | 0.926 | 0.821 | 0.823 | 0.707 | 0.621 | 0.546 | 0.874 | 0.732 |
| renin | 22 | 0.611 | 0.514 | –0.165 | 0.225 | 0.398 | 0.463 | 0.187 | 0.213 | 0.601 | 0.423 |
| average values | 0.736 | 0.697 | 0.486 | 0.502 | 0.692 | 0.660 | 0.522 | 0.534 | 0.701 | 0.615 | |
BACE-1, β-secretase 1; CHK1, serine/threonine-protein kinase chk1; DPP4, dipeptidyl peptidase 4; ER, estrogen receptor; LTA-4H, leukotriene A-4 hydrolase; P38a, mitogen-activated protein kinase 14; PPAR-γ, peroxisome proliferator-activated receptor; and PTP1B, protein tyrosine phosphatase 1B.