| Literature DB >> 31392430 |
Pin Chen1, Yaobin Ke1, Yutong Lu1, Yunfei Du1, Jiahui Li1, Hui Yan1, Huiying Zhao2, Yaoqi Zhou3, Yuedong Yang4.
Abstract
Performance of structure-based molecular docking largely depends on the accuracy of scoring functions. One important type of scoring functions are knowledge-based potentials derived from known three-dimensional structures of proteins and/or protein-ligand complex structures. This study seeks to improve a knowledge-based protein-ligand potential based on a distance-scale finite ideal-gas reference (DFIRE) state (DLIGAND) by expanding the representation of protein atoms from 13 mol2 atom types to 167 residue-specific atom types, and employing a recently updated dataset containing 12,450 monomer protein chains for training. We found that the updated version DLIGAND2 has a consistent improvement over DLIGAND in predicting binding affinities for either native complex structures or docking-generated poses. More importantly, DLIGAND2 has a 52% increase over DLIGAND in enrichment factors in top 1% predictions based on the DUD-E decoy set, and consistently improves over Autodock Vina and other statistical energy functions in all three benchmark tests. We further found that DLIGAND2 outperforms empirical and machine-learning methods compared for virtual screening on new targets that are not homologous to the DUD-E training set. Given the best performance as a parameter-free statistical potential and among the best in all performance measures, DLIGAND2 should be useful for re-assessing the poses generated by docking software, or acting as one term in other scoring functions. The program is available at https://github.com/sysu-yanglab/DLIGAND2 .Entities:
Keywords: Docking; Knowledge-based energy function; Protein–ligand interaction
Year: 2019 PMID: 31392430 PMCID: PMC6686496 DOI: 10.1186/s13321-019-0373-4
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1The atomic interaction potentials a between ligand type S.3 and main-chain O atom of ASP, or ARG in DLIGAND2, or their common mol2 atom type (O.2) by the DLIGAND, and b between ligand type N.am and atom CB of GLU, or CE of LYS, or their common mol2 type “C.3” by DLIGAND, as a function of distance
Fig. 2Comparison between theoretically predicted and experimentally measured protein–ligand binding free energies for 195 complexes on the CASF-2013 testing set for a DLIGAND with a correlation coefficient of 0.526 and b DLIGAND2 with a correlation coefficient of 0.572. The solid line is from the regression fit
Comparisons of 30 scoring functions on the CASF-2013 dataset
| Function | PCC | RMSE | Description | Year |
|---|---|---|---|---|
| RF-Score-v2 | 0.803a | 1.54 | Machine learning | 2014 |
| ID-Score | 0.753b | 1.63 | Descriptor-based and empirical | 2013 |
| ΔvinaRF20 | 0.686c | 1.64 | Machine learning | 2016 |
| AutoDockHybrid | 0.64 | n.a. | Force fields and machine learning | 2016 |
| X-ScoreHM | 0.614 | 1.78 | Empirical | 2002 |
| ΔSASA | 0.606 | 1.79 | Empirical | 2014 |
| ChemScore@SYBYL | 0.592 | 1.82 | Empirical | 1998 |
| ChemPLP@GOLD | 0.579 | 1.84 | Empirical | 2009 |
|
|
|
|
|
|
| SMoG2016 | 0.57d | 1.68 | Knowledge-based and empirical | 2016 |
| PLP1@DS | 0.568 | 1.86 | Empirical | 2000 |
| AutoDock Vina | 0.563e | 1.87 | Knowledge-based and empirical | 2010 |
| G-Score@SYBYL | 0.558 | 1.87 | Energy-based | 1997 |
| ASP@GOLD | 0.556 | 1.88 | Statistical potential | 2005 |
| ASE@MOE | 0.544 | 1.89 | Empirical | n.a. |
| ChemScore@GOLD | 0.536 | 1.90 | Empirical | 2003 |
|
|
|
|
|
|
| D-Score@SYBYL | 0.526 | 1.92 | Energy-based | 2001 |
| Alpha-HB@MOE | 0.511 | 1.94 | Empirical | n.a. |
| LUDI3@DS | 0.487 | 1.97 | Empirical | 1998 |
| GoldScore@GOLD | 0.483 | 1.97 | Energy-based | 1997 |
| Affinity-dG@MOE | 0.482 | 1.98 | Empirical | n.a. |
| LigScore2@DS | 0.456 | 2.02 | Empirical | 2005 |
| GlideScore-SP | 0.452 | 2.03 | Energy-based | 2006 |
| SMoG2001 | 0.418 | 3.39 | Knowledge-based | 2001 |
| Jain@DS | 0.408 | 2.05 | Empirical | 2006 |
| PMF@DS | 0.364 | 2.11 | Statistical potential | 2006 |
| GlideScore-XP | 0.277 | 2.18 | Energy-based | 2004 |
| London-dG@MOE | 0.242 | 2.19 | Empirical | n.a. |
| PMF@SYBYL | 0.221 | 2.20 | Statistical potential | 1999 |
The results for 23 scoring functions were collected from Li [5], the results for RF-score-v2, ID-score, ΔvinaRF20 and SMoG2016 (labeled as a, b, c, d) were collected from Ballester [57], Li [20], Wang [58] and Theau [31], separately, and the results for DLIGAND2, Autodock Vina, and DLIGAND were calculated with default options by ourselves
n.a. not available
Success rates for the evaluation of docking power ranked by top three poses
| Scoring function | Success rates (%) | ||
|---|---|---|---|
| The top pose | Top two poses | Top three poses | |
| ChemPLP@GOLD | 81 | 86.7 | 89.7 |
| ChemScore@GOLD | 77.9 | 83.1 | 88.2 |
| GlideScore-SP | 78.5 | 85.6 | 87.7 |
| ASP@GOLD | 71.8 | 81.5 | 87.2 |
| LigScore2@DS | 76.9 | 84.1 | 86.7 |
| PLP1@DS | 77.4 | 84.1 | 86.2 |
| PLP2@DS | 74.4 | 81.5 | 86.2 |
| Alpha-HB@MOE | 75.4 | 82.6 | 86.2 |
| GoldScore@GOLD | 71.3 | 81 | 85.6 |
| GlideScore-XP | 74.4 | 82.6 | 85.6 |
| LUDI1@DS | 59 | 75.4 | 83.1 |
| LUDI2@DS | 65.6 | 75.4 | 81.5 |
| LigScore1@DS | 65.1 | 74.9 | 81 |
| Affinity-dG@MOE | 63.1 | 74.9 | 81 |
| London-dG@MOE | 59.5 | 73.8 | 78.5 |
| X-ScoreHM | 61 | 73.3 | 77.9 |
| ChemScore@SYBYL | 59.5 | 69.2 | 75.4 |
| X-Score | 59.5 | 69.2 | 75.4 |
|
|
|
|
|
| X-ScoreHP | 54.4 | 67.7 | 73.8 |
| LUDI3@DS | 48.7 | 65.1 | 72.8 |
| GScore@SYBYL | 45.1 | 61.5 | 72.3 |
| X-ScoreHS | 54.4 | 66.7 | 72.3 |
| Jain@DS | 48.2 | 62.1 | 70.8 |
| PMF@SYBYL | 51.8 | 60 | 66.7 |
| PMF04@DS | 51.8 | 62.6 | 66.2 |
| ASE@MOE | 51.3 | 60 | 63.6 |
| PMF@DS | 44.1 | 52.3 | 60 |
|
|
|
|
|
| dSAS | 21.5 | 33.3 | 45.1 |
| DScore@SYBYL | 18.5 | 29.7 | 42.6 |
Results (excluding DLIGAND2 and DIGAND) cited from Li [5]. The RMSD value between one best-scored binding pose and the native binding pose is less than 2.0 Å
Success rates (%) for the evaluation of ranking power ranked by high-level results on optimized structures
| Score function | Success rates (%) on crystal structures | Success rates (%) on optimized structures | ||
|---|---|---|---|---|
| High-level | Low-level | High-level | Low-level | |
| X-ScoreHM | 58.5 | 72.3 | 56.9 | 73.8 |
| ChemScore@SYBYL | 53.8 | 67.7 | 52.3 | 69.2 |
| D-Score@SYBYL | 49.2 | 63.1 | 52.3 | 63.1 |
| LigScore1@DS | 52.3 | 61.5 | 50.8 | 63.1 |
| ΔSAS | 49.2 | 67.7 | 50.8 | 69.2 |
|
|
|
|
|
|
| PLP2@DS | 55.4 | 72.3 | 47.7 | 67.7 |
| Alpha-HB@MOE | 52.3 | 66.2 | 47.7 | 64.6 |
| ChemPLP@GOLD | 58.5 | 72.3 | 46.2 | 61.5 |
| G-Score@SYBYL | 52.3 | 72.3 | 46.2 | 61.5 |
|
|
|
|
|
|
| PMF@DS | 49.2 | 66.2 | 46.2 | 63.1 |
| LUDI1@DS | 52.3 | 69.2 | 44.6 | 66.2 |
| Jain@DS | 41.5 | 58.5 | 44.6 | 63.1 |
| GoldScore@GOLD | 55.4 | 76.9 | 43.1 | 66.2 |
| ASE@MOE | 40 | 64.6 | 43.1 | 63.1 |
| London-dG@MOE | 43.1 | 60 | 40 | 60 |
| ASP@GOLD | 47.7 | 72.3 | 38.5 | 60 |
| Affinity-dG@MOE | 53.8 | 66.2 | 36.9 | 50.8 |
| ChemScore@GOLD | 46.2 | 63.1 | 33.8 | 53.8 |
| GlideScore-XP | 35.4 | 47.7 | 32.3 | 46.2 |
| PMF@SYBYL | 43.1 | 61.5 | 30.8 | 53.8 |
| GlideScore-SP | 43.1 | 56.9 | 21.5 | 38.5 |
Results (excluding DLIGAND2 and DIGAND) cited from Li [5]
Pearson correlation coefficients and root mean squared error between experimental binding affinity and binding affinity predicted by DLIGAND, DLIGAND2, and X-Score using docking poses generated by eight docking programs along with the results from the docking programs
| Docking program | Pearson correlation coefficient | Root mean squared error | ||||||
|---|---|---|---|---|---|---|---|---|
| Self | DLIGAND | DLIGAND2 | X-Score | Self | DLIGAND | DLIGAND2 | X-Score | |
| AutoDock | 0.404 | 0.465 | 0.537 | 0.547 | 1.91 | 1.77 | 1.69 | 1.68 |
| AutoDock Vina | 0.501 | 0.459 | 0.519 | 0.536 | 1.74 | 1.78 | 1.72 | 1.69 |
| rDock | 0.102 | 0.463 | 0.535 | 0.507 | 2.01 | 1.78 | 1.70 | 1.76 |
| LeDock | 0.426 | 0.457 | 0.532 | 0.54 | 1.82 | 1.78 | 1.69 | 1.69 |
| UCSF DOCK | 0.195 | 0.427 | 0.498 | 0.488 | 1.97 | 1.81 | 1.74 | 1.76 |
| iDock | 0.485 | 0.461 | 0.522 | 0.54 | 1.75 | 1.78 | 1.71 | 1.69 |
| GalaxyDock | 0.487 | 0.464 | 0.537 | 0.532 | 1.75 | 1.78 | 1.69 | 1.71 |
| iGEMDOCK | 0.384 | 0.444 | 0.501 | 0.502 | 1.85 | 1.80 | 1.76 | 1.82 |
| Average | 0.373 | 0.455 | 0.523 | 0.524 | 1.85 | 1.79 | 1.71 | 1.73 |
The performance of six scoring functions on the DUD-E dataset
| Scoring functions | LogAUC(%) | EF1% | EF5% | EF10% |
|---|---|---|---|---|
| DLIGAND2 |
|
| 3.31 | 2.55 |
| AutoDock Vina | 9.96 | 5.12 |
|
|
| ΔvinaRF20 | 9.00 | 6.38 |
| 2.58 |
| DLIGAND | 7.61 | 4.40 | 2.74 | 2.23 |
| X-ScoreHM | 7.25 | 4.06 | 2.68 | 2.19 |
| ID-Score | 2.47 | 1.61 | 1.42 | 1.36 |
The highest values in each column are labeled italics
Enrichment factor values (EF1%) by DLIGAND2, AutoDock Vina, ΔvinaRF20, DLIGAND, X-ScoreHM, ID-Score on eight protein categories
| DLIGAND2 | Vina | ΔvinaRF20 | DLIGAND | X-ScoreHM | ID-Score | |
|---|---|---|---|---|---|---|
| Cytochrome P450 |
| 1.93 | 3.77 | 5.10 | 3.59 | 0.56 |
| GPCR |
| 2.48 | 4.17 | 3.79 | 1.56 | 1.49 |
| Ion channel | 0.84 |
| 3.48 | 0.51 | 0.00 | 0.83 |
| Kinase |
| 6.10 | 7.50 | 4.41 | 5.93 | 1.04 |
| Miscellaneous | 6.89 | 5.65 |
| 6.63 | 4.46 | 5.36 |
| Nuclear receptors | 5.42 | 9.14 |
| 4.57 | 4.01 | 1.69 |
| Other enzymes | 3.23 | 3.88 |
| 2.52 | 2.64 | 1.21 |
| Protease |
| 4.63 | 6.99 | 8.65 | 5.58 | 2.57 |
| Average |
| 4.79 | 6.01 | 4.52 | 3.47 | 1.84 |
Italic fonts highlight the highest value in each category
Fig. 3Receiver operating characteristic (ROC) curves for the target PTN1 protein by different scoring methods
Fig. 4The average EF1% in the DEKOIS 2.0 benchmark over the number of targets sorted according to their increasing sequence identity (seqid) by blastpgp from the DUD-E targets