| Literature DB >> 25734685 |
Hossam M Ashtawy, Nihar R Mahapatra.
Abstract
BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25734685 PMCID: PMC4347622 DOI: 10.1186/1471-2105-16-S4-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Multi-layered perceptron, feed-forward neural network used to predict the binding affinity of a protein-ligand complex characterized by a set of features. This model represents SNN-Score, the single neural network scoring function we build.
Figure 2BgN-Score: ensemble neural network SF using bagging approach.
Figure 3BsN-Score: ensemble neural network SF using boosting approach.
The 16 conventional scoring functions and the molecular docking software in which they are implemented
| Scoring function (SF) | Software | Type of SF | Reference |
|---|---|---|---|
| Jain | Discovery Studio | Empirical | [ |
| LigScore | Knowledge based | [ | |
| Ludi | Empirical | [ | |
| PLP | Empirical | [ | |
| PMF | Knowledge based | [ | |
| ChemScore | SYBYL | Empirical | [ |
| D-Score | Force-field based | [ | |
| G-Score | Force-field based | [ | |
| F-Score | Empirical | [ | |
| PMF-Score1 | Knowledge based | [ | |
| ASP | GOLD | Empirical | [ |
| ChemScore2 | Empirical | [ | |
| GoldScore3 | Force-field based | [ | |
| GlideScore | Glide | Empirical | [ |
| DrugScore | Knowledge based | [ | |
| X-Score | Empirical | [ |
1 SYBYL's implementation of PMF
2 GOLD's implementation of ChemScore
3 GOLD's implementation of G-Score
Comparison of the scoring powers of BsN-Score, BgN-Score, SNN-Score, Random Forests (RF), and 16 conventional SFs on the core test set Cr
| Scoring function |
|
|
|
| RMSEtest5 | RMSEtrain6 |
|---|---|---|---|---|---|---|
| BsN-Score::XARG | 195 | 0.816 | 0.799 | 1.38 | 1.386 | 1.366 |
| BgN-Score::XARG | 195 | 0.804 | 0.798 | 1.42 | 1.449 | 1.403 |
| RF::XARG | 195 | 0.801 | 0.790 | 1.43 | 1.498 | 1.442 |
| SNN-Score::X | 195 | 0.675 | 0.685 | 1.76 | 1.760 | 1.704 |
| X-Score::HMScore | 195 | 0.644 | 0.705 | 1.83 | 1.865 | 1.730 |
| DrugScoreCSD | 195 | 0.569 | 0.627 | 1.96 | - | - |
| SYBYL::ChemScore | 195 | 0.555 | 0.585 | 1.98 | - | - |
| DS::PLP1 | 195 | 0.545 | 0.588 | 2.00 | - | - |
| GOLD::ASP | 195 | 0.534 | 0.577 | 2.02 | - | - |
| SYBYL::G-Score | 195 | 0.492 | 0.536 | 2.08 | - | - |
| DS::LUDI3 | 195 | 0.487 | 0.478 | 2.09 | - | - |
| DS::LigScore2 | 193 | 0.464 | 0.507 | 2.12 | - | - |
| GlidScore-XP | 178 | 0.457 | 0.435 | 2.14 | - | - |
| DS::PMF | 193 | 0.445 | 0.448 | 2.14 | - | - |
| GOLD::ChemScore | 178 | 0.441 | 0.452 | 2.15 | - | - |
| SYBYL::D-Score | 195 | 0.392 | 0.447 | 2.19 | - | - |
| DS::Jain | 189 | 0.316 | 0.346 | 2.24 | - | - |
| GOLD::GoldScore | 169 | 0.295 | 0.322 | 2.29 | - | - |
| SYBYL::PMF-Score | 190 | 0.268 | 0.273 | 2.29 | - | - |
| SYBYL::F-Score | 185 | 0.216 | 0.243 | 2.35 | - | - |
1 Number of complexes in Cr with positive (favorable) binding scores using this SF [14].
2 Ris the Pearson correlation coefficient between predicted and measured BA values of complexes in Cr.
3 Ris the Spearman correlation coefficient between predicted and measured BA values of complexes in Cr.
4 SD is the standard deviation of errors between predicted and measured BA values of complexes in Cr based on Equation 3 in [15].
5 RMSE is the root-mean-square of errors between predicted and measured BA values of the test complexes in Cr. Test RMSE is not available for conventional SFs except for X-Score::HMScore that we have re-constructed.
6 RMSE is the root-mean-square of errors between predicted and measured BA values of out-of-sample complexes in the training set Pr. Training RMSE is not available for conventional SFs except for X-Score::HMScore that we have re-constructed.
Comparison of the scoring powers of BsN-Score, BgN-Score, SNN-Score, Random Forests (RF), and the four top performing conventional SFs on four protein-family-specific tests sets.
| HIV protease ( | Trypsin ( | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Scoring function |
|
|
| RMSE4 | D5 | Scoring function |
|
|
| RMSE4 | D5 |
| X-Score::HPScore | 0.341 | 0.339 | 1.54 | 1.509 | N | SYBYL::ChemScore | 0.829 | 0.773 | 0.95 | - | U |
| BsN::XARG | 0.290 | 0.230 | 1.56 | 1.705 | Y | DS::Ludi2 | 0.823 | 0.791 | 0.96 | - | U |
| RF::XARG | 0.289 | 0.219 | 1.519 | 1.719 | Y | X-Score::HSScore | 0.817 | 0.824 | 0.97 | 1.401 | N |
| BgN-Score::XARG | 0.287 | 0.209 | 1.58 | 1.860 | Y | DS::PLP2 | 0.797 | 0.774 | 1.02 | - | U |
| SYBYL::ChemScore | 0.255 | 0.228 | 1.58 | - | U | BgN-Score::XAR | 0.776 | 0.719 | 1.06 | 1.070 | Y |
| DrugScore::PairSurf | 0.225 | 0.170 | 1.59 | - | U | RF::XAR | 0.774 | 0.753 | 1.07 | 1.133 | Y |
| DS::PMF04 | 0.183 | 0.200 | 1.61 | - | U | BsN-Score::AR | 0.766 | 0.709 | 1.08 | 1.119 | Y |
| SNN-Score::X | 0.039 | 0.048 | 1.64 | 2.255 | Y | SNN-Score::X | 0.735 | 0.672 | 1.14 | 1.209 | Y |
| RF::XARG | 0.964 | 0.975 | 0.44 | 0.588 | N | BsN-Score::XARG | 0.937 | 0.920 | 0.59 | 0.678 | N |
| BsN-Score::XARG | 0.918 | 0.922 | 0.64 | 0.710 | N | RF::XARG | 0.934 | 0.08 | 0.60 | 0.657 | N |
| BgN-Score::XARG | 0.848 | 0.808 | 1.02 | 1.024 | N | BgN-Score::XARG | 0.892 | 0.848 | 0.76 | 0.805 | N |
| SNN-Score::X | 0.748 | 0.716 | 1.08 | 1.085 | N | SNN-Score::X | 0.829 | 0.789 | 0.940 | 0.957 | N |
| DS::PLP2 | 0.800 | 0.772 | 0.84 | - | U | SNN-Score::X | 0.756 | 0.704 | 1.38 | 1.433 | Y |
| SYBYL::G-Score | 0.706 | 0.646 | 0.99 | - | U | BgN-Score::XARG | 0.722 | 0.726 | 1.48 | 1.552 | Y |
| SYBYL::ChemScore | 0.699 | 0.631 | 1.00 | - | U | BsN-Score::XARG | 0.699 | 0.637 | 1.58 | 1.603 | Y |
| BsN-Score::X | 0.674 | 0.434 | 1.03 | 3.418 | Y | RF::XARG | 0.697 | 0.693 | 1.52 | 1.674 | Y |
| SNN-Score::X | 0.631 | 0.451 | 1.08 | 3.561 | Y | DS::PLP1 | 0.667 | 0.672 | 1.58 | - | U |
| SYBYL::PMF-Score | 0.627 | 0.618 | 1.09 | - | U | SYBYL::G-Score | 0.667 | 0.626 | 1.58 | - | U |
| BgN-Score::XA | 0.625 | 0.423 | 1.09 | 3.642 | Y | X-Score::HSScore | 0.666 | 0.586 | 1.58 | 1.737 | N |
| RF::XARG | 0.601 | 0.374 | 1.11 | 3.393 | Y | DrugScore::Pair | 0.651 | 0.622 | 1.61 | - | U |
| BsN-Score::XARG | 0.948 | 0.921 | 0.44 | 1.004 | N | BsN-Score::XARG | 0.913 | 0.938 | 0.86 | 1.155 | N |
| RF::XARG | 0.910 | 0.860 | 0.57 | 1.140 | N | RF::XARG | 0.910 | 0.934 | 0.86 | 1.125 | N |
| BgN-Score::XARG | 0.884 | 0.766 | 0.65 | 1.320 | N | BgN-Score::XARG | 0.858 | 0.876 | 1.08 | 1.320 | N |
| SNN-Score::X | 0.652 | 0.310 | 1.05 | 1.687 | N | SNN-Score::X | 0.761 | 0.756 | 1.37 | 1.374 | N |
1 Ris the Pearson correlation coefficient between predicted and measured BA values of complexes in this protein-family-specific test set.
2 Ris the Spearman correlation coefficient between predicted and measured BA values of complexes in this protein-family-specific test set.
3 SD is the standard deviation of errors between predicted and measured BA values of complexes in this protein-family-specific test set.
4 RMSE is the root-mean-square of errors between predicted and measured BA values of the test complexes in in this protein-family-specific test set. Test RMSE is not available for conventional SFs except for X-Score that we have re-constructed. Training RMSE is not reported in this table because the values are very similar to RMSEtrain in Table 1 due to the overlap between the training data sets of the two experiments.
5 This indicates whether the test set complexes are disjoint from (D = Y) or overlap with (D = N) the training set complexes for NN and RF models. Any overlap between the training and test data of the conventional SFs is unknown (D = U) to us.