| Literature DB >> 29888034 |
Fatemah Alghamedy1, Jeevith Bopaiah1, Derek Jones1, Xiaofei Zhang1, Heidi L Weiss1, Sally R Ellingson1.
Abstract
Drug discovery is an expensive, lengthy, and sometimes dangerous process. The ability to make accurate computational predictions of drug binding would greatly improve the cost-effectiveness and safety of drug discovery and development. This study incorporates ensemble docking, the use of multiple protein conformations extracted from a molecular dynamics trajectory to perform docking calculations, with additional biomedical data sources and machine learning algorithms to improve the prediction of drug binding. We found that we can greatly increase the classification accuracy of an active vs a decoy compound using these methods over docking scores alone. The best results seen here come from having an individual protein conformation that produces binding features that correlate well with the active vs. decoy classification, in which case we achieve over 99% accuracy. The ability to confidently make accurate predictions on drug binding would allow for computational polypharamacological networks with insights into side-effect prediction, drug-repurposing, and drug efficacy.Entities:
Year: 2018 PMID: 29888034 PMCID: PMC5961778
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
The total number of features in each category for the base dataset, the best 100 features for the whole dataset, and the best features for each protein conformation (Conf 1-7).
| Category | Base | Best | Conf 1 | Conf 2 | Conf 3 | Conf 4 | Conf 5 | Conf 6 | Conf 7 |
|---|---|---|---|---|---|---|---|---|---|
| Drugs | 2,792 | 99 | 10 | 54 | 20 | 7 | 5 | 10 | 7 |
| Protein | 103 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Binding | 109 | 1 | 0 | 1 | 0 | 8 | 0 | 0 | 8 |
| The total | 3004 | 100 | 10 | 55 | 20 | 15 | 5 | 10 | 15 |
Metrics used in this study
| Name | Definition | Formula |
|---|---|---|
| Youden’s index | Performance of dichotomous test. The value 1 indicates a perfect test and -1 indicates a useless test. | |
| AUC | Area under the curve for the ROC curve. The probability | |
| Accuracy | Percent of correct predictions | |
| F1 | Harmonic mean of precision and recall | |
| Precision | Positive predictive value | |
| Recall | True positive rate |
Youden’s Index for each models.
| Value | Model 1 | Model 2 | Model 3 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Conf 1 | Conf 2 | Conf 3 | Conf 4 | Conf 5 | Conf 6 | Conf 7 | |||
| Docking Score cut-off | -9.2 | -8.2 | -8.5 | -2.8 | -8.3 | -8 | 37.4 | -8.1 | -8.9 |
| Best J Value | 0.1600 | 0.0829 | 0.0832 | 0.1052 | 0.1142 | 0.2103 | 0.0125 | 0.0874 | 0.1283 |
Model 1. Conf = conformation; MD = molecular docking; ML = machine learning model; Acc. = Accuracy; Prec. = Precision
Model 2. MD = molecular docking; ML = machine learning model; Acc. = Accuracy; Prec. = Precision
Model 2 and 3 using best 100 features
Model 3. MD = molecular docking; ML = machine learning model; Acc. = Accuracy; Prec. = Precision