| Literature DB >> 30712151 |
Thierry Hanser1, Fabian P Steinmetz2, Jeffrey Plante3, Friedrich Rippmann2, Mireille Krier2.
Abstract
In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources can improve the relevance of the training set in regard of the target chemical space, leading to improved performance. Similarly we will demonstrate that combining multiple statistical models together, and with expert systems, can lead to positive synergistic effects when taking into account the confidence in the predictions of the merged systems. The best combinations analyzed display a good hERG predictivity. Finally, this work demonstrates the suitability of the SOHN methodology for building models in the context of receptor based endpoints like hERG inhibition when using the appropriate pharmacophoric descriptors.Entities:
Keywords: Combining models; Expert system; Machine learning; Public–private data sharing; QSAR; SOHN; Temporal study; hERG
Year: 2019 PMID: 30712151 PMCID: PMC6689868 DOI: 10.1186/s13321-019-0334-y
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Idealised illustration of a QT interval prolongation as measured by electrocardiography
Fig. 2Overview of the evaluation experiments. The evaluation was divided into 5 different experiments addressing different ways of combining the prediction models and the training data sources
Derek Nexus performance against Merck test data
| Expert model | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| Derek Nexus | 0.75 | 0.63 | 0.43 | 0.84 | 0.44 | 0.84 | 0.27 | 0.27 |
RF and SOHN trained with ChEMBL data against the Merck test data
| Statistical models (public) | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| RFChEMBL | 0.74 | 0.57 | 0.26 | 0.88 | 0.37 | 0.81 | 0.16 | 0.15 |
| SOHNChEMBL | 0.73 | 0.66 | 0.54 | 0.78 | 0.42 | 0.86 | 0.30 | 0.29 |
RF and SOHN trained with Merck data, performance against the Merck test data
| Statistical models (private) | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| RFMerck | 0.82 | 0.73 | 0.57 | 0.89 | 0.61 | 0.88 | 0.48 | 0.47 |
| SOHNMerck | 0.82 | 0.75 | 0.63 | 0.87 | 0.59 | 0.89 | 0.49 | 0.48 |
Fig. 3Individual models (Derek Nexus, RF and SOHN) using respectively public and private data. The positive impact in performance of using private data versus public data appears clearly in these results. The expert system Derek Nexus is used as a baseline
Combining public and private data for the RF model
| Public + private (RF) | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| 5:0 (100% public) | 0.74 | 0.57 | 0.26 | 0.88 | 0.38 | 0.81 | 0.16 | 0.15 |
| 5:1 | 0.83 | 0.73 | 0.56 | 0.91 | 0.63 | 0.88 | 0.49 | 0.49 |
| 5:2 | 0.83 | 0.73 | 0.56 | 0.91 | 0.63 | 0.88 | 0.49 | 0.49 |
| 5:3 | 0.82 | 0.72 | 0.53 | 0.90 | 0.61 | 0.87 | 0.45 | 0.45 |
| 5:4 | 0.82 | 0.72 | 0.53 | 0.91 | 0.62 | 0.87 | 0.46 | 0.47 |
| 5:5 | 0.83 | 0.73 | 0.56 | 0.91 | 0.64 | 0.88 | 0.49 | 0.49 |
| 4:5 | 0.82 | 0.72 | 0.54 | 0.90 | 0.61 | 0.87 | 0.47 | 0.46 |
| 3:5 | 0.83 | 0.73 | 0.54 | 0.91 | 0.62 | 0.88 | 0.47 | 0.47 |
| 2:5 | 0.84 | 0.75 | 0.59 | 0.91 | 0.65 | 0.89 | 0.52 | 0.51 |
| 1:5 | 0.82 | 0.71 | 0.51 | 0.91 | 0.62 | 0.87 | 0.46 | 0.45 |
| 0:5 (100% private) | 0.82 | 0.73 | 0.57 | 0.89 | 0.61 | 0.88 | 0.48 | 0.47 |
Combining public and private data for the SOHN model
| Public + private (SOHN) | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| 5:0 (100% public) | 0.73 | 0.66 | 0.54 | 0.78 | 0.42 | 0.86 | 0.30 | 0.29 |
| 5.1 | 0.84 | 0.78 | 0.67 | 0.88 | 0.62 | 0.90 | 0.54 | 0.53 |
| 5:2 | 0.83 | 0.78 | 0.69 | 0.87 | 0.60 | 0.91 | 0.53 | 0.53 |
| 5:3 | 0.83 | 0.76 | 0.64 | 0.86 | 0.61 | 0.90 | 0.51 | 0.51 |
| 5:4 | 0.81 | 0.73 | 0.59 | 0.87 | 0.57 | 0.88 | 0.46 | 0.45 |
| 5:5 | 0.83 | 0.74 | 0.59 | 0.89 | 0.61 | 0.88 | 0.49 | 0.48 |
| 4:5 | 0.83 | 0.76 | 0.64 | 0.88 | 0.60 | 0.90 | 0.51 | 0.51 |
| 3:5 | 0.83 | 0.76 | 0.64 | 0.88 | 0.60 | 0.90 | 0.51 | 0.51 |
| 2:5 | 0.93 | 0.76 | 0.63 | 0.89 | 0.62 | 0.89 | 0.52 | 0.52 |
| 1:5 | 0.84 | 0.77 | 0.64 | 0.89 | 0.64 | 0.90 | 0.53 | 0.53 |
| 0:5 (100% private) | 0.82 | 0.75 | 0.63 | 0.87 | 0.59 | 0.89 | 0.49 | 0.48 |
Fig. 4Combining public and private data with different weights
Combining the statistical model RF with the expert system Derek Nexus
| RF + Derek Nexus | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| Pure statistical (RF) | 0.82 | 0.73 | 0.57 | 0.89 | 0.61 | 0.88 | 0.48 | 0.47 |
| Confidence < 0.6 | 0.83 | 0.73 | 0.56 | 0.90 | 0.62 | 0.88 | 0.48 | 0.47 |
| Confidence < 0.7 | 0.83 | 0.73 | 0.54 | 0.92 | 0.64 | 0.88 | 0.49 | 0.49 |
| Confidence < 0.8 | 0.82 | 0.72 | 0.53 | 0.90 | 0.61 | 0.87 | 0.45 | 0.45 |
| Confidence < 0.9 | 0.78 | 0.67 | 0.47 | 0.86 | 0.49 | 0.85 | 0.34 | 0.34 |
| Pure expert (Derek Nexus) | 0.75 | 0.64 | 0.43 | 0.80 | 0.40 | 0.84 | 0.27 | 0.27 |
Combining the statistical model SOHN with the expert system Derek Nexus
| SOHN + Derek Nexus | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| Pure statistical (SOHN) | 0.82 | 0.75 | 0.63 | 0.87 | 0.59 | 0.89 | 0.49 | 0.48 |
| Confidence < 0.6 | 0.83 | 0.75 | 0.60 | 0.90 | 0.63 | 0.89 | 0.51 | 0.51 |
| Confidence < 0.7 | 0.84 | 0.75 | 0.59 | 0.91 | 0.64 | 0.89 | 0.51 | 0.51 |
| Confidence < 0.8 | 0.83 | 0.71 | 0.49 | 0.93 | 0.67 | 0.86 | 0.47 | 0.46 |
| Confidence < 0.9 | 0.81 | 0.69 | 0.47 | 0.91 | 0.60 | 0.86 | 0.42 | 0.41 |
| Pure expert (Derek Nexus) | 0.75 | 0.64 | 0.43 | 0.80 | 0.40 | 0.84 | 0.27 | 0.27 |
Fig. 5Combining statistical models with the expert model
Combining statistical models
| Models | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| RFMerck+ChEMBL (2:5) | 0.84 | 0.75 | 0.59 | 0.91 | 0.65 | 0.89 | 0.52 | 0.51 |
| SOHNMerck+ChEMBL (2:5) | 0.83 | 0.76 | 0.63 | 0.89 | 0.62 | 0.89 | 0.52 | 0.52 |
| RF-SOHNMerck+ChEMBL (2:5) | 0.85 | 0.78 | 0.66 | 0.91 | 0.67 | 0.90 | 0.57 | 0.57 |
Fig. 6Combining statistical models
Combining all the models
| Models | ACC | BA | SENS | SPEC | PPV | NPV | MCC | KAPPA |
|---|---|---|---|---|---|---|---|---|
| Pure expert (Derek) | 0.75 | 0.64 | 0.43 | 0.8 | 0.4 | 0.84 | 0.27 | 0.27 |
| RF-SOHNMerck+ChEMBL (2:5) | 0.85 | 0.78 | 0.66 | 0.91 | 0.67 | 0.9 | 0.57 | 0.57 |
| RF-SOHNMerck+ChEMBL+Derek (2:5) | 0.86 | 0.77 | 0.61 | 0.93 | 0.72 | 0.89 | 0.58 | 0.57 |
Fig. 7Combining all the models and data sources into a single prediction system. We can observe a light gain in performance mainly driven by the conversion of sensitivity into precision