| Literature DB >> 33430964 |
Noé Sturm1, Andreas Mayr2, Thanh Le Van3, Vladimir Chupakhin4, Hugo Ceulemans3, Joerg Wegner3, Jose-Felipe Golib-Dzib5, Nina Jeliazkova6, Yves Vandriessche7, Stanislav Böhm8, Vojtech Cima8, Jan Martinovic8, Nigel Greene9, Tom Vander Aa10, Thomas J Ashby10, Sepp Hochreiter2, Ola Engkvist11, Günter Klambauer12, Hongming Chen13.
Abstract
Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.Entities:
Keywords: Big data; ChEMBL; Cheminformatics; Deep learning; Machine learning; Prospective evaluation; PubChem; QSAR; Retrospective evaluation; Structure-based virtual screening
Year: 2020 PMID: 33430964 PMCID: PMC7169028 DOI: 10.1186/s13321-020-00428-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Compound distributions across the targets for the AstraZeneca and the Janssen dataset, respectively. In the lower panel, the y-axis shows the number of compounds for targets represented by the x-axis, where the targets are sorted according to the number of compounds. The horizontal dashed line represents the maximum number of compounds per target observed in the datasets. In the upper panel, a point represents the activity ratio of a target; targets are sorted the same way as in the lower panel. The curve in the upper panel is a smooth average
Contingency tables ExCAPE-ML vs. company datasets
| AstraZeneca | Janssen | ||||||
|---|---|---|---|---|---|---|---|
| Active | Inactive | Sum | Active | Inactive | Sum | ||
| ExCAPE-ML | Active | 0.598 | 0.043 | 0.64 | 0.422 | 0.030 | 0.45 |
| Inactive | 0.077 | 0.283 | 0.36 | 0.013 | 0.535 | 0.55 | |
| Sum | 0.67 | 0.33 | 1.00 | 0.44 | 0.56 | 1.00 | |
Contingency tables for labels of ExCAPE-ML compounds being available also in the company datasets. The values are relative frequencies of the number of target-compound labels being characterized as active/inactive by ExCAPE-ML and the respective company dataset
Fig. 2Prospective and Retrospective Model Evaluation with three folds (A, B, C). White and colored circles in the Figure represent clusters of compounds, where the size of the circles indicates the cluster sizes (nr. of compounds in the clusters). Colors indicate folds, to which clusters are assigned to, where white circles indicate folds, which are not used for building or evaluating a particular model. In stage 1, the inner loop, one of the three folds serves as the training set, one serves as a test set and the third one is kept aside as a test set for Stage 2a, the outer loop. The respective inner folds used in Stage 1 are merged to training sets for Stage 2a, the retrospective model testing stage. All folds together are merged to the training set for obtaining full-scale models in Stage 2b, the prospective model testing stage. Stage 1 is used for hyperparameter selection of Stage 2a and hyperparameter selection of Stage 2b. For retrospective model testing (Stage 2a) the two respective performance values (Perf X.Y) are averaged in each outer loop iteration step and the hyperparameter setting with the best ROC-AUC value is used for training models in Stage 2a, which finally gives performance values (Perf X) for retrospective model testing. For prospective model testing (Stage 2b) all six performance values (Perf X.Y) of the inner loop are averaged for hyperparameter selection. A final trained model on all data is then evaluated on AstraZeneca and Janssen industrial datasets
Retrospective evaluation performance
| Metric | Null Hyp.: DNN AUC < row AUC | ||||
|---|---|---|---|---|---|
| Algorithm | ROC-AUC | Kappa | F1 | Wilcoxon test | Sign test |
| DNN | 0.83 ± 0.11 | 0.39 ± 0.23 | 0.58 ± 0.30 | ||
| XGB | 0.81 ± 0.11 | 0.36 ± 0.21 | 0.56 ± 0.30 | 8.01e−48 | 7.90e−50 |
| MF | 0.78 ± 0.11 | 0.15 ± 0.20 | 0.45 ± 0.34 | 1.80e−71 | 1.14e−84 |
Retrospective evaluation performance values (mean and standard deviation across targets) for the considered machine learning algorithms together with p-values of tests comparing the ROC-AUC of the respective algorithm in the table row with that of DNNs
Fig. 3ROC-AUC, Kappa and F1-score performances of DNN, XGB and MF models on the ExCAPE-ML dataset. Violin plots illustrate the distribution of individual target performances, boxplots represent the interquartile range, with median value in transparent and average as the horizontal black segment
Prospective evaluation performance
| Algorithm | Metric | AstraZeneca | Janssen |
|---|---|---|---|
| DNN | ROC-AUC | 0.70 ± 0.14 | 0.66 ± 0.16 |
| Kappa | 0.20 ± 0.19 | 0.15 ± 0.19 | |
| F1 | 0.42 ± 0.26 | 0.43 ± 0.24 | |
| XGB | ROC-AUC | 0.67 ± 0.15 | 0.64 ± 0.15 |
| Kappa | 0.13 ± 0.17 | 0.10 ± 0.17 | |
| F1 | 0.35 ± 0.25 | 0.39 ± 0.27 | |
| MF | ROC-AUC | 0.68 ± 0.15 | 0.64 ± 0.15 |
| Kappa | 0.12 ± 0.15 | 0.09 ± 0.14 | |
| F1 | 0.35 ± 0.29 | 0.38 ± 0.30 |
Prospective evaluation performance values (mean and standard deviation across targets) for the considered machine learning algorithms
Fig. 4ROC-AUC, Kappa and F1-score performances of DNN, XGB and MF models on AstraZeneca and Janssen datasets. Violin plots illustrate the distribution of individual target performances, boxplots represent the interquartile range, with median value in transparent and average as the horizontal black segment
Prospective Performance Comparison
| Null Hyp.: DNN AUC < row AUC | ||||
|---|---|---|---|---|
| Wilcoxon test | Sign test | |||
| Algorithm | AstraZeneca | Janssen | AstraZeneca | Janssen |
| XGB | 7.17e−15 | 2.62e−15 | 1.27e−14 | 6.84e−14 |
| MF | 4.01e−09 | 3.23e−11 | 2.98e−08 | 3.04e−11 |
p-values of comparing the respective algorithm prospective ROC-AUC evaluation performance in the table row with that of DNNs using two different statistical tests
Fig. 5Target family breakdown for ExCAPE-ML, AstraZeneca and Janssen predictions. The numbers on the horizontal axis represent the number of targets corresponding to the target family and dataset. The vertical axis represents the AUC value
Target Family Performance Comparison
| ExCAPE-ML | AstraZeneca | Janssen | ||||
|---|---|---|---|---|---|---|
| Target family | Targets w./sz. | p-value | Target w./sz. | p-value | Targets w./sz. | p-value |
| Oxidoreductase | 25/37 | 7/18 | 3.91e−01 | 16/32 | 3.77e−02 | |
| Transferase | ||||||
| Hydrolase | 63/92 | 21/47 | 6.96e−02 | 36/77 | ||
| Lyase | 4/8 | 2.59e−01 | 0/0 | 4/8 | 2.59e−01 | |
| Isomerase | 4/6 | 1.00e−01 | 0/1 | 1.00e + 00 | 4/6 | 1.00e−01 |
| GPCR Fam. A | 76/94 | 34/70 | 66/93 | |||
| GPCR Fam. B | 3/5 | 2.10e−01 | 0/5 | 1.00e + 00 | 2/5 | 5.39e-01 |
| GPCR Fam. C | 3/5 | 2.10e−01 | 1/5 | 8.68e-01 | 5/5 | |
| Nuclear Hormone Receptor | 14/20 | 9/17 | 7.55e−02 | 13/19 | ||
| Reader | 2/7 | 7.37e−01 | 1/1 | 3.33e−01 | 2/2 | 1.11e−01 |
| Eraser | 7/9 | 5/7 | 4.53e-02 | 4/7 | 1.73e-01 | |
| Writer | 3/3 | 3.70e−02 | 1/1 | 3.33e−01 | 0/1 | 1.00e + 00 |
| Ligand-gated | 3/6 | 3.20e−01 | 1/4 | 8.02e−01 | 2/6 | 6.49e−01 |
| Voltage-gated | 6/12 | 1.78e−01 | 3/9 | 6.23e−01 | 6/12 | 1.78e−01 |
| Primary active | 3/4 | 1.11e−01 | 0/2 | 1.00e + 00 | 2/4 | 4.07e−01 |
| Electrochem. | 7/10 | 1.97e−02 | 2/8 | 8.05e−01 | 3/8 | 5.32e−01 |
| Overall | 364/476 | 162/338 | 225/438 | |||
Number of targets won (w.) by DNNs from a target family, size of target family (sz.) and p-values of binomial tests for each target family class, with the null hypothesis that the probability of being the best method for a certain target is less than 1/3 for DNNs when compared to XGB and MF
p-values below the significance threshold of 0.01 are in italics