| Literature DB >> 32881355 |
Thomas Blaschke1, Christian Feldmann1, Jürgen Bajorath1.
Abstract
Compounds with the ability to interact with multiple targets, also called promiscuous compounds, provide the basis for polypharmacological drug discovery. In recent years, a plethora of structural analogs with different promiscuity has been identified. Nevertheless, the molecular origins of promiscuity remain to be elucidated. In this study, we systematically extracted different structural analogs with varying promiscuity using the matched molecular pair (MMP) formalism from public biological screening and medicinal chemistry data. Care was taken to eliminate all compounds with potential false-positive activity annotations from the analysis. Promiscuity predictions were then attempted at the level of compound pairs representing promiscuity cliffs (PCs; formed by analogs with large promiscuity differences) and corresponding non-PC MMPs (analog pairs without significant promiscuity differences). To address this prediction task, different machine learning models were generated and the results were compared with single compound predictions. PCs encoding promiscuity differences were found to contain more structure-promiscuity relationship information than sets of individual promiscuous compounds. In addition, feature analysis was carried out revealing key contributions to the correct prediction of PCs and non-PC MMPs via machine learning.Entities:
Keywords: deep learning; machine learning; multitarget activity; polypharmacology; promiscuity; structure-promiscuity relationships
Mesh:
Year: 2020 PMID: 32881355 PMCID: PMC7816223 DOI: 10.1002/minf.202000196
Source DB: PubMed Journal: Mol Inform ISSN: 1868-1743 Impact factor: 3.353
Reported are the numbers of PCs for different ΔPD thresholds.
|
Data source |
ΔPD |
PCs |
Cpds[a] |
Non‐prom.[b] |
Prom.[c] |
|---|---|---|---|---|---|
|
PubChem screening compounds |
2–3 |
273,862 |
152,971 |
111,399 |
41,572 |
|
4–5 |
57,608 |
46,302 |
35,751 |
10,551 | |
|
6–7 |
18,759 |
17,268 |
13,640 |
3628 | |
|
8–9 |
6403 |
6762 |
5364 |
1398 | |
|
≥10 |
5750 |
5694 |
4706 |
998 | |
|
Kinase inhibitors |
≥10 |
5615 |
4187 |
3588 |
599 |
[a] Number of compounds
[b] Number of non‐promiscuous compounds
[c] Number of promiscuous compounds
Figure 1The network‐based selection strategy is schematically illustrated. Red nodes represent promiscuous compounds in PCs with ΔPD≥10, gray nodes non‐promiscuous PC partners (PD<=1), and edges the formation of pairwise PCs.
Figure 2In (a), the fingerprint representation of a PC is shown. The MMPFP consists of the core fingerprint (common bits) and two substituent fingerprints (unique bits for compound 1 and 2, respectively). (b) illustrates pair‐based similarity assessment combining contributions from individual fingerprint components.
Reported are the ACC, MCC, F1, and ROC AUC values using 1‐NN, k‐NN, SVM, RF, and DNN models predicting PC formation.
|
Data source |
Metric |
1‐NN |
k‐NN |
SVM |
RF |
DNN |
|---|---|---|---|---|---|---|
|
PubChem screening compounds |
ACC |
0.71 |
0.70 |
0.78 |
0.78 |
0.76 |
|
MCC |
0.42 |
0.40 |
0.56 |
0.56 |
0.53 | |
|
F1 |
0.72 |
0.71 |
0.77 |
0.77 |
0.75 | |
|
ROC AUC |
0.71 |
0.77 |
0.85 |
0.86 |
0.84 | |
|
Kinase inhibitors |
ACC |
0.64 |
0.64 |
0.64 |
0.71 |
0.70 |
|
MCC |
0.28 |
0.28 |
0.31 |
0.42 |
0.40 | |
|
F1 |
0.66 |
0.66 |
0.56 |
0.71 |
0.66 | |
|
ROC AUC |
0.64 |
0.64 |
0.72 |
0.78 |
0.77 |
Reported are the ACC, MCC, F1, and ROC AUC values using 1‐NN, k‐NN, SVM, RF, and DNN models predicting promiscuity at the level of single compounds.
|
Data source |
Metric |
1‐NN |
k‐NN |
SVM |
RF |
DNN |
|---|---|---|---|---|---|---|
|
PubChem screening compounds |
ACC |
0.63 |
0.65 |
0.68 |
0.71 |
0.69 |
|
MCC |
0.27 |
0.31 |
0.37 |
0.42 |
0.39 | |
|
F1 |
0.65 |
0.66 |
0.69 |
0.72 |
0.67 | |
|
ROC AUC |
0.63 |
0.71 |
0.76 |
0.79 |
0.77 | |
|
Kinase inhibitors |
ACC |
0.59 |
0.60 |
0.62 |
0.62 |
0.58 |
|
MCC |
0.19 |
0.22 |
0.24 |
0.25 |
0.16 | |
|
F1 |
0.63 |
0.66 |
0.65 |
0.65 |
0.60 | |
|
ROC AUC |
0.59 |
0.63 |
0.64 |
0.64 |
0.60 |
Figure 3In (a), ROC curves are shown for predictions of PCs with ΔPD≥10. In (b), prediction accuracy is compared for distinguishing between PCs with varying ΔPD values and non‐PC MMPs.
Figure 4The graphs report the cumulative fraction of corresponding subsets of eliminated MMPFP features (top: positive, bottom: negative features). Eliminated features were mapped to the different CFP, S1FP, and S2FP components of MMPFPs.