| Literature DB >> 33431012 |
Lindsey Burggraaff1, Herman W T van Vlijmen1,2, Adriaan P IJzerman1, Gerard J P van Westen3.
Abstract
The development of drugs is often hampered due to off-target interactions leading to adverse effects. Therefore, computational methods to assess the selectivity of ligands are of high interest. Currently, selectivity is often deduced from bioactivity predictions of a ligand for multiple targets (individual machine learning models). Here we show that modeling selectivity directly, by using the affinity difference between two drug targets as output value, leads to more accurate selectivity predictions. We test multiple approaches on a dataset consisting of ligands for the A1 and A2A adenosine receptors (among others classification, regression, and we define different selectivity classes). Finally, we present a regression model that predicts selectivity between these two drug targets by directly training on the difference in bioactivity, modeling the selectivity-window. The quality of this model was good as shown by the performances for fivefold cross-validation: ROC A1AR-selective 0.88 ± 0.04 and ROC A2AAR-selective 0.80 ± 0.07. To increase the accuracy of this selectivity model even further, inactive compounds were identified and removed prior to selectivity prediction by a combination of statistical models and structure-based docking. As a result, selectivity between the A1 and A2A adenosine receptors was predicted effectively using the selectivity-window model. The approach presented here can be readily applied to other selectivity cases.Entities:
Keywords: A1 adenosine receptor; A2A adenosine receptor; GPCR; Modeling; QSAR; Selectivity; Selectivity window
Year: 2020 PMID: 33431012 PMCID: PMC7222572 DOI: 10.1186/s13321-020-00438-3
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Dataset characteristics: number of compounds, distribution of activities and chemical similarity within the dataset
| Dataset | Description | Total number of compounds | Activity (pActivity) | Similarity (tanimoto FCFP4) | |||
|---|---|---|---|---|---|---|---|
| Protein | Min | Median | Max | Mean | |||
| A1AR bioactivity dataset | Compounds with measured activity for the A1AR | 2774 | A1AR | 4.05 | 6.43 | 10.52 | 0.18 |
| A2AAR bioactivity dataset | Compounds with measured activity for the A2AAR | 3123 | A2AAR | 4.00 | 6.91 | 11.00 | 0.18 |
| A1AR/A2AAR dataset | Compounds with measured activity for both the A1AR and A2AAR with classification A1AR/A2AAR/dual/non-binder | 1106 | A1AR | 4.33 | 6.52 | 10.52 | 0.19 |
| A2AAR | 4.30 | 6.83 | 10.80 | ||||
| Semi-selective compounds | Compounds with measured activity for both the A1AR and A2AAR that do not fit into a class | 855 | A1AR | 4.37 | 6.37 | 10.02 | 0.20 |
| A2AAR | 4.34 | 7.09 | 10.38 | ||||
Fig. 1Distribution of activities of the different compound classes for the A1AR (a) and A2AAR (b). The compounds from the A1AR and A2AAR bioactivity datasets that did not belong to any of the classes of the A1AR/A2AAR dataset are called “single points”
Performance of selectivity classification models
| Classification model | Class | MCC | Sensitivity | Specificity | PPV | NPV | ROC |
|---|---|---|---|---|---|---|---|
| QSAR 2-class | A1AR | 0.40 ± 0.13 | 0.62 ± 0.16 | 0.76 ± 0.11 | 0.57 ± 0.12 | 0.86 ± 0.07 | 0.87 ± 0.06 |
| A2AAR | 0.40 ± 0.13 | 0.76 ± 0.11 | 0.62 ± 0.16 | 0.86 ± 0.07 | 0.57 ± 0.12 | 0.87 ± 0.06 | |
| QSAR 3-class | A1AR | 0.22 ± 0.15 | 0.25 ± 0.15 | 0.96 ± 0.02 | 0.31 ± 0.14 | 0.93 ± 0.02 | 0.76 ± 0.06 |
| A2AAR | 0.20 ± 0.17 | 0.33 ± 0.16 | 0.88 ± 0.02 | 0.35 ± 0.14 | 0.83 ± 0.04 | 0.62 ± 0.09 | |
| Dual | 0.10 ± 0.09 | 0.81 ± 0.03 | 0.29 ± 0.10 | 0.75 ± 0.02 | 0.35 ± 0.08 | 0.58 ± 0.06 | |
| QSAR 3-class | A1AR | 0.00 ± 0.07 | 0.11 ± 0.05 | 0.88 ± 0.07 | 0.10 ± 0.06 | 0.91 ± 0.02 | 0.64 ± 0.09 |
| A2AAR | 0.36 ± 0.12 | 0.47 ± 0.14 | 0.85 ± 0.10 | 0.59 ± 0.11 | 0.85 ± 0.02 | 0.65 ± 0.09 | |
| Non-binder | 0.07 ± 0.13 | 0.67 ± 0.11 | 0.39 ± 0.12 | 0.70 ± 0.05 | 0.37 ± 0.12 | 0.50 ± 0.09 | |
| QSAR 4-class | A1AR | 0.11 ± 0.10 | 0.12 ± 0.09 | 0.97 ± 0.01 | 0.18 ± 0.11 | 0.95 ± 0.01 | 0.70 ± 0.07 |
| A2AAR | 0.25 ± 0.16 | 0.29 ± 0.13 | 0.94 ± 0.02 | 0.39 ± 0.16 | 0.90 ± 0.02 | 0.67 ± 0.09 | |
| Dual | 0.09 ± 0.05 | 0.51 ± 0.05 | 0.58 ± 0.08 | 0.50 ± 0.05 | 0.60 ± 0.02 | 0.58 ± 0.05 | |
| Non-binder | 0.15 ± 0.06 | 0.51 ± 0.09 | 0.64 ± 0.06 | 0.47 ± 0.04 | 0.68 ± 0.05 | 0.57 ± 0.07 |
Means of fivefold cross-validation with standard error of the mean (SEM). The class indicates the performance for that particular selectivity class: A1AR-selective, A2AAR-selective, dual (non-selective), and non-binders
MCC Matthews Correlation Coefficient, PPV positive predictive value, NPV negative predictive value, ROC receiver operating characteristic
Fig. 2Chemical similarity of compounds of the selectivity classes A1AR-selective, A2AAR-selective, dual, and non-binders. The chemical similarity is visualized with t-SNE [20] based on FCFP4 fingerprints. a The used chemical clusters of the compounds: A1AR-selective, A2AAR-selective, dual binders, and non-binders. b Clusters based on chemical similarity; each color-symbol combination represents a unique cluster (136 clusters in total)
Performances of classification and regression bioactivity models for the A1AR and A2AAR
| Protein | Model training type | Dataset in training | Validation set (only A1AR or A2AAR compounds, respective of the protein) | MCC | Sensitivity | Specificity | PPV | NPV | ROC |
|---|---|---|---|---|---|---|---|---|---|
| A1AR | Classification | A1AR compounds in the A1AR/A2AAR dataset + semi-selective compounds | A1AR/A2AAR dataset + semi-selective compounds | − 0.09 ± 0.06 | 0.44 ± 0.09 | 0.48 ± 0.11 | 0.46 ± 0.09 | 0.45 ± 0.06 | 0.41 ± 0.05 |
| Classification | A1AR bioactivity dataset | A1AR/A2AAR dataset + semi-selective compounds | − 0.16 ± 0.05 | 0.39 ± 0.08 | 0.45 ± 0.10 | 0.42 ± 0.09 | 0.41 ± 0.06 | 0.39 ± 0.04 | |
| Regression | A1AR compounds in the A1AR/A2AAR dataset + semi-selective compounds | A1AR/A2AAR dataset + semi-selective compounds | 0.09 ± 0.04 | 0.53 ± 0.09 | 0.56 ± 0.08 | 0.54 ± 0.09 | 0.54 ± 0.08 | 0.61 ± 0.03 | |
| Regression | A1AR bioactivity dataset | A1AR/A2AAR dataset + semi-selective compounds | 0.04 ± 0.06 | 0.46 ± 0.08 | 0.58 ± 0.08 | 0.52 ± 0.10 | 0.52 ± 0.08 | 0.59 ± 0.05 | |
| Regression | A1AR bioactivity dataset | A1AR bioactivity dataset | 0.06 ± 0.07 | 0.49 ± 0.07 | 0.58 ± 0.08 | 0.53 ± 0.07 | 0.54 ± 0.06 | 0.60 ± 0.05 | |
| A2AAR | Classification | A2AAR compounds in the A1AR/A2AAR dataset + semi-selective compounds | A1AR/A2AAR dataset + semi-selective compounds | 0.11 ± 0.09 | 0.59 ± 0.10 | 0.50 ± 0.13 | 0.73 ± 0.05 | 0.39 ± 0.05 | 0.59 ± 0.06 |
| Classification | A2AAR bioactivity dataset | A1AR/A2AAR dataset + semi-selective compounds | 0.16 ± 0.11 | 0.57 ± 0.12 | 0.56 ± 0.13 | 0.75 ± 0.06 | 0.45 ± 0.09 | 0.61 ± 0.07 | |
| Regression | A2AAR compounds in the A1AR/A2AAR dataset + semi-selective compounds | A1AR/A2AAR dataset + semi-selective compounds | 0.12 ± 0.10 | 0.69 ± 0.10 | 0.40 ± 0.08 | 0.70 ± 0.04 | 0.47 ± 0.10 | 0.64 ± 0.06 | |
| Regression | A2AAR bioactivity dataset | A1AR/A2AAR dataset + semi-selective compounds | 0.21 ± 0.07 | 0.64 ± 0.10 | 0.56 ± 0.10 | 0.76 ± 0.04 | 0.46 ± 0.04 | 0.69 ± 0.05 | |
| Regression | A2AAR bioactivity dataset | A2AAR bioactivity dataset | 0.19 ± 0.07 | 0.63 ± 0.11 | 0.54 ± 0.09 | 0.70 ± 0.04 | 0.50 ± 0.05 | 0.69 ± 0.05 |
Query compounds were categorized based on post-classification of the bioactivity predictions: predicted pActivity < 6.5 = inactive and predicted pActivity ≥ 6.5 = active
MCC Matthews Correlation Coefficient, PPV positive predictive value, NPV negative predictive value, ROC receiver operating characteristic
Performances of selectivity modeling using the two-step A1AR-A2AAR difference model or the selectivity-window model
| Model | Class | MCC | Sensitivity | Specificity | PPV | NPV | ROC |
|---|---|---|---|---|---|---|---|
| Trained on all double points, tested on all double points (non-binders and semi-selective compounds always true/false negative) | |||||||
| A1AR-A2AAR difference | A1AR | 0.15 ± 0.13 | 0.17 ± 0.14 | 0.99 ± 0.00 | 0.18 ± 0.12 | 0.98 ± 0.01 | 0.76 ± 0.09 |
| A2AAR | 0.11 ± 0.07 | 0.19 ± 0.09 | 0.94 ± 0.02 | 0.15 ± 0.07 | 0.94 ± 0.01 | 0.67 ± 0.14 | |
| Dual | 0.26 ± 0.05 | 0.76 ± 0.02 | 0.50 ± 0.07 | 0.68 ± 0.02 | 0.59 ± 0.03 | 0.66 ± 0.02 | |
| Selectivity-window | A1AR | 0.07** ± 0.09 | 0.030.03 | 0.99 ± 0.00 | 0.10** ± 0.10 | 0.97 ± 0.00 | 0.87 ± 0.03 |
| A2AAR | 0.22 ± 0.12 | 0.15 ± 0.09 | 0.99 ± 0.00 | 0.42 ± 0.18 | 0.94 ± 0.01 | 0.74 ± 0.07 | |
| Dual | 0.36 ± 0.06 | 0.85 ± 0.03 | 0.48 ± 0.03 | 0.69 ± 0.01 | 0.70 ± 0.06 | 0.70 ± 0.02 | |
| Trained on all double points, tested on only A1AR, A2AAR, and dual class | |||||||
| A1AR-A2AAR difference | A1AR | 0.19 ± 0.16 | 0.17 ± 0.14 | 0.99 ± 0.00 | 0.28 ± 0.18 | 0.96 ± 0.01 | 0.75 ± 0.09 |
| A2AAR | 0.28* ± 0.12 | 0.19 ± 0.09 | 0.98 ± 0.01 | 0.48* ± 0.15 | 0.91 ± 0.01 | 0.72 ± 0.15 | |
| Dual | 0.25 ± 0.07 | 0.76 ± 0.02 | 0.57 ± 0.11 | 0.91 ± 0.02 | 0.28 ± 0.04 | 0.70 ± 0.05 | |
| Selectivity-window | A1AR | 0.17** ± 0.23 | 0.03 ± 0.03 | 0.99 ± 0.01 | 0.50** ± 0.50 | 0.92 ± 0.01 | 0.88 ± 0.04 |
| A2AAR | 0.32* ± 0.15 | 0.15 ± 0.09 | 1.00 ± 0.00 | 0.75* ± 0.25 | 0.82 ± 0.03 | 0.80 ± 0.07 | |
| Dual | 0.33 ± 0.11 | 0.84 ± 0.04 | 0.46 ± 0.06 | 0.80 ± 0.02 | 0.55 ± 0.10 | 0.66 ± 0.04 | |
| Trained on only A1AR, A2AAR, and dual class, tested on only A1AR, A2AAR, and dual class | |||||||
| Selectivity-window | A1AR | − 0.05*** ± 0.00 | 0.00 ± 0.00 | 0.99 ± 0.01 | 0.00*** ± 0.00 | 0.92 ± 0.01 | 0.81 ± 0.05 |
| A2AAR | 0.23 ± 0.18 | 0.25 ± 0.12 | 0.94 ± 0.03 | 0.43 ± 0.22 | 0.83 ± 0.03 | 0.66 ± 0.11 | |
| Dual | 0.04 ± 0.12 | 0.69 ± 0.05 | 0.35 ± 0.09 | 0.73 ± 0.03 | 0.32 ± 0.09 | 0.53 ± 0.08 | |
The class indicates the performance for that particular selectivity class: A1AR-selective, A2AAR-selective, and dual (non-selective). The query compounds were categorized based on post-classification of the selectivity predictions: A1AR-selective when pActivity difference ≥ 2, A2AAR-selective when pActivity difference ≤ − 2, and dual binder when pActivity difference ≥ − 1 and ≤ 1
MCC Matthews Correlation Coefficient, PPV positive predictive value, NPV negative predictive value, ROC receiver operating characteristic
*1/5 folds failed, **3/5 folds failed, ***4/5 folds failed
Fig. 3Chemical structures of compounds with predictions by different selectivity models. The compounds were wrongly predicted with the two-step A1AR-A2AAR model and correctly predicted with the selectivity-window model. Predictions are indicated as: predicted A1AR-selective (A1AR), A2AAR-selective (A2AAR), and as dual binder (Dual) for ligands CHEMBL260788 (a), CHEMBL3596506 (b), and CHEMBL201750 (c)
Fig. 4Relationship between experimental selectivity versus predicted selectivity. Predicted selectivity values shown for the selectivity-window model. A1AR-selective classification thresholds shown as orange lines (dotted = old threshold, solid = new threshold)
Fig. 5Positive predictive value (PPV) of compounds predicted to be A1AR- or A2AAR-selective. The PPV depicts the number of experimentally validated selective compounds divided by the total number of predicted selective compounds. PPVs are shown when different filters are applied: no bioactivity filter, statistical bioactivity, bioactivity based on docking score, and consensus bioactivity (statistical bioactivity and structure-based docking)
Fig. 6Docked poses of compounds in their corresponding targets. Poses of two non-binders in the A1AR (CHEMBL1800792 in a and CHEMBL372580 in b), an A1AR-selective compound (CHEMBL204780 in c), and A2AAR-selective compound (CHEMBL371436 in d). Docked poses are compared to the co-crystalized ligands shown in orange. Hydrogen bonds between ligands and Asn6.55 are shown in yellow
Performance of the selectivity-window model on an external validation set
| Model | Class | MCC | Sensitivity | Specificity | PPV | NPV | ROC |
|---|---|---|---|---|---|---|---|
| Selectivity-window | A1AR | 0.13 | 0.39 | 0.83 | 0.12 | 0.96 | 0.75 |
| A2AAR | 0.40 | 0.24 | 1.00 | 0.70 | 0.97 | 0.66 | |
| Dual | 0.02 | 0.81 | 0.21 | 0.64 | 0.39 | 0.37 | |
| Selectivity-window and bioactivity filtered | A1AR | 0.18 | 1.00 | 0.16 | 0.21 | 1.00 | 0.66 |
| A2AAR | 0.88 | 1.00 | 0.97 | 0.80 | 1.00 | 0.98 | |
| Dual | – | 0.00 | 1.00 | – | 0.28 | 0.72 |
The query compounds were categorized based on post hoc optimized classification of the selectivity predictions: A1AR-selective when selectivity-window ≥ 0.5, A2AAR-selective when selectivity-window ≤ − 2, and dual binder when selectivity-window ≥ − 1 and < 0.5
MCC Matthews Correlation Coefficient, PPV positive predictive value, NPV negative predictive value, ROC receiver operating characteristic
Properties of different subsets in the A1AR/A2AAR dataset, A1AR bioactivity dataset, and A2AAR bioactivity dataset
| Dataset | Subset | Chemical similarity (Tanimoto FCFP4) | Number of compounds | Number of actives (pActivity ≥ 6.5) | Number of inactives (pActivity < 6.5) | Number of A1AR-selectives | Number of A2AAR-selectives | Number of dual binders | Number of non-binders |
|---|---|---|---|---|---|---|---|---|---|
| A1AR/A2AAR dataset | 1 | 0.26 | 362 | n.a. | n.a. | 11 | 52 | 146 | 153 |
| 2 | 0.21 | 261 | n.a. | n.a. | 11 | 38 | 111 | 101 | |
| 3 | 0.20 | 171 | n.a. | n.a. | 12 | 21 | 86 | 52 | |
| 4 | 0.21 | 156 | n.a. | n.a. | 10 | 20 | 70 | 56 | |
| 5 | 0.19 | 156 | n.a. | n.a. | 6 | 15 | 66 | 69 | |
| A1AR bioactivity dataset | 1 | 0.23 | 718 | 501 | 217 | n.a. | n.a. | n.a. | n.a. |
| 2 | 0.20 | 551 | 304 | 247 | n.a. | n.a. | n.a. | n.a. | |
| 3 | 0.20 | 524 | 281 | 243 | n.a. | n.a. | n.a. | n.a. | |
| 4 | 0.19 | 477 | 261 | 216 | n.a. | n.a. | n.a. | n.a. | |
| 5 | 0.19 | 504 | 306 | 198 | n.a. | n.a. | n.a. | n.a. | |
| A2AAR bioactivity dataset | 1 | 0.25 | 1463 | 994 | 469 | n.a. | n.a. | n.a. | n.a. |
| 2 | 0.22 | 467 | 263 | 204 | n.a. | n.a. | n.a. | n.a. | |
| 3 | 0.23 | 460 | 312 | 148 | n.a. | n.a. | n.a. | n.a. | |
| 4 | 0.20 | 416 | 256 | 160 | n.a. | n.a. | n.a. | n.a. | |
| 5 | 0.19 | 317 | 191 | 126 | n.a. | n.a. | n.a. | n.a. |