| Literature DB >> 32931270 |
Lindsey Burggraaff1, Amber van Veen1, Chi Chung Lam1, Herman W T van Vlijmen1,2, Adriaan P IJzerman1, Gerard J P van Westen1.
Abstract
Proteins often have both orthosteric and allosteric binding sites. Endogenous ligands, such as hormones and neurotransmitters, bind to the orthosteric site, while synthetic ligands may bind to orthosteric or allosteric sites, which has become a focal point in drug discovery. Usually, such allosteric modulators bind to a protein noncompetitively with its endogenous ligand or substrate. The growing interest in allosteric modulators has resulted in a substantial increase of these entities and their features such as binding data in chemical libraries and databases. Although this data surge fuels research focused on allosteric modulators, binding data is unfortunately not always clearly indicated as being allosteric or orthosteric. Therefore, allosteric binding data is difficult to retrieve from databases that contain a mixture of allosteric and orthosteric compounds. This decreases model performance when statistical methods, such as machine learning models, are applied. In previous work we generated an allosteric data subset of ChEMBL release 14. In the current study an improved text mining approach is used to retrieve the allosteric and orthosteric binding types from the literature in ChEMBL release 22. Moreover, convolutional deep neural networks were constructed to predict the binding types of compounds for class A G protein-coupled receptors (GPCRs). Temporal split validation showed the model predictiveness with Matthews correlation coefficient (MCC) = 0.54, sensitivity allosteric = 0.54, and sensitivity orthosteric = 0.94. Finally, this study shows that the inclusion of accurate binding types increases binding predictions by including them as descriptor (MCC = 0.27 improved to MCC = 0.34; validated for class A GPCRs, trained on all GPCRs). Although the focus of this study is mainly on class A GPCRs, binding types for all protein classes in ChEMBL were obtained and explored. The data set is included as a supplement to this study, allowing the reader to select the compounds and binding types of interest.Entities:
Year: 2020 PMID: 32931270 PMCID: PMC7592116 DOI: 10.1021/acs.jcim.0c00695
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Figure 1Ligand binding sites observed in crystal structures of class A GPCRs. Orthosteric ligands are shown in green, allosteric modulators in orange. The following crystal structures are included (PDB): 4MBS,[10] 4MQT,[10] 4N6H,[11] 4NTJ,[12] 4PHU,[13] 5LWE,[14] 5NDZ,[15] 5NLX,[16] 5T1A,[2] 5TZR,[17] 5TZY,[17] 5X7D,[18] 6C1Q,[19] and 6C1R.[20]
Physicochemical Properties of Orthosteric and Allosteric Compounds in ChEMBLa
| allosteric | orthosteric | |||||
|---|---|---|---|---|---|---|
| median | MAD | mean and standard deviation | median | MAD | mean and standard deviation | |
| pChEMBL | 6.4 | 0.9 | 6.5 ± 1.2 | 7.1 | 1.0 | 7.1 ± 1.4 |
| molecular weight | 383 | 61 | 404 ± 135 | 434 | 93 | 541 ± 474 |
| 3.8 | 1.0 | 3.8 ± 2.0 | 3.8 | 1.3 | 3.1 ± 3.9 | |
| num. H donors | 1 | 1 | 1 ± 2 | 1 | 1 | 4 ± 8 |
| num. H acceptors | 4 | 1 | 4 ± 2 | 5 | 2 | 6 ± 8 |
| fraction rotatable bonds | 0.17 | 0.05 | 0.18 ± 0.09 | 0.20 | 0.06 | 0.22 ± 0.12 |
| fraction aromatic bonds | 0.50 | 0.11 | 0.47 ± 0.17 | 0.43 | 0.12 | 0.42 ± 0.18 |
The fractions of rotatable bonds and aromatic bonds represent the number of bonds normalized to the total number of bonds per compound. An independent (unpaired) t test gave p < 0.001 (two-tailed, α = 0.05) for the difference between the allosteric and orthosteric means of each property, which indicates a significant difference. MAD = mean absolute deviation.
Figure 2Physicochemical properties of orthosteric and allosteric compounds in ChEMBL per protein family. Orange boxes indicate the area between the 25th percentile and 50th percentile, gray indicates the area between the 50th percentile and 75th percentile, the border between orange and gray indicates the median value or 50th percentile, and the whiskers indicate the minimum value up to the 25th percentile (bottom whiskers) and the 75th percentile up to the maximum value (top whiskers). (A) pChEMBL value; (B) A Log P; (C) fraction of rotatable bonds; (D) fraction of aromatic bonds.
Figure 3Clustered scaffolds of orthosteric and allosteric compounds for class A GPCRs. Bemis–Murcko scaffolds are depicted with the root scaffold indicated in bold.
DNN Regression Model Performances for Prediction of Bioactivities of Allosteric Modulators for Class A GPCRsa
| training data set | added descriptor | MCC | sensitivity | specificity | accuracy | PPV | NPV | RMSE | ROC |
|---|---|---|---|---|---|---|---|---|---|
| class A GPCRs | – | 0.27 | 0.81 | 0.44 | 0.61 | 0.55 | 0.74 | 1.07 | 0.73 |
| binding type | 0.30 | 0.74 | 0.56 | 0.64 | 0.58 | 0.72 | 1.15 | 0.72 | |
| predicted binding type | 0.22 | 0.51 | 0.70 | 0.62 | 0.59 | 0.63 | 1.01 | 0.70 | |
| GPCRs | – | 0.27 | 0.78 | 0.48 | 0.62 | 0.56 | 0.73 | 1.01 | 0.74 |
| binding type | 0.34 | 0.78 | 0.55 | 0.65 | 0.59 | 0.75 | 1.09 | 0.73 | |
| predicted binding type | 0.15 | 0.52 | 0.63 | 0.58 | 0.54 | 0.61 | 0.99 | 0.67 | |
| all proteins in ChEMBL | – | 0.35 | 0.74 | 0.61 | 0.67 | 0.61 | 0.74 | 1.03 | 0.74 |
| binding type | 0.37 | 0.74 | 0.62 | 0.68 | 0.62 | 0.74 | 0.98 | 0.75 | |
| predicted binding type | 0.18 | 0.43 | 0.74 | 0.60 | 0.58 | 0.61 | 1.06 | 0.67 |
MCC = Matthews correlation coefficient, PPV = positive predictive value, NPV = negative predictive value, RMSE = root-mean-square error, and ROC = receiver operating characteristic.
Figure 4Performances of DNN regression models for the prediction of bioactivities of class A GPCR allosteric modulators with and without binding type descriptors. MCC = Matthews correlation coefficient, PPV = positive predictive value, and NPV = negative predictive value.