| Literature DB >> 30767155 |
Lindsey Burggraaff1, Paul Oranje2, Robin Gouka2, Pieter van der Pijl2, Marian Geldof2, Herman W T van Vlijmen1,3, Adriaan P IJzerman1, Gerard J P van Westen4.
Abstract
Sodium-dependent glucose co-transporter 1 (SGLT1) is a solute carrier responsible for active glucose absorption. SGLT1 is present in both the renal tubules and small intestine. In contrast, the closely related sodium-dependent glucose co-transporter 2 (SGLT2), a protein that is targeted in the treatment of diabetes type II, is only expressed in the renal tubules. Although dual inhibitors for both SGLT1 and SGLT2 have been developed, no drugs on the market are targeted at decreasing dietary glucose uptake by SGLT1 in the gastrointestinal tract. Here we aim at identifying SGLT1 inhibitors in silico by applying a machine learning approach that does not require structural information, which is absent for SGLT1. We applied proteochemometrics by implementation of compound- and protein-based information into random forest models. We obtained a predictive model with a sensitivity of 0.64 ± 0.06, specificity of 0.93 ± 0.01, positive predictive value of 0.47 ± 0.07, negative predictive value of 0.96 ± 0.01, and Matthews correlation coefficient of 0.49 ± 0.05. Subsequent to model training, we applied our model in virtual screening to identify novel SGLT1 inhibitors. Of the 77 tested compounds, 30 were experimentally confirmed for SGLT1-inhibiting activity in vitro, leading to a hit rate of 39% with activities in the low micromolar range. Moreover, the hit compounds included novel molecules, which is reflected by the low similarity of these compounds with the training set (< 0.3). Conclusively, proteochemometric modeling of SGLT1 is a viable strategy for identifying active small molecules. Therefore, this method may also be applied in detection of novel small molecules for other transporter proteins.Entities:
Keywords: Cheminformatics; Machine learning; Molecular modeling; Proteochemometrics; SGLT1; Sodium-dependent glucose co-transporter; Sodium-glucose linked transporter
Year: 2019 PMID: 30767155 PMCID: PMC6689890 DOI: 10.1186/s13321-019-0337-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Chemical space of the public and in-house datasets. a The t-SNE shows molecular structure and affinity (pKi for public data and % of (negative) control for in-house data) for representative hSGLT1 compounds. b Molecular weight and ALogP distribution of compounds in the training sets
Fig. 2Activity threshold grid search. Searching the activity threshold grid for in-house (activity percentage compared to negative control) and public data (pChEMBL value). Model performance was measured using Matthews Correlation Coefficient (MCC), which was 0.48 for the final selected thresholds of < 70% for in-house data and pChEMBL > 8.5 for public data
Model performance depends on datasets that are used in training
| Model and validation | Training | Sensitivity | Specificity | PPV | NPV | MCC |
|---|---|---|---|---|---|---|
| QSAR (EV) | PD + IH | 0.76 | 0.86 | 0.42 | 0.96 | 0.48 |
| Public PCM (CV) | PD | 0.01 ± 0.01 | 0.98 ± 0.00 | 0.03 ± 0.06 | 0.91 ± 0.01 | − 0.03 ± 0.03 |
| In-house PCM (CV) | IH | 0.69 ± 0.07 | 0.89 ± 0.02 | 0.38 ± 0.06 | 0.97 ± 0.01 | 0.45 ± 0.05 |
| Combined PCM (CV) | PD + IH | 0.64 ± 0.06 | 0.93 ± 0.01 | 0.47 ± 0.07 | 0.96 ± 0.01 | 0.49 ± 0.05 |
PD public data, IH in-house data, EV external validation on 30% of data, CV fivefold cross validation on 20% of the data per iteration
Fig. 3Chemical space of the selected compounds compared to the training and screening datasets. a The Diverse set (yellow) and Cluster set (green) are displayed compared to the training (orange and red) and Enamine screening set (blue). The Enamine set is represented by a random selection of 20,000 out of the total of 1,815,674 compounds (~ 1%) in the screening set to limit t-SNE calculation time. b The molecular weight and ALogP of the Diverse and Cluster set compared to the training and screening sets
Fig. 4Reference hSGLT1 inhibitors for Cluster set and their inhibitory activity. Inhibitory activities (compared to negative control, where 100% is no inhibition) and chemical structures of four recently identified novel hSGLT1 inhibitors: bepridil, bupivacaine, cloperastine, and trihexyphenidyl
Fig. 5Clustering of hSGLT1 actives. Active hSGLT1 compounds in the training set clustered into ten chemical clusters (Tanimoto, FCFP6). Molecular structure and affinity (pKi for public data and % of (negative) control for in-house data) for representative cluster compounds are shown. In-house compounds with activity < 70% of (negative) control and public compounds with pChEMBL ≥ 6.5 were used in clustering. a t-SNE plot of the chemical clusters. b The molecular weight and ALogP distribution of compounds in the chemical clusters