| Literature DB >> 31861445 |
Robert Ancuceanu1, Bogdan Tamba2, Cristina Silvia Stoicescu3, Mihaela Dinu1.
Abstract
A prototype of a family of at least nine members, cellular Src tyrosine kinase is a therapeutically interesting target because its inhibition might be of interest not only in a number of malignancies, but also in a diverse array of conditions, from neurodegenerative pathologies to certain viral infections. Computational methods in drug discovery are considerably cheaper than conventional methods and offer opportunities of screening very large numbers of compounds in conditions that would be simply impossible within the wet lab experimental settings. We explored the use of global quantitative structure-activity relationship (QSAR) models and molecular ligand docking in the discovery of new c-src tyrosine kinase inhibitors. Using a dataset of 1038 compounds from ChEMBL database, we developed over 350 QSAR classification models. A total of 49 models with reasonably good performance were selected and the models were assembled by stacking with a simple majority vote and used for the virtual screening of over 100,000 compounds. A total of 744 compounds were predicted by at least 50% of the QSAR models as active, 147 compounds were within the applicability domain and predicted by at least 75% of the models to be active. The latter 147 compounds were submitted to molecular ligand docking using AutoDock Vina and LeDock, and 89 were predicted to be active based on the energy of binding.Entities:
Keywords: QSAR; c-src-tyrosine kinase; cancer; drug discovery; molecular descriptors; molecular docking; virtual screening
Mesh:
Substances:
Year: 2019 PMID: 31861445 PMCID: PMC6981969 DOI: 10.3390/ijms21010019
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Variability of the dataset illustrated by several simple constitutional descriptors or molecular properties. Blue—inactive compounds; red—active compounds. For the Lipinsky rule, “No” indicates compounds not obeying to the Lipinsky’s rule of five, and “Yes” compounds satisfying the rule; among the latter the active compounds are more frequent. C% indicates the percentage of carbon atoms, N% the percentage of nitrogen atoms, whereas ALOGP is the Ghose-Crippen octanol-water partition coeff. (logP).
Figure 2Dissimilarity matrix illustrating the variability among the dataset based on the Gower distances between the compounds.
Performance of the quantitative structure-activity relationship (QSAR) models selected.
| Model * | BA (%) | PPV (%) | MMCE (%) | AUC (%) | TPR (%) | TNR (%) | Q2 − Q2, rnd |
|---|---|---|---|---|---|---|---|
| RF_anova_23 | 70.24 | 78.26 | 18.60 | 82.56 | 45.39 | 95.08 | 21.33 |
| RF_auc_20 | 70.07 | 78.08 | 18.69 | 82.85 | 45.04 | 95.09 | 21.23 |
| RF_cforest_13 | 70.07 | 79.39 | 18.60 | 82.96 | 44.80 | 95.34 | 21.33 |
| RF_kruskal_30 | 70.52 | 77.42 | 18.60 | 82.61 | 46.35 | 94.68 | 21.33 |
| RF_RFimp_30 | 71.54 | 80.04 | 17.73 | 86.03 | 47.69 | 95.39 | 22.19 |
| RF_RF.SRCimp_20 | 71.01 | 77.44 | 18.31 | 83.76 | 47.18 | 94.83 | 21.62 |
| RF_RF.SRCvarselect_10 | 72.93 | 78.72 | 17.34 | 86.01 | 51.29 | 94.56 | 22.58 |
| RF_impurity_15 | 70.67 | 76.43 | 18.69 | 83.72 | 46.91 | 94.43 | 21.23 |
| RF_permutation_10 | 71.53 | 80.51 | 17.83 | 83.63 | 47.86 | 95.20 | 22.10 |
| RF_univariate_30 | 71.48 | 83.49 | 17.44 | 84.31 | 46.80 | 96.16 | 22.48 |
| SVM_anova_30 | 71.83 | 71.26 | 19.07 | 82.08 | 51.60 | 92.05 | 20.48 |
| SVM_auc_30 | 72.02 | 71.56 | 18.98 | 83.25 | 51.99 | 92.05 | 20.94 |
| SVM_cforest_30 | 75.11 | 74.96 | 17.05 | 85.60 | 57.65 | 92.57 | 22.87 |
| SVM_chi.sq_30 | 71.91 | 75.44 | 18.59 | 82.45 | 50.86 | 92.97 | 21.33 |
| SVM_gainratio_30 | 72.03 | 72.78 | 18.98 | 82.85 | 51.99 | 92.07 | 20.94 |
| SVM_information_30 | 72.44 | 73.34 | 18.59 | 83.91 | 52.54 | 92.35 | 21.33 |
| SVM_kruskal_20 | 72.06 | 72.29 | 18.98 | 82.06 | 52.06 | 92.05 | 20.94 |
| SVM_oneR_30 | 72.49 | 78.08 | 17.73 | 81.16 | 50.68 | 94.31 | 22.19 |
| SVM_RFimp_30 | 74.74 | 74.71 | 17.25 | 86.92 | 57.16 | 92.32 | 22.68 |
| SVM_RF.SRCimp_30 | 75.92 | 77.07 | 16.28 | 86.20 | 58.57 | 93.28 | 23.64 |
| SVM_RF.SRCvarselect_20 | 76.33 | 76.22 | 16.28 | 86.75 | 60.10 | 92.56 | 23.64 |
| SVM_impurity_30 | 73.96 | 73.86 | 17.82 | 84.27 | 55.61 | 92.30 | 22.10 |
| SVM_permutation_20 | 72.14 | 73.82 | 18.59 | 84.37 | 51.58 | 92.71 | 21.33 |
| SVM_relief_30 | 72.42 | 71.93 | 19.08 | 82.15 | 53.57 | 91.26 | 20.84 |
| SVM_sym.uncertain_20 | 71.91 | 73.31 | 18.69 | 83.33 | 50.99 | 92.84 | 21.23 |
| Adabm1_RFimp_30 | 71.06 | 73.50 | 19.08 | 83.49 | 49.11 | 93.00 | 20.84 |
| Adabm1_RF.SRCvarselect_20 | 71.15 | 70.36 | 19.56 | 81.96 | 50.36 | 91.95 | 20.36 |
| Adabm1_impurity_20 | 71.22 | 73.34 | 18.80 | 83.66 | 49.18 | 93.26 | 21.13 |
| Adabm1_univariate_30 | 70.50 | 74.30 | 19.27 | 82.36 | 47.61 | 93.39 | 20.65 |
| BartM_chi.sq_30 | 73.15 | 73.28 | 18.11 | 83.54 | 53.87 | 92.42 | 21.81 |
| BartM_gainratio_20 | 71.61 | 70.19 | 19.37 | 82.45 | 51.57 | 91.64 | 20.56 |
| BartM_information_20 | 73.56 | 73.52 | 17.92 | 84.08 | 54.68 | 92.44 | 22.00 |
| BartM_RFimp_25 | 74.24 | 71.45 | 18.02 | 85.28 | 57.13 | 91.36 | 21.90 |
| BartM_impurity_20 | 73.48 | 70.94 | 18.50 | 83.79 | 55.74 | 91.22 | 21.42 |
| BartM_permutation_22 | 74.70 | 71.64 | 17.82 | 85.04 | 58.17 | 91.23 | 22.10 |
| BartM_sym.uncertain_30 | 73.59 | 71.19 | 18.31 | 84.36 | 55.69 | 91.49 | 21.62 |
| C50_anova_30 | 75.96 | 72.56 | 17.05 | 84.73 | 60.70 | 91.23 | 22.87 |
| C50_auc_20 | 74.00 | 72.03 | 18.12 | 83.75 | 56.80 | 91.19 | 21.81 |
| C50_cforest_20 | 75.08 | 71.62 | 17.73 | 85.06 | 59.32 | 90.84 | 22.19 |
| C50_chi.sq_30 | 75.55 | 70.40 | 17.73 | 83.55 | 60.79 | 90.32 | 22.19 |
| C50_gainratio_30 | 75.26 | 70.85 | 17.82 | 84.43 | 60.08 | 90.45 | 22.10 |
| C50_kruskal_30 | 74.56 | 71.35 | 18.02 | 84.52 | 58.03 | 91.10 | 21.90 |
| C50_oneR_30 | 73.91 | 72.78 | 18.41 | 83.62 | 57.06 | 90.76 | 21.52 |
| C50_RFimp_30 | 78.56 | 75.39 | 15.32 | 87.24 | 65.23 | 91.89 | 24.60 |
| C50_RF.SRCimp_30 | 76.21 | 72.82 | 17.05 | 85.45 | 61.32 | 91.10 | 22.87 |
| C50_RF.SRCvarselect_20 | 77.64 | 72.08 | 16.76 | 87.84 | 65.43 | 89.86 | 23.16 |
| C50_impurity_20 | 76.40 | 76.14 | 16.10 | 86.70 | 60.13 | 92.66 | 23.83 |
| C50_permutation_30 | 75.93 | 72.28 | 16.96 | 86.29 | 60.51 | 91.36 | 22.96 |
| C50_univariate_30 | 75.44 | 70.55 | 17.73 | 85.47 | 60.46 | 90.43 | 22.19 |
* Each model name is formed by three parts separated by an underscore: the first part of the name indicates the classifier, the second part the feature selection algorithm (in an abbreviated form), and the third part the number of features used to build the model. The names of the classification and feature selection algorithms are provided in Section 4. For instance, RF_anova_20 was a random forest based on features selected based on ANOVA (as implemented in “anova.test” within “mlr” R package) and the number of features used was 20. BA: balanced accuracy; PPV: positive predictive value; MMCE: mean misclassification error; AUC: area under the ROC curve; TPR: true positive rate; TNR: true negative rate; Q2 – accuracy; Q2, rnd - most probable random accuracy (as explained in the text).
The most important molecular descriptors associated with the inhibition of the c-src tyrosine kinase.
| Name | Interpretation | Descriptor Block (Group) | Frequency Occurring among the First Five Most Important Features |
|---|---|---|---|
| SpMax4_Bh(m) | Largest eigenvalue n. 4 of Burden matrix weighted by mass | Burden eigenvalues | 14 |
| DECC | Eccentric topological index | Topological indices | 11 |
| SpMax5_Bh(m) | Largest eigenvalue n. 5 of Burden matrix weighted by mass | Burden eigenvalues | 8 |
| SpMax3_Bh(m) | Largest eigenvalue n. 3 of Burden matrix weighted by mass | Burden eigenvalues | 8 |
| J_D | Balaban-like index from topological distance matrix (Balaban distance connectivity index) | 2D matrix-based descriptors | 6 |
| F06[C–N] | Frequency of C–N at topological distance 6 | 2D Atom Pairs | 5 |
| Chi1_EA(dm) | Connectivity-like index of order 1 from edge adjacency mat. weighted by dipole moment | Edge adjacency indices | 4 |
| P_VSA_MR_6 | P_VSA-like on Molar Refractivity, bin 6 | P_VSA-like descriptors | 3 |
| SpMax6_Bh(m) | largest eigenvalue n. 6 of Burden matrix weighted by mass | Burden eigenvalues | 3 |
| N-073 | Ar2NH/Ar3N/Ar2N-Al/R..N..R | Atom-centered fragments | 2 |
| F05[C–N] | Frequency of C–N at topological distance 5 | 2D Atom Pairs | 2 |
A total of 19 other descriptors occurred only once among the five most important features identified by each of the 17 feature selection algorithms.
Figure 3Variation of the proportion of compounds estimated to be outside the applicability domain (F. Sahigara et al. method [34]) for the 49 QSAR models used in virtual screening.
Figure 4Receiver operating characteristic curve for the performance of molecular docking using LeDock software on the training set (n = 175 compounds, as described in the text).
Compounds predicted to be active by both the assembled QSAR models and ligand docking.
| ZINC Code | Substance Name | Confirmation in Wet Lab Experiments * | Activity Confirmed on Other Tyrosin Kinases * | Presence in the Training Set | Energy of Binding ** |
|---|---|---|---|---|---|
| ZINC000001550477 | Lapatinib | Yes | Yes | Yes | −10.07 (0.67) |
| ZINC000034638188 | Pf-562271 | Yes | Yes | Yes | −9.3 (0.74) |
| ZINC000063298074 | Ilorasertib | Yes | Yes | Yes | −10.09 (0.66) |
| ZINC000034800096 | Gw583373a | No | Yes | No | −11.02 (1.01) |
| ZINC000027184814 | Vibriobactin | NA | No | No | −9.77 (0.74) |
| ZINC000034800093 | Gw580496a | No | Yes | No | −9.33 (1.09) |
| ZINC000150528975 | Vedroprevir | NA | No | No | −11.51 (1.04) |
| ZINC000034800112 | Gw576484x | No | Yes | No | −10.36 (0.84) |
| ZINC000072190218 | Avatrombopag | NA | No | No | −9.28 (0.43) |
| ZINC000034800091 | Gw576609a | No | Yes | No | −11.38 (0.69) |
| ZINC000044418656 | Gw784684x | No | Yes | No | −10.77 (0.93) |
| ZINC000042804069 | Gsk-182497a | No | Yes | No | −9.57 (0.37) |
| ZINC000103297739 | Defactinib | No | Yes | No | −10.23 (0.40) |
| ZINC000004215255 | Cefpimizole | NA | No | No | −10.54 (0.70) |
| ZINC000042834127 | Gsk1751853a | No | Yes | No | −10.34 (1.40) |
| ZINC000014945166 | Gw830365a | No | Yes | No | −9.53 (0.29) |
| ZINC000150339466 | Ciluprevir | NA | No | No | −10.95 (0.88) |
| ZINC000043195317 | Golvatinib | No | Yes | No | −14 (1.06) |
| ZINC000042201866 | Gw566221a | No | Yes | No | −10.06 (0.71) |
| ZINC000095615094 | Patellamide G | NA | No | No | −9.32 (0.79) |
| ZINC000003604326 | Vaneprim | NA | No | No | −11.01 (0.79) |
| ZINC000002007399 | Gw458787a | No | Yes | No | −10.95 (0.76) |
| ZINC000028639340 | Posaconazole | NA | No | No | −10.92 (1.01) |
| ZINC000072122048 | Gsk259178a | No | Yes | No | −12.44 (0.49) |
| ZINC000068204830 | Daclatasvir | NA | No | No | −10.75 (0.42) |
| ZINC000043131420 | Fostamatinib | NA | Yes | No | −10.77 (1.11) |
| ZINC000169289453 | Simeprevir | NA | No | No | −11.45 (0.88) |
| ZINC000042834162 | Gw869810x | No | Yes | No | −12.11 (0.76) |
| ZINC000049709569 | Asperazine | NA | No | No | −11.6 (0.82) |
| ZINC000096928979 | Deleobuvir | NA | No | No | −10.2 (0.68) |
| ZINC000042201868 | Gw568377a | No | No | No | −9.36 (0.60) |
| ZINC000014945147 | Gw809897x | Yes | Yes | No | −10.44 (0.71) |
| ZINC000014945171 | Gw830263a | Yes | Yes | No | −10.53 (0.57) |
| ZINC000014945045 | Gw569530a | No | Yes | No | −9.52 (0.55) |
| ZINC000003925087 | Gw806742x | Yes | Yes | No | −10.43 (0.78) |
| ZINC000095618748 | Candesartan O-Glucuronide | NA | No | No | −9.71 (0.58) |
| ZINC000098052868 | Olcegepant | NA | No | No | −9.55 (0.48) |
| ZINC000049833405 | Preulicyclamide | NA | No | No | −11.13 (0.62) |
| ZINC000034800110 | Gw574782a | No | Yes | No | −10.42 (0.60) |
| ZINC000014965596 | Gw683134a | Yes | Yes | No | −10.91 (0.80) |
| ZINC000034800112 | Gw576484x | No | Yes | No | −9.93 (0.36) |
| ZINC000019862646 | Fedratinib | Yes | Yes | No | −10.23 (0.64) |
| ZINC000150377731 | Bms-247243 | NA | No | No | −10.42 (0.83) |
| ZINC000003986669 | Bx-795 | Yes | Yes | No | −9.28 (0.69) |
| ZINC000095615898 | Tyrokeradine A | NA | No | No | −11.14 (0.76) |
| ZINC000003919988 | L-766892 | NA | No | No | −9.59 (0.67) |
| ZINC000095544067 | Ulithiacyclamide F | NA | No | No | −9.76 (0.52) |
| ZINC000049889335 | Edulirin A | NA | No | No | −11.45 (1.04) |
| ZINC000003995140 | Gw621823a | No | Yes | No | −10.63 (0.63) |
| ZINC000040379218 | Gw684626b | No | Yes | No | −10.46 (0.87) |
| ZINC000034800121 | Gw567808a | No | Yes | No | −10.42 (0.53) |
| ZINC000169306513 | Hydroxyitraconazole | NA | No | No | −9.78 (1.02) |
| ZINC000169368380 | Kni-1039 | NA | No | No | −10.13 (0.41) |
| ZINC000150601177 | Ombitasvir | NA | No | No | −10.07 (0.69) |
| ZINC000040404350 | Gsk-969786a | No | Yes | No | −10.2 (0.75) |
| ZINC000150592451 | Micromide | NA | No | No | −12.96 (1.00) |
| ZINC000028249631 | Pd-170292 | NA | No | No | −10.1 (0.73) |
| ZINC000169366333 | Porphyrin | NA | No | No | −11.05 (0.71) |
| ZINC000034800119 | Gw576924a | No | Yes | No | −10.18 (0.92) |
| ZINC000150362888 | Pyropheophytin B | NA | No | No | −10.23 (0.73) |
| ZINC000100057121 | Tegobuvir | NA | No | No | −10.55 (0.58) |
| ZINC000103213128 | Heptamethylene 1,7-Bis-Imadacloprid | NA | No | No | −9.58 (0.47) |
| ZINC000169291993 | Sansanmycin F | NA | No | No | −9.5 (0.56) |
| ZINC000230052516 | Urobilin | NA | No | No | −10.9 (0.85) |
| ZINC000003994828 | Brecanavir | NA | No | No | −10.41 (0.86) |
| ZINC000169363931 | Ansacarbamitocin C | NA | No | No | −10.56 (0.52) |
| ZINC000095535868 | Rwj-58259 | NA | No | No | −10.09 (0.77) |
| ZINC000003921862 | Tallimustine | NA | No | No | −9.76 (0.67) |
| ZINC000063933734 | Rebastinib | No | Yes | No | −9.73 (0.57) |
| ZINC000095615652 | Patellamide C | NA | No | No | −9.46 (0.73) |
| ZINC000197688172 | S-[(3e,5z)-3,5-Octadienoate | NA | No | No | −9.6 (0.67) |
| ZINC000014965588 | Gw709042a | No | Yes | No | −9.89 (0.89) |
| ZINC000085537136 | Barixibat | NA | No | No | −9.72 (0.56) |
| ZINC000169291499 | Kibdelomycin | NA | No | No | −10.99 (0.66) |
| ZINC000003946578 | Mitratapide | NA | No | No | −10.41 (0.62) |
| ZINC000001481922 | Setipafant | NA | No | No | −10.05 (0.62) |
| ZINC000072173092 | Deoxyvobstusine Lactone | NA | No | No | −9.66 (0.64) |
| ZINC000006717126 | Quarfloxin | NA | No | No | −9.85 (0.78) |
| ZINC000077301904 | Losartan N2-Glucuronide | NA | No | No | −10.86 (1.27) |
| ZINC000150609364 | Pseudoceratinazole A | NA | No | No | −11.38 (0.97) |
| ZINC000095616246 | Ulithiacyclamide E | NA | No | No | −9.35 (0.69) |
| ZINC000068151111 | Narlaprevir | NA | No | No | −9.96 (0.44) |
| ZINC000150351429 | Phytosulfokine B | NA | No | No | −9.7 (0.70) |
| ZINC000003989268 | Ceftaroline Fosamil | NA | No | No | −9.84 (0.62) |
| ZINC000008552132 | Stafac | NA | No | No | −11.01 (0.91) |
| ZINC000095618880 | Clofazimine Glucuronide | NA | No | No | −9.65 (0.58) |
| ZINC000096006065 | Xv638 | NA | No | No | −9.56 (0.57) |
| ZINC000169292535 | Rifapentine | NA | No | No | −12.81 (0.92) |
| ZINC000150341961 | Mafodotin | NA | No | No | −9.32 (0.71) |
* Based on ChEMBL and PubChem data for each substance (“Yes” means that there are at least limited confirmatory data in one of the public databases, “No” means that there is no such confirmatory data; NA—data not available at all). ** For an estimation of the docking error we provided in brackets the standard deviation of the energy of binding computed from the value of the different clusters of 20 poses.
Figure 5Crystallographic pose of the NAP ligand within c-src tyrosine kinase (in red) and predicted pose by LeDock (in blue). It may be seen that the rings overlap very closely, whereas the free aliphatic chains do not overlap so well.