| Literature DB >> 30404161 |
Guohui Sun1, Tengjiao Fan2, Xiaodong Sun3, Yuxing Hao4, Xin Cui5, Lijiao Zhao6, Ting Ren7, Yue Zhou8, Rugang Zhong9, Yongzhen Peng10.
Abstract
O⁶-methylguanine-DNA methyltransferase (MGMT), a unique DNA repair enzyme, can confer resistance to DNA anticancer alkylating agents that modify the O⁶-position of guanine. Thus, inhibition of MGMT activity in tumors has a great interest for cancer researchers because it can significantly improve the anticancer efficacy of such alkylating agents. In this study, we performed a quantitative structure activity relationship (QSAR) and classification study based on a total of 134 base analogs related to their ED50 values (50% inhibitory concentration) against MGMT. Molecular information of all compounds were described by quantum chemical descriptors and Dragon descriptors. Genetic algorithm (GA) and multiple linear regression (MLR) analysis were combined to develop QSAR models. Classification models were generated by seven machine-learning methods based on six types of molecular fingerprints. Performances of all developed models were assessed by internal and external validation techniques. The best QSAR model was obtained with Q²Loo = 0.83, R² = 0.87, Q²ext = 0.67, and R²ext = 0.69 based on 84 compounds. The results from QSAR studies indicated topological charge indices, polarizability, ionization potential (IP), and number of primary aromatic amines are main contributors for MGMT inhibition of base analogs. For classification studies, the accuracies of 10-fold cross-validation ranged from 0.750 to 0.885 for top ten models. The range of accuracy for the external test set ranged from 0.800 to 0.880 except for PubChem-Tree model, suggesting a satisfactory predictive ability. Three models (Ext-SVM, Ext-Tree and Graph-RF) showed high and reliable predictive accuracy for both training and external test sets. In addition, several representative substructures for characterizing MGMT inhibitors were identified by information gain and substructure frequency analysis method. Our studies might be useful for further study to design and rapidly identify potential MGMT inhibitors.Entities:
Keywords: MGMT; QSAR; anticancer alkylating agents; classification; inhibitors; resistance
Mesh:
Substances:
Year: 2018 PMID: 30404161 PMCID: PMC6278368 DOI: 10.3390/molecules23112892
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Multi-Criteria Decision Making (MCDM) graphs of the generated models based on 103 (A) and 84 (B) base analogs.
Figure 2Plots of the experimental versus predicted pED50 values for compounds in the training set and test set of the best models derived from initial (A) and further (B) quantitative structure activity relationship (QSAR) modeling.
Figure 3Williams plots for the best QSAR models based on model I (A) and model II (B). The transverse dash lines represent ±3 standard residual and vertical black line represents warning leverage h*.
Types and chemical meanings of molecular descriptors used in model II.
| Descriptor | Type | Chemical Meaning |
|---|---|---|
|
| 2D matrix-based descriptors | logarithmic coefficient sum of the last eigenvector from chi matrix |
|
| 2D matrix-based descriptors | balaban-like index from Barysz matrix weighted by polarizability |
|
| 2D matrix-based descriptors | normalized spectral positive sum from Burden matrix weighted by polarizability |
|
| 2D matrix-based descriptors | logarithmic coefficient sum of the last eigenvector from Burden matrix weighted by I-State |
|
| 2D autocorrelations | Moran autocorrelation of lag 1 weighted by ionization potential |
|
| 2D autocorrelations | mean topological charge index of order 4 |
|
| 2D Atom Pairs Binary | presence/absence of C-C at topological distance 9 |
|
| Functional group counts | number of primary amines (aromatic) |
|
| CATS 2D | CATS2D Donor-Acceptor at lag 07 |
Figure 4Chemical diversity distribution of the training set (N = 104 compounds), external test set (N = 25 compounds). (A) Chemical space was analyzed using the molecule weight (MW) and Ghose-Crippen LogKow (ALogP) of each set in the database. N represents the number of compounds in each data set. (B) Heat map of molecular similarity plotted by Euclidian distance metrics for the training and external test sets. Euclidian distance metrics was calculated by MACCS keys fingerprint and processed by normalization.
Figure 5Performance of 10-fold cross-validation for the training set in 42 classification models. CA, AUC, SE and SP represent the classification accuracy; the area under the ROC curve, sensitivity and specificity, respectively.
Performance of top ten binary classification models for the training and external test sets 1.
| Data Set | Model | CA | AUC | SE | SP | TP | TN | FP | FN |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Ext-RF | 0.865 | 0.926 | 0.88 | 0.85 | 44 | 46 | 8 | 6 |
| Ext-LR | 0.885 | 0.922 | 0.90 | 0.87 | 45 | 47 | 7 | 5 | |
| Ext-ANN | 0.846 | 0.913 | 0.88 | 0.81 | 44 | 44 | 10 | 6 | |
| Ext-SVM | 0.865 | 0.912 | 0.88 | 0.85 | 44 | 46 | 8 | 6 | |
| Graph-RF | 0.817 | 0.897 | 0.84 | 0.80 | 42 | 43 | 11 | 8 | |
| PubChem-LR | 0.798 | 0.887 | 0.74 | 0.85 | 37 | 46 | 8 | 13 | |
| Ext-Tree | 0.837 | 0.879 | 0.78 | 0.89 | 39 | 48 | 6 | 11 | |
| PubChem-RF | 0.750 | 0.871 | 0.68 | 0.81 | 34 | 44 | 10 | 16 | |
| Graph-LR | 0.779 | 0.870 | 0.76 | 0.80 | 38 | 43 | 11 | 12 | |
| PubChem-Tree | 0.827 | 0.867 | 0.82 | 0.83 | 41 | 45 | 9 | 9 | |
| External test set | Ext-RF | 0.840 | 0.930 | 0.75 | 0.92 | 9 | 12 | 1 | 3 |
| Ext-LR | 0.840 | 0.974 | 0.75 | 0.92 | 9 | 12 | 1 | 3 | |
| Ext-ANN | 0.800 | 0.962 | 0.67 | 0.92 | 8 | 12 | 1 | 4 | |
| Ext-SVM | 0.880 | 0.904 | 0.83 | 0.92 | 10 | 12 | 1 | 2 | |
| Graph-RF | 0.880 | 0.920 | 0.92 | 0.85 | 11 | 11 | 2 | 1 | |
| PubChem-LR | 0.840 | 0.936 | 0.92 | 0.77 | 11 | 10 | 3 | 1 | |
| Ext-Tree | 0.880 | 0.901 | 0.83 | 0.92 | 10 | 12 | 1 | 2 | |
| PubChem-RF | 0.800 | 0.917 | 0.75 | 0.85 | 9 | 11 | 2 | 3 | |
| Graph-LR | 0.840 | 0.936 | 0.75 | 0.92 | 9 | 12 | 1 | 3 | |
| PubChem-Tree | 0.640 | 0.667 | 0.67 | 0.62 | 8 | 8 | 5 | 4 |
1 CA, classification accuracy; AUC, the area under the ROC curve; SE, sensitivity; SP, specificity; TP, the number of true positive compounds; TN, the number of true negative compounds; FP, the number of false positive compounds; FN, the number of true negative compounds.
Representative privileged substructures obtained from PubChem fingerprint responsible for O6-methylguanine-DNA methyltransferase (MGMT) inhibition in base analogs.
| No. | Privileged Substructures | General Substructures | Representative Compounds | IG | FP | FN |
|---|---|---|---|---|---|---|
| FP297 | C-Br |
|
| 0.096 | 2.08(11) | 0(0) |
| FP327 | C(~Br)(~C) 1 | 0.087 | 2.08(11) | 0(0) | ||
| FP328 | C(~Br)(~C)(~C) | 0.087 | 2.08(11) | 0(0) | ||
| FP330 | C(~Br)(:C) 2 | 0.087 | 2.08(11) | 0(0) | ||
| FP43 | ≥1 Br | 0.096 | 2.08(11) | 0(0) | ||
| FP509 | Br-C:C-C |
|
| 0.078 | 2.08(9) | 0(0) |
| FP554 | Br-C-C-C | 0.078 | 2.08(9) | 0(0) | ||
| FP670 | Br-C:C:C-C | 0.078 | 2.08(9) | 0(0) | ||
| FP421 | C=S |
|
| 0.078 | 2.08(9) | 0(0) |
| FP471 | S:C:C:C | 0.078 | 2.08(9) | 0(0) | ||
| FP480 | C:S:C-C | 0.078 | 2.08(9) | 0(0) | ||
| FP513 | S:C:C-[#1] | 0.078 | 2.08(9) | 0(0) | ||
| FP532 | S-C:C-[#1] | 0.078 | 2.08(9) | 0(0) | ||
| FP699 | O-C-C-C-C-C(C)-C |
|
| 0.096 | 2.08(11) | 0(0) |
| FP776 | CC1CCC(C)CC1 | 0.064 | 2.08(11) | 0(0) | ||
| FP188 | ≥2 saturated or aromatic heteroatom-containing ring size 6 |
| 0.081 | 1.93(13) | 0.14(1) | |
| FP648 | O=N-C:C-N |
|
| 0.073 | 1.92(12) | 0.15(1) |
| FP260 | ≥3 hetero-aromatic rings |
| 0.056 | 1.89(10) | 0.18(1) | |
| FP713 | Cc1ccc(C)cc1 |
|
| 0.056 | 1.89(10) | 0.18(1) |
| FP697 | C-C-C-C-C-C(C)-C |
|
| 0.048 | 1.87(9) | 0.19(1) |
1 “~“represent “regardless of bond order”; 2 “:” represents bond aromaticity.