| Literature DB >> 22312310 |
Wei Zhou1,2, Zhijun Dai1, Yuan Chen1, Haiyan Wang3, Zheming Yuan1,2.
Abstract
To design ARC-111 analogues with improved efficiency, we constructed the QSAR of 22 ARC-111 analogues with RPMI8402 tumor cells. First, the optimized support vector regression (SVR) model based on the literature descriptors and the worst descriptor elimination multi-roundly (WDEM) method had similar generalization as the artificial neural network (ANN) model for the test set. Secondly, seven and 11 more effective descriptors out of 2,923 features were selected by the high-dimensional descriptor selection nonlinearly (HDSN) and WDEM method, and the SVR models (SVR3 and SVR4) with these selected descriptors resulted in better evaluation measures and a more precise predictive power for the test set. The interpretability system of better SVR models was further established. Our analysis offers some useful parameters for designing ARC-111 analogues with enhanced antitumor activity.Entities:
Keywords: ARC-111 analogues; QSAR; RPMI8402; high-dimensional descriptor selection nonlinearly (HDSN) method; support vector regression; worst descriptor elimination multi-roundly (WDEM) method
Mesh:
Substances:
Year: 2012 PMID: 22312310 PMCID: PMC3269744 DOI: 10.3390/ijms13011161
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Comparative quantitative structure-activity relationship (QSAR) modeling of the independent test, based on the literature dataset.
| Stepwise MLR | PLS | ANN | SVR1 | SVR2 | |
|---|---|---|---|---|---|
| Number of descriptors | 5 | 7 | 9 | 9 | 6 |
| 0.201 | 0.167 | 0.050 | 0.141 | 0.061 | |
| 0.910 | 0.890 | 0.962 | 0.937 | 0.950 | |
| 0.730 | 0.775 | 0.933 | 0.811 | 0.918 |
Comparative QSAR modeling of the independent test based on the high-dimensional descriptors selection using support vector regression (SVR).
| Stepwise MLR | PLS | ANN | SVR3 | SVR4 | |
|---|---|---|---|---|---|
| Number of descriptors | 5 | 7 | 9 | 7 | 11 |
| 0.201 | 0.167 | 0.050 | 0.032 | 0.028 | |
| 0.910 | 0.890 | 0.962 | 0.964 | 0.971 | |
| 0.730 | 0.775 | 0.933 | 0.957 | 0.962 |
The retained descriptors by the high-dimensional descriptor selection nonlinearly (HDSN) and worst descriptor elimination multi-roundly (WDEM) methods and their F-test values.
| Model | Group name | Descriptor name | |
|---|---|---|---|
| SVR3 | GSFRAG | 26.555 | |
| 2D autocorrelations | 25.175 | ||
| Constitutional descriptors | 12.210 | ||
| 2D autocorrelations | 12.114 | ||
| Functional group counts | 5.898 | ||
| Topological charge indices | 3.687 | ||
| Geometrical descriptors | 2.387 | ||
| SVR4 | BCUT descriptors | 11.382 | |
| GSFRAG-L | 3.771 | ||
| Randic molecular profiles | 3.511 | ||
| Eigenvalue-based indices | 2.456 | ||
| Constitutional descriptors | 2.456 | ||
| RDF descriptors | 2.435 | ||
| Walk and path counts | 2.425 | ||
| RDF descriptors | 2.398 | ||
| Topological descriptors | 2.084 | ||
| RDF descriptors | 1.304 | ||
| GETAWAY descriptors | 0.599 |
p < 0.05;
p < 0.01; F0.05(1,10) = 4.96; F0.01(1,10) = 10.04; F0.05(1,6) = 5.99; F0.01(1,6) = 13.74; F0.05(7,10) = 3.14; F0.01(7,10) = 5.2; F0.05(11,6) = 4.03; F0.01(11,6) = 7.8.
Figure 1Single-factor effects of features in the SVR3 (A) and SVR4 (B) models.
Figure 2Four types of ARC-111 analogues structures.
Substituents and activities of 34 ARC-111 analogues.
| Experimental drugs | Theoretical drugs | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Compound | Type | Substituent | pIC50 (expt.) | Compound | Type | Substituent | pIC50 (pred.) | ||||||
| R1 | R2 | R3 | R4 | R1 | R2 | R3 | R4 | ||||||
| 1 | I | Me | Me | 8.699 | 1 | I | Me | Et | 8.651 | ||||
| 2 | Me | Bn | 7.276 | 2 | Me | 8.172 | |||||||
| 3 | Et | Bn | 7.114 | 3 | Et | 7.876 | |||||||
| 4 | Bn | 6.523 | 4 | 7.388 | |||||||||
| 5 | Bn | 6.071 | 5 | 6.908 | |||||||||
| 6 | Bn | Bn | 6.420 | 6 | III | Bn | 6.668 | ||||||
| 7 | Et | Et | 8.222 | 7 | Et | 7.208 | |||||||
| 8 | 8.097 | 8 | 6.904 | ||||||||||
| 9 | H | Me | 9.523 | 9 | 6.617 | ||||||||
| 10 | H | Et | 8.699 | 10 | IV | Et | 6.248 | ||||||
| 11 | H | 8.523 | 11 | 6.102 | |||||||||
| 12 | H | 8.699 | 12 | 6.100 | |||||||||
| 13 | H | Bn | 7.796 | ||||||||||
| 14 | H | H | 8.398 | ||||||||||
| 15 | Me | 8.097 | |||||||||||
| 16 | Et | 8.301 | |||||||||||
| 17 | II | 8.523 | |||||||||||
| 18 | III | H | 8.155 | ||||||||||
| 19 | Me | 7.523 | |||||||||||
| 20 | IV | Bn | 6.398 | ||||||||||
| 21 | H | 7.046 | |||||||||||
| 22 | Me | 6.523 | |||||||||||
Four experimental compounds in the test set;
predicted values of 12 theoretical compounds by the SVR4 model.
Group and count of descriptors from the software PCLIENT.
| Group No. | Group of descriptors | Count | Group No. | Group of descriptors | Count |
|---|---|---|---|---|---|
| 1 | Constitutional descriptors | 48 | 13 | RDF descriptors | 150 |
| 2 | Topological descriptors | 119 | 14 | 3D-MoRSE descriptors | 160 |
| 3 | Walk and path counts | 47 | 15 | WHIM descriptors | 99 |
| 4 | Connectivity indices | 33 | 16 | GETAWAY descriptors | 197 |
| 5 | Information indices | 47 | 17 | Functional group counts | 121 |
| 6 | 2D autocorrelations | 96 | 18 | Atom-centered fragments | 120 |
| 7 | Edge adjacency indices | 107 | 19 | Charge descriptors | 14 |
| 8 | BCUT descriptors | 64 | 20 | Molecular properties | 28 |
| 9 | Topological charge indices | 21 | 21 | ET-state Indices | >300 |
| 10 | Eigenvalue-based indices | 44 | 22 | ET-state Properties | 3 |
| 11 | Randic molecular profiles | 41 | 23 | GSFRAG Descriptor | 307 |
| 12 | Geometrical descriptors | 74 | 24 | GSFRAG-L Descriptor | 886 |
| Total: | >3000 | ||||
This group of descriptors did not exist in the default state.