| Literature DB >> 22510606 |
Khac-Minh Thai1, Thuy-Quyen Nguyen, Trieu-Du Ngo, Thanh-Dao Tran, Thi-Ngoc-Phuong Huynh.
Abstract
Benzo[c]phenanthridine (BCP) derivatives were identified as topoisomerase I (TOP-I) targeting agents with pronounced antitumor activity. In this study, a support vector machine model was performed on a series of 73 analogues to classify BCP derivatives according to TOP-I inhibitory activity. The best SVM model with total accuracy of 93% for training set was achieved using a set of 7 descriptors identified from a large set via a random forest algorithm. Overall accuracy of up to 87% and a Matthews coefficient correlation (MCC) of 0.71 were obtained after this SVM classifier was validated internally by a test set of 15 compounds. For two external test sets, 89% and 80% BCP compounds, respectively, were correctly predicted. The results indicated that our SVM model could be used as the filter for designing new BCP compounds with higher TOP-I inhibitory activity.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22510606 PMCID: PMC6268465 DOI: 10.3390/molecules17044560
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Chemical structures of benzo[c]phenanthridine derivatives.
Molecular descriptors from RF feature selection method.
| Class | Symbol | Definition |
|---|---|---|
| Topological descriptors
| D/Dr05 | distance/detour ring index of order 5 |
|
| distance/detour ring index of order 6 | |
| Walk and path counts
| MPC06 | molecular path count of order 06 |
|
| molecular path count of order 08 | |
|
| molecular path count of order 10 | |
| 2D frequency fingerprints | F05[C–C] | frequency of C–C at topological distance 05 |
| frequency of N–O at topological distance 08 |
Optimal parameters (C, γ) for SVM approach.
| Feature selection method | Number of features selected | Range of γ | Best γ | Range of C | Best C | Cross-validation error |
|---|---|---|---|---|---|---|
| mRMR | 10 | 5[−10:10] | 3.125 | 5[−10:10] | 1 | 0.21 |
| GA | 16 | 2[−10:10] | 0.125 | 2[−10:10] | 16 | 0.15 |
| RF | 7 | 2[−10:10] | 0.25 | 2[−10:10] | 4 | 0.17 |
Classification results of 3 SVM models corresponding with 3 descriptor sets selected successively via mRMR, GA and RF methods.
| Feature selection method | Training set | Test set | External set | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mRMR | GA | RF | mRMR | GA | RF | mRMR | GA | RF | |
| Number of support vectors | 51 | 40 | 35 | ||||||
| Total accuracy | 0.98 | 0.91 | 0.93 | 0.80 | 0.93 | 0.87 | 0.67 | 0.78 | 0.89 |
| Sensitivity | 0.93 | 0.73 | 0.87 | 0.33 | 1.00 | 1.00 | 0.40 | 0.60 | 0.80 |
| Specificity | 1.00 | 0.98 | 0.95 | 0.92 | 0.92 | 0.83 | 1.00 | 1.00 | 1.00 |
| Positive precision | 1.00 | 0.92 | 0.87 | 0.50 | 0.75 | 0.60 | 1.00 | 1.00 | 1.00 |
| Negative precision | 0.98 | 0.91 | 0.95 | 0.85 | 1.00 | 1.00 | 0.57 | 0.67 | 0.80 |
| Matthews correlation cofficient (MCC) | 0.96 | 0.77 | 0.82 | 0.29 | 0.83 | 0.71 | 0.48 | 0.63 | 0.80 |
| Total accuracy of cross-validation | 0.76 | 0.78 | 0.74 | ||||||
Classification results from three classified approaches.
| Training set | Test set | External set | Application set | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Evaluation criteria | SVM e1071 | SVM Kernlab | RF | SVM e1071 | SVM kernlab | RF | SVM e1071 | SVM kernlab | RF | SVM e1071 |
| TP |
| 13 | 6 |
| 3 | 3 |
| 3 | 3 |
|
| TN |
| 41 | 36 |
| 8 | 10 |
| 4 | 3 |
|
| FP |
| 2 | 9 |
| 0 | 0 |
| 2 | 2 |
|
| FN |
| 2 | 7 |
| 4 | 2 |
| 0 | 1 |
|
| Total accuracy |
| 0.93 | 0.72 |
| 0.73 | 0.87 |
| 0.78 | 0.67 |
|
| Sensitivity |
| 0.87 | 0.46 |
| 0.43 | 0.60 |
| 1.00 | 0.75 |
|
| Specificity |
| 0.95 | 0.80 |
| 1.00 | 1.00 |
| 0.67 | 0.60 |
|
| Positive precision |
| 0.87 | 0.40 |
| 1.00 | 1.00 |
| 0.60 | 0.60 |
|
| Negative precision |
| 0.95 | 0.84 |
| 0.67 | 0.83 |
| 1.00 | 0.75 |
|
| MCC |
| 0.82 | 0.25 |
| 0.54 | 0.71 |
| 0.63 | 0.35 |
|
| Cross-validation error |
| 0.20 | 0.28 a | |||||||
| Y-scrambling total accuracy |
| |||||||||
Chemical structure often benzo[c]phenanthridine derivatives in application set and their topoisomerase I inhibitory activity REC and classification results from final SVM model. Classification term: “1” presented TOP-I active compound having equal or stronger activity than that of topotecan; “0” presented TOP-I inactive compound having weaker activity than that of topotecan.
| No | Chemical structure | Name | REC TOP-I mediated DNA cleavage (Experimental) | TOP-I Active/Inactive | TOP-I classified result from final SVM model |
|---|---|---|---|---|---|
| A1 | BMC_08_7824_7a | 0.03 | 1 | 1 | |
| A2 | BMC_08_7824_7b | 0.08 | 1 | 1 | |
| A3 | BMC_08_7824_9 | 0.2 | 1 | 1 | |
| A4 | BMC_08_7824_11 | 0.1 | 1 | 1 | |
| A5 | BMC_08_8598_9 | >10 | 0 | 0 | |
| A6 | BMC_08_8598_10 | 0.2 | 1 | 0 | |
| A7 | BMC_08_8598_12 | 0.2 | 1 | 0 | |
| A8 | BMC_08_8598_13 | 0.2 | 1 | 1 | |
| A9 | BMC_08_8598_14 | >10 | 0 | 0 | |
| A10 | BMC_08_8598_15 | >10 | 0 | 0 |
Chemical structure of82 benzo[c]phenanthridine derivatives and their topoisomerase I inhibitory activity REC and classification results from final SVM model. Classification term: “1” presented TOP-I active compound having equal or stronger activity than that of topotecan; “0” presented TOP-I inactive compound having weaker activity than that of topotecan.
| No | Chemical structure | Name | REC TOP-I mediated DNA cleavage (Experimental) | TOP-I Active/Inactive | TOP-I classified result from final SVM model |
|---|---|---|---|---|---|
| 1 | Nitidine | 10 | 0 | 0 | |
| 2 | BMC_03_3795_10a | 8 | 0 | 0 | |
| 3 | BMC_03_3795_10b | 200 | 0 | 0 | |
| 4 | BMC_03_3795_10c | 200 | 0 | 0 | |
| 5 | BMC_03_3795_10d | >1000 | 0 | 0 | |
| 6 | BMC_03_3795_10e | 500 | 0 | 0 | |
| 7 | BMC_03_3795_10f | 10 | 0 | 0 | |
| 8 | BMC_03_3795_11a | >1000 | 0 | 0 | |
| 9 | BMC_03_3795_11b | 100 | 0 | 0 | |
| 10 | BMC_03_3795_12d | >1000 | 0 | 0 | |
| 11 | BMC_03_2061_03a | 0.5 | 1 | 0 | |
| 12 | BMC_03_2061_03b | >1000 | 0 | 0 | |
| 13 | BMC_03_2061_03c | 0.3 | 1 | 1 | |
| 14 | BMC_03_2061_03d | 1000 | 0 | 0 | |
| 15 | BMC_03_2061_03e | 1 | 1 | 0 | |
| 16 | BMC_03_2061_03f | 1000 | 0 | 0 | |
| 17 | BMCL_02_3333_03c | 10 | 0 | 0 | |
| 18 | BMC_03_2061_03h | 50 | 0 | 0 | |
| 19 | BMC_03_2061_03i | 1 | 1 | 0 | |
| 20 | BMC_03_2061_03j | >1000 | 0 | 0 | |
| 21 | BMC_03_2061_03k | 0.8–1.0 | 1 | 1 | |
| 22 | BMC_03_2061_04a | 0.8 | 1 | 0 | |
| 23 | BMC_03_2061_04b | 100 | 0 | 1 | |
| 24 | BMC_03_2061_09k | 10 | 0 | 0 | |
| 25 | JMC_03_2254_02 | 0.3 | 1 | 0 | |
| 26 | JMC_03_2254_03 | 6 | 0 | 0 | |
| 27 | JMC_03_2254_05a | 1000 | 0 | 1 | |
| 28 | JMC_03_2254_05b | 0.1 | 1 | 1 | |
| 29 | JMC_03_2254_06a | 15 | 0 | 0 | |
| 30 | JMC_03_2254_06b | 0.5 | 1 | 0 | |
| 31 | JMC_03_2254_05c | 0.2 | 1 | 0 | |
| 32 | JMC_03_2254_06c | 8.0 | 0 | 0 | |
| 33 | JMC_03_2254_16a | 10 | 0 | 1 | |
| 34 | JMC_03_2254_17a | 500 | 0 | 0 | |
| 35 | JMC_03_2254_02 | 0.5 | 1 | 1 | |
| 36 | BMCL_02_3333_02 | >1000 | 0 | 0 | |
| 37 | BMCL_02_3333_03 | 200 | 0 | 0 | |
| 38 | BMCL_02_3333_04a | 0.3 | 1 | 1 | |
| 39 | BMCL_02_3333_04b | 1000 | 0 | 0 | |
| 40 | BMCL_02_3333_04c | 30 | 0 | 0 | |
| 41 | BMCL_02_3333_04d | 1.0 | 1 | 0 | |
| 42 | LDDD_04_198_01 | 0.03 | 1 | 0 | |
| 43 | LDDD_04_198_02 | 2.0 | 0 | 0 | |
| 44 | LDDD_04_198_03 | 2.0 | 0 | 0 | |
| 45 | BMC_04_3731_03a | 9 | 0 | 0 | |
| 46 | BMC_04_3731_03b | 6 | 0 | 0 | |
| 47 | BMC_04_3731_03c | 2 | 0 | 0 | |
| 48 | BMC_04_3731_03d | >300 | 0 | 0 | |
| 49 | BMC_04_3731_04a | 100 | 0 | 0 | |
| 50 | BMC_04_3731_04b | 12 | 0 | 0 | |
| 51 | BMC_04_3731_04c | 6 | 0 | 0 | |
| 52 | BMC_04_0795_04 | >300 | 0 | 0 | |
| 53 | BMC_04_3731_05 | 10 | 0 | 0 | |
| 54 | BMC_04_0795_01b | 0.3 | 1 | 0 | |
| 55 | BMC_04_0795_01f | 0.2 | 1 | 1 | |
| 56 | BMC_04_0795_01g | 2.0 | 0 | 0 | |
| 57 | BMC_04_0795_01h | 20 | 0 | 0 | |
| 58 | BMC_04_0795_03 | 10 | 0 | 0 | |
| 59 | BMC_04_0795_02 | >1000 | 0 | 0 | |
| 60 | BMC_04_0795_04 | 0.8 | 1 | 1 | |
| 61 | BMC_04_0795_05 | 5 | 0 | 0 | |
| 62 | BMC_05_6782_07c | 1.0 | 1 | 0 | |
| 63 | BMC_05_6782_07d | 8 | 0 | 0 | |
| 64 | BMC_05_6782_08 | 0.6 | 1 | 1 | |
| 65 | BMC_05_6782_09c | 0.4 | 1 | 0 | |
| 66 | BMC_05_6782_09d | 10 | 0 | 0 | |
| 67 | BMC_05_6782_09e | 0.7 | 1 | 1 | |
| 68 | BMC_05_6782_09f | 12 | 0 | 0 | |
| 69 | BMC_05_6782_09g | 60 | 0 | 0 | |
| 70 | BMC_05_6782_09h | 0.6 | 1 | 1 | |
| 71 | BMC_05_6782_09i | 1.5 | 0 | 0 | |
| 72 | BMC_05_6782_09j | 0.3 | 1 | 1 | |
| 73 | BMC_05_6782_10 | 3 | 0 | 1 | |
| 74 | BMC_06_3131_10c | 1.2 | 0 | 0 | |
| 75 | BMC_06_3131_10d | 0.07 | 1 | 1 | |
| 76 | BMC_06_3131_10g | 9 | 0 | 0 | |
| 77 | BMC_06_3131_10i | 0.45 | 1 | 1 | |
| 78 | BMC_06_3131_10j | >100 | 0 | 0 | |
| 79 | BMC_06_3131_10f | 3 | 0 | 0 | |
| 80 | BMC_06_3131_10k | 13 | 0 | 0 | |
| 81 | BMC_06_3131_10l | 0.35 | 1 | 1 | |
| 82 | BMC_06_3131_10m | 0.15 | 1 | 1 |
BCPs datasets division.
| Dataset | Total compounds | Actives a | Inactives b |
|---|---|---|---|
| Whole set | 82 | 24 | 58 |
| Training set | 58 | 15 | 43 |
| Test set | 15 | 5 | 10 |
| External set | 9 | 4 | 5 |
| Application set (External set 2) | 10 | 7 | 3 |
a Actives: Compounds whose activity is equal or stronger than topotecan; b Inactives: compounds whose activity is weaker than topotecan.
Figure 2Process of SVM classification model.
Figure 3Selection of optimal parameters of Kernel function using Grid algorithm.