| Literature DB >> 36081436 |
Rongyuan Chen1, Zhixiong He2, Shaonian Huang1, Lizhi Shen1, Xiancheng Zhou1.
Abstract
Breast cancer is one of the most widespread and fatal cancers in women. At present, anticancer drug-inhibiting estrogen receptor α subtype (ERα) can greatly improve the cure rate for breast cancer patients, so the research and development of this kind of drugs are very urgent. In this paper, the problem of how to screen excellent anticancer drugs is abstracted as an optimization problem. Firstly, the graph model is used to extract low-dimensional features with strong distinguishing and describing ability according to various attributes of candidate compounds, and then, kernel functions are used to map these features to high-dimensional space. Then, the quantitative analysis model of ERα biological activity and the classification model based on ADMET properties of the support vector machine are constructed. Finally, sequential least square programming (SLSQP) is utilized to solve the ERα biological activity model. The experimental results show that for anticancer data sets, compared with principal component analysis (PCA), the error rate of the graph model constructed in this paper is reduced by 6.4%, 15%, and 7.8% on mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE), respectively. In terms of classification prediction, compared with principal component analysis (PCA), the recall and precision rates of this method are enhanced by 19.5% and 12.41%, respectively. Finally, the optimal biological activity value (IC50_nM) 34.6 and inhibitory biological activity value (pIC50) 7.46 were obtained.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36081436 PMCID: PMC9448531 DOI: 10.1155/2022/8418048
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Figure 1the overall framework.
Figure 2Graph model framework of ADMET property classification and prediction model for SVM compounds.
The degree change of the minimum spanning tree of some feature nodes.
| Features | Nodes | Nodes | Nodes | Features | Nodes | Nodes | Nodes |
|---|---|---|---|---|---|---|---|
| nHCsatu | 190 | 23 | 167 | MDEN-11 | 142 | 33 | 109 |
| nG12Ring | 156 | 0 | 156 | MLFER_BH | 178 | 69 | 109 |
| SHCsatu | 167 | 16 | 151 | maxdNH | 140 | 31 | 109 |
| nHBint3 | 166 | 29 | 137 | mindNH | 139 | 31 | 108 |
| ETA_Shape_Y | 70 | 207 | 137 | minHdNH | 138 | 31 | 107 |
| maxHBd | 38 | 167 | 129 | maxaasC | 73 | 179 | 106 |
| SsNH2 | 158 | 29 | 129 | nBondsD2 | 211 | 106 | 105 |
| nsNH2 | 158 | 30 | 128 | minsCH3 | 31 | 135 | 104 |
| nHsNH2 | 158 | 30 | 128 | MDEN-12 | 137 | 33 | 104 |
| SHsNH2 | 158 | 32 | 126 | SHBint2 | 173 | 69 | 104 |
| ATSm1 | 158 | 33 | 125 | minHBd | 49 | 151 | 102 |
| SHdNH | 155 | 31 | 124 | nBase | 162 | 61 | 101 |
| minHCsats | 12 | 136 | 124 | minaaaC | 43 | 144 | 101 |
| SdNH | 154 | 31 | 123 | minHBint6 | 2 | 102 | 100 |
| nHdNH | 154 | 31 | 123 | SHBint8 | 156 | 57 | 99 |
| ndNH | 154 | 31 | 123 | SHsOH | 81 | 180 | 99 |
| ATSc5 | 142 | 20 | 122 | nHBint2 | 163 | 66 | 97 |
| nHBint7 | 157 | 37 | 120 | SsssCH | 179 | 82 | 97 |
| nHBint8 | 162 | 46 | 116 | SaaaC | 43 | 139 | 96 |
| MLFER_BO | 179 | 68 | 111 | maxsCH3 | 43 | 138 | 95 |
| SHBint3 | 167 | 56 | 111 | maxaaaC | 46 | 141 | 95 |
| ndssC | 168 | 58 | 110 | nsssCH | 194 | 100 | 94 |
| XLogP | 73 | 183 | 110 | minHsOH | 68 | 161 | 93 |
| nHBint10 | 165 | 55 | 110 | maxssssC | 96 | 4 | 92 |
| maxHdNH | 141 | 31 | 110 | SdO | 200 | 108 | 92 |
Weight ranking of the first 15 feature variables in graph model extraction.
| Feature number | Feature name | Weight |
|---|---|---|
| 659 | MDEC-23 | 0.110333 |
| 587 | LipoaffinityIndex | 0.100936 |
| 406 | minsssN | 0.075303 |
| 476 | maxHsOH | 0.036822 |
| 531 | maxssO | 0.036599 |
| 357 | minHsOH | 0.031988 |
| 56 | C1SP2 | 0.025563 |
| 39 | BCUTc-1 l | 0.024249 |
| 673 | MLFER_A | 0.016138 |
| 652 | MLogP | 0.016063 |
| 79 | VC-5 | 0.012582 |
| 639 | nHBAcc | 0.010918 |
| 351 | minHBint5 | 0.010303 |
| 410 | minsOH | 0.009069 |
| 103 | CrippenLogP | 0.007243 |
Error results of two feature extraction algorithms.
| Arithmetic | Evaluation index | ||
|---|---|---|---|
| MAE | MSE | RMSE | |
| Graph model | 0.5217 | 0.5044 | 0.7102 |
| PCA | 0.5571 | 0.5936 | 0.7704 |
Figure 3Optimize the fitting effect of the model for inhibiting the biological activity of ERα.
Comparison table of model evaluation results.
| Arithmetic | Evaluation index | |||
|---|---|---|---|---|
| MAE | MSE | RMSE | Score | |
| SLSQP | 0.6587 | 0.7715 | 0.8783 | 0.6231 |
| Adaboost | 0.8492 | 1.0127 | 1.0063 | 0.5023 |
| Lasso | 0.9721 | 1.3849 | 1.1767 | 0.3236 |
Regression coefficient result (partial display).
| Regression coefficient | Regression coefficient value |
|---|---|
|
| 1.5541 |
|
| -0.0029 |
|
| 0.2517 |
|
| -0.0141 |
|
| 0 |
|
| -0.3898 |
|
| 0 |
|
| 0.1859 |
|
| 0 |
|
| 0 |
|
| 0 |
|
| 0.4203 |
|
| -0.0173 |
|
| 0 |
| ⋮ | ⋮ |
|
| -0.1225 |
|
| -0.0051 |
|
| -0.0038 |
|
| -0.0008 |
|
| 0.0001 |
Figure 4Question 3 operation flow chart.
Figure 5Classification effect diagram based on property Caco-2.
Figure 6Classification effect diagram based on property CYP3A4.
Figure 7Classification effect diagram based on property hERG.
Figure 8Classification effect diagram based on property HOB.
Figure 9Classification effect diagram based on property MN.
Graph model-SVM various accuracy results of classified prediction models.
| ADMET | Accuracy | Recall | Precision | F1-score |
|---|---|---|---|---|
| Caco-2 | 0.8127 | 0.8580 | 0.7616 | 0.8141 |
| CYP3A4 | 0.8734 | 0.9379 | 0.8947 | 0.8704 |
| hERG | 0.8025 | 0.7713 | 0.8643 | 0.8034 |
| HOB | 0.7392 | 0.0381 | 0.6667 | 0.6420 |
| MN | 0.7443 | 1.0000 | 0.7443 | 0.6352 |
Accuracy comparison of ADMET property classification prediction model in two classification algorithms.
| Accuracy | Recall | Precision | F1-score | |||||
|---|---|---|---|---|---|---|---|---|
| PCA-SVM | MST-SVM | PCA-SVM | MST-SVM | PCA-SVM | MST-SVM | PCA-SVM | MST-SVM | |
| Caco-2 | 0.8633 | 0.8127 | 0.9136 | 0.8580 | 0.7878 | 0.7916 | 0.8643 | 0.8141 |
| CYP3A4 | 0.8785 | 0.8734 | 0.9000 | 0.9379 | 0.9321 | 0.8947 | 0.8802 | 0.8704 |
| hERG | 0.8228 | 0.8025 | 0.8430 | 0.7713 | 0.8430 | 0.8643 | 0.8228 | 0.8034 |
| HOB | 0.7342 | 0.7392 | 0.0000 | 0.7381 | 0.0000 | 0.6667 | 0.6212 | 0.6420 |
| MN | 0.7722 | 0.7443 | 0.9966 | 1.0000 | 0.7670 | 0.7443 | 0.6990 | 0.6352 |
Solution results of bioactivity optimization model for inhibition of ERα.
| Characteristic variable | MDEC-23 | 10 |
| LipoaffinityIndex | 12 | |
| minsssN | 0 | |
| maxHsOH | 0.5 | |
| maxssO | 6.5 | |
| minHsOH | 0.6 | |
| C1SP2 | 0 | |
| BCUTc-1 l | -0.36 | |
| MLFER_A | 1 | |
| VC-5 | 0.3 | |
| nHBAcc | 3 | |
| minHBint5 | 3 | |
| ATSc3 | -0.1 | |
| MDEC-33 | 12.9 | |
| TopoPSA | 80 | |
|
| ||
| Biological activity | IC50_nM | 34.6 |
|
| ||
| Inhibitory bioactivity | pIC50 | 7.46 |