| Literature DB >> 35910022 |
Yongfei Qin1, Chao Li1, Xia Shi1, Weigang Wang1,2.
Abstract
The development of breast cancer is closely linked to the estrogen receptor ERα, which is also considered to be an important target for the treatment of breast cancer. Therefore, compounds that can antagonize ERα activity may be drug candidates for the treatment of breast cancer. In drug development, to save manpower and resources, potential active compounds are often screened by establishing compound activity prediction model. For the 1974 compounds collected, the top 20 molecular descriptors that significantly affected the biological activity were screened using LASSO regression models combined with 10-fold cross-validation method. Further, a regression prediction model based on the MLP fully connected neural network was constructed to predict the bioactivity values of 50 new compounds. To measure the validity of the model, the model loss term was specified as the mean squared error (MSE). The results showed that the MLP-based regression prediction model had a loss value of 0.0146 on the validation set. This model is therefore well trained and the prediction strategy used is valid. The methods developed by this paper may provide a reference for the development of anti-breast cancer drugs.Entities:
Keywords: LASSO regression; MLP; biological activity; breast cancer drug candidates; neural
Year: 2022 PMID: 35910022 PMCID: PMC9326362 DOI: 10.3389/fbioe.2022.946329
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Perceptron model.
FIGURE 2Multilayer perceptron model.
FIGURE 3Diagonal matrix plot of correlation coefficients (10 variables).
FIGURE 4Plot of the variation of the L1 parity against the regression coefficient.
FIGURE 5MSE with curve.
Cross-validating lambda values.
| Measure: Mean Squared error | |||||
|---|---|---|---|---|---|
| Lambda | Index | Measure | SE | Nonzero | |
| Min | 0.000663 | 73 | 0.3232 | 0.01272 | 281 |
| 1se | 0.003225 | 56 | 0.3344 | 0.01698 | 163 |
Molecular descriptors with significant effect on pIC50 bioactivity values (TOP20).
| Variable | Coefficient | Abs_coeff |
|---|---|---|
| nHBAcc | −0.34921 | 0.349,211 |
| SP-7 | −0.25978 | 0.259,782 |
| C3SP2 | 0.198,333 | 0.198,333 |
| ndO | 0.16484 | 0.16484 |
| nsssCH | 0.163,161 | 0.163,161 |
| ECCEN | 0.145,899 | 0.145,899 |
| MLFER_A | 0.140,239 | 0.140,239 |
| mindO | −0.13936 | 0.139,355 |
| ATSm5 | 0.138,853 | 0.138,853 |
| minHAvin | 0.137,933 | 0.137,933 |
| MDEC-34 | −0.12988 | 0.129,875 |
| BCUTp-1h | 0.129,423 | 0.129,423 |
| SsF | 0.123,461 | 0.123,461 |
| mindsCH | −0.12217 | 0.122,166 |
| C1SP3 | 0.117,622 | 0.117,622 |
| maxHsOH | 0.11567 | 0.11567 |
| nHBint7 | −0.11027 | 0.110,271 |
| MDEN-22 | 0.099217 | 0.099217 |
| ATSp5 | −0.09359 | 0.093587 |
| SHBint8 | 0.091518 | 0.091518 |
FIGURE 6Plot of model explainable deviation against variable coefficients.
FIGURE 7Filter variable classification statistics.
FIGURE 8Network structure diagram.
FIGURE 9Loss function diagram.
Table of predicted results.
| Index | IC50_nM | pIC50 | Index | IC50_nM | pIC50 |
|---|---|---|---|---|---|
| 1 | 625.279,085 | 6.203,926 | 26 | 7,550.487,226 | 5.122,025 |
| 2 | 1,514.918,949 | 5.819,611 | 27 | 2,975.551,524 | 5.526,433 |
| 3 | 1,230.488,010 | 5.909,923 | 28 | 1,056.888,664 | 5.975,971 |
| 4 | 80.848,224 | 7.092329 | 29 | 154.518,702 | 6.811,019 |
| 5 | 48.221,256 | 7.316,762 | 30 | 242.322,471 | 6.615,606 |
| 6 | 39.612,235 | 7.402,171 | 31 | 3,283.679,462 | 5.483,639 |
| 7 | 2.341,306 | 8.630,542 | 32 | 8,099.091247 | 5.091564 |
| 8 | 23.148,631 | 7.635,475 | 33 | 1947.127,430 | 5.710,606 |
| 9 | 47.355,375 | 7.324,631 | 34 | 8,526.529,200 | 5.069228 |
| 10 | 34.096260 | 7.467,293 | 35 | 8,391.913,980 | 5.076139 |
| 11 | 31.653,379 | 7.499,580 | 36 | 472.583,992 | 6.325,521 |
| 12 | 32.697,205 | 7.485,489 | 37 | 466.080905 | 6.331,539 |
| 13 | 25.456,401 | 7.594,203 | 38 | 4,262.275,525 | 5.370,359 |
| 14 | 23.424,709 | 7.630,326 | 39 | 458.985,483 | 6.338,201 |
| 15 | 9.698,761 | 8.013284 | 40 | 441.836,676 | 6.354,738 |
| 16 | 9.306,946 | 8.031193 | 41 | 426.397,463 | 6.370,185 |
| 17 | 26.925,417 | 7.569,838 | 42 | 429.814,207 | 6.366,719 |
| 18 | 36.208,316 | 7.441,192 | 43 | 662.573,029 | 6.178,766 |
| 19 | 157.062175 | 6.803,928 | 44 | 145.994,719 | 6.835,663 |
| 20 | 1953.101,219 | 5.709,275 | 45 | 426.397,463 | 6.370,185 |
| 21 | 26.404,043 | 7.578,329 | 46 | 5,860.424,608 | 5.232,071 |
| 22 | 2,333.532,620 | 5.631,986 | 47 | 5,528.679,594 | 5.257,379 |
| 23 | 894.967,968 | 6.048193 | 48 | 5,671.119,197 | 5.246,331 |
| 24 | 1,616.214,377 | 5.791,501 | 49 | 4,657.397,312 | 5.331,857 |
| 25 | 8,979.197,446 | 5.046763 | 50 | 996.189,669 | 6.001658 |