| Literature DB >> 32117768 |
Qiuji Cui1, Shuai Lu1, Bingwei Ni1, Xian Zeng2, Ying Tan3, Ya Dong Chen1, Hongping Zhao1.
Abstract
Aqueous solubility is an important physicochemical property of compounds in anti-cancer drug discovery. Artificial intelligence solubility prediction tools have scored impressive performances by employing regression, machine learning, and deep learning methods. The reported performances vary significantly partly because of the different datasets used. Solubility prediction on novel compounds needs to be improved, which may be achieved by going deeper with deep learning. We constructed deeper-net models of ~20-layer modified ResNet convolutional neural network architecture, which were trained and tested with 9,943 compounds encoded by molecular fingerprints. Retrospectively tested by 62 recently-published novel compounds, one deeper-net model outperformed four established tools, shallow-net models, and four human experts. Deeper-net models also outperformed others in predicting the solubility values of a series of novel compounds newly-synthesized for anti-cancer drug discovery. Solubility prediction may be improved by going deeper with deep learning. Our deeper-net models are accessible at http://www.npbdb.net/solubility/index.jsp.Entities:
Keywords: anti-cancer drug discovery; aqueous solubility; artificial intelligence; chemical; compounds; deep learning
Year: 2020 PMID: 32117768 PMCID: PMC7026387 DOI: 10.3389/fonc.2020.00121
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1The molecular structures and experimental solubility S values of six recently-published novel compounds.
Figure 2The architecture of the 20-layer CNN ResNet-like deep learning model. (A) A CNN ResNet-like deep learning model with 20 parameter layers. The “conv1d x,y” is a 1D convolution layer with x kernel sizes and y filters. And the curvy arrows are the shortcut connections. The shortcut connection with a parameter layer increases dimensions. The different color means different layer class in the architecture. “Green” means the first layer, “white” means the last layer, “gray” means the parameter layer of the shortcut connection, and the others mean the residual layers. The color change of the residual layers from purple to blue to yellow indicates the tensor dimension change from 9 to 18 to 36. (B) The shortcut connection in the architecture of CNN ResNet-like deep learning model. Shortcut connections simply perform identity mapping by skipping one or more layers (20). Their outputs are added to the outputs of the stacked layers without extra parameter and computational complexity.
Performance on the logS prediction of 62 recently-published novel compounds.
| MOE V2016.0802 | <0.2 | 0.908 | 74.2 |
| QikProp 2018-4 QP18 | <0.2 | 0.926 | 69.4 |
| QikProp 2018-4 CIQP18 | <0.2 | 1.162 | 54.8 |
| AlogGPS V2.1 | 0.160 | 0.814 | 77.4 |
| 4-layer DNN model | 0.307 | 0.739 | 80.7 |
| 1-layer DNN model | 0.086 | 0.849 | 72.6 |
| 6-layer DNN model | 0.264 | 0.762 | 79.0 |
| 8-layer ResNet-like model | <0.2 | 0.982 | 66.1 |
| 14-layer ResNet-like model | 0.133 | 0.827 | 74.2 |
| 20-layer ResNet-like model | |||
| 26-layer ResNet-like model | 0.075 | 0.854 | 77.4 |
The performance of the established tools, and the shallow-net and deeper-net deep learning models in the prediction of experimental logS values of 62 recently-published novel compounds. The best performance values are in bold font.
Percent of predicted logS value within 10-fold of experimental value.
Performance on the solubility category prediction.
| Expert 1 | 6.5 | 4-layer DNN model | 79.0 |
| Expert 2 | 8.1 | 1-layer DNN model | 79.0 |
| Expert 3 | 11.3 | 6-layer DNN model | 82.3 |
| Expert 4 | 74.2 | 8-layer ResNet-like model | 80.7 |
| MOE V2016.0802 | 91.9 | 14-layer ResNet-like model | 87.1 |
| QikProp 2018-4 QP18 | 85.5 | 20-layer ResNet-like model | 85.5 |
| QikProp 2018-4 CIQP18 | 87.1 | 26-layer ResNet-like model | 83.9 |
| AlogGPS V2.1 | 82.3 |
The performance of human experts, the established tools, and the shallow-net and deeper-net deep learning models in the prediction of solubility category of 62 recently-published novel compounds. The solubility categories are practically insoluble or insoluble (<0.1 g/L), slightly soluble (0.1–10 g/L), soluble (10–100 g/L), and freely soluble (>100 g/L).
Figure 3The molecular structures and experimental solubility S values (in mg/mL) of the five synthetic novel compounds for a drug discovery project with solubility values measured for the first time by this work.
Performance on the logS prediction of 5 novel compounds.
| MOE V2016.0802 | 2.293 | <20 |
| QikProp 2018-4 QP18 | 2.717 | 20 |
| QikProp 2018-4 CIQP18 | 2.308 | 20 |
| AlogGPS V2.1 | 1.073 | 60 |
| 4-layer DNN model | 1.325 | 60 |
| 1-layer DNN model | 1.502 | 60 |
| 6-layer DNN model | 1.494 | 40 |
| 8-layer ResNet-like model | 1.646 | 60 |
| 14-layer ResNet-like model | 0.982 | 60 |
| 20-layer ResNet-like model | 0.811 | 60 |
| 26-layer ResNet-like model | ||
The performance of the established tools, and the shallow-net and deeper-net deep learning models in the prediction of experimental logS values of 5 novel compounds (quantitative values measured in this work). The best performance value is in bold font.
Percent of predicted logS value within 10-fold of experimental value.