| Literature DB >> 33121010 |
Tingjun Xu1, Weiming Chen1, Junhong Zhou1, Jingfang Dai1, Yingyong Li1, Yingli Zhao1.
Abstract
Chemically unstable natural products are prone to show their reactivity in the procedures of extraction, purification, or identification and turn into contaminants as so-called "artifacts". However, identification of artifacts requires considerable investments in technical equipment, time, and human resources. For revealing these reactive natural products and their artifacts by computational approaches, we set up a virtual screening system to seek cases in a biochemical database. The screening system is based on deep learning models of predicting the two main classifications of conversion reactions from natural products to artifacts, namely solvolysis and oxidation. A set of result data was reviewed for checking validity of the screening system, and we screened out a batch of reactive natural products and their probable artifacts. This work provides some insights into the formations of natural product artifacts, and the result data may act as warnings regarding the improper handling of biological matrixes in multicomponent extraction.Entities:
Keywords: artifact; deep learning; natural product; virtual screening
Year: 2020 PMID: 33121010 PMCID: PMC7692644 DOI: 10.3390/biom10111486
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Example of a set of relational data: natural products from Thalictrum delavayi.
| No. | Biological Source | Natural Product |
|---|---|---|
| 1 |
|
|
| 2 |
|
|
| 3 |
|
|
| 4 |
|
|
| 5 |
|
|
| 6 |
|
|
| 7 |
|
|
| 8 |
|
|
| 9 |
|
|
| 10 |
|
|
Figure 1Illustration of the virtual screening system for discovering reactive natural products and their probable artifacts [17,18].
Figure 2(A) Architecture of the neural networks for predicting the reactions of natural products to artifacts. (B) Illustration of the convolutional neural network (CNN)-based neural networks in training mode.
Key hyperparameters of the best-performing CNN models.
| Class of CNN Model | Batch Size | Epoch | Latent Dimensionality of Encoding Space | Latent Dimensionality of Decoding Space | Optimizer |
|---|---|---|---|---|---|
| Solvolysis of methanol | 64 | 100 | 256 | 64 | Adam |
| Solvolysis of ethanol | 64 | 500 | 256 | 64 | Adam |
| Solvolysis of acetone | 64 | 100 | 256 | 64 | Adam |
| Solvolysis of dichloromethane | 64 | 500 | 256 | 64 | Adam |
| Solvolysis of chloroform | 64 | 1000 | 256 | 64 | Adam |
| Solvolysis of water | 64 | 500 | 256 | 64 | Adam |
| Oxidation | 64 | 500 | 256 | 64 | Adam |
Performance of the used CNN models on validation data set.
| Class of CNN Model | Success | Concordance | Accuracy |
|---|---|---|---|
| Solvolysis of methanol | 88.21% | 0.93 | 75.72% |
| Solvolysis of ethanol | 86.80% | 0.87 | 78.27% |
| Solvolysis of acetone | 98.18% | 0.97 | 87.91% |
| Solvolysis of dichloromethane | 95.00% | 0.97 | 89.64% |
| Solvolysis of chloroform | 88.64% | 0.96 | 85.23% |
| Solvolysis of water | 82.33% | 0.86 | 70.40% |
| Oxidation | 86.86% | 0.85 | 71.07% |
Success: percentage of valid SMILES strings for molecular structure generated by the models; Concordance: average sequence match ratio of target and predicted SMILES strings (0 = totally different, 1 = exact match); Accuracy: percentage of chemical structure identification (same InchiKey) between target and predicted SMILES strings.
Figure 3Some typical cases of reactive natural products and their probable artifacts caused by solvolysis in the result data set.
Figure 4Some typical cases of reactive natural products and their probable artifacts caused by oxidation in the result data set.