| Literature DB >> 35672685 |
Seo Hyun Shin1, Seung Man Oh1, Jung Han Yoon Park2, Ki Won Lee3,4,5, Hee Yang6.
Abstract
BACKGROUND: Due to their diverse bioactivity, natural product (NP)s have been developed as commercial products in the pharmaceutical, food and cosmetic sectors as natural compound (NC)s and in the form of extracts. Following administration, NCs typically interact with multiple target proteins to elicit their effects. Various machine learning models have been developed to predict multi-target modulating NCs with desired physiological effects. However, due to deficiencies with existing chemical-protein interaction datasets, which are mostly single-labeled and limited, the existing models struggle to predict new chemical-protein interactions. New techniques are needed to overcome these limitations.Entities:
Keywords: Chemical-protein interaction; Deep learning; Multi-target prediction; Natural compounds; Siamese neural network
Mesh:
Substances:
Year: 2022 PMID: 35672685 PMCID: PMC9175487 DOI: 10.1186/s12859-022-04752-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1OptNCMiner model flowchart
Data used to construct the base dataset, transfer learning dataset, and few-shot learning dataset
| Dataset | Target gene | Target protein | Data source | Active compounds | Inactive compounds |
|---|---|---|---|---|---|
| Base dataset (actives | ADORA2A | Adenosine receptor A2a | ExCAPE-DB | 5077 | 591 |
| BRCA1 | Breast cancer type 1 susceptibility protein | ExCAPE-DB | 8619 | 43,095 | |
| CNR1 | Cannabinoid receptor 1 | ExCAPE-DB | 5125 | 397 | |
| DRD2 | D(2) dopamine receptor | ExCAPE-DB | 8037 | 40,185 | |
| HTR1A | 5-hydroxytryptamine receptor 1A | ExCAPE-DB | 6339 | 31,695 | |
| KCNH2 | Potassium voltage-gated channel subfamily H member 2 | ExCAPE-DB | 5327 | 26,635 | |
| LMNA | Prelamin-A/C | ExCAPE-DB | 14,533 | 72,665 | |
| OPRM1 | Mu-type opioid receptor | ExCAPE-DB | 5665 | 2872 | |
| SLC6A4 | Sodium-dependent serotonin transporter | ExCAPE-DB | 6912 | 370 | |
| TARDBP | TAR DNA-binding protein 43 | ExCAPE-DB | 12,193 | 60,965 | |
| TDP1 | Tyrosyl-DNA phosphodiesterase 1 | ExCAPE-DB | 23,129 | 115,645 | |
| Transfer learning dataset (1000 > actives > 500) | ADRA2A | Alpha-2A adrenergic receptor | ExCAPE-DB | 816 | 39 |
| GRIN1 | Glutamate receptor ionotropic | ExCAPE-DB | 553 | 92 | |
| HTR3A | 5-hydroxytryptamine receptor 3A | ExCAPE-DB | 565 | 65 | |
| MINK1 | Misshapen-like kinase 1 | ExCAPE-DB | 929 | 8 | |
| PKM2 | Pyruvate kinase PKM | ExCAPE-DB | 546 | 2730 | |
| POLK | DNA polymerase kappa | LIT-PCBA | 772 | 3860 | |
| VDR | Vitamin D3 receptor | LIT-PCBA | 884 | 4420 | |
| Few-shot learning dataset (100 > actives) | ADRB2 | Beta 2 adrenergic receptor | LIT-PCBA | 17 | 170 |
| ESR | Estrogen receptor alpha | LIT-PCBA | 13 | 130 | |
| IDH1 | Isocitrate dehydrogenase | LIT-PCBA | 39 | 390 | |
| MTOR | mammalian target of rapamycin complex 1 | LIT-PCBA | 97 | 970 | |
| OPRK1 | Kappa opioid receptor | LIT-PCBA | 24 | 5460 | |
| PPARG | Peroxisome proliferator-activated receptor gamma | LIT-PCBA | 27 | 270 | |
| TP53 | Cellular tumor antigen p53 | LIT-PCBA | 79 | 790 |
Fig. 2The distribution of a physicochemical properties; and b chemical structures in the base dataset, transfer learning dataset, and few-shot learning dataset
The performance of OptNCMiner and baseline models with the base dataset and transfer learning dataset
| Model | Performance metric1 | Base dataset | Transfer learning dataset |
|---|---|---|---|
| OptNCMiner | Recall | 0.833 | 0.871 |
| AUROC | 0.632 | 0.787 | |
| Accuracy | 0.440 | 0.713 | |
| Cosine similarity | Recall | 0.573 | 0.696 |
| AUROC | 0.643 | 0.761 | |
| Accuracy | 0.708 | 0.818 | |
| Naïve bayes classifier | Recall | 0.322 | 0.483 |
| AUROC | 0.623 | 0.696 | |
| Accuracy | 0.909 | 0.887 | |
| Logistic regression | Recall | 0.212 | 0.581 |
| AUROC | 0.606 | 0.785 | |
| Accuracy | 0.978 | 0.969 | |
| Random forest | Recall | 0.677 | 0.479 |
| AUROC | 0.343 | 0.241 | |
| Accuracy | 0.028 | 0.027 | |
| Multi-layer perceptron | Recall | 0.361 | 0.824 |
| AUROC | 0.676 | 0.818 | |
| Accuracy | 0.972 | 0.899 |
1All performance metrics are weighted averages of the results of all proteins comprising the dataset
Performance of OptNCMiner with the few-shot learning dataset
| Target protein | Recall | AUROC | Accuracy | Count |
|---|---|---|---|---|
| Beta 2 adrenergic receptor | 0.488 | 0.400 | 0.450 | 5 |
| Estrogen receptor a | 0.585 | 1.000 | 0.764 | 5 |
| Isocitrate dehydrogenase | 0.488 | 0.600 | 0.536 | 5 |
| Mammalian target of rapamycin complex 1 | 0.537 | 0.889 | 0.663 | 9 |
| Kappa opioid receptor | 0.659 | 1.000 | 0.806 | 5 |
| Peroxisome proliferator-activated receptor gamma | 0.610 | 1.000 | 0.778 | 5 |
| Cellular tumor antigen p53 | 0.537 | 0.857 | 0.664 | 7 |
| Weighted average | 0.555 | 0.829 | 0.665 | 41 |
Fig. 3In silico docking score for false positives from the a few-shot learning dataset; and b two molecular docking results for compound-protein interactions with lowest in silico docking score (highest binding affinity)
Fig. 4OptNCMiner predicts that ginger contains 33 NCs that regulate 8 different target proteins associated with T2DM complications