| Literature DB >> 34099048 |
Guadalupe Gonzalez1, Shunwang Gong1, Ivan Laponogov2, Michael Bronstein1,3,4, Kirill Veselkov5,6.
Abstract
BACKGROUND: Recent efforts in the field of nutritional science have allowed the discovery of disease-beating molecules within foods based on the commonality of bioactive food molecules to FDA-approved drugs. The pioneering work in this field used an unsupervised network propagation algorithm to learn the systemic-wide effect on the human interactome of 1962 FDA-approved drugs and a supervised algorithm to predict anticancer therapeutics using the learned representations. Then, a set of bioactive molecules within foods was fed into the model, which predicted molecules with cancer-beating potential.The employed methodology consisted of disjoint unsupervised feature generation and classification tasks, which can result in sub-optimal learned drug representations with respect to the classification task. Additionally, due to the disjoint nature of the tasks, the employed approach proved cumbersome to optimize, requiring testing of thousands of hyperparameter combinations and significant computational resources.To overcome the technical limitations highlighted above, we represent each drug as a graph (human interactome) with its targets as binary node features on the graph and formulate the problem as a graph classification task. To solve this task, inspired by the success of graph neural networks in graph classification problems, we use an end-to-end graph neural network model operating directly on the graphs, which learns drug representations to optimize model performance in the prediction of anticancer therapeutics.Entities:
Keywords: Cancer research; Genomics; Graph deep learning; Hyperfoods; Systems biology
Mesh:
Substances:
Year: 2021 PMID: 34099048 PMCID: PMC8182908 DOI: 10.1186/s40246-021-00333-4
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Fig. 1Drug targets are represented as a binary signal on the PPI. We use a GNN to generate a graph embedding representing the systemic-wide effect of the drug on the PPI. We then feed this representation to an MLP for the anticancer prediction task. The model is trained in an end-to-end fashion. After model training, we feed bioactive molecules within foods to the model for the prediction of anticancer food molecules
Hyperparameter space searched
| Hyperparameter | Space search |
|---|---|
| Learning rate | 5.10−4,5.10−3 |
| L2-regularization | 1.10−5,1.10−4,5.10−4 |
| Number of convolutional layers | 1,2,3 |
| Number of dropout layers | 1,2 |
| Batch normalization | |
| Feature normalization | |
| n-hops for ChebNet | 2,4,6 |
Time complexity of neural layers in O notation
| Layer/model | Time complexity | Layers | Running time (ms) |
|---|---|---|---|
| GCN | 1 | 5 | |
| 2 | 6 | ||
| 3 | 7 | ||
| GraphSAGE | 1 | 3.5 | |
| 2 | 4.5 | ||
| 3 | 6 | ||
| ChebNet | 1 | 4 | |
| 2 | 5 | ||
| 3 | 6 |
Time complexity of neural models expressed in running time per training iteration per sample
Summary of results (%) on anticancer drug prediction
| Method | ACC | F1 | AUPR | Precision ac | Recall ac | Precision non-ac | Recall non-ac |
|---|---|---|---|---|---|---|---|
| SVM | 79.26 ± 4.2 | 52.12 ± 5.92 | 53.35 ± 10.97 | 41.50 ± 6.75 | 69.12 ± 10.08 | 96.31 ± 1.06 | 88.74 ± 3.20 |
| RWR + SVM | 81.13 ± 3.79 | 51.84 ± 5.79 | 67.43 ± 8.14 | 38.98 ± 5.38 | 75.08 ± 6.92 | 96.90 ± 0.83 | 86.67 ± 2.37 |
| MLP | 80.62 ± 3.81 | 66.53 ± 5.02 | 69.05 ± 5.01 | 69.75 ± 6.74 | 64.55 ± 8.23 | 96.02 ± 0.85 | 96.68 ± 1.30 |
| GCN | 80.52 ± 3.33 | 63.95 ± 3.90 | 66.45 ± 5.82 | 63.33 ± 5.72 | 65.51 ± 7.42 | 96.08 ± 0.76 | 95.54 ± 1.38 |
| GraphSAGE | 78.27 ± 6.11 | 59.93 ± 6.53 | 64.42 ± 9.96 | 61.04 ± 5.72 | 61.15 ± 13.48 | 95.62 ± 1.37 | 95.38 ± 1.51 |
| ChebNet | |||||||
| MLP-P | 76.72 ± 2.68 | 54.40 ± 3.56 | 59.79 ± 7.64 | 51.67 ± 11.33 | 60.73 ± 7.81 | 95.44 ± 0.72 | 92.72 ± 3.18 |
| GCN-P | 78.70 ± 5.36 | 57.43 ± 7.61 | 60.03 ± 8.48 | 52.77 ± 7.69 | 64.03 ± 11.05 | 95.83 ± 1.18 | 93.37 ± 1.72 |
| GraphSAGE-P | 77.09 ± 4.18 | 54.07 ± 4.88 | 60.55 ± 9.51 | 48.87 ± 4.06 | 61.64 ± 9.65 | 95.53 ± 0.96 | 92.55 ± 1.95 |
| ChebNet-P | 76.10 ± 2.67 | 55.71 ± 4.46 | 59.68 ± 9.53 | 53.72 ± 4.07 | 57.86 ± 4.96 | 95.17 ± 0.53 | 94.35 ± 0.44 |
ACC = balanced accuracy, F1 = harmonic mean of precision and recall, AUPR = area under the precision-recall curve, ac = anticancer, non-ac = non-anticancer
Fig. 2Precision-recall curve of Baseline and ChebNet models across all splits
Anticancer likeness of food molecules was computed using the best-performing neural model
| ID | Name | Description |
|---|---|---|
| FDB001084 | Pancreatin | Digestive enzyme. Used in replacement therapy. Used to prepare protein hydrolysates for pre- and post-operative diets. |
| FDB006967 | Anthracene | Organic compounds containing a system of three linearly fused benzene rings. Anthracene can be found in sorrel. Anthracene is formally rated as an unfounded non-carcinogenic (IARC 3) potentially toxic compound. |
| FDB008856 | 2,2’-Bis(4-hydroxyphenyl) propane | Potential food contaminant arising from its use in reusable polycarbonate food containers such as water carboys, baby bottles and kitchen utensils. |
| FDB011663 | Coumestrol | Coumestrol is a natural organic compound in the class of phytochemicals known as coumestans. It has garnered research interest because of its estrogenic activity and its prevalence in some foods, such as soybeans and herbs such as Pueraria mirifica. Coumestrol is a phytoestrogen, mimicking the biological activity of estrogens. |
| FDB011828 | Genistein | Genistein is a phenolic compound belonging to the isoflavonoid group. Isoflavonoids are found mainly in soybean. Genistein and daidzein (an other isoflavonoid) represent the major phytochemicals found in this plant. |
| FDB012375 | Pterostilbene | Pterostilbene is a member of the class of compounds known as stilbenes. Pterostilbene can be found in common grape and grape wine. Pterostilbene is a stilbenoid chemically related to resveratrol. |
| FDB012974 | Mercenene | Found in the common clam Mercenaria mercenaria and Mercenaria campechiensis |
| FDB014654 | Myristicin | Natural organic compound present in the essential oil of nutmeg and to a lesser extent in other spices such as parsley and dill. |
| FDB016593 | 2,5-Dihydro-4,5-dimethyl-2-(1-methylpropyl) thiazole | Flavoring ingredient. Reported in hydrolyzed vegetable protein. |
| FDB020870 | 1-Methyl-6-phenyl-1H-imidazo[4,5-b]pyridin-2-amine | Food-related mutagen, reported to be the most abundant heterocyclic amine found in cooked meat and fish. |
| FDB022056 | 5a-Androstane-3a,17a-diol | Steroid compound. |
| FDB022182 | Isourso-deoxycholic acid | Bile acid. |
| FDB022318 | 11alpha-Hydroxy-progesterone | Steroid hormone involved in the female menstrual cycle, pregnancy (supports gestation) and embryogenesis of humans and other species. |
| FDB023086 | Dihydro-testosterone | Potent androgenic metabolite of testosterone. |
| FDB023772 | Testosterone enanthate | Testosterone enanthate is used in androgen substitution. |
| FDB024072 | 5b-Dihydro-testosterone | Intermediate in androgen and estrogen metabolism. |
| FDB028898 | Methyl-arsonite | Found in the arsenate detoxification I pathway. |
| FDB030068 | Platinum | Member of the class of compounds known as homogeneous transition metal compounds. Platinum can be found in a number of food items such as white cabbage, sunburst squash (pattypan squash), potato, and broccoli. |
| FDB030278 | 17- | It belongs to gluco/mineralocorticoids, progestogins, and derivatives class of compounds. |
| FDB030678 | Androst-4-en-3,17-dione | It belongs to androgens and derivatives class of compounds. |
20 molecules were predicted as additional anticancer molecules to those reported in [12]. Extended description and additional information for each molecule can be found in Additional file 2