| Literature DB >> 31865913 |
Yun Xiong1,2, Mengjie Guo1,2, Lu Ruan1,2, Xiangnan Kong3, Chunlei Tang4, Yangyong Zhu1,2, Wei Wang5.
Abstract
BACKGROUND: It is significant to identificate complex biological mechanisms of various diseases in biomedical research. Recently, the growing generation of tremendous amount of data in genomics, epigenomics, metagenomics, proteomics, metabolomics, nutriomics, etc., has resulted in the rise of systematic biological means of exploring complex diseases. However, the disparity between the production of the multiple data and our capability of analyzing data has been broaden gradually. Furthermore, we observe that networks can represent many of the above-mentioned data, and founded on the vector representations learned by network embedding methods, entities which are in close proximity but at present do not actually possess direct links are very likely to be related, therefore they are promising candidate subjects for biological investigation.Entities:
Keywords: Disease association prediction; Heterogeneous network; Network embedding
Mesh:
Year: 2019 PMID: 31865913 PMCID: PMC6927100 DOI: 10.1186/s12920-019-0623-3
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1An illustrative example of heterogeneous biological network embedding. The left figure demonstrates one disease, two miRNAs, two genes, and their known links which are denoted by solid edges. The right figure presents their projection to the two-dimensional space of a small region around the disease by employing network embedding. The four red dashed links denote the top predicted links utilizing our model. a Sub-network Relation Visualization. b Network Embedding Visualization
Description of each sub-network of the constructed heterogeneous network
| Network | Number of links | Weight | Source | |
|---|---|---|---|---|
| Gene (proteins) interaction network | G - G | 39,240 | 1 | HPRD [ |
| microRNA similarity network | M - M | 56,289 | 0 to 1 | MISIM [ |
| Disease phenotype similarity network | D - D | 3,162,016 | 0 to 1 | MimMiner [ |
| Gene-Disease association network | G - D | 19,714 | 0 to 1 | DisGeNET [ |
| Gene-miRNA interaction network | G - M | 21,259 | 0.3 or 1 | miRTarBase [ |
| miRNA-Disease association network | M - D | 878 | 1 | Chen et al. [ |
Fig. 2Network schema of constructed heterogeneous network. The solid black lines denote the links observed from the real-world network, and the red dashed lines denote the links we want to predict
Meta paths and their random walk-based measures between gene-disease and miRNA-disease
| With 2 types of nodes | Measure | With 3 types of nodes | Measure | |
|---|---|---|---|---|
| gene-disease | Gene | 9364 | Gene | 16103 |
| Gene | 8658 | Gene | 10465 | |
| Gene | 14422 | Gene | 16084 | |
| Gene | 10184 | Gene | 16460 | |
| miRNA-disease | miRNA | 19381 | miRNA | 14820 |
| miRNA | 21323 | miRNA | 10011 | |
| miRNA | 19540 | miRNA | 15481 | |
| miRNA | 21335 | miRNA | 14626 |
AUROC Score on Gene-Disease Association Prediction
| Method/Training ratio | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|
| CATAPULT | 0.611 | 0.619 | 0.622 | 0.659 | 0.685 |
| HSMP | 0.621 | 0.625 | 0.679 | 0.708 | 0.747 |
| HSSVM | 0.609 | 0.653 | 0.693 | 0.734 | 0.779 |
| DeepWalk | 0.454 | 0.461 | 0.481 | 0.433 | 0.477 |
| LINE(1st+2nd) | 0.638 | 0.655 | 0.647 | 0.667 | 0.661 |
| DGI | 0.523 | 0.527 | 0.549 | 0.561 | 0.534 |
| TransE | 0.488 | 0.496 | 0.492 | 0.488 | 0.496 |
| AspEm | 0.667 | 0.659 | 0.657 | 0.681 | |
| HeteWalk | 0.638 |
The best performance is in bold
AUROC Score on miRNA-Disease Association Prediction
| Method/Training ratio | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|
| CATAPULT | 0.811 | 0.833 | 0.843 | 0.867 | 0.877 |
| HSMP | 0.833 | 0.864 | 0.878 | 0.899 | 0.869 |
| HSSVM | 0.841 | 0.877 | 0.902 | 0.922 | 0.932 |
| DeepWalk | 0.498 | 0.511 | 0.534 | 0.611 | 0.677 |
| LINE(1st+2nd) | 0.780 | 0.795 | 0.829 | 0.813 | 0.804 |
| DGI | 0.501 | 0.483 | 0.496 | 0.516 | 0.512 |
| TransE | 0.473 | 0.477 | 0.481 | 0.469 | 0.464 |
| AspEm | 0.765 | 0.819 | 0.761 | 0.849 | 0.819 |
| HeteWalk |
The best performance is in bold
Fig. 3Performance on different networks. The left figure illustrates the AUROC score of miRNA-disease association predicted by two comparable methods and our method, in which the blue bar denotes the results on a sub-network only containing data in miRNA and disease types and the orange one is on the whole heterogeneous network. The right figure illustrates the score of gene-disease association prediction, in which the blue bar denotes the results on a sub-network only containing data in gene and disease types and the orange one is on the whole network. a miRNA-disease association prediction. b Gene-disease association prediction
Fig. 4Parameter sensitivity. The green broken-line denotes the results on gene-disease association prediction, while the red broken-line denotes the results on miRNA-disease association prediction. a AUROC for different embedding dimensions. b AUROC for different number of walks
Top 10 unknown disease-related associations predicted by HeteWalk
| Gene | miRNA | Gene | miRNA | ||||
|---|---|---|---|---|---|---|---|
| Leukemia OMIM: 601626 | Alzheimer disease OMIM: 104300 | ||||||
| 2 | TNF | 3 | hsa-mir-21 | 2 | GRN | 1 | hsa-mir-223 |
| 4 | APOE | 4 | hsa-mir-17 | 8 | CHMP2B | 2 | hsa-mir-659 |
| 5 | ATM | 7 | hsa-mir-146a | 10 | TNF | 3 | hsa-let-7c |
| 6 | PRRX1 | 8 | hsa-mir-510 | 12 | CEBPA | 4 | hsa-mir-21 |
| 7 | CD81 | 10 | hsa-mir-20b | 13 | ATM | 5 | hsa-mir-15a |
| 8 | USP8 | 11 | hsa-mir-331 | 15 | PPARG | 6 | hsa-mir-16-1 |
| 9 | PPARG | 12 | hsa-mir-155 | 16 | BCR | 7 | hsa-mir-17 |
| 10 | IL1B | 13 | hsa-mir-143 | 17 | ABL1 | 8 | hsa-mir-155 |
| 11 | SH2B1 | 14 | hsa-mir-539 | 18 | USP8 | 9 | hsa-mir-510 |
| 12 | IL6 | 15 | hsa-mir-192 | 19 | HNF1B | 11 | hsa-let-7a-1 |
| Insulin resistance OMIM: 125853 | Prostate cancer OMIM: 176807 | ||||||
| 1 | BCR | 1 | hsa-mir-659 | 1 | ATM | 1 | hsa-mir-223 |
| 2 | ABL1 | 2 | hsa-mir-21 | 2 | ZNF804A | 2 | hsa-mir-21 |
| 4 | ARID3B | 3 | hsa-mir-223 | 3 | BEND2 | 4 | hsa-mir-144 |
| 8 | MAST1 | 4 | hsa-let-7c | 4 | TBP | 5 | hsa-mir-331 |
| 9 | CEBPA | 5 | hsa-mir-16-1 | 5 | PLTP | 6 | hsa-mir-17 |
| 11 | CDH8 | 6 | hsa-mir-15a | 6 | ELP5 | 8 | hsa-mir-510 |
| 12 | ZNF609 | 7 | hsa-mir-17 | 7 | KLHL35 | 10 | hsa-mir-143 |
| 13 | TBP | 8 | hsa-mir-155 | 8 | ENTPD6 | 11 | hsa-mir-20b |
| 14 | IL1RAPL1 | 9 | hsa-mir-146a | 9 | RBP2 | 12 | hsa-mir-425 |
| 15 | ENTPD6 | 10 | hsa-mir-510 | 10 | U2AF2 | 14 | hsa-let-7a-1 |
| Schizophrenia OMIM: 181500 | Breast cancer OMIM: 114480 | ||||||
| 1 | CEBPA | 1 | hsa-mir-21 | 1 | PHKG1 | 2 | hsa-let-7c |
| 2 | TNF | 2 | hsa-let-7c | 2 | FGF4 | 3 | hsa-mir-223 |
| 3 | EVPL | 3 | hsa-mir-223 | 3 | CEBPA | 4 | hsa-mir-16-1 |
| 4 | PPARG | 4 | hsa-mir-16-1 | 4 | EVPL | 7 | hsa-mir-15a |
| 5 | AKT2 | 5 | hsa-mir-15a | 5 | HAVCR1 | 10 | hsa-mir-539 |
| 6 | HAVCR1 | 6 | hsa-mir-146a | 6 | BCR | 12 | hsa-mir-20b |
| 7 | PHKG1 | 7 | hsa-mir-155 | 7 | TBP | 13 | hsa-mir-484 |
| 8 | APOE | 8 | hsa-mir-510 | 8 | PPARG | 14 | hsa-mir-192 |
| 9 | ENPP1 | 9 | hsa-mir-17 | 9 | CDH1 | 15 | hsa-mir-93 |
| 10 | FGF4 | 10 | hsa-mir-20b | 10 | AKT2 | 16 | hsa-mir-614 |
| Gastric cancer OMIM: 137215 | Colorectal cancer OMIM: 114500 | ||||||
| 1 | FTO | 2 | hsa-mir-146a | 1 | ESRRB | 1 | hsa-mir-146a |
| 2 | NTRK1 | 3 | hsa-mir-155 | 2 | COL3A1 | 2 | hsa-mir-16-1 |
| 3 | PCSK1 | 5 | hsa-mir-539 | 3 | GNA11 | 4 | hsa-mir-155 |
| 4 | MSH6 | 6 | hsa-mir-484 | 4 | GDF1 | 5 | hsa-mir-20b |
| 5 | RAI1 | 7 | hsa-let-7c | 5 | ZMPSTE24 | 6 | hsa-mir-93 |
| 6 | DICER1 | 8 | hsa-mir-192 | 6 | COL4A5 | 7 | hsa-mir-192 |
| 7 | DHH | 9 | hsa-mir-614 | 7 | KIF11 | 8 | hsa-mir-539 |
| 8 | MC3R | 10 | hsa-mir-21 | 8 | CLCN2 | 10 | hsa-mir-181b-1 |
| 9 | NOG | 11 | hsa-mir-181b-1 | 10 | REST | 11 | hsa-mir-510 |
| 10 | GDF1 | 12 | hsa-mir-34b | 11 | SCN3B | 12 | hsa-mir-203a |
For each disease, the top-ranked genes are in the left column while the top-ranked miRNAs are in the right. The numbers denote their original ranking before known associations are removed in the results
Top 10 diseases associated to the given miRNAs predicted by HeteWalk
| Rank | Disease | Verified |
|---|---|---|
| hsa-mir-21 | ||
| 3 | 188550 Nonmedullary Thyroid cancer 1 | miR2Disease |
| 5 | 608232 Chronic myeloid leukemia | PhenomiR |
| 6 | 266600 Inflammatory bowel disease 1 | HMDD |
| 8 | 607464 Thyroid carcinoma | |
| 9 | 273300 Male germ cell tumor | |
| 10 | 151430 B-cell lymphoma 2 | PhenomiR |
| 11 | 155601 Cutaneous malignant melanoma | PhenomiR |
| 12 | 145500 Hypertension | HMDD |
| 13 | 256700 Neuroblastoma | HMDD |
| 14 | 176807 Prostate cancer | PhenomiR, HMDD, miR2Disease |
| hsa-let-7a-1 | ||
| 2 | 155255 Medulloblastoma | PhenomiR |
| 4 | 176807 Prostate cancer | PhenomiR, HMDD, miR2Disease |
| 6 | 256700 Neuroblastoma | PhenomiR |
| 7 | 608232 Chronic myeloid leukemia | PhenomiR |
| 9 | 151430 B-cell lymphoma 2 | PhenomiR |
| 10 | 150699 Uterine leiomyoma | |
| 12 | 600634 Pituitary adenoma | miR2Disease |
| 15 | 236000 Hodgkin lymphoma | PhenomiR, HMDD, miR2Disease |
| 16 | 607464 Thyroid carcinoma | |
| 18 | 226150 Enterocolitis | |
| hsa-mir-125b-1 | ||
| 1 | 137800 Glioma susceptibility 1 | miR2Disease |
| 2 | 266600 Inflammatory bowel disease 1 | |
| 4 | 188550 Nonmedullary Thyroid cancer 1 | HMDD |
| 5 | 273300 Male germ cell tumor | |
| 6 | 608232 Chronic myeloid leukemia | PhenomiR |
| 7 | 155601 Cutaneous malignant melanoma | HMDD |
| 9 | 145500 Hypertension | |
| 10 | 181500 Schizophrenia | |
| 11 | 151430 B-cell lymphoma 2 | PhenomiR |
| 13 | 260350 Pancreatic cancer | PhenomiR, HMDD, miR2Disease |
| hsa-mir-155 | ||
| 2 | 188550 Nonmedullary Thyroid cancer 1 | HMDD |
| 3 | 273300 Male germ cell tumor | |
| 4 | 137800 Glioma susceptibility 1 | HMDD |
| 6 | 155601 Cutaneous malignant melanoma | HMDD |
| 7 | 608232 Chronic myeloid leukemia | PhenomiR |
| 8 | 256700 Neuroblastoma | |
| 10 | 601626 Acute myeloid leukemia | PhenomiR, HMDD |
| 12 | 226150 Enterocolitis | |
| 13 | 114500 Colorectal cancer | PhenomiR, HMDD |
| 15 | 176807 Prostate cancer | PhenomiR |
The first column shows the rankings of the predictions among all diseases, the second presents their diseases names and OMIM ids, and the third indicates whether the predicted associations are verified
Top 10 diseases associated with the given miRNAs predicted by CATAPULT
| hsa-mir-21 | hsa-let-7a-1 | hsa-mir-125b-1 | hsa-mir-155 | ||||
|---|---|---|---|---|---|---|---|
| 4 | 7 | 3 | 4 | ||||
| 7 | 273300 Male germ cell tumor | 9 | 4 | 6 | 151430 B-cell lymphoma 2 | ||
| 9 | 155601 Cutaneous malignant melanooma | 10 | 273300 Male germ cell tumor | 6 | 273300 Male germ cell tumor | 8 | 273300 Male germ cell tumor |
| 11 | 13 | 188550 Nonmedullary Thyroid cancer 1 | 7 | 9 | 155601 Cutaneous malignant melanooma | ||
| 13 | 14 | 137800 Glioma susceptibility 1 | 9 | 155601 Cutaneous malignant melanooma | 10 | ||
| 14 | 15 | 226150 Enterocolitis | 10 | 114500 Colorectal cancer | 12 | 114500 Colorectal cancer | |
| 15 | 226150 Enterocolitis | 17 | 11 | 226150 Enterocolitis | 13 | ||
| 16 | 181500 Schizophrenia | 19 | 605027 Non-Hodgkin Lymphoma | 12 | 236000 Hodgkin lymphoma | 14 | 226150 Enterocolitis |
| 17 | 131440 Myeloproliferative disorder with eosinophilia | 20 | 266600 Inflammatory bowel disease 1 | 13 | 15 | 158350 Cowden syndrome 1 | |
| 18 | 605027 Non-Hodgkin Lymphoma | 21 | 268210 Rhabdomyosarcoma | 14 | 266600 Inflammatory bowel disease 1 | 16 | 600634 Pituitary adenoma |
Know associations are omitted and records verified are in bold. The first column indicates their original rankings
Top 10 diseases associated with the given miRNAs predicted by HSMP
| hsa-mir-21 | hsa-let-7a-1 | hsa-mir-125b-1 | hsa-mir-155 | ||||
|---|---|---|---|---|---|---|---|
| 3 | 155601 Cutaneous malignant melanooma | 5 | 3 | 266600 Inflammatory bowel disease 1 | 3 | ||
| 4 | 8 | 5 | 4 | 273300 Male germ cell tumor | |||
| 5 | 9 | 6 | 273300 Male germ cell tumor | 5 | |||
| 6 | 151400 Leukemia | 11 | 181500 Schizophrenia | 7 | 7 | ||
| 8 | 12 | 131440 Myeloproliferative disorder with eosinophilia | 9 | 10 | 256700 Neuroblastoma | ||
| 9 | 14 | 10 | 181500 Schizophrenia | 11 | 155255 Medulloblastoma | ||
| 11 | 137580 Tourette syndrome | 16 | 236000 Hodgkin lymphoma | 11 | 12 | ||
| 14 | 273300 Male germ cell tumor | 17 | 12 | 13 | 174050 Polycystic liver disease 1 | ||
| 15 | 18 | 268210 Rhabdomyosarcoma | 13 | 158350 Cowden syndrome 1 | 14 | 137580 Tourette syndrome | |
| 16 | 131440 Myeloproliferative disorder with eosinophilia | 19 | 192600 Cardiomyopathy | 14 | 600634 Pituitary adenoma | 15 | 125853 Diabetes type 2 |
Know associations are omitted and records verified are in bold. The first column indicates their original rankings
Top 10 diseases associated with the given miRNAs predicted by HSSVM
| hsa-mir-21 | hsa-let-7a-1 | hsa-mir-125b-1 | hsa-mir-155 | ||||
|---|---|---|---|---|---|---|---|
| 3 | 6 | 4 | 114500 Colorectal cancer | 3 | |||
| 4 | 155601 Cutaneous malignant melanooma | 8 | 5 | 266600 Inflammatory bowel disease 1 | 5 | ||
| 5 | 9 | 6 | 145500 Hypertension | 6 | 256700 Neuroblastoma | ||
| 7 | 11 | 131440 Myeloproliferative disorder with eosinophilia | 7 | 601626 Acute myeloid leukemia | 8 | ||
| 8 | 13 | 608232 Chronic myeloid leukemia | 9 | 226150 Enterocolitis | 9 | 273300 Male germ cell tumor | |
| 10 | 14 | 268210 Rhabdomyosarcoma | 10 | 11 | |||
| 12 | 601665 Obesity | 15 | 11 | 268210 Rhabdomyosarcoma | 12 | 125853 Diabetes type 2 | |
| 13 | 273300 Male germ cell tumor | 16 | 150699 Uterine leiomyoma | 12 | 273300 Male germ cell tumor | 13 | |
| 14 | 607464 Thyroid carcinoma | 18 | 13 | 600634 Pituitary adenoma | 14 | 600634 Pituitary adenoma | |
| 15 | 247640 Lymphoblastic leukemia | 19 | 14 | 266600 Inflammatory bowel disease 1 | 15 | 158350 Cowden syndrome 1 |
Know associations are omitted and records verified are in bold. The first column indicates their original rankings