| Literature DB >> 29348552 |
Liang Ding1, Minghui Wang2,3, Dongdong Sun1, Ao Li1,4.
Abstract
Accumulating evidences have indicated that lncRNAs play an important role in various human complex diseases. However, known disease-related lncRNAs are still comparatively small in number, and experimental identification is time-consuming and labor-intensive. Therefore, developing a useful computational method for inferring potential associations between lncRNAs and diseases has become a hot topic, which can significantly help people to explore complex human diseases at the molecular level and effectively advance the quality of disease diagnostics, therapy, prognosis and prevention. In this paper, we propose a novel prediction of lncRNA-disease associations via lncRNA-disease-gene tripartite graph (TPGLDA), which integrates gene-disease associations with lncRNA-disease associations. Compared to previous studies, TPGLDA can be used to better delineate the heterogeneity of coding-non-coding genes-disease association and can effectively identify potential lncRNA-disease associations. After implementing the leave-one-out cross validation, TPGLDA achieves an AUC value of 93.9% which demonstrates its good predictive performance. Moreover, the top 5 predicted rankings of lung cancer, hepatocellular carcinoma and ovarian cancer are manually confirmed by different relevant databases and literatures, affording convincing evidence of the good performance as well as potential value of TPGLDA in identifying potential lncRNA-disease associations. Matlab and R codes of TPGLDA can be found at following: https://github.com/USTC-HIlab/TPGLDA .Entities:
Mesh:
Substances:
Year: 2018 PMID: 29348552 PMCID: PMC5773503 DOI: 10.1038/s41598-018-19357-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The flowchart of TPGLDA. (a) Construct lncRNA-disease and gene-disease adjacency matrix. Calculate interaction profile for isolated nodes and integrate into adjacency matrix for further resource allocation. (b) Construct lncRNA-disease-gene tripartite graph. (c) Resource allocation on tripartite graph and build the potential lncRNA-disease associations. (d)Calculate the resource score (Rscore) of candidate lncRNAs and rank all candidates’ Rscore for each disease in descending order.
Figure 2Performance comparison between TPGLDA, LRLSLDA and KRWRH in terms of AUC and ROC curve based on LOOCV. As a result, TPGLDA achieves the highest AUCs of 0.939. The base line indicates random performance.
Comparison with other computational approaches at two stringency levels (Sp = 99.0% and Sp = 95.0%).
| TPGLDA | KRWRH | LRLSLDA | |
|---|---|---|---|
|
| 53.5% | 11.7% | 10.7% |
|
| 97.8% | 96.7% | 96.7% |
|
| 59.2% | 24.1% | 22.6% |
|
| 55.2% | 15.2% | 14.0% |
|
| 76.9% | 42.6% | 35.2% |
|
| 94.5% | 93.6% | 93.4% |
|
| 29.4% | 18.7% | 16.0% |
|
| 45.4% | 25.4% | 20.7% |
Figure 3The average AUCs across all the diseases at different top k cutoffs.
Figure 4The average recall across all the diseases at different top k cutoffs.
Prediction results for TPGLDA, KRWRH and LRLSLDA utilizing leave-one-out cross validation experiment on 15 diseases.
| Disease name | No. of Associated lncRNAs | AUC | ||
|---|---|---|---|---|
| TPGLDA | KRWRH | LRLSLDA | ||
| Gastric Cancer | 24 | 0.893 | 0.832 | 0.756 |
| Colorectal Cancer | 21 | 0.884 | 0.782 | 0.687 |
| Breast Cancer | 20 | 0.852 | 0.675 | 0.655 |
| Hepatocellular Carcinoma | 20 | 0.911 | 0.891 | 0.751 |
| Non-Small Cell | ||||
| Lung Cancer | 15 | 0.799 | 0.759 | 0.765 |
| Prostate Cancer | 13 | 0.886 | 0.807 | 0.758 |
| Esophageal Squamous | ||||
| Cell Carcinoma | 13 | 0.822 | 0.835 | 0.739 |
| Ovarian Cancer | 12 | 0.892 | 0.731 | 0.768 |
| Bladder Cancer | 11 | 0.883 | 0.774 | 0.765 |
| Lung Cancer | 9 | 0.828 | 0.737 | 0.750 |
| Melanoma | 9 | 0.939 | 0.627 | 0.815 |
| Glioma | 9 | 0.820 | 0.710 | 0.808 |
| Tumor | 8 | 0.950 | 0.786 | 0.625 |
| Schizophrenia | 8 | 0.860 | 0.854 | 0.630 |
| Papillary Thyroid Carcinoma | 7 | 0.892 | 0.700 | 0.835 |
The top 5 predictions computed by TPGLDA for Lung Cancer, Hepatocellular Carcinoma and Ovarian Cancer and the confirmation for their associations by related databases.
| LncRNA | TPGLDA’s rank | Evidences (PMID) | Description |
|---|---|---|---|
|
| |||
| GAS5 | 1 | 25925741,24357161 | Lnc2Cancer,LncRNA2Target |
| CDKN2B-AS1 | 2 | 21489289,26408699 | MNDR,Lnc2Cancer |
| UCA1 | 3 | 26380024 | Lnc2Cancer |
| PVT1 | 4 | 26493997;26493997 | Lnc2Cancer,literature |
| HNF1A-AS1 | 5 | 25863539 | literature |
|
| |||
| GAS5 | 1 | 26404135, 26163879 | Lnc2Cancer, literature |
| SOX2-OT | 2 | 26097588 | Lnc2Cancer |
| PVT1 | 3 | 25624916 | Lnc2Cancer |
| LINC00152 | 4 | 27351280, 26356260 | Lnc2Cancer, literature |
| UCA1 | 5 | 27215316, 27167190 | Lnc2Cancer, literature |
|
| |||
| MEG3 | 1 | 24859196 | Lnc2Cancer,LncRNA2Target |
| GAS5 | 2 | 26503132 | Lnc2Cancer |
| CCAT2 | 3 | 27283598 | Lnc2Cancer |
| BANCR | 4 | unconfirmed | unconfirmed |
| CDKN2B-AS1 | 5 | 27095571 | Lnc2Cancer |
Figure 5Operating principle of resource allocation in an lncRNA-disease-gene tripartite graph consisted of three lncRNAs, five diseases, and four genes. The blue circles, green squares and purple triangles represent lncRNAs in L, disease in D and genes in G, respectively. (a) For target lncRNA, the initial resources (1, 1, 0, 0, 1) locate on. (b) In the first step, each disease averagely distributes its resource to both sides of neighboring nodes based on the degree of each disease. (c) In the second step, the resources flow back to D from L and G, and final resource vector locate on D are and.