| Literature DB >> 33425486 |
Ji-Ren Zhou1, Zhu-Hong You1, Li Cheng1, Bo-Ya Ji1,2.
Abstract
Uncovering additional long non-coding RNA (lncRNA)-disease associations has become increasingly important for developing treatments for complex human diseases. Identification of lncRNA biomarkers and lncRNA-disease associations is central to diagnoses and treatment. However, traditional experimental methods are expensive and time-consuming. Enormous amounts of data present in public biological databases are available for computational methods used to predict lncRNA-disease associations. In this study, we propose a novel computational method to predict lncRNA-disease associations. More specifically, a heterogeneous network is first constructed by integrating the associations among microRNA (miRNA), lncRNA, protein, drug, and disease, Second, high-order proximity preserved embedding (HOPE) was used to embed nodes into a network. Finally, the rotation forest classifier was adopted to train the prediction model. In the 5-fold cross-validation experiment, the area under the curve (AUC) of our method achieved 0.8328 ± 0.0236. We compare it with the other four classifiers, in which the proposed method remarkably outperformed other comparison methods. Otherwise, we constructed three case studies for three excess death rate cancers, respectively. The results show that 9 (lung cancer, gastric cancer, and hepatocellular carcinomas) out of the top 15 predicted disease-related lncRNAs were confirmed by our method. In conclusion, our method could predict the unknown lncRNA-disease associations effectively.Entities:
Keywords: deep learning; heterogeneous information networks; lncRNA-disease associations; rotation forest
Year: 2020 PMID: 33425486 PMCID: PMC7773765 DOI: 10.1016/j.omtn.2020.10.040
Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN: 2162-2531 Impact factor: 8.886
Figure 1The ROC curves of our methods
The AUC is the area under the receiver operating characteristic curves (ROC).
Figure 2The PR curves of our methods
The AUPR is the area under the precision-recall (PR) curves.
5-fold cross-validation results of our method
| Fold | Acc. (%) | Sen. (%) | Spec. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 0 | 78.87 | 76.49 | 81.25 | 80.31 | 57.8 | 84.86 |
| 1 | 75.15 | 70.83 | 79.46 | 77.52 | 50.49 | 79.63 |
| 2 | 81.55 | 77.68 | 85.42 | 84.19 | 63.28 | 85.78 |
| 3 | 78.12 | 74.40 | 81.85 | 80.39 | 56.41 | 82.84 |
| 4 | 79.76 | 75.89 | 83.63 | 82.26 | 59.70 | 83.28 |
| Average | 78.69 ± 2.36 | 75.06 ± 2.64 | 82.32 ± 2.28 | 80.93 ± 2.48 | 57.54 ± 4.71 | 83.28 ± 2.36 |
Comparison of different features, respectively and simultaneously
| Feature | Acc. (%) | Sen. (%) | Spec. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| Attribute | 71.85 ± 1.55 | 66.79 ± 3.37 | 76.90 ± 1.63 | 74.31 ± 1.27 | 43.94 ± 2.97 | 71.92 ± 2.09 |
| Behavior | 79.16 ± 2.88 | 71.37 ± 3.11 | 86.96 ± 2.97 | 84.57 ± 3.41 | 59.06 ± 5.81 | 81.88 ± 2.61 |
| Both | 78.69 ± 2.36 | 75.06 ± 2.64 | 82.32 ± 2.28 | 80.93 ± 2.48 | 57.54 ± 4.71 | 83.28 ± 2.36 |
Figure 3Comparison of different features under 5-fold cross-validation, respectively
Figure 4Comparison of different features under 5-fold cross-validation simultaneously
Comparison of different classifiers
| Classifier | Acc. (%) | Sen. (%) | Spec. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| Decision tree | 73.57 ± 1.74 | 66.07 ± 2.69 | 81.07 ± 2.39 | 77.77 ± 2.23 | 47.71 ± 3.51 | 73.55 ± 1.79 |
| GDBT | 78.69 ± 1.66 | 68.21 ± 2.42 | 89.17 ± 1.24 | 86.29 ± 1.68 | 58.69 ± 3.24 | 81.27 ± 1.30 |
| Naive bayes | 68.69 ± 2.66 | 48.16 ± 3.33 | 89.23 ± 2.25 | 81.67 ± 4.05 | 40.99 ± 5.64 | 79.02 ± 2.45 |
| Random forest | 78.75 ± 2.43 | 68.75 ± 2.68 | 88.75 ± 2.47 | 85.95 ± 3.05 | 58.69 ± 4.93 | 81.35 ± 1.85 |
| Rotation forest | 78.69 ± 2.36 | 75.06 ± 2.64 | 82.32 ± 2.28 | 80.93 ± 2.48 | 57.54 ± 4.71 | 83.28 ± 2.36 |
Marked lncRNAs associations between lung cancer, gastric cancer, and hepatocellular carcinomas
| lncRNA | Disease | Rank |
|---|---|---|
| HOTAIR | lung cancer | 3.10 |
| H19 | lung cancer | 6.7 |
| KCNQ1OT1 | lung cancer | 8 |
| MEG3 | lung cancer | 12 |
| UCA1 | lung cancer | 13 |
| XIST | lung cancer | 14 |
| linc-ROR | lung cancer | 15 |
| H19 | gastric cancer | 1 |
| MEG3 | gastric cancer | 3 |
| HOTAIR | gastric cancer | 6 |
| NEAT1 | gastric cancer | 7 |
| XIST | gastric cancer | 8 |
| UCA1 | gastric cancer | 9 |
| ANRIL | gastric cancer | 10 |
| CASC2 | gastric cancer | 12 |
| link-ROR | gastric cancer | 13 |
| H19 | hepatocellular carcinomas | 1.7 |
| HOTAIR | hepatocellular carcinomas | 2.3 |
| ANRIL | hepatocellular carcinomas | 5 |
| IGF2-AS | hepatocellular carcinomas | 9 |
| MEG3 | hepatocellular carcinomas | 12 |
| TUSC7 | hepatocellular carcinomas | 13 |
| NEAT1 | hepatocellular carcinomas | 15 |
Nine associations involved in the heterogeneous network
| Relationship type | Database | Number of associations |
|---|---|---|
| miRNA-lncRNA | lncRNASNP2 | 8,374 |
| miRNA-disease | HMDD | 16,427 |
| miRNA-protein | miRTarBase | 4,944 |
| lncRNA-disease | lncRNAdisease | 1,680 |
| lncRNASNP2 | ||
| lnc2Cancer | ||
| lncRNA-protein | lncRNA2Target | 690 |
| protein-disease | DisGeNET | 25,087 |
| drug-protein | DrugBank | 11,107 |
| drug-disease | CTD | 18,416 |
| protein-protein | STRING | 19,237 |
| total | N/A | 105,963 |
The number of kinds of nodes in the heterogeneous network
| Node | Number of nodes |
|---|---|
| disease | 2,062 |
| lncRNA | 769 |
| miRNA | 1,023 |
| protein | 1,649 |
| drug | 1,025 |
| total | 6,528 |
Figure 5The network constructed by the multiple associations among different biomolecules
Figure 6The directed acyclic graph of a type of digestive system disease, gastrointestinal neoplasms