| Literature DB >> 36092871 |
Dengju Yao1, Tao Zhang1, Xiaojuan Zhan1,2, Shuli Zhang1, Xiaorong Zhan3, Chao Zhang4.
Abstract
More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.Entities:
Keywords: autoencoder; geometric complement heterogeneous information; lncRNA-disease association prediction; machine learning; random forest
Year: 2022 PMID: 36092871 PMCID: PMC9448985 DOI: 10.3389/fgene.2022.995532
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1The flowchart of constructing the GCHIRFLDA model.
The AUCs under different lncRNA/disease feature dimension.
| Dimension |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| 0.9576 | 0.9724 | 0.9768 | 0.9782 | 0.9750 | 0.9724 |
|
| 0.9492 | 0.9753 | 0.9775 | 0.9809 | 0.9804 | 0.9788 |
|
| 0.9577 | 0.9760 | 0.9791 | 0.9833 | 0.9842 | 0.9826 |
|
| 0.9561 | 0.9764 | 0.9808 | 0.9872 | 0.9884 | 0.9877 |
|
| 0.9539 | 0.9736 | 0.9804 | 0.9874 |
| 0.9889 |
|
| 0.9109 | 0.9711 | 0.9793 | 0.9880 | 0.9891 | 0.9890 |
FIGURE 2The ROC Curves of different classifiers in the GCHIRFLDA model.
FIGURE 3The Precision-Recall Curves of different classifiers in the GCHIRFLDA model.
The performance comparison of different classifiers in the GCHIRFLDA model.
| Classifier | AUC | AUPR | Recall | Accuracy | F1-score |
|---|---|---|---|---|---|
| Xgboost | 0.9815 | 0.4544 | 0.9523 | 0.9182 | 0.9523 |
| RF |
|
|
|
|
|
| C50 | 0.9513 | 0.1517 | 0.9340 | 0.8724 | 0.9265 |
| GBDT | 0.9497 | 0.2348 | 0.8942 | 0.8701 | 0.9253 |
| SVM | 0.9832 | 0.5826 | 0.9243 | 0.9313 | 0.9595 |
| LightGBM | 0.9832 | 0.5250 | 0.9428 | 0.9215 | 0.9541 |
The AUCs and AUPRs of different LDA prediction models.
| Method | AUC | AUPR |
|---|---|---|
| GCHIRFLDA |
|
|
| GAERF | 0.980 | 0.491 |
| GCNLDA | 0.959 | 0.223 |
| CNNLDA | 0.952 | 0.251 |
| LDAP | 0.863 | 0.166 |
| MFLDA | 0.626 | 0.066 |
| Ping’s Method | 0.871 | 0.219 |
| SIMCLDA | 0.746 | 0.095 |
FIGURE 4The ROC Curves of different LDA prediction models.
The top 20 colon cancer-related lncRNA candidates predicted by the GCHIRFLDA model.
| lncRNA | Rank | Evidence |
|---|---|---|
| CDKN2B-AS1 | 1 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| PVT1 | 2 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| UCA1 | 3 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| NEAT1 | 4 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| KCNQ1OT1 | 5 | Lnc2Cancer 3.0 |
| XIST | 6 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| GAS5 | 7 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| SPRY4-IT1 | 8 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| MIR17HG | 9 | Literature ( |
| TUG1 | 10 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| BANCR | 11 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| HOTTIP | 12 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| BCYRN1 | 13 | LncRNADiseasev2.0 |
| HNF1A-AS1 | 14 | Lnc2Cancer 3.0 |
| AFAP1-AS1 | 15 | Lnc2Cancer 3.0 |
| HULC | 16 | Lnc2Cancer 3.0 |
| TUSC7 | 17 | Lnc2Cancer 3.0 |
| KIRREL3-AS3 | 18 | unknown |
| LSINCT5 | 19 | unknown |
| NPTN-IT1 | 20 | unknown |
The top 20stomach cancer-related lncRNA candidates predicted by the GCHIRFLDA model.
| lncRNA | Rank | Evidence |
|---|---|---|
| MALAT1 | 1 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| XIST | 2 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| NEAT1 | 3 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| CCAT2 | 4 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| TUG1 | 5 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| KCNQ1OT1 | 6 | Lnc2Cancer 3.0 |
| HOTTIP | 7 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| WT1-AS | 8 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| HNF1A-AS1 | 9 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| HULC | 10 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| MIR17HG | 11 | Literature ( |
| CRNDE | 12 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| NPTN-IT1 | 13 | Lnc2Cancer 3.0& LncRNADisease v2.0 |
| LINC00675 | 14 | Lnc2Cancer 3.0 |
| KIRREL3-AS3 | 15 | unknown |
| TP53COR1 | 16 | unknown |
| BCYRN1 | 17 | Lnc2Cancer 3.0 |
| HOTAIRM1 | 18 | Lnc2Cancer 3.0 |
| AFAP1-AS1 | 19 | LncRNADisease v.2.0 |
| LINC01133 | 20 | Lnc2Cancer 3.0 |