| Literature DB >> 34983363 |
Li Wang1,2, Cheng Zhong3,4.
Abstract
BACKGROUND: Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance.Entities:
Keywords: Disease similarity based on gene–gene interaction network; Gaussian interaction profile kernel similarity of lncRNAs; Graph attention network; lncRNA-disease association prediction
Mesh:
Substances:
Year: 2022 PMID: 34983363 PMCID: PMC8729153 DOI: 10.1186/s12859-021-04548-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Performance comparison of predicting methods under the setting CVP, CVL and CVD on Dataset 1. a–b Performance of all methods based on the CVP cross-validation settings. c–d Performance of all methods based on the CVL cross-validation settings. e–f Performance of all methods based on the CVD cross-validation settings
Fig. 2Performance comparison of predicting methods under the setting CVP, CVL and CVD on Dataset 2. a–b Performance of all methods based on the CVP cross-validation settings. c–d Performance of all methods based on the CVL cross-validation settings. e–f Performance of all methods based on the CVD cross-validation settings
Fig. 3Performance comparison of predicting methods under the setting CVP, CVL and CVD on Dataset 3. a–b Performance of all methods based on the CVP cross-validation settings. c–d Performance of all methods based on the CVL cross-validation settings. e–f Performance of all methods based on the CVD cross-validation settings
Experiment results of six methods on Dataset1 under CVP setting
| gGATLDA | BiWalkLDA | SIMCLDA | MFLDA | BiGAN | GCRFLDA | |
|---|---|---|---|---|---|---|
| AUC | 0.8435 | 0.7836 | 0.7223 | 0.5246 | 0.8120 | |
| Precision | 0.7538 | 0.6822 | 0.6928 | 0.4972 | 0.7273 | |
| Recall | 0.7968 | 0.7591 | 0.6705 | 0.5025 | 0.7025 | |
| AUPR | 0.8727 | 0.8203 | 0.7895 | 0.5029 | 0.7806 | |
| Accuracy | 0.7768 | 0.6866 | 0.6432 | 0.4992 | 0.7473 | |
| F1-Score | 0.7740 | 0.7087 | 0.6552 | 0.4995 | 0.7127 |
The best results in each row are represented in bold
Experiment results of six methods on Dataset2 under CVP setting
| gGATLDA | BiWalkLDA | SIMCLDA | MFLDA | BiGAN | GCRFLDA | |
|---|---|---|---|---|---|---|
| AUC | 0.6499 ± 0.0022 | 0.8433 ± 0.0035 | 0.8270 ± 0.0033 | 0.8932 ± 0.0118 | 0.9548 | |
| Precision | 0.4958 ± 0.0040 | 0.6979 ± 0.0114 | 0.9261 ± 0.0368 | 0.8031 ± 0.0129 | 0.8840 | |
| Recall | 0.8466 ± 0.0264 | 0.8997 ± 0.0103 | 0.5905 ± 0.0646 | 0.7990 ± 0.0443 | 0.8689 | |
| AUPR | 0.7419 ± 0.0036 | 0.8824 ± 0.0053 | 0.8720 ± 0.0027 | 0.8857 ± 0.0200 | 0.9512 | |
| Accuracy | 0.4930 ± 0.0065 | 0.7549 ± 0.0080 | 0.7698 ± 0.0166 | 0.8016 ± 0.0214 | 0.8859 | |
| F1-Score | 0.6253 ± 0.0104 | 0.7859 ± 0.0041 | 0.7174 ± 0.0358 | 0.8005 ± 0.0261 | 0.8755 |
The best results in each row are represented in bold
Experiment results of six methods on Dataset3 under CVP setting
| gGATLDA | BiWalkLDA | SIMCLDA | MFLDA | BiGAN | GCRFLDA | |
|---|---|---|---|---|---|---|
| AUC | 0.8185 ± 0.0024 | 0.8465 ± 0.0030 | 0.8478 ± 0.0048 | 0.9045 ± 0.0185 | 0.9583 | |
| Precision | 0.7980 ± 0.0367 | 0.6370 ± 0.0033 | 0.7247 ± 0.0142 | 0.8667 ± 0.1310 | 0.6572 ± 0.0073 | |
| Recall | 0.7297 ± 0.0121 | 0.8475 ± 0.0162 | 0.6942 ± 0.1089 | 0.9495 ± 0.0132 | 0.8632 | |
| AUPR | 0.8416 ± 0.0031 | 0.8450 ± 0.0053 | 0.8860 ± 0.0032 | 0.9058 ± 0.0192 | 0.9548 | |
| Accuracy | 0.8670 ± 0.0271 | 0.6568 ± 0.0032 | 0.7623 ± 0.0065 | 0.7652 ± 0.0867 | 0.7270 ± 0.0088 | |
| F1-Score | 0.6801 ± 0.0049 | 0.7810 ± 0.0022 | 0.7523 ± 0.0324 | 0.7767 ± 0.0068 | 0.8817 |
The best results in each row are represented in bold
Fig. 4Performance comparison of predicting methods using different disease similarity. a–b For Dataset1, ROC curve and PR curve of predicting methods using different disease similarity. c–d For Dataset2, ROC curve and PR curve of predicting methods using different disease similarity
Influence of different hops on the prediction model
| Dataset1 | Dataset2 | |||||
|---|---|---|---|---|---|---|
| hop = 1 | hop = 2 | hop = 3 | hop = 1 | hop = 2 | hop = 3 | |
| AUC | 0.948 | 0.943 | 0.945 | 0.986 | 0.982 | 0.953 |
| Precision | 0.731 | 0.794 | 0.754 | 0.658 | 0.698 | 0.730 |
| Recall | 0.965 | 0.900 | 0.926 | 0.999 | 0.995 | 0.988 |
| AUPR | 0.953 | 0.948 | 0.951 | 0.985 | 0.983 | 0.950 |
| Accuracy | 0.799 | 0.825 | 0.800 | 0.732 | 0.777 | 0.802 |
| F1-Score | 0.830 | 0.838 | 0.824 | 0.791 | 0.819 | 0.837 |
Fig. 5Hype-parameter optimization results for F1-score, accuracy, recall, AUC, AUPR. a Results comparing different epochs. b Results comparing different batch size. c Results comparing different learning rate
Top 15 predicted lncRNAs associated with breast cancer
| Disease similarity based on gene–gene interaction network | Disease similarity based on disease semantic | ||||
|---|---|---|---|---|---|
| Rank | lncRNA | Evidence | Rank | lncRNA | Evidence |
| 1 | KCNQ1OT1 | Lnc2Cancer 3.0 | 1 | TRAF3IP2-AS1 | PMID: 30157476 |
| 2 | UCA1 | Lnc2Cancer 3.0 | 2 | DLX6-AS1 | Lnc2Cancer 3.0 |
| 3 | MIAT | Lnc2Cancer 3.0 | 3 | MINA | PMID: 30254753 |
| 4 | MINA | PMID: 30254753 | 4 | KCNQ1OT1 | Lnc2Cancer 3.0 |
| 5 | NPTN-IT1 | Lnc2Cancer 3.0 | 5 | NEAT1 | Lnc2Cancer 3.0 |
| 6 | LincRNA-p21 | Lnc2Cancer 3.0 | 6 | LincRNA-p21 | Lnc2Cancer 3.0 |
| 7 | IGF2-AS | PMID: 33175607 | 7 | UCA1 | Lnc2Cancer 3.0 |
| 8 | DRAIC | LncRNADisease v2.0 | 8 | SOX2-OT | Lnc2Cancer 3.0 |
| 9 | NEAT1 | Lnc2Cancer 3.0 | 9 | NPTN-IT1 | Lnc2Cancer 3.0 |
| 10 | PCAT29 | PMID: 32521844 | 10 | HULC | Lnc2Cancer 3.0 |
| 11 | HULC | Lnc2Cancer 3.0 | 11 | CRNDE | LncRNADisease v2.0 |
| 12 | CCND1 | LncRNADisease v2.0 | 12 | TUSC7 | Lnc2Cancer 3.0 |
| 13 | SPRY4-IT1 | Lnc2Cancer 3.0 | 13 | 7SK | unconfirmed |
| 14 | SOX2-OT | Lnc2Cancer 3.0 | 14 | WT1-AS | LncRNADisease v2.0 |
| 15 | TUSC7 | Lnc2Cancer 3.0 | 15 | ESCCAL-1 | unconfirmed |
Top 15 predicted lncRNAs associated with gastric cancer
| Disease similarity based on gene–gene interaction network | Disease similarity based on disease semantic | ||||
|---|---|---|---|---|---|
| Rank | lncRNA | Evidence | Rank | lncRNA | Evidence |
| 1 | KCNQ1OT1 | Lnc2Cancer 3.0 | 1 | TRAF3IP2-AS1 | PMID: 25370763 |
| 2 | SOX2-OT | Lnc2Cancer 3.0 | 2 | SOX2-OT | Lnc2Cancer 3.0 |
| 3 | LincRNA-p21 | Lnc2Cancer 3.0 | 3 | DLX6-AS1 | Lnc2Cancer 3.0 |
| 4 | XIST | LncRNADisease v2.0 | 4 | NEAT1 | Lnc2Cancer 3.0 |
| 5 | NPTN-IT1 | Lnc2Cancer 3.0 | 5 | MALAT1 | Lnc2Cancer 3.0 |
| 6 | MIAT | Lnc2Cancer 3.0 | 6 | GAS5 | Lnc2Cancer 3.0 |
| 7 | DRAIC | Lnc2Cancer 3.0 | 7 | XIST | LncRNADisease v2.0 |
| 8 | MALAT1 | Lnc2Cancer 3.0 | 8 | LincRNA-p21 | Lnc2Cancer 3.0 |
| 9 | HULC | Lnc2Cancer 3.0 | 9 | KCNQ1OT1 | Lnc2Cancer 3.0 |
| 10 | IGF2-AS | PMID: 31183590 | 10 | NPTN-IT1 | Lnc2Cancer 3.0 |
| 11 | NEAT1 | Lnc2Cancer 3.0 | 11 | HULC | Lnc2Cancer 3.0 |
| 12 | PCAT29 | LncRNADisease v2.0 | 12 | TUG1 | Lnc2Cancer 3.0 |
| 13 | AIR | Lnc2Cancer 3.0 | 13 | MIAT | Lnc2Cancer 3.0 |
| 14 | GAS5 | Lnc2Cancer 3.0 | 14 | DRAIC | Lnc2Cancer 3.0 |
| 15 | TUG1 | Lnc2Cancer 3.0 | 15 | SRA1 | unconfirmed |
Top 15 predicted lncRNAs associated with prostate cancer
| Disease similarity based on gene–gene interaction network | Disease similarity based on disease semantic | ||||
|---|---|---|---|---|---|
| Rank | lncRNA | Evidence | Rank | lncRNA | Evidence |
| 1 | H19 | LncRNADisease v2.0 | 1 | TRAF3IP2-AS1 | unconfirmed |
| 2 | MALAT1 | Lnc2Cancer 3.0 | 2 | DLX6-AS1 | PMID: 33035382 |
| 3 | TRAF3IP2-AS1 | unconfirmed | 3 | SNHG11 | Lnc2Cancer 3.0 |
| 4 | PVT1 | Lnc2Cancer 3.0 | 4 | H19 | LncRNADisease v2.0 |
| 5 | MEG3 | Lnc2Cancer 3.0 | 5 | IGF2-AS | Lnc2Cancer 3.0 |
| 6 | XIST | Lnc2Cancer 3.0 | 6 | TERC | LncRNADisease v2.0 |
| 7 | CDKN2B-AS1 | LncRNADisease v2.0 | 7 | GAS5 | Lnc2Cancer 3.0 |
| 8 | UCA1 | Lnc2Cancer 3.0 | 8 | MALAT1 | Lnc2Cancer 3.0 |
| 9 | KCNQ1OT1 | Lnc2Cancer 3.0 | 9 | C1QTNF9B-AS1 | Lnc2Cancer 3.0 |
| 10 | GAS5 | Lnc2Cancer 3.0 | 10 | MEG3 | Lnc2Cancer 3.0 |
| 11 | IGF2-AS | Lnc2Cancer 3.0 | 11 | XIST | Lnc2Cancer 3.0 |
| 12 | HOTAIR | Lnc2Cancer 3.0 | 12 | PVT1 | Lnc2Cancer 3.0 |
| 13 | TUG1 | Lnc2Cancer 3.0 | 13 | HOTAIR | Lnc2Cancer 3.0 |
| 14 | TERC | LncRNADisease v2.0 | 14 | KCNQ1OT1 | Lnc2Cancer 3.0 |
| 15 | CTBP1-AS | Lnc2Cancer 3.0 | 15 | CDKN2B-AS1 | LncRNADisease v2.0 |
Top 15 predicted lncRNAs associated with renal carcinoma
| Disease similarity based on gene–gene interaction network | Disease similarity based on disease semantic | ||||
|---|---|---|---|---|---|
| Rank | lncRNA | Evidence | Rank | lncRNA | Evidence |
| 1 | TRAF3IP2-AS1 | PMID: 33741027 | 1 | TRAF3IP2-AS1 | PMID: 33741027 |
| 2 | H19 | LncRNADisease v2.0 | 2 | DLX6-AS1 | Lnc2Cancer 3.0 |
| 3 | XIST | Lnc2Cancer 3.0 | 3 | SNHG11 | PMID: 32126023 |
| 4 | CDKN2B-AS1 | Lnc2Cancer 3.0 | 4 | H19 | LncRNADisease v2.0 |
| 5 | MALAT1 | Lnc2Cancer 3.0 | 5 | MALAT1 | Lnc2Cancer 3.0 |
| 6 | MIAT | PMID: 30041179 | 6 | CDKN2B-AS1 | Lnc2Cancer 3.0 |
| 7 | UCA1 | Lnc2Cancer 3.0 | 7 | XIST | Lnc2Cancer 3.0 |
| 8 | DRAIC | LncRNADisease v2.0 | 8 | MIAT | PMID: 30041179 |
| 9 | MIR17HG | PMID: 24511118 | 9 | GAS5 | Lnc2Cancer 3.0 |
| 10 | MEG3 | Lnc2Cancer 3.0 | 10 | MEG3 | Lnc2Cancer 3.0 |
| 11 | KCNQ1OT1 | LncRNADisease v2.0 | 11 | NEAT1 | Lnc2Cancer 3.0 |
| 12 | NEAT1 | Lnc2Cancer 3.0 | 12 | KCNQ1OT1 | LncRNADisease v2.0 |
| 13 | TUG1 | Lnc2Cancer 3.0 | 13 | UCA1 | Lnc2Cancer 3.0 |
| 14 | PCAT29 | LncRNADisease v2.0 | 14 | LSINCT5 | unconfirmed |
| 15 | MINA | unconfirmed | 15 | MIR17HG | PMID: 24511118 |
Fig. 6Venn diagrams of the two datasets
Three benchmark datasets
| Datasets | lncRNAs | Diseases | Associations |
|---|---|---|---|
| Dataset1 | 285 | 226 | 621 |
| Dataset2 | 240 | 412 | 2697 |
| Dataset3 | 443 | 608 | 3207 |
Fig. 7Procedure of the method gGATLDA