| Literature DB >> 31480319 |
Ping Xuan1, Nan Sheng1, Tiangang Zhang2, Yong Liu1, Yahong Guo3.
Abstract
It is well known that the unusual expression of long non-coding RNAs (lncRNAs) is closely related to the physiological and pathological processes of diseases. Therefore, inferring the potential lncRNA-disease associations are helpful for understanding the molecular pathogenesis of diseases. Most previous methods have concentrated on the construction of shallow learning models in order to predict lncRNA-disease associations, while they have failed to deeply integrate heterogeneous multi-source data and to learn the low-dimensional feature representations from these data. We propose a method based on the convolutional neural network with the attention mechanism and convolutional autoencoder for predicting candidate disease-related lncRNAs, and refer to it as CNNDLP. CNNDLP integrates multiple kinds of data from heterogeneous sources, including the associations, interactions, and similarities related to the lncRNAs, diseases, and miRNAs. Two different embedding layers are established by combining the diverse biological premises about the cases that the lncRNAs are likely to associate with the diseases. We construct a novel prediction model based on the convolutional neural network with attention mechanism and convolutional autoencoder to learn the attention and the low-dimensional network representations of the lncRNA-disease pairs from the embedding layers. The different adjacent edges among the lncRNA, miRNA, and disease nodes have different contributions for association prediction. Hence, an attention mechanism at the adjacent edge level is established, and the left side of the model learns the attention representation of a pair of lncRNA and disease. A new type of lncRNA similarity and a new type of disease similarity are calculated by incorporating the topological structures of multiple bipartite networks. The low-dimensional network representation of the lncRNA-disease pairs is further learned by the autoencoder based convolutional neutral network on the right side of the model. The cross-validation experimental results confirm that CNNDLP has superior prediction performance compared to the state-of-the-art methods. Case studies on stomach cancer, breast cancer, and prostate cancer further show the ability of CNNDLP for discovering the potential disease lncRNAs.Entities:
Keywords: attention at adjacent edge level; convolutional neural networks; feature learning based on convolutional autoencoder; lncRNA-disease association prediction; similarity calculation based on multiple bipartite networks
Mesh:
Substances:
Year: 2019 PMID: 31480319 PMCID: PMC6747450 DOI: 10.3390/ijms20174260
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1ROC curves and PR curves of CNNDLP and other methods for all diseases.
AUCs of CNNDLP and other methods on all the diseases and 10 well-characterized diseases.
| Disease Name | CNNDLP | Ping’s Method | AUC | SIMCLDA | MFLDA | CNNLDA |
|---|---|---|---|---|---|---|
| Prostate cancer |
| 0.826 | 0.710 | 0.874 | 0.553 | 0.897 |
| Stomach cancer | 0.947 | 0.930 | 0.928 | 0.864 | 0.467 |
|
| Lung cancer |
| 0.911 | 0.882 | 0.790 | 0.676 | 0.940 |
| Breast cancer |
| 0.872 | 0.830 | 0.742 | 0.517 | 0.836 |
| Reproduce organ cancer |
| 0.818 | 0.742 | 0.707 | 0.740 | 0.922 |
| Ovarian cancer |
| 0.913 | 0.857 | 0.786 | 0.558 | 0.942 |
| Hematologic cancer |
| 0.908 | 0.903 | 0.828 | 0.716 | 0.934 |
| Kidney cancer |
| 0.979 | 0.977 | 0.728 | 0.677 | 0.956 |
| Liver cancer |
| 0.910 | 0.898 | 0.799 | 0.634 | 0.918 |
| Thoracic cancer |
| 0.860 | 0.792 | 0.792 | 0.649 | 0.890 |
| Average AUC of 405 diseases |
| 0.870 | 0.745 | 0.745 | 0.626 | 0.952 |
The bold values indicate the higher AUCs.
AUPRs of CNNDLP and other methods on all the diseases and 10 well-characterized diseases.
| Disease Name | CNNDLP | Ping’s Method | AUPR | SIMCLDA | MFLDA | CNNLDA |
|---|---|---|---|---|---|---|
| Prostate cancer |
| 0.333 | 0.297 | 0.176 | 0.092 | 0.390 |
| Stomach cancer |
| 0.364 | 0.094 | 0.138 | 0.008 | 0.286 |
| Lung cancer |
| 0.437 | 0.363 | 0.131 | 0.171 | 0.058 |
| Breast cancer | 0.485 | 0.403 | 0.396 | 0.047 | 0.031 |
|
| Reproduce organ cancer |
| 0.281 | 0.240 | 0.130 | 0.103 | 0.091 |
| Ovarian cancer |
| 0.483 | 0.427 | 0.027 | 0.023 | 0.526 |
| Hematologic cancer |
| 0.403 | 0.370 | 0.216 | 0.121 | 0.523 |
| Kidney cancer | 0.569 |
| 0.462 | 0.030 | 0.034 | 0.584 |
| Liver cancer | 0.630 | 0.498 | 0.511 | 0.140 | 0.110 |
|
| Thoracic cancer |
| 0.383 | 0.364 | 0.155 | 0.102 | 0.890 |
| Average AUC of 405 diseases |
| 0.152 | 0.127 | 0.059 | 0.039 | 0.251 |
The bold values indicate the higher AUPRs.
Figure 2Recall values of top k candidates of CNNDLP and other four methods.
Comparing of different methods based on AUCs with the paired Wilcoxon test.
| SIMCLDA | Ping’s Method | MFLDA | LDAP | CNNLDA | |
|---|---|---|---|---|---|
| 9.2454 × 10−6 | 0.00048 | 5.9940 × 10−7 | 0.00121 | 0.00773 | |
| 8.3473 × 10−7 | 0.04174 | 3.5037 × 10−8 | 0.00126 | 0.00024 |
The top 15 stomach cancer-related candidate lncRNAs.
| Rank | lncRNA Name | Description | Rank | lncRNA Name | Description |
|---|---|---|---|---|---|
| 1 | SPRY4-IT1 | Lnc2Cancer, LncRNADisease | 9 | CDKN2B-AS1 | LncRNADisease |
| 2 | TINCR | Lnc2Cancer, LncRNADisease | 10 | CCAT1 | Lnc2Cancer, LncRNADisease |
| 3 | H19 | Lnc2Cancer, LncRNADisease | 11 | HOTAIR | Lnc2Cancer, LncRNADisease |
| 4 | TUSC7 | Lnc2Cancer, LncRNADisease | 12 | GACAT2 | LncRNADisease |
| 5 | BANCR | Lnc2Cancer, LncRNADisease | 13 | UCA1 | Lnc2Cancer, LncRNADisease |
| 6 | MEG3 | Lnc2Cancer, LncRNADisease | 14 | PVT1 | Lnc2Cancer, LncRNADisease |
| 7 | GAS5 | Lnc2Cancer, LncRNADisease | 15 | MEG8 | literature |
| 8 | GHET1 | Lnc2Cancer, LncRNADisease |
The top 15 breast cancer-related candidate lncRNAs.
| Rank | lncRNA Name | Description | Rank | lncRNA Name | Description |
|---|---|---|---|---|---|
| 1 | SOX2-OT | Lnc2Cancer, LncRNADisease | 9 | CCAT1 | Lnc2Cancer, LncRNADisease |
| 2 | HOTAIR | Lnc2Cancer, LncRNADisease | 10 | GAS5 | Lnc2Cancer, LncRNADisease |
| 3 | LINC00472 | Lnc2Cancer, LncRNADisease | 11 | MIR124-2HG | literature |
| 4 | BCYRN1 | LncRNADisease | 12 | XIST | Lnc2Cancer, LncRNADisease |
| 5 | LINC-PINT | literature | 13 | LINC-ROR | Lnc2Cancer, LncRNADisease |
| 6 | MALAT1 | Lnc2Cancer, LncRNADisease | 14 | PANDAR | Lnc2Cancer, LncRNADisease |
| 7 | CDKN2B-AS1 | LncRNADisease | 15 | AFAP1-AS1 | Lnc2Cancer |
| 8 | SPRY4-IT1 | Lnc2Cancer, LncRNADisease |
The top 15 prostate cancer-related candidate lncRNAs.
| Rank | lncRNA Name | Description | Rank | lncRNA Name | Description |
|---|---|---|---|---|---|
| 1 | CDKN2B-AS1 | LncRNADisease | 9 | HOTAIR | Lnc2Cancer, LncRNADisease |
| 2 | PCGEM1 | Lnc2Cancer, LncRNADisease | 10 | LINC00963 | Lnc2Cancer, LncRNADisease |
| 3 | PVT1 | Lnc2Cancer, LncRNADisease | 11 | H19 | Lnc2Cancer, LncRNADisease |
| 4 | GAS5 | Lnc2Cancer, LncRNADisease | 12 | MEG3 | Lnc2Cancer, LncRNADisease |
| 5 | HOTTIP | Lnc2Cancer, LncRNADisease | 13 | TUG1 | Lnc2Cancer, LncRNADisease |
| 6 | NEAT1 | Lnc2Cancer, LncRNADisease | 14 | PCA3 | Lnc2Cancer, LncRNADisease |
| 7 | PCAT5 | Lnc2Cancer | 15 | DANCR | Lnc2Cancer, LncRNADisease |
| 8 | PRINS | Lnc2Cancer, LncRNADisease |
Figure 3Construction and representation of multiple bipartite graphs. (a) Construct a lncRNA-disease association bipartite graph based on the known associations between lncRNAs and diseases, and its’ matrix representation . (b) Construct lncRNA-miRNA interactions bipartite graph based on the known lncRNA-miRNA interactions, and its’ matrix representation . (c) Construct miRNA-disease association bipartite graph based on known miRNA-disease associations, and its’ matrix representation . (d) Calculate the lncRNA similarity, and construct the matrix representation . (e) Calculate the disease similarity, and construct the matrix representation .
Figure 4Construction of the left embedding layer matrix of and , . (a) Construct the first part of by exploiting the lncRNA similarities and the lncRNA-disease associations. (b) Construct the second part of by integrating the lncRNA-disease associations and the disease similarities. (c) Construct the third part of by incorporating the lncRNA-miRNA interactions and the miRNA-disease associations. (d) Concatenate the three parts of .
Figure 5Calculation of the first type of lncRNA similarity and the first type of disease similarity. (a) The lncRNA-disease association bipartite network. (b) Calculate the lncRNA similarities based on the common associated disease nodes. (c) Computer the disease similarities based on their common related lncRNA nodes. (d) Calculate the lncRNA similarities according to their associated similar disease nodes. (e) The disease similarity calculation based on their related similar lncRNA nodes.
Figure 6Construction of the prediction model based on the convolutional neural network and convolutional autoencoder for learning the attention representation and the low-dimensional network representation.