| Literature DB >> 31480350 |
Ping Xuan1, Shuxiang Pan1, Tiangang Zhang2, Yong Liu1, Hao Sun1.
Abstract
Aberrant expressions of long non-coding RNAs (lncRNAs) are often associated with diseases and identification of disease-related lncRNAs is helpful for elucidating complex pathogenesis. Recent methods for predicting associations between lncRNAs and diseases integrate their pertinent heterogeneous data. However, they failed to deeply integrate topological information of heterogeneous network comprising lncRNAs, diseases, and miRNAs. We proposed a novel method based on the graph convolutional network and convolutional neural network, referred to as GCNLDA, to infer disease-related lncRNA candidates. The heterogeneous network containing the lncRNA, disease, and miRNA nodes, is constructed firstly. The embedding matrix of a lncRNA-disease node pair was constructed according to various biological premises about lncRNAs, diseases, and miRNAs. A new framework based on a graph convolutional network and a convolutional neural network was developed to learn network and local representations of the lncRNA-disease pair. On the left side of the framework, the autoencoder based on graph convolution deeply integrated topological information within the heterogeneous lncRNA-disease-miRNA network. Moreover, as different node features have discriminative contributions to the association prediction, an attention mechanism at node feature level is constructed. The left side learnt the network representation of the lncRNA-disease pair. The convolutional neural networks on the right side of the framework learnt the local representation of the lncRNA-disease pair by focusing on the similarities, associations, and interactions that are only related to the pair. Compared to several state-of-the-art prediction methods, GCNLDA had superior performance. Case studies on stomach cancer, osteosarcoma, and lung cancer confirmed that GCNLDA effectively discovers the potential lncRNA-disease associations.Entities:
Keywords: attention mechanism at node feature level; convolutional neural network; graph convolutional network; lncRNA-disease association prediction
Mesh:
Substances:
Year: 2019 PMID: 31480350 PMCID: PMC6769579 DOI: 10.3390/cells8091012
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Construction and representation of a heterogeneous network with three different nodes. (a) LncRNA network (LncNet) and its adjacency matrix were constructed by calculating the functional similarity of the lncRNAs according to their associated diseases. (b) Calculation of the functional similarity of the lncRNAs based on their related diseases and construction of miRNA network (MirNet) and the adjacency matrix . (c) Establishment of the connexion between LncNet and disease network (DisNet) based on known lncRNA-disease associations and construction of the adjacency matrix . (d) Connexion of LncNet and MirNet according to known interactions between lncRNAs and miRNAs and construction of the adjacency matrix . (e) Connexion of the miRNAs and diseases according to known miRNA-disease associations and construction of the adjacency matrix C. (f) Computation of the similarities based on the DAGs of the diseases and construction of DisNet and the adjacency matrix . (g) LncNet, DisNet, MirNet, and the connexions among them were used to construct the heterogeneous network LncDisMirNet and its adjacency matrix .
Figure 2Overall model structure. (a) Establish the attention mechanism at the feature levels and the autoencoder based on graph convolution. (b) Construct the convolutional and pooling layers.
Figure 3Construction of the embedding matrix of - pair. (a) Construction of the first part of the embedding matrix based on the similarity between and the other lncRNAs and the association between and all lncRNAs. (b) The second part of the embedding matrix was constructed based on the similarity between and the other lncRNA and the association between and the other diseases. (c) Construction of the third part using the lncRNA-miRNA interactions and miRNA-disease associations. (d) Construction of the final embedding matrix by combining the representations of the first, second, and third parts.
Figure 4Receiver operating characteristic (ROC) and precision-recall (PR) curves of GCNLDA and other methods for all diseases. (a) ROC curves of all the methods; (b) PR curves of all the methods.
Area under the ROC curves (AUCs) of GCNLDA and other methods for all the diseases and 10 well-characterized diseases.
| Disease Name | AUC | ||||
|---|---|---|---|---|---|
| GCNLDA | SIMCLDA | Ping’s Method | MFLDA | LDAP | |
| Average AUC on 405 diseases |
| 0.746 | 0.871 | 0.626 | 0.863 |
| respiratory system cancer |
| 0.789 | 0.911 | 0.719 | 0.891 |
| organ system cancer |
| 0.82 | 0.95 | 0.729 | 0.884 |
| intestinal cancer |
| 0.811 | 0.909 | 0.559 | 0.905 |
| prostate cancer |
| 0.873 | 0.826 | 0.553 | 0.71 |
| lung cancer |
| 0.79 | 0.911 | 0.676 | 0.883 |
| breast cancer |
| 0.742 | 0.871 | 0.517 | 0.83 |
| reproductive organ cancer |
| 0.707 | 0.818 | 0.74 | 0.742 |
| gastrointestinal system cancer |
| 0.784 | 0.896 | 0.582 | 0.867 |
| liver cancer |
| 0.799 | 0.91 | 0.634 | 0.898 |
| hepatocellular carcinoma |
| 0.765 | 0.903 | 0.688 | 0.902 |
The bold values indicate the higher AUCs.
AUPRs of GCNLDA and other methods for all the diseases and 10 well-characterized diseases.
| Disease Name | AUPR | ||||
|---|---|---|---|---|---|
| GCNLDA | SIMCLDA | Ping’s Method | MFLDA | LDAP | |
| Average AUC on 405 diseases |
| 0.166 | 0.219 | 0.095 | 0.066 |
| respiratory system cancer |
| 0.149 | 0.414 | 0.072 | 0.303 |
| organ system cancer |
| 0.411 | 0.765 | 0.338 | 0.628 |
| intestinal cancer |
| 0.141 | 0.252 | 0.042 | 0.246 |
| prostate cancer |
| 0.176 | 0.333 | 0.095 | 0.297 |
| lung cancer |
| 0.138 | 0.334 | 0.008 | 0.094 |
| breast cancer | 0.623 | 0.445 |
| 0.476 | 0.629 |
| reproductive organ cancer |
| 0.047 | 0.403 | 0.031 | 0.396 |
| gastrointestinal system cancer |
| 0.130 | 0.271 | 0.104 | 0.238 |
| liver cancer |
| 0.201 | 0.526 | 0.086 | 0.498 |
| hepatocellular carcinoma |
| 0.096 | 0.239 | 0.082 | 0.303 |
The bold values indicate the higher AUPRs.
A pairwise comparison with a paired Wilcoxon-test on the prediction results.
| SIMCLDA | Ping’s Method | MFLDA | LDAP | |
|---|---|---|---|---|
| 1.131026 × 10−106 | 1.494908 × 10−44 | 4.534043 × 10−124 | 4.291344 × 10−50 | |
| 1.342560 × 10−89 | 2.204929 × 10−29 | 1.567472 × 10−112 | 2.844473 × 10−48 |
Figure 5Average recalls across all tested diseases under different top k cutoffs.
The top 15 candidate lncrnas for stomach cancer, osteosarcoma and lung cancer.
| Disease Name | Rank | lncRNA | Evidence | Rank | lncRNA | Evidence |
|---|---|---|---|---|---|---|
| Stomach cancer | 1 | MALAT1 | Lnc2Cancer, LncRNADisease | 9 | HULC | Lnc2Cancer, LncRNADisease |
| 2 | NEAT1 | Lnc2Cancer, LncRNADisease | 10 | CCAT2 | Lnc2Cancer, LncRNADisease | |
| 3 | MIR17HG | Literature [ | 11 | KCNQ1OT1 | Lnc2Cancer | |
| 4 | HOTTIP | Lnc2Cancer, LncRNADisease | 12 | BCYRN1 | LncRNADisease* | |
| 5 | TUG1 | Lnc2Cancer, LncRNADisease | 13 | CASC2 | Lnc2Cancer, LncRNADisease | |
| 6 | HNF1A-AS1 | Lnc2Cancer, LncRNADisease | 14 | PANDAR | Lnc2Cancer, LncRNADisease | |
| 7 | XIST | Lnc2Cancer, LncRNADisease | 15 | PCAT1 | LncRNADisease* | |
| 8 | AFAP1-AS1 | Lnc2Cancer | ||||
| Osteosarcoma | 1 | H19 | Lnc2Cancer, LncRNADisease | 9 | LINC00675 | LncRNADisease* |
| 2 | GAS5 | Lnc2Cancer | 10 | BCYRN1 | LncRNADisease* | |
| 3 | PVT1 | Lnc2Cancer | 11 | CCAT2 | Lnc2Cancer | |
| 4 | NEAT1 | Lnc2Cancer | 12 | CASC2 | Lnc2Cancer | |
| 5 | EWSAT1 | Lnc2Cancer | 13 | CCAT1 | Lnc2Cancer | |
| 6 | AFAP1-AS1 | Literature [ | 14 | TP73-AS1 | Lnc2Cancer | |
| 7 | CDKN2B-AS1 | LncRNADisease | 15 | PCA3 | LncRNADisease* | |
| 8 | SPRY4-IT1 | Lnc2Cancer | ||||
| Lung cancer | 1 | KCNQ1OT1 | Lnc2Cancer | 9 | IGF2-AS | Lnc2Cancer |
| 2 | HOTTIP | Lnc2Cancer, LncRNADisease | 10 | PCAT1 | LncRNADisease | |
| 3 | SPRY4-IT1 | Lnc2Cancer, LncRNADisease | 11 | CASC2 | Lnc2Cancer, LncRNADisease | |
| 4 | TP73-AS1 | Lnc2Cancer | 12 | ESRG | LncRNADisease* | |
| 5 | MIAT | Lnc2Cancer | 13 | PCA3 | LncRNADisease* | |
| 6 | MIR155HG | Literature [ | 14 | SNHG12 | Lnc2Cancer | |
| 7 | LINC00675 | LncRNADisease* | 15 | TUSC7 | Lnc2Cancer | |
| 8 | SOX2-OT | LncRNADisease |
“Lnc2Cancer” means the lncRNA candidate was included in the Lnc2Cancer database. “LncRNADisease” means the candidate was included among the experimentally verified data in LncRNADisease. “LncRNADisease*” means the candidate was included among the predicted data in LncRNADisease. “Literature” means the candidate was supported in published studies.