| Literature DB >> 31130990 |
Ping Xuan1, Yangkun Cao1, Tiangang Zhang2, Rui Kong3, Zhaogong Zhang1.
Abstract
A lot of studies indicated that aberrant expression of long non-coding RNA genes (lncRNAs) is closely related to human diseases. Identifying disease-related lncRNAs (disease lncRNAs) is critical for understanding the pathogenesis and etiology of diseases. Most of the previous methods focus on prioritizing the potential disease lncRNAs based on shallow learning methods. The methods fail to extract the deep and complex feature representations of lncRNA-disease associations. Furthermore, nearly all the methods ignore the discriminative contributions of the similarity, association, and interaction relationships among lncRNAs, disease, and miRNAs for the association prediction. A dual convolutional neural networks with attention mechanisms based method is presented for predicting the candidate disease lncRNAs, and it is referred to as CNNLDA. CNNLDA deeply integrates the multiple source data like the lncRNA similarities, the disease similarities, the lncRNA-disease associations, the lncRNA-miRNA interactions, and the miRNA-disease associations. The diverse biological premises about lncRNAs, miRNAs, and diseases are combined to construct the feature matrix from the biological perspectives. A novel framework based on the dual convolutional neural networks is developed to learn the global and attention representations of the lncRNA-disease associations. The left part of the framework exploits the various information contained by the feature matrix to learn the global representation of lncRNA-disease associations. The different connection relationships among the lncRNA, miRNA, and disease nodes and the different features of these nodes have the discriminative contributions for the association prediction. Hence we present the attention mechanisms from the relationship level and the feature level respectively, and the right part of the framework learns the attention representation of associations. The experimental results based on the cross validation indicate that CNNLDA yields superior performance than several state-of-the-art methods. Case studies on stomach cancer, lung cancer, and colon cancer further demonstrate CNNLDA's ability to discover the potential disease lncRNAs.Entities:
Keywords: attention at feature level; attention at relationship level; dual convolutional neural networks; lncRNA-disease prediction; lncRNA-miRNA interactions
Year: 2019 PMID: 31130990 PMCID: PMC6509943 DOI: 10.3389/fgene.2019.00416
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Construction of the feature matrix of lncRNA l1 and disease d2. (A) Construct the first part of feature matrix by integrating the lncRNA similarities and the lncRNA-disease associations. (B) Construct the second part by incorporating the lncRNA-disease associations and the disease similarities. (C) Construct the third part by exploiting the lncRNA-miRNA interactions and the miRNA-disease associations. (D) Concatenate these three parts to form the feature matrix .
Figure 2Construction of the framework based on the dual convolutional neural networks for learning the global and attention representations. (A) Construct the convolutional and pooling layers. (B) Establish the attention mechanism at the feature and relationship levels. (C) Construct the final module to estimate the association score.
Figure 3ROC curves and PR curves of CNNLDA and other methods for all the diseases. (A) ROC curves of all the methods. (B) PR curves of all the methods.
AUCs of ROC curves of CNNLDA and other methods for all of the diseases and 10 well-characterized diseases.
| Average AUC on 402 diseases | 0.746 | 0.871 | 0.626 | 0.863 | |
| Respiratory system cancer | 0.885 | 0.789 | 0.719 | 0.891 | |
| Organ system cancer | 0.82 | 0.95 | 0.729 | 0.884 | |
| Intestinal cancer | 0.811 | 0.909 | 0.559 | 0.905 | |
| Prostate cancer | 0.873 | 0.826 | 0.553 | 0.71 | |
| Lung cancer | 0.79 | 0.911 | 0.676 | 0.883 | |
| Breast cancer | 0.742 | 0.871 | 0.517 | 0.83 | |
| Reproductive organ cancer | 0.707 | 0.818 | 0.74 | 0.742 | |
| Gastrointestinal system cancer | 0.784 | 0.896 | 0.582 | 0.867 | |
| Liver cancer | 0.799 | 0.91 | 0.634 | 0.898 | |
| Hepatocellular carcinoma | 0.765 | 0.903 | 0.688 | 0.902 | |
The bold values significant the highest AUC.
AUPRs of PR curves of CNNLDA and other methods for all of the diseases and 10 well-characterized diseases.
| Average AUPR on 402 diseases | 0.095 | 0.219 | 0.066 | 0.166 | |
| Respiratory system cancer | 0.245 | 0.149 | 0.072 | 0.303 | |
| Organ system cancer | 0.411 | 0.765 | 0.338 | 0.628 | |
| Intestinal cancer | 0.141 | 0.252 | 0.042 | 0.246 | |
| Prostate cancer | 0.176 | 0.333 | 0.095 | 0.297 | |
| Lung cancer | 0.058 | 0.138 | 0.008 | 0.094 | |
| Breast cancer | 0.445 | 0.803 | 0.476 | 0.629 | |
| Reproductive organ cancer | 0.091 | 0.047 | 0.031 | 0.396 | |
| Gastrointestinal system cancer | 0.130 | 0.271 | 0.104 | 0.238 | |
| Liver cancer | 0.201 | 0.526 | 0.086 | 0.498 | |
| Hepatocellular carcinoma | 0.096 | 0.239 | 0.082 | 0.303 | |
The bold values significant the highest AUPR.
A pairwise comparison with a paired Wilcoxon-test on the prediction results in terms of AUCs and AUPRs.
| 7.2911e-116 | 7.7561e-53 | 1.3120e-133 | 3.7677e-64 | |
| 1.7468e-41 | 0.0455 | 5.0559e-52 | 4.8014e-09 |
Figure 4The average recalls across all the tested diseases under different top k cutoffs.
The candidate lncRNAs associated with stomach cancer, lung cancer and colon cancer.
| Stomachcancer | 1 | XIST | LncRNADisease, Lnc2Cancer | 9 | HULC | LncRNADisease, Lnc2Cancer |
| 2 | NEAT1 | LncRNADisease, Lnc2Cancer | 10 | PCAT1 | Lnc2Cancer | |
| 3 | SOX2-OT | Lnc2Cancer | 11 | HOTTIP | LncRNADisease, Lnc2Cancer | |
| 4 | CCAT2 | LncRNADisease, Lnc2Cancer | 12 | KCNQ1OT1 | literature1 Sun et al., | |
| 5 | TUG1 | LncRNADisease, Lnc2Cancer | 13 | WT1-AS | LncRNADisease, Lnc2Cancer | |
| 6 | MALAT1 | LncRNADisease, Lnc2Cancer | 14 | NPTN-IT1 | miRCancer, StarBase | |
| 7 | BCYRN1 | Lnc2Cancer | 15 | MIR17HG | literature1 Bahari et al., | |
| 8 | HCP5 | literature2 Mo et al., | ||||
| Lung cancer | 1 | HOTTIP | LncRNADisease, Lnc2Cancer | 9 | LINC00663 | Lnc2Cancer |
| 2 | PCA3 | unconfirmed | 10 | SOX2-OT | LncRNADisease | |
| 3 | LINC00675 | unconfirmed | 11 | MIAT | Lnc2Cancer | |
| 4 | HULC | literature1Zhang et al., | 12 | LINC00312 | Lnc2Cancer | |
| 5 | KCNQ1OT1 | Lnc2Cancer | 13 | TINCR | Lnc2Cancer | |
| 6 | SNHG12 | Lnc2Cancer | 14 | LINC00961 | Lnc2Cancer | |
| 7 | CBR3-AS1 | miRCancer, StarBase | 15 | GHET1 | Lnc2Cancer | |
| 8 | TUSC7 | Lnc2Cancer | ||||
| Colon cancer | 1 | PVT1 | Lnc2Cancer | 9 | SNHG4 | miRCancer, StarBase |
| 2 | UCA1 | LncRNADisease, Lnc2Cancer | 10 | SPRY4-IT1 | literature1 Shen et al., | |
| 3 | NEAT1 | Lnc2Cancer | 11 | BANCR | Lnc2Cancer | |
| 4 | WT1-AS | Lnc2Cancer | 12 | HULC | Lnc2Cancer | |
| 5 | CDKN2B-AS1 | Lnc2Cancer | 13 | LSINCT5 | Lnc2Cancer | |
| 6 | BCYRN1 | literature1 Gu et al., | 14 | KCNQ1OT1 | Lnc2Cancer | |
| 7 | GAS5 | Lnc2Cancer | 15 | HNF1A-AS1 | Lnc2Cancer | |
| 8 | HOTAIRM1 | Lnc2Cancer |
(1) “Lnc2Cancer” and “LncRNADisease” are manually curated database. (2) “literature.