| Literature DB >> 31572428 |
Wen Li1, Shulin Wang1, Junlin Xu1, Guo Mao1, Geng Tian2, Jialiang Yang2.
Abstract
Current studies have shown that long non-coding RNAs (lncRNAs) play a crucial role in a variety of fundamental biological processes related to complex human diseases. The prediction of latent disease-lncRNA associations can help to understand the pathogenesis of complex human diseases at the level of lncRNA, which also contributes to the detection of disease biomarkers, and the diagnosis, treatment, prognosis and prevention of disease. Nevertheless, it is still a challenging and urgent task to accurately identify latent disease-lncRNA association. Discovering latent links on the basis of biological experiments is time-consuming and wasteful, necessitating the development of computational prediction models. In this study, a computational prediction model has been remodeled as a matrix completion framework of the recommendation system by completing the unknown items in the rating matrix. A novel method named faster randomized matrix completion for latent disease-lncRNA association prediction (FRMCLDA) has been proposed by virtue of improved randomized partial SVD (rSVD-BKI) on a heterogeneous bilayer network. First, the correlated data source and experimentally validated information of diseases and lncRNAs are integrated to construct a heterogeneous bilayer network. Next, the integrated heterogeneous bilayer network can be formalized as a comprehensive adjacency matrix which includes lncRNA similarity matrix, disease similarity matrix, and disease-lncRNA association matrix where the uncertain disease-lncRNA associations are referred to as blank items. Then, a matrix approximate to the original adjacency matrix has been designed with predicted scores to retrieve the blank items. The construction of the approximate matrix could be equivalently resolved by the nuclear norm minimization. Finally, a faster singular value thresholding algorithm with a randomized partial SVD combing a new sub-space reuse technique has been utilized to complete the adjacency matrix. The results of leave-one-out cross-validation (LOOCV) experiments and 5-fold cross-validation (5-fold CV) experiments on three different benchmark databases have confirmed the availability and adaptability of FRMCLDA in inferring latent relationships of disease-lncRNA pairs, and in inferring lncRNAs correlated with novel diseases without any prior interaction information. Additionally, case studies have shown that FRMCLDA is able to effectively predict latent lncRNAs correlated with three widespread malignancies: prostate cancer, colon cancer, and gastric cancer.Entities:
Keywords: association prediction; faster SVT; heterogeneous bilayer network; matrix completion; randomized partial SVD; similarity measurements
Year: 2019 PMID: 31572428 PMCID: PMC6749816 DOI: 10.3389/fgene.2019.00769
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Details of three benchmark datasets.
| Datasets | Number of known associations | Number of lncRNAs | Number of diseases | Sparsity of the matrix DL | Weights in integrated Similarity |
|---|---|---|---|---|---|
| 352 | 156 | 190 | 1.187*10−2 | ||
| 540 | 115 | 178 | 2.638*10−2 | ||
| 621 | 258 | 226 | 1.065*10−2 |
The sparsity is calculated by the ratio of existed known association number to the size of the matrix (all the possible association number).
Figure 1Scheme of FRMCLDA to infer latent disease-related lncRNAs by matrix recovery.
The effects of the cosine similarity on AUC by 5CV in dataset2.
| No | Only combing | Only combing | Combing |
|---|---|---|---|
| 0.7995 ± 0.0044 | 0.8705 ± 0.0050 | 0.8510 ± 0.0032 | 0.9145 ± 0.0013 |
The result of contribution test on performance of prediction by LOOCV in dataset 2.
| Set the data source to random matrix | Average AUCs by 20 times randomization |
|---|---|
| lncRNA similarity matrix (LS) | 0.8615 ± 0.0061 |
| Disease similarity matrix (DS) | 0.8081 ± 0.0059 |
| disease-lncRNA association matrix (DL) | 0.5332 ± 0.0174 |
The time usage of FRMCLDA for different sizes of heterogeneous network.
| The size of heterogeneous network | CPU time (second) | |
|---|---|---|
| 156 × 190 | 2.1758 ± 0.2826 | |
| 115 × 178 | 1.5367 ± 0.1799 | |
| 258 ×226 | 3.9016 ± 0.2703 |
Figure 2Overall performance assessment of FRMCLDA, BPLLDA, SIMCLDA, KATZLDA and LRLSLDA in predicting disease-lncRNA relationships on Dataset 1 by global LOOCV.
Figure 3Performance assessment of LRLSLDA, GrwLDA, BPLLDA and FRMCLDA in inferring novel disease-correlated lncRNAs on Dataset 1 by local LOOCV. (A) ROC curve of inferring novel disease-related lncRNAs. (B) PR curve of inferring novel disease-related lncRNAs.
Predicting novel disease-related lncRNAs by deleting known associations for each disease.
| Known but deleted breast cancer-related lncRNAs | Rank number | Known but deleted breast cancer-related lncRNAs | Rank number |
|---|---|---|---|
| BCAR4 | 13 | LSINCT5 | 7 |
| BCYRN1 | 6 | MALAT1 | 2 |
| CDKN2B-AS1 | 4 | MEG3 | 3 |
| DSCAM-AS1 | 8 | MIR31HG | 14 |
| GAS5 | 10 | PINC | 15 |
| H19 | 1 | PVT1 | 5 |
| HOTAIR | 9 | SRA1 | 11 |
Figure 4Performance of FRMCLDA, KATZLDA and SIMCLDA on inferring lncRNAs by global 5-fold cross-validation on Dataset 2. (A) ROC curve of predicting disease-lncRNA associations. (B) PR curve of predicting disease-lncRNA associations. (C) Results of precision at every rank. (D) Results of recall at every rank.
Figure 5Performance of FRMCLDA, KATZLDA and SIMCLDA on inferring lncRNAs by global 5-fold cross-validation on Dataset 3. (A) ROC curve of predicting disease-lncRNA associations. (B) PR curve of predicting disease-lncRNA associations. (C) Results of precision at every rank. (D) Results of recall at every rank.
Precision-rank on dataset 2.
| lncRNA | Top 20 | Top 40 | Top 60 | Top 80 | Top 100 | Top 120 | Top 140 | |
|---|---|---|---|---|---|---|---|---|
|
| FRMCLDA | 0.8800 | 0.5150 | 0.4233 | 0.3775 | 0.3440 | 0.3200 | 0.3029 |
| SIMCLDA | 0.5300 | 0.4150 | 0.3667 | 0.3175 | 0.2860 | 0.2671 | 0.2343 | |
| KATZLDA | 0.2100 | 0.1500 | 0.1500 | 0.1300 | 0.1200 | 0.1167 | 0.1086 | |
|
| FRMCLDA | 0.1630 | 0.1707 | 0.2352 | 0.2796 | 0.3185 | 0.3556 | 0.3926 |
| SIMCLDA | 0.0981 | 0.1537 | 0.2037 | 0.2352 | 0.2648 | 0.2907 | 0.3037 | |
| KATZLDA | 0.0389 | 0.0556 | 0.0833 | 0.0963 | 0.1111 | 0.1296 | 0.1407 |
The top-20 lncRNAs predicted for prostate cancer.
| Rank | LncRNA | Pubmed ID | Rank | LncRNA | Pubmed ID |
|---|---|---|---|---|---|
| 1 | H19 | 24988946 | 11 | IGF2-AS | 27507663 |
| 2 | HOTAIR | 23936419 | 12 | PCAT1 | 22664915 |
| 3 | MEG3 | 14602737 | 13 | LincRNA-p21 | 27976428 |
| 4 | MALAT1 | 23845456 | 14 | PTENpg1 | not found |
| 5 | CDKN2B-AS1 | 20541999 | 15 | PRNCR1 | 20874843 |
| 6 | PVT1 | 23728290 | 16 | SNHG16 | not found |
| 7 | GAS5 | 22664915 | 17 | MINA | not found |
| 8 | Linc00963 | 24691949 | 18 | SRA1 | 16607388 |
| 9 | C1QTNF9B-AS1 | 27507663 | 19 | NEAT1 | 25415230 |
| 10 | UCA1 | 27686228 | 20 | LSINCT5 | not found |
The top-20 lncRNAs predicted for colon cancer.
| Rank | Name of LncRNA | Pubmed ID | Rank | Name of LncRNA | Pubmed ID |
|---|---|---|---|---|---|
| 1 | CDKN2B-AS1 | 26708220 | 11 | DRAIC | Not found |
| 2 | PVT1 | 25043044 | 12 | IGF2-AS | Not found |
| 3 | GAS5 | 25326054 | 13 | NPTN-IT1 | 23395002 |
| 4 | LincRNA-p21 | 26656491 | 14 | XIST | 29679755 |
| 5 | UCA1 | 26885155 | 15 | PCAT29 | Not found |
| 6 | KCNQ1OT1 | 16965397 | 16 | LSINCT5 | 25526476 |
| 7 | TUG1 | 27634385 | 17 | anti-NOS2A | Not found |
| 8 | MINA | Not found | 18 | HIF1A-AS2 | 29278853 |
| 9 | BCYRN1 | 29625226 | 19 | SNHG16 | 24519959 |
| 10 | MIAT | 29686537 | 20 | HIF1A-AS1 | 28946548 |
The top-20 lncRNAs predicted for gastric cancer.
| Rank | Name of LncRNA | Pubmed ID | Rank | Name of LncRNA | Pubmed ID |
|---|---|---|---|---|---|
| 1 | GAS5 | 27827524 | 11 | SNHG16 | 29081409 |
| 2 | MALAT1 | 27486823 | 12 | PTENpg1 | 25694351 |
| 3 | LincRNA-p21 | 28969031 | 13 | PCAT29 | 25700553 |
| 4 | BCYRN1 | 29435146 | 14 | XIST | 29053187 |
| 5 | KCNQ1OT1 | Not found | 15 | BDNF-AS1 | Not found |
| 6 | IGF2-AS | Not found | 16 | HIF1A-AS1 | 26722487 |
| 7 | TUG1 | 27983921 | 17 | HIF1A-AS2 | 25686741 |
| 8 | NPTN-IT1 | 28951520 | 18 | lncRNA-ATB | 28115163 |
| 9 | MIAT | 29039602 | 19 | HAR1B | Not found |
| 10 | DRAIC | 25700553 | 20 | CCAT2 | 29435046 |
Figure 6Network of the top-50 predicted associations of prostate cancer, colon cancer and gastric cancer on Dataset 3. Circles and triangles represent lncRNAs and diseases, respectively.