| Literature DB >> 31494494 |
Zhen-Hao Guo1, Zhu-Hong You2, Yan-Bin Wang3, Hai-Cheng Yi4, Zhan-Heng Chen4.
Abstract
Long non-coding RNA (lncRNA) play critical roles in the occurrence and development of various diseases. The determination of the lncRNA-disease associations thus would contribute to provide new insights into the pathogenesis of the disease, the diagnosis, and the gene treatments. Considering that traditional experimental approaches are difficult to detect potential human lncRNA-disease associations from the vast amount of biological data, developing computational method could be of significant value. In this paper, we proposed a novel computational method named LDASR to identify associations between lncRNA and disease by analyzing known lncRNA-disease associations. First, the feature vectors of the lncRNA-disease pairs were obtained by integrating lncRNA Gaussian interaction profile kernel similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. Second, autoencoder neural network was employed to reduce the feature dimension and get the optimal feature subspace from the original feature set. Finally, Rotating Forest was used to carry out prediction of lncRNA-disease association. The proposed method achieves an excellent preference with 0.9502 AUC in leave-one-out cross-validations (LOOCV) and 0.9428 AUC in 5-fold cross-validation, which significantly outperformed previous methods. Moreover, two kinds of case studies on identifying lncRNAs associated with colorectal cancer and glioma further proves the capability of LDASR in identifying novel lncRNA-disease associations. The promising experimental results show that the LDASR can be an excellent addition to the biomedical research in the future.Entities:
Keywords: Biocomputational Method; Bioinformatics; Computational Bioinformatics
Year: 2019 PMID: 31494494 PMCID: PMC6733997 DOI: 10.1016/j.isci.2019.08.030
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Flowchart of LDASR
Step 1: Building three similarity matrices for disease by combining semantic information and Gaussian kernel information. Step 2: Building 1 similarity matrix for lncRNA. Step 3: Extraction of similarity feature vectors for disease and lncRNA from disease similarity matrix and lncRNA similarity matrix. Step 4: Extracting the same number of positive and negative samples from the adjacency matrix to construct the dataset used in this paper. Step 5: Selecting the most valuable features and reducing feature noise by using autoencoder. Step 6: more discriminant feature vectors were put into Rotation Forest ensemble classifier for training, verification, and prediction. The construction of disease semantic matrix can see also Figure S1.
Figure 2The ROC and AUC of LDASR in LOOCV Based on the v2017 Dataset (3,530 lncRNA-Disease Associations)
Figure 3The ROCs and AUCs of LDASR in 5-Fold Cross-validation Based on the v2017 Dataset (3,530 lncRNA-Disease Associations)
Five-fold Cross-validation Results Performed by LDASR on the v2017 Dataset (3,530 lncRNA-Disease Associations)
| Fold | Acc. (%) | Sen. (%) | Spec. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| 0 | 83.85 | 90.08 | 77.62 | 80.10 | 68.24 | 93.11 |
| 1 | 85.27 | 88.95 | 81.59 | 82.85 | 70.73 | 93.19 |
| 2 | 84.42 | 89.80 | 79.04 | 81.07 | 69.24 | 94.78 |
| 3 | 88.10 | 91.22 | 84.99 | 85.87 | 76.35 | 95.27 |
| 4 | 86.97 | 90.65 | 83.29 | 84.43 | 74.14 | 95.08 |
Figure 5Under the v2012 Dataset (586 lncRNA-Disease Associations), LDASR and LRLSLDA, LRLSLDA-LNCSIM1, LRLSLDA-LNCSIM2 Were Compared between the AUCs Obtained under LOOCV
Figure 4Comparison with Random Forest, Logistic Regression, Naive Bayes, and SVM in 5-Fold Cross-validation Based on the v2017 Dataset (3,530 lncRNA-Disease Associations)
Top 10 Colorectal Cancer-Associated lncRNAs Predicted by LDASR
| Num | lncRNA | Confirmed Database |
|---|---|---|
| 1 | snhg3 | LncRNAWiki |
| 2 | linc00237 | Unconfirmed |
| 3 | kcna2 | Unconfirmed |
| 4 | xist | LncRNADisease/MNDR 2.0 |
| 5 | cahm | LncRNADisease/CRlncRNA |
| 6 | bx649059 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 7 | ab073614 | Lnc2Cancer |
| 8 | bx648207 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 9 | ak123657 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 10 | fas-as1 | Unconfirmed |
Top 10 Glioma-Associated lncRNAs Predicted by LDASR
| Num | lncRNA | Confirmed Database |
|---|---|---|
| 1 | zfat-as1 | Unconfirmed |
| 2 | xist | CRlncRNA/MNDR 2.0 |
| 3 | spry4-it1 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 4 | cytor | MNDR 2.0 |
| 5 | neat1 | LncRNADisease/CRlncRNA |
| 6 | meg3 | LncRNADisease/CRlncRNA |
| 7 | malat1 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 8 | cdkn2b-as1 | LncRNADisease |
| 9 | h19 | LncRNADisease/CRlncRNA |
| 10 | hotair | LncRNADisease/CRlncRNA/MNDR 2.0 |
Top 10 Prostate Cancer-Associated lncRNAs Predicted by LDASR
| Num | lncRNA | Confirmed Database |
|---|---|---|
| 1 | pcat29 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 2 | tug1 | Unconfirmed |
| 3 | malat1 | LncRNADisease/CRlncRNA/MNDR 2.0 |
| 4 | hif1a-as2 | Unconfirmed |
| 5 | h19 | LncRNADisease/CRlncRNA |
| 6 | dleu1 | LncRNADisease/MNDR 2.0 |
| 7 | dgcr5 | Unconfirmed |
| 8 | cytor | Unconfirmed |
| 9 | cdkn2b-as3 | Unconfirmed |
| 10 | cdkn2b-as11 | LncRNADisease/CRlncRNA/MNDR 2.0/Lnc2Cancer |