| Literature DB >> 29322937 |
Jian-Yu Shi1, Hua Huang2, Yan-Ning Zhang3, Yu-Xi Long3, Siu-Ming Yiu4.
Abstract
BACKGROUND: In human genomes, long non-coding RNAs (lncRNAs) have attracted more and more attention because their dysfunctions are involved in many diseases. However, the associations between lncRNAs and diseases (LDA) still remain unknown in most cases. While identifying disease-related lncRNAs in vivo is costly, computational approaches are promising to not only accelerate the possible identification of associations but also provide clues on the underlying mechanism of various lncRNA-caused diseases. Former computational approaches usually only focus on predicting new associations between lncRNAs having known associations with diseases and other lncRNA-associated diseases. They also only work on binary lncRNA-disease associations (whether the pair has an association or not), which cannot reflect and reveal other biological facts, such as the number of proteins involved in LDA or how strong the association is (i.e., the intensity of LDA).Entities:
Keywords: Continued; Discrete; Graph regression; Prediction; Semantic similarity; Sequence feature; lncRNA-disease association
Mesh:
Substances:
Year: 2017 PMID: 29322937 PMCID: PMC5763297 DOI: 10.1186/s12920-017-0305-y
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Graph regression for predicting the associations between lncRNAs and diseases. From top to bottom, G , G and G are listed. Circle nodes and rounded square nodes denote lncRNAs and diseases respectively. In G and G , lines denote the similarities between nodes. In G , solid lines linking nodes represent LDAs and dashed lines denote the pairs to be predicted. T1, T2, T3 and T4 account for four predicting tasks
Fig. 2A toy example of illustrating three kinds of LDAs involving 3 lncRNAs and 4 diseases. The first row shows a 3×5 binary lncRNA-protein interaction matrix, a 5×5 protein similarity matrix and a 5×4 binary gene-disease association matrix from left to right. The second row lists three kinds of 3×4 LDA matrices, including a binary matrix, a discrete matrix, and a continued matrix. The entries highlighted by different colors in discrete and continued matrices have different values. Binary LDA provides a coarse information about how a lncRNA is associated with a disease, while discrete LDA and continue LDA provide the number of proteins involved in LDA and the intensity of LDA for that question respectively
Fig. 3The comparison with three state-of-the-art approaches in the traditional scenario T1
The comparison with three state-of-the-art models in Scenario T2 and T3
| Scenario | Measure | MLKNN | RLS | GRUF |
|---|---|---|---|---|
| T2 | AUC | 0.8334 |
| 0.8482 |
| AUPR | 0.1064 | 0.1443 |
| |
| T3 | AUC | 0.8377 | 0.5915 |
|
| AUPR | 0.1742 | 0.0971 |
|
The best values are bold
Performance of GRUF in comprehensive scenarios in terms of AUC
| Scenario | Binary | Discrete | Continued |
|---|---|---|---|
| T1 (10CV) | 0.8916 | 0.8900 | 0.9148 |
| T2 (10CV) | 0.7505 | 0.7412 | 0.8176 |
| T3 (10CV) | 0.8487 | 0.8361 | 0.8060 |
| T4 (10×10 CV) | 0.6080 | 0.6070 | 0.6078 |
Performance of GRUF in comprehensive scenarios in terms of Correlation
| Scenario | Binary | Discrete | Continued |
|---|---|---|---|
| T1 (10CV) | 0.1525 | 0.4012 |
|
| T2 (10CV) | 0.1498 | 0.1774 |
|
| T3 (10CV) | 0.1206 | 0.1515 |
|
| T4 (10×10 CV) | 0.1463 | 0.1541 |
|
The italic entries denote the best
The validation of potential lncRNA-disease associations in novel prediction
| Rank | lncRNA | Disease | Validation |
|---|---|---|---|
| 1 | DLX6-AS1 | breast neoplasms, male | N/A |
| 2 | H19 | breast neoplasms, male | [ |
| 3 | CDKN2B-AS1 | breast neoplasms, male | DB |
| 4 | DLX6-AS1 | musculoskeletal abnormalities | N/A |
| 5 | 7SK | liver neoplasms, experimental | [ |
DB- LncRNADisease;
N/A- no finding in medical literatures or LncRNADisease