| Literature DB >> 35882886 |
Bo Wang1, RunJie Liu2, XiaoDong Zheng2, XiaoXin Du2, ZhengFei Wang2.
Abstract
In recent years, with the continuous development and innovation of high-throughput biotechnology, more and more evidence show that lncRNA plays an essential role in biological life activities and is related to the occurrence of various diseases. However, due to the high cost and time-consuming of traditional biological experiments, the number of associations between lncRNAs and diseases that rely on experiments to verify is minimal. Computer-aided study of lncRNA-disease association is an important method to study the development of the lncRNA-disease association. Using the existing data to establish a prediction model and predict the unknown lncRNA-disease association can make the biological experiment targeted and improve its accuracy of the biological experiment. Therefore, we need to find an accurate and efficient method to predict the relationship between lncRNA and diseases and help biologists complete the diagnosis and treatment of diseases. Most of the current lncRNA-disease association predictions do not consider the model instability caused by the actual data. Also, predictive models may produce data that overfit is not considered. This paper proposes a lncRNA-disease association prediction model (ENCFLDA) that combines an elastic network with matrix decomposition and collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association between unknown lncRNA and disease, updates the matrix by matrix decomposition combined with the elastic network, and then obtains the final prediction matrix by collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association of unknown lncRNAs with diseases. First, since the known lncRNA-disease association matrix is very sparse, the cosine similarity and KNN are used to update the lncRNA-disease association matrix. The matrix is then updated by matrix decomposition combined with an elastic net algorithm, to increase the stability of the overall prediction model and eliminate data overfitting. The final prediction matrix is then obtained through collaborative filtering based on lncRNA.Through simulation experiments, the results show that the AUC value of ENCFLDA can reach 0.9148 under the framework of LOOCV, which is higher than the prediction result of the latest model.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35882886 PMCID: PMC9325687 DOI: 10.1038/s41598-022-16594-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1ROC comparison between ENCFLDA and other advanced models based on the same data set.
Figure 2AUPR comparison between ENCFLDA model and other advanced models based on the same data set.
Figure 3ROC under different parameters and Transformation curve of a parameter in the range of [0,1].
The contributions of all components of the proposed method.
| KNN based on cosine similarity | Matrix decomposition | Collaborative filtering | AUC | AUPR |
|---|---|---|---|---|
| × | √ | √ | 0.8843 | 0.0343 |
| √ | × | √ | 0.8916 | 0.0414 |
| √ | √ | × | 0.8962 | 0.0512 |
| √ | √ | √ | 0.9148 | 0.1082 |
Candidate lncRNAs and TWO rank in the top 15 of the TWO cases and the related literature.
| Disease | lncRNA | Evidence(PMID) | Rank |
|---|---|---|---|
| Lung Neoplasms | XIST | 29130102,31632059 | 1 |
| Lung Neoplasms | MALAT1 | 23243023 | 3 |
| Lung Neoplasms | KCNQ1OT1 | 30471108 | 4 |
| Lung Neoplasms | OIP5-AS1 | 32774481 | 6 |
| Lung Neoplasms | NEAT1 | 28615056 | 7 |
| Lung Neoplasms | HCG18 | 32559619 | 8 |
| Lung Neoplasms | DCP1A | 32034313 | 9 |
| Lung Neoplasms | SNHG16 | 31071307 | 11 |
| Lung Neoplasms | FGD5-AS1 | 31919528 | 13 |
| Breast Neoplasms | OIP5-AS1 | 32945479 | 3 |
| Breast Neoplasms | SNHG16 | 32945479 | 5 |
| Breast Neoplasms | SCAMP1 | 29497041 | 6 |
| Breast Neoplasms | FGD5-AS1 | 33880593 | 13 |
| Breast Neoplasms | LINC00657 | 32996041 | 14 |
| Breast Neoplasms | TUG1 | 28950664 | 15 |
Figure 4Differentiated expression and Survival period of genes in the normal and tumor sample.
Figure 5Enriched gene sets in small cell lung cancer, the KEGG gene sets, by samples of high gene expression.
Figure 6Flow Chart of ENCFLDA Applied to lncRNA-Disease Association Prediction.
Figure 7Constraint domain of ridge regression.