| Literature DB >> 36241972 |
Lei Deng1, Dayun Liu1, Yizhan Li1, Runqi Wang1, Junyi Liu2, Jiaxuan Zhang3, Hui Liu4.
Abstract
BACKGROUND: Increasing evidence shows that circRNA plays an essential regulatory role in diseases through interactions with disease-related miRNAs. Identifying circRNA-disease associations is of great significance to precise diagnosis and treatment of diseases. However, the traditional biological experiment is usually time-consuming and expensive. Hence, it is necessary to develop a computational framework to infer unknown associations between circRNA and disease.Entities:
Keywords: Circrna-disease associations; High-order features; Multi-source data; Neural network
Mesh:
Substances:
Year: 2022 PMID: 36241972 PMCID: PMC9569055 DOI: 10.1186/s12859-022-04976-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Five-fold cross-validation results on circFunBase dataset
| Validation set | AUC | Accuracy | Precision | Recall | F1_score |
|---|---|---|---|---|---|
| 1 | 0.9903 | 0.9505 | 0.9461 | 0.9541 | 0.9501 |
| 2 | 0.9924 | 0.9631 | 0.9546 | 0.9760 | 0.9652 |
| 3 | 0.9908 | 0.9296 | 0.8811 | 0.9878 | 0.9314 |
| 4 | 0.9879 | 0.9371 | 0.9259 | 0.9519 | 0.9387 |
| 5 | 0.9907 | 0.9463 | 0.9157 | 0.9812 | 0.9473 |
| Average | 0.9904 | 0.9453 | 0.9246 | 0.9702 | 0.9463 |
Fig. 1ROC curves performed by MSPCD on circFunBase dataset
The comparison of different methods based on five-fold cross-validation
| Model | AUC | Accuracy | Precision | Recall | F1_score |
|---|---|---|---|---|---|
| MSPCD | 0.9904 | 0.9453 | 0.9246 | 0.9702 | 0.9463 |
| DMFCDA | 0.9492 | 0.8954 | 0.8816 | 0.9149 | 0.8978 |
| KATZCPDA | 0.9208 | 0.9103 | 0.9204 | 0.8837 | 0.9016 |
| AE_RF | 0.9079 | 0.9079 | 0.9689 | 0.8426 | 0.9006 |
| GBDTCDA | 0.9064 | 0.8899 | 0.9004 | 0.8603 | 0.8798 |
| IMS-CDA | 0.8773 | 0.8403 | 0.8771 | 0.8156 | 0.8452 |
| AE_DNN | 0.7816 | 0.7055 | 0.7707 | 0.6024 | 0.6649 |
Fig. 2ROC curves performed by different methods on circFunBase
Fig. 3ROC curves performed by different methods on independent testing dataset
Five-fold cross-validation results on circR2Disease dataset
| Validation set | AUC | Accuracy | Precision | Recall | F1_score |
|---|---|---|---|---|---|
| 1 | 0.9325 | 0.8907 | 0.9090 | 0.8800 | 0.8943 |
| 2 | 0.9643 | 0.9285 | 0.9115 | 0.9363 | 0.9237 |
| 3 | 0.9551 | 0.9201 | 0.9370 | 0.9153 | 0.9260 |
| 4 | 0.9517 | 0.9240 | 0.9292 | 0.9130 | 0.9210 |
| 5 | 0.9595 | 0.9156 | 0.8730 | 0.9649 | 0.9166 |
| Average | 0.9526 | 0.9157 | 0.9119 | 0.9219 | 0.9163 |
Fig. 4ROC curves performed by MSPCD on circR2Disease dataset
Fig. 5ROC curves performed by seven methods on circR2Disease dataset
Fig. 6Effects of lengths of high-order feature
The comparison of different classifiers based on five-fold cross-validation
| Classifiers | AUC | Accuracy | Precision | Recall | F1_score |
|---|---|---|---|---|---|
| RF | 0.8983 | 0.7828 | 0.7903 | 0.7690 | 0.7794 |
| SVM | 0.9697 | 0.9433 | 0.9277 | 0.9617 | 0.9443 |
| DNN | 0.9763 | 0.9279 | 0.9326 | 0.9240 | 0.9274 |
| MSPCD | 0.9904 | 0.9453 | 0.9246 | 0.9702 | 0.9463 |
Fig. 7Histograms of the results of different classifiers based on five-fold cross-validation
Top 15 circRNA-associations predicted by MSPCD on circFunBase dataset
| CircRNA | Disease | Evidence (PMID) |
|---|---|---|
| hsa_circ_0067997 | Gastric cancer | PMID: 30688097 |
| hsa_circ_0082081 | Basal cell cancer | Unconfirmed |
| hsa_circ_0054537 | Coronary artery disease | Unconfirmed |
| hsa_circ_0007534 | Cervical cancer | PMID: 31445025 |
| hsa_circ_0004872 | Gastric cancer | PMID: 33172486 |
| hsa_circ_0053764 | Acute myocardial infarction | Unconfirmed |
| hsa_circ_0084192 | Cervical cancer | Unconfirmed |
| hsa_circ_0044556 | Colorectal cancer | PMID: 32884449 |
| hsa_circ_0028319 | Cutaneous squamous cell cancer | Unconfirmed |
| hsa_circ_0078616 | Ovarian aging | Unconfirmed |
| hsa_circRNA_401801 | Colorectal cancer | Unconfirmed |
| hsa_circ_0007536 | Tuberculosis | Unconfirmed |
| hsa_circ_0030428 | Hypertension | Unconfirmed |
| hsa_circ_0001361 | Bladder cancer | PMID: 31705065 |
| hsa_circ_0023546 | Cholangiocarcinoma | Unconfirmed |
Statistics of the constructed dataset
| Dataset | No. circRNAs | No. diseases | No. known associations | Association density |
|---|---|---|---|---|
| CircFunBase | 2957 | 67 | 2984 | 0.0150 |
| circR2Disease | 533 | 89 | 612 | 0.0129 |
Fig. 8Fuse multi-source data to obtain circRNA similarity and disease similarity
Fig. 9Overview of our proposed MSPCD method for predicting circRNA-disease assoications. Firstly, it takes the similarity of circRNA i and disease j as input and outputs their high-order non-linear features through three fully connected layers. Secondly, we use the dot product to acquire the high-level interactive feature of i and j. The result and the high-order features are concatenated to generate a new vector fed into DNN to finally realize the association prediction between circRNA and disease