| Literature DB >> 35883487 |
Ruifen Cao1, Chuan He1, Pijing Wei2, Yansen Su3, Junfeng Xia2, Chunhou Zheng3.
Abstract
Circular RNAs (circRNAs) are covalently closed single-stranded RNA molecules, which have many biological functions. Previous experiments have shown that circRNAs are involved in numerous biological processes, especially regulatory functions. It has also been found that circRNAs are associated with complex diseases of human beings. Therefore, predicting the associations of circRNA with disease (called circRNA-disease associations) is useful for disease prevention, diagnosis and treatment. In this work, we propose a novel computational approach called GGCDA based on the Graph Attention Network (GAT) and Graph Convolutional Network (GCN) to predict circRNA-disease associations. Firstly, GGCDA combines circRNA sequence similarity, disease semantic similarity and corresponding Gaussian interaction profile kernel similarity, and then a random walk with restart algorithm (RWR) is used to obtain the preliminary features of circRNA and disease. Secondly, a heterogeneous graph is constructed from the known circRNA-disease association network and the calculated similarity of circRNAs and diseases. Thirdly, the multi-head Graph Attention Network (GAT) is adopted to obtain different weights of circRNA and disease features, and then GCN is employed to aggregate the features of adjacent nodes in the network and the features of the nodes themselves, so as to obtain multi-view circRNA and disease features. Finally, we combined a multi-layer fully connected neural network to predict the associations of circRNAs with diseases. In comparison with state-of-the-art methods, GGCDA can achieve AUC values of 0.9625 and 0.9485 under the results of fivefold cross-validation on two datasets, and AUC of 0.8227 on the independent test set. Case studies further demonstrate that our approach is promising for discovering potential circRNA-disease associations.Entities:
Keywords: circRNA-disease associations; circular RNAs; graph attention network; graph convolutional network; random walk with restart algorithm
Mesh:
Substances:
Year: 2022 PMID: 35883487 PMCID: PMC9313348 DOI: 10.3390/biom12070932
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
The used datasets.
| Datasets | circRNA Numbers | Disease Numbers | Associations |
|---|---|---|---|
| CircR2Disease | 590 | 88 | 651 |
| DATA | 809 | 119 | 944 |
| MNDR | 2175 | 154 | 2785 |
Figure 1Illustration of the heterogeneous network.
Figure 2The framework of GGCDA, composed of four parts: (I) feature fusion and initialization; (II) circRNA-disease heterogeneous network construction; (III) features representation based on the combination of multi-head GAT and GCN; (IV) fully connected layer for prediction.
Results of FFCV on CircR2Disease achieved by GGCDA.
| Fold | AUC | AUPR | ACC | PRE | REC | F1-Score |
|---|---|---|---|---|---|---|
| 1 | 0.9753 | 0.9743 | 0.8969 | 0.8421 | 0.9771 | 0.9046 |
| 2 | 0.9718 | 0.9608 | 0.9346 | 0.8951 | 0.9846 | 0.9377 |
| 3 | 0.9665 | 0.9555 | 0.9077 | 0.8581 | 0.9769 | 0.0136 |
| 4 | 0.9863 | 0.9847 | 0.8885 | 0.8217 | 0.9923 | 0.8989 |
| 5 | 0.9631 | 0.9349 | 0.9038 | 0.8387 | 1.0000 | 0.9123 |
| Average | 0.9726 | 0.9620 | 0.9063 | 0.8511 | 0.9861 | 0.9134 |
| Average (10) | 0.9625 | 0.9422 | 0.9172 | 0.8700 | 0.9822 | 0.9224 |
Figure 3AUC and AUPR curves of FFCV obtained by GGCDA on CircR2Disease.
Figure 4Impact of different parameters on model performance: (a) Comparison of AUC and AUPR values for different attention heads. (b) Comparison of AUC and AUPR values at different GAT layers. (c) Comparison of AUC and AUPR values at different GCN layers. (d) Comparison of AUC and AUPR values at different embedding sizes. (e) Comparison of AUC and AUPR values on different penalty factors.
Figure 5Performance compared with different component.
The FFCV AUC values achieved by the various models.
| Methods | DMMCDA | NCPCDA | GCNCDA | RWRKNN | GATCDA | GGCDA |
|---|---|---|---|---|---|---|
| AUC | 0.9598 | 0.9201 | 0.9090 | 0.9333 | 0.9011 | 0.9625 |
Results of FFCV on DATA achieved by GGCDA.
| Fold | AUC | AUPR | ACC | PRE | REC | F1-Score |
|---|---|---|---|---|---|---|
| 1 | 0.9608 | 0.9522 | 0.9312 | 0.9095 | 0.9577 | 0.9330 |
| 2 | 0.9553 | 0.9375 | 0.9339 | 0.9100 | 0. 9630 | 0.9357 |
| 3 | 0.9503 | 0.9174 | 0.9339 | 0.9184 | 0.9523 | 0.9351 |
| 4 | 0.9685 | 0.9620 | 0.9206 | 0.8804 | 0.9735 | 0.9246 |
| 5 | 0.9511 | 0.9537 | 0.9069 | 0.8964 | 0.9202 | 0.9081 |
| Average | 0.9572 | 0.9446 | 0.9253 | 0.9029 | 0.9534 | 0.9273 |
| Average (10) | 0.9485 | 0.9266 | 0.9116 | 0.8827 | 0.9505 | 0.9150 |
Figure 6AUC and AUPR curves of FFCV obtained by GGCDA on DATA.
Results of the GGCDA on the independent test set.
| AUC | AUPR | ACC | PRE | REC | F1-Score |
|---|---|---|---|---|---|
| 0.8227 | 0.7836 | 0.7832 | 0.7651 | 0.8173 | 0.7903 |
The top 10 hepatocellular carcinoma-related candidate circRNAs.
| Disease | circRNA | PMID |
|---|---|---|
| Hepatocellular | hsa_circ_0000284 | 29415990 |
| hsa_circ_0001141 | 28636993 | |
| hsa_circ_0001946 | 28892615 | |
| hsa_circ_0001649 | 26600397 | |
| hsa_circRNA_102049 | 28710406 | |
| hsa_circ_0001445 | 29378234 | |
| hsa_circ_0001821 | unconfirmed | |
| hsa_circ_0067934 | 29458020 | |
| hsa_circ_0023404 | unconfirmed | |
| hsa_circRNA_103387 | 28710406 |
The top 10 breast cancer-related candidate circRNAs.
| Disease | circRNA | PMID |
|---|---|---|
| Breast | hsa_circ_0000284 | 27050392 |
| hsa_circ_0001141 | unconfirmed | |
| hsa_circ_0001946 | 28049499 | |
| hsa_circ_0007534 | 29593432 | |
| hsa_circ_0001821 | 27928058 | |
| hsa_circ_0001313 | 28249903 | |
| circ-Foxo3 | 27886165 | |
| hsa_circ_0014717 | unconfirmed | |
| hsa_circ_0002113 | 28803498 | |
| hsa_circ_0004771 | 28484086 |
The top 10 colorectal cancer-related candidate circRNAs.
| Disease | circRNA | PMID |
|---|---|---|
| Colorectal | hsa_circ_000753 | 29364478 |
| hsa_circ_000114 | 26110611 | |
| hsa_circ_000131 | 28249903 | |
| hsa_circ_000182 | 30591054 | |
| hsa_circ_000194 | 28174233 | |
| hsa_circ_000028 | 27050392 | |
| hsa_circ_000164 | 29421663 | |
| hsa_circ_0067934 | unconfirmed | |
| hsa_circ_001471 | 29571246 | |
| hsa_circ_000050 | 28656150 |