| Literature DB >> 34966413 |
Shijia Zhou1, Weicheng Sun1, Ping Zhang1, Li Li1,2.
Abstract
Pseudogenes were originally regarded as non-functional components scattered in the genome during evolution. Recent studies have shown that pseudogenes can be transcribed into long non-coding RNA and play a key role at multiple functional levels in different physiological and pathological processes. microRNAs (miRNAs) are a type of non-coding RNA, which plays important regulatory roles in cells. Numerous studies have shown that pseudogenes and miRNAs have interactions and form a ceRNA network with mRNA to regulate biological processes and involve diseases. Exploring the associations of pseudogenes and miRNAs will facilitate the clinical diagnosis of some diseases. Here, we propose a prediction model PMGAE (Pseudogene-MiRNA association prediction based on the Graph Auto-Encoder), which incorporates feature fusion, graph auto-encoder (GAE), and eXtreme Gradient Boosting (XGBoost). First, we calculated three types of similarities including Jaccard similarity, cosine similarity, and Pearson similarity between nodes based on the biological characteristics of pseudogenes and miRNAs. Subsequently, we fused the above similarities to construct a similarity profile as the initial representation features for nodes. Then, we aggregated the similarity profiles and associations of nodes to obtain the low-dimensional representation vector of nodes through a GAE. In the last step, we fed these representation vectors into an XGBoost classifier to predict new pseudogene-miRNA associations (PMAs). The results of five-fold cross validation show that PMGAE achieves a mean AUC of 0.8634 and mean AUPR of 0.8966. Case studies further substantiated the reliability of PMGAE for mining PMAs and the study of endogenous RNA networks in relation to diseases.Entities:
Keywords: ceRNA network; extreme gradient boosting; feature fusion; graph auto-encoder; microRNA; pseudogene
Year: 2021 PMID: 34966413 PMCID: PMC8710693 DOI: 10.3389/fgene.2021.781277
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flowchart of PMGAE.
FIGURE 2AUC (A) and AUPR (B) of PMGAE using five-fold cross validation. Insets represent the zoom-in view of local regions.
FIGURE 3Comparison of AUC (A) and AUPR (B) of PMGAE and MF-based models.
FIGURE 4Clustering results of nodes before (A) and after (B) embedding.
Model performance comparison using similarity profile fusions and using individual similarity profiles.
| Methods | Evaluation metrics | |||||
|---|---|---|---|---|---|---|
| Acc. | Sen. | Spec. | Prec. | AUC | AUPR | |
| Jaccard | 0.7641 | 0.6443 | 0.8838 | 0.8475 | 0.8416 | 0.8676 |
| Pearson | 0.7633 | 0.6555 | 0.8710 | 0.8356 | 0.8381 | 0.8637 |
| Cosine | 0.7901 | 0.6491 | 0.9310 | 0.9040 | 0.8562 | 0.8872 |
| Cosine + Jaccard | 0.7927 | 0.6433 | 0.9421 | 0.9176 | 0.8607 | 0.8912 |
| Cosine + Pearson | 0.7964 | 0.6396 | 0.9533 | 0.9320 | 0.8591 | 0.8935 |
| Jaccard + Pearson | 0.7954 | 0.6460 | 0.9448 | 0.9214 | 0.8565 | 0.8913 |
| Full fusion | 0.8015 | 0.6592 | 0.9437 | 0.9216 | 0.8632 | 0.8966 |
FIGURE 5Model performance using various embedding methods.
FIGURE 6AUC (A) and AUPR (B) using various classifiers.
FIGURE 7AUC and AUPR of various hidden unit setups in the first (A) and second (B) layers of GAE.
Model performance under various setups of positive: negative sample ratios.
| Evaluation metrics | Positive: negative sample ratio | ||||
|---|---|---|---|---|---|
| 1:1 | 1:2 | 1:5 | 1:10 | 1:20 | |
| AUC | 0.8632 | 0.8548 | 0.8557 | 0.8596 | 0.8626 |
| AUPR | 0.8966 | 0.8388 | 0.7653 | 0.7193 | 0.6693 |
| Acc. | 0.8015 | 0.8523 | 0.9218 | 0.9554 | 0.9753 |
| Sen. | 0.6592 | 0.6008 | 0.5594 | 0.5419 | 0.5196 |
| Spec. | 0.9437 | 0.9782 | 0.9943 | 0.9968 | 0.9981 |
| Prec. | 0.9216 | 0.9323 | 0.9513 | 0.9447 | 0.9323 |
| MCC | 0.6292 | 0.6646 | 0.6938 | 0.6965 | 0.6858 |
The top 15 candidate miRNAs associated with pseudogenes RPLP0P2, HLA-H, and HLA-J and the evidence from starBase.
| Rank | RPLP0P2 | HLA-H | HLA-J | |||
|---|---|---|---|---|---|---|
| miRNA | starBase | miRNA | starBase | miRNA | starBase | |
| 1 | hsa-miR-15a-5p | Yes | hsa-miR-15a-5p | Yes | hsa-miR-497-5p | Yes |
| 2 | hsa-miR-424-5p | Yes | hsa-miR-15b-5p | Yes | hsa-miR-424-5p | Yes |
| 3 | hsa-miR-15b-5p | Yes | hsa-miR-16-5p | Yes | hsa-miR-195-5p | Yes |
| 4 | hsa-miR-195-5p | Yes | hsa-miR-195-5p | Yes | hsa-miR-16-5p | Yes |
| 5 | hsa-miR-497-5p | Yes | hsa-miR-497-5p | Yes | hsa-miR-15b-5p | Yes |
| 6 | hsa-miR-16-5p | Yes | hsa-miR-424-5p | Yes | hsa-miR-15a-5p | Yes |
| 7 | hsa-miR-34c-5p | No | hsa-miR-199b-5p | No | hsa-miR-23c | Yes |
| 8 | hsa-miR-449a | No | hsa-miR-3619-5p | Yes | hsa-miR-103a-3p | No |
| 9 | hsa-miR-378b | No | hsa-miR-761 | Yes | hsa-miR-204-5p | No |
| 10 | hsa-miR-320c | Yes | hsa-miR-106b-5p | No | hsa-miR-3619-5p | Yes |
| 11 | hsa-miR-761 | Yes | hsa-miR-125a-5p | Yes | hsa-miR-134-5p | Yes |
| 12 | hsa-miR-99a-5p | No | hsa-miR-4319 | Yes | hsa-miR-613 | No |
| 13 | hsa-miR-320d | Yes | hsa-miR-146a-5p | No | hsa-miR-29b-3p | Yes |
| 14 | hsa-let-7d-5p | Yes | hsa-miR-875-5p | Yes | hsa-miR-125b-3p | No |
| 15 | hsa-let-7b-5p | Yes | hsa-miR-503-5p | Yes | hsa-miR-761 | Yes |