| Literature DB >> 35317234 |
Syed Danish Ali1,2, Hilal Tayara3, Kil To Chong1,4.
Abstract
Piwi-interacting RNAs (piRNAs) play a pivotal role in maintaining genome integrity by repression of transposable elements, gene stability, and association with various disease progressions. Cost-efficient computational methods for the identification of piRNA disease associations promote the efficacy of disease-specific drug development. In this regard, we developed a simple, robust, and efficient deep learning method for identifying the piRNA disease associations known as piRDA. The proposed architecture extracts the most significant and abstract information from raw sequences represented in a simplicated piRNA disease pair without any involvement of features engineering. Two-step positive unlabeled learning and bootstrapping technique are utilized to abstain from the false-negative and biased predictions dealing with positive unlabeled data. The performance of proposed method piRDA is evaluated using k-fold cross-validation. The piRDA is significantly improved in all the performance evaluation measures for the identification of piRNA disease associations in comparison to state-of-the-art method. Moreover, it is thus projected conclusively that the proposed computational method could play a significant role as a supportive and practical tool for primitive disease mechanisms and pharmaceutical research such as in academia and drug design. Eventually, the proposed model can be accessed using publicly available and user-friendly web tool athttp://nsclbio.jbnu.ac.kr/tools/piRDA/.Entities:
Keywords: Convolutional Neural Network; Deep learning; Positive unlabeled learning; Reliable negative sample; Sequence analysis; Web-server; piRNA disease associations
Year: 2022 PMID: 35317234 PMCID: PMC8908038 DOI: 10.1016/j.csbj.2022.02.026
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 2Illustration of reliable negative selection. (a) Positive and unlabeled data samples. (b) Training with random negative. (c) Unlabeled samples according to their prediction scores.
Fig. 1The overall workflow of proposed Architecture piRDA for identifying piRNA disease associations.
Summary of piRDA performance for identifying piRNA disease associations using independent piRNA IDs.
| No. | Disease | DAOHV |
|---|---|---|
| 1 | Renal cell carcinoma | |
| 2 | Lung cancer | [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 3 | Breast cancer | [0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 4 | Pancreatic carcinoma | [0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 5 | Head and neck (squamous cell) carcinoma | [0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 6 | Lung cancer (lung adenocarcinoma) | [0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 7 | Alzheimer’s disease | [0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 8 | Cardiovascular diseases (CDC, CF, CCS) cardiac regeneration | [0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0] |
| 9 | Head and neck cancer | [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0] |
| 10 | Gastric cancer | [0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0] |
| 11 | Colon cancer | [0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0] |
| 12 | Non-small cell lung carcinoma (NSCLC) | [0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0] |
| 13 | Prostate cancer | [0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0] |
| 14 | Dysplastic liver nodules and hepatocellular carcinoma | [0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0] |
| 15 | Rheumatoid arthritis | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0] |
| 16 | Testicular germ cell carcinoma | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0] |
| 17 | Endometrial carcinogenesis | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0] |
| 18 | Male infertility | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0] |
| 19 | Leukemia | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0] |
| 20 | Heart stroke | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0] |
| 21 | Ovarian cancer | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1] |
Fig. 3Illustrating detailed architecture of proposed method piRDA where the convolutional block comprises convolution layer with ReLU as an activation function along with group normalization and max-pooling layers. The dense block consists of two fully connected layers along with dropout probability ReLU as an activation function and sigmoid activation function for prediction associated scores.
Summary of performance comparison of piRDA with existing methods for identifying piRNA disease associations.
| Metric | piRDA | iPiDA-sHN | iPiDi-PUL |
|---|---|---|---|
| Acc | 0.913 | 0.736 | 0.589 |
| Sn | 0.909 | 0.779 | 0.281 |
| Sp | 0.918 | 0.694 | 0.897 |
| Mcc | 0.827 | – | – |
| RI | 0.056 | 0.307 | 0.322 |
| AUC | 0.951 | 0.887 | 0.856 |
| AUPRC | 0.931 | 0.834 | 0.764 |
”-” denotes Not Applicable.
Fig. 4Illustration of the five folds success rate (ROC), with associated calculation of prediction quality (AUC) and standard deviation error.
Fig. 5Illustration of the five folds PRC together with (AUPRC) and standard deviation error.
Fig. 6Illustration of ROC along with AUC and standard deviation error of sub 10-fold cross-validation.
Fig. 7Illustration of PRC along with AUC and standard deviation error of sub 10-fold cross-validation.
Fig. 8Clusters of positive and negative piRNA disease associations features of the proposed method obtained from hidden layer activation using UMAP.
Fig. 9Illustration of evaluation measures comparision of piRDA with existing methods for identifying piRNA disease associations.
Summary of piRDA performance for identifying piRNA disease associations using independent piRNA IDs.
| piRNA ID | Association | Reported | |
|---|---|---|---|
| piR-hsa-23317 | Cardiovascular diseases | Li et al. | |
| piR-hsa-1207 | |||
| piR-hsa-24016 | |||
| piR-hsa-26593 | |||
| piR-hsa-29114 | |||
| piR-hsa-26686 | Renal cell carcinoma | Wu et al. | |
| piR-hsa-20266 | Fu et al. | ||
| piR-hsa-25783 | Alzheimer disease | Roy et al. | |
| piR-hsa-28467 | |||
| piR-hsa-24016 | |||
| piR-hsa-2107 | |||
| piR-hsa-820 | |||
| piR-hsa-515 |
piRNA ID refers to the piRBase [28].