| Literature DB >> 33087810 |
Jinlong Li1, Xingyu Chen1, Qixing Huang1, Yang Wang2, Yun Xie1, Zong Dai2, Xiaoyong Zou3, Zhanchao Li4,5.
Abstract
Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF .Entities:
Mesh:
Substances:
Year: 2020 PMID: 33087810 PMCID: PMC7578641 DOI: 10.1038/s41598-020-75005-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Flowchart of the current method.
Figure 2The similarity and statistical results of the benchmark data set. (A–C) describe the similarity of any two miRNAs, two diseases and two miRNA-disease associations, respectively. (D) Describes the distribution of miRNA, disease and miRNA-disease association similarity values.
Figure 3Comparison of different negative example selection strategies. (A) Describes the comparison of prediction results between two selection negative methods, and (B) describes the ROC and PRC for different selection strategy. In this figure, selecting negative sample from reliable negative sample (RN) was represented in blue and selecting negative sample from unlabeled negative sample (UN) was represented in red.
Figure 4The comparison of 7 evaluation metrics among different thresholds.
Figure 5The comparison results among different ratios of positive and negative samples.
The results of fivefold cross-validation test from the different non-redundant data sets.
| Threshold | Acc (%) | Sen (%) | Spe (%) | Pre (%) | Mcc | AUROC | AUPRC |
|---|---|---|---|---|---|---|---|
| 0.9 | 97.98 | 96.34 | 99.61 | 99.60 | 0.9601 | 0.9968 | 0.9974 |
| 0.8 | 97.87 | 96.42 | 99.32 | 99.30 | 0.9578 | 0.9955 | 0.9965 |
| 0.7 | 97.39 | 95.47 | 99.31 | 99.29 | 0.9486 | 0.9973 | 0.9978 |
| 0.6 | 97.57 | 95.35 | 99.80 | 99.80 | 0.9524 | 0.9950 | 0.9964 |
The results of different non-redundant data sets.
| Threshold | Acc (%) | Sen (%) | Spe (%) | Pre (%) | Mcc | AUROC | AUPRC |
|---|---|---|---|---|---|---|---|
| 0.7 | 97.89 | 96.40 | 99.37 | 99.35 | 0.9582 | 0.9958 | 0.9967 |
| 0.6 | 98.03 | 96.51 | 99.55 | 99.54 | 0.9609 | 0.9955 | 0.9963 |
| 0.5 | 97.97 | 96.52 | 99.43 | 99.41 | 0.9598 | 0.9974 | 0.9979 |
| 0.4 | 97.90 | 96.41 | 99.40 | 99.38 | 0.9550 | 0.9962 | 0.9975 |
| 0.3 | 97.73 | 96.13 | 99.31 | 99.30 | 0.9550 | 0.9977 | 0.9981 |
| 0.2 | 97.33 | 96.04 | 98.68 | 98.61 | 0.9469 | 0.9934 | 0.9942 |
The results of fivefold cross-validation based on the two non-redundant miRNA-disease associations data sets.
| Threshold | Acc (%) | Sen (%) | Spe (%) | Pre (%) | Mcc | AUROC | AUPRC |
|---|---|---|---|---|---|---|---|
| 0.9 | 97.88 | 96.44 | 99.33 | 99.31 | 0.9581 | 0.9964 | 0.9973 |
| 0.8 | 97.59 | 95.92 | 99.27 | 99.24 | 0.9524 | 0.9941 | 0.9958 |
Figure 6Performance comparisons between Seq-SymRF and WBSMDA, RLSMDA, PBMDA, GBDT-LR, TCRMDA as well as ABMDA.