| Literature DB >> 36176298 |
Zheng Wei1, Dengju Yao1, Xiaojuan Zhan1,2, Shuli Zhang1.
Abstract
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.Entities:
Keywords: clustering; computational methods; ensemble learning; miRNA-disease association; sampling
Year: 2022 PMID: 36176298 PMCID: PMC9513605 DOI: 10.3389/fgene.2022.995535
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1The method of sample representation.
FIGURE 2The method of constructing a negative sample set.
FIGURE 3Ensemble learning framework.
FIGURE 4The silhouette coefficient of clustering results under different numbers of clusters.
Performance comparison of the CSMDA using different base classifiers.
| Model | Precision | Recall | F1-score | AUROC | AUPR |
|---|---|---|---|---|---|
| CSMDA-AB | 0.9567 | 0.9267 | 0.9414 | 0.9885 | 0.9901 |
| CSMDA-ERT | 0.9666 | 0.9514 | 0.9589 | 0.9907 | 0.9926 |
| CSMDA-RF |
| 0.9468 | 0.9582 | 0.9912 | 0.9929 |
| CSMDA-XGB | 0.9674 |
|
|
|
|
Performance comparison of the CSMDA under different dimension training samples.
| Model | Precision | Recall | F1-score | AUROC | AUPR |
|---|---|---|---|---|---|
| CSMDA-NOFS | 0.9674 | 0.9543 | 0.9608 | 0.9927 | 0.9939 |
| CSMDA-FS75 |
| 0.9545 |
|
|
|
| CSMDA-FS50 | 0.9667 |
| 0.9608 | 0.9927 | 0.9939 |
| CSMDA-FS25 | 0.9657 | 0.9540 | 0.9598 | 0.9916 | 0.9930 |
FIGURE 5The distribution of features from miRNAs and diseases among the top X features.
Performance comparison of the CSMDA with other MDA prediction models.
| Model | Precision | Recall | F1-score | AUROC | AUPR |
|---|---|---|---|---|---|
| ABMDA [19] | 0.8213 ± 0.0033 | 0.8371 ± 0.0044 | 0.8290 ± 0.0030 | 0.9023 ± 0.0021 | 0.8879 ± 0.0032 |
| ANMDA [22] | 0.8561 ± 0.0017 | 0.8728 ± 0.0020 | 0.8643 ± 0.0014 | 0.9373 ± 0.0005 | 0.9328 ± 0.0008 |
| GAEMDA [21] | 0.8146 ± 0.0031 | 0.9111 ± 0.0028 | 0.8597 ± 0.0010 | 0.9352 ± 0.0001 | 0.8850 ± 0.0010 |
| GBDT-LR [20] | 0.8403 ± 0.0026 | 0.8567 ± 0.0031 | 0.8484 ± 0.0021 | 0.9246 ± 0.0010 | 0.9177 ± 0.0015 |
| IRFMDA [18] | 0.8447 ± 0.0021 | 0.8598 ± 0.0025 | 0.8521 ± 0.0016 | 0.9267 ± 0.0009 | 0.9222 ± 0.0012 |
| ERMDA [23] | 0.8740 ± 0.0039 | 0.9043 ± 0.0019 | 0.8889 ± 0.0022 | 0.9561 ± 0.0013 | 0.9542 ± 0.0020 |
| CSMDA |
|
|
|
|
|
The top 20 miRNAs for three cancers predicted by the CSMDA.
| Disease | Rank | miRNA | Evidence |
|---|---|---|---|
| breast cancer | 1 | hsa-mir-195 | dbDEMC |
| 2 | hsa-mir-146a | dbDEMC | |
| 3 | hsa-mir-24 | dbDEMC | |
| 4 | hsa-let-7e | dbDEMC | |
| 5 | hsa-mir-9 | dbDEMC | |
| 6 | hsa-mir-219 | dbDEMC | |
| 7 | hsa-mir-148a | dbDEMC | |
| 8 | hsa-mir-218 | dbDEMC | |
| 9 | hsa-let-7a | dbDEMC | |
| 10 | hsa-mir-29a | dbDEMC | |
| 11 | hsa-mir-223 | dbDEMC | |
| 12 | hsa-mir-30d | dbDEMC | |
| 13 | hsa-mir-92a | dbDEMC | |
| 14 | hsa-mir-210 | dbDEMC | |
| 15 | hsa-mir-200c | dbDEMC | |
| 16 | hsa-mir-17 | dbDEMC | |
| 17 | hsa-mir-214 | dbDEMC | |
| 18 | hsa-mir-372 | dbDEMC | |
| 19 | hsa-mir-106b | dbDEMC | |
| 20 | hsa-mir-221 | dbDEMC | |
| colon cancer | 1 | hsa-mir-24 | dbDEMC |
| 2 | hsa-mir-20a | dbDEMC | |
| 3 | hsa-mir-125b | dbDEMC | |
| 4 | hsa-mir-182 | dbDEMC | |
| 5 | hsa-mir-29a | dbDEMC | |
| 6 | hsa-mir-214 | dbDEMC | |
| 7 | hsa-mir-17 | dbDEMC | |
| 8 | hsa-mir-21 | dbDEMC | |
| 9 | hsa-mir-30b | dbDEMC | |
| 10 | hsa-mir-29b | dbDEMC | |
| 11 | hsa-mir-19b | dbDEMC | |
| 12 | hsa-mir-19a | dbDEMC | |
| 13 | hsa-mir-18a | dbDEMC | |
| 14 | hsa-mir-141 | dbDEMC | |
| 15 | hsa-mir-155 | dbDEMC | |
| 16 | hsa-mir-223 | dbDEMC | |
| 17 | hsa-mir-127 | dbDEMC | |
| 18 | hsa-mir-34c | Hiyoshi, Y., et al. [40] | |
| 19 | hsa-mir-1 | dbDEMC | |
| 20 | hsa-mir-126 | dbDEMC | |
| lung cancer | 1 | hsa-mir-29c | dbDEMC |
| 2 | hsa-mir-92a | dbDEMC | |
| 3 | hsa-mir-206 | dbDEMC | |
| 4 | hsa-mir-214 | dbDEMC | |
| 5 | hsa-mir-183 | dbDEMC | |
| 6 | hsa-mir-210 | dbDEMC | |
| 7 | hsa-mir-142 | dbDEMC | |
| 8 | hsa-mir-221 | dbDEMC | |
| 9 | hsa-mir-30e | dbDEMC | |
| 10 | hsa-mir-24 | dbDEMC | |
| 11 | hsa-mir-223 | dbDEMC | |
| 12 | hsa-mir-20b | dbDEMC | |
| 13 | hsa-mir-193b | dbDEMC | |
| 14 | hsa-mir-191 | dbDEMC | |
| 15 | hsa-mir-22 | dbDEMC | |
| 16 | hsa-mir-124 | dbDEMC | |
| 17 | hsa-mir-18b | dbDEMC | |
| 18 | hsa-mir-30a | dbDEMC | |
| 19 | hsa-mir-148a | dbDEMC | |
| 20 | hsa-mir-15b | dbDEMC |