| Literature DB >> 36171337 |
Thi Ngan Dong1, Johanna Schrader2, Stefanie Mücke3, Megha Khosla4.
Abstract
Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach's superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36171337 PMCID: PMC9519928 DOI: 10.1038/s41598-022-20529-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Simpler variants of MPM where ‘✓’ and ‘×’ denote the existence and non-existence of the corresponding components/modules.
| Model | Message Passing | Feature Selection | SDNE | Random Forest classifier | PCG associations |
|---|---|---|---|---|---|
| × | ✓ | ✓ | ✓ | ✓ | |
| ✓ | × | ✓ | ✓ | ✓ | |
| ✓ | ✓ | × | ✓ | ✓ | |
| × | × | ✓ | ✓ | × | |
| × | × | ✓ | ✓ | ✓ |
The association data statistics where , , refer to the number of associations, miRNAs and diseases respectively.
| Dataset | |||
|---|---|---|---|
| 4592 | 442 | 309 | |
| 10,494 | 742 | 545 | |
| 10,980 | 742 | 591 | |
| 4311 | 382 | 226 | |
| 6388 | 697 | 509 | |
| 4734 | 638 | 227 |
Results for all models on the three large independent test sets.
| Method | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | AP | AUC | AP | AUC | AP | AUC | AP | AUC | AP | AUC | AP | AUC | AP | AUC | AP | AUC | AP | |
| 0.542 | 0.554 | 0.541 | 0.207 | 0.542 | 0.118 | 0.532 | 0.549 | 0.53 | 0.202 | 0.53 | 0.115 | 0.513 | 0.517 | 0.513 | 0.176 | 0.512 | 0.097 | |
| 0.657 | 0.622 | 0.656 | 0.256 | 0.656 | 0.149 | 0.644 | 0.621 | 0.645 | 0.261 | 0.645 | 0.153 | 0.638 | 0.617 | 0.638 | 0.257 | 0.638 | 0.15 | |
| 0.698 | 0.624 | 0.698 | 0.256 | 0.698 | 0.148 | 0.716 | 0.643 | 0.718 | 0.281 | 0.719 | 0.167 | 0.704 | 0.648 | 0.703 | 0.291 | 0.704 | 0.176 | |
| 0.838 | 0.831 | 0.838 | 0.542 | 0.838 | 0.395 | 0.865 | 0.857 | 0.866 | 0.597 | 0.866 | 0.452 | 0.859 | 0.853 | 0.859 | 0.581 | 0.858 | 0.435 | |
| 0.832 | 0.826 | 0.832 | 0.534 | 0.832 | 0.385 | 0.827 | 0.819 | 0.827 | 0.519 | 0.827 | 0.37 | 0.811 | 0.812 | 0.812 | 0.514 | 0.811 | 0.368 | |
| 0.499 | 0.5 | 0.499 | 0.167 | 0.499 | 0.091 | 0.499 | 0.5 | 0.499 | 0.167 | 0.499 | 0.091 | 0.499 | 0.5 | 0.499 | 0.167 | 0.499 | 0.091 | |
| SOTA Improvement | ||||||||||||||||||
| 0.846 | 0.84 | 0.846 | 0.564 | 0.847 | 0.418 | 0.866 | 0.859 | 0.866 | 0.602 | 0.867 | 0.46 | 0.859 | 0.86 | 0.859 | 0.607 | 0.859 | 0.468 | |
| 0.814 | 0.809 | 0.814 | 0.503 | 0.814 | 0.357 | 0.823 | 0.818 | 0.823 | 0.519 | 0.823 | 0.373 | 0.814 | 0.819 | 0.814 | 0.533 | 0.814 | 0.391 | |
| 0.824 | 0.816 | 0.824 | 0.516 | 0.824 | 0.369 | 0.836 | 0.828 | 0.836 | 0.538 | 0.836 | 0.391 | 0.831 | 0.832 | 0.831 | 0.554 | 0.831 | 0.411 | |
| 0.837 | 0.83 | 0.837 | 0.546 | 0.837 | 0.401 | 0.842 | 0.834 | 0.842 | 0.552 | 0.843 | 0.408 | 0.846 | 0.847 | 0.846 | 0.581 | 0.846 | 0.439 | |
The percentage of improvement over the state-of-the-art models are in italics.
, , indicate that we test all models with the positive:negative rates of 1:1, 1:5, 1:10, respectively. Bold font is used to highlight the best scores.
The AP scores corresponding to the 18 complete test sets for new diseases average over 20 experimental runs.
| Disease | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| D001749 | 0.089 | 0.340 | 0.77 | 0.446 | 0.103 | 0.77 | 0.567 | 0.589 | 0.58 | |
| D001943 | 0.824 | 0.160 | 0.507 | 0.414 | 0.205 | 0.811 | 0.679 | 0.693 | 0.654 | |
| D002289 | 0.108 | 0.303 | 0.800 | 0.278 | 0.132 | 0.795 | 0.662 | 0.678 | 0.589 | |
| D002292 | 0.082 | 0.238 | 0.653 | 0.285 | 0.087 | 0.67 | 0.51 | 0.525 | 0.531 | |
| D002294 | 0.186 | 0.241 | 0.608 | 0.384 | 0.064 | 0.646 | 0.529 | 0.531 | 0.493 | |
| D003110 | 0.069 | 0.242 | 0.600 | 0.271 | 0.078 | 0.619 | 0.487 | 0.54 | 0.515 | |
| D005909 | 0.123 | 0.369 | 0.712 | 0.418 | 0.109 | 0.726 | 0.597 | 0.63 | 0.523 | |
| D005910 | 0.759 | 0.112 | 0.246 | 0.731 | 0.409 | 0.117 | 0.642 | 0.66 | 0.626 | |
| D006333 | 0.669 | 0.180 | 0.300 | 0.651 | 0.395 | 0.088 | 0.578 | 0.602 | 0.566 | |
| D008175 | 0.115 | 0.437 | 0.749 | 0.375 | 0.138 | 0.751 | 0.615 | 0.62 | 0.611 | |
| D008545 | 0.108 | 0.355 | 0.706 | 0.365 | 0.117 | 0.715 | 0.58 | 0.598 | 0.558 | |
| D010051 | 0.114 | 0.400 | 0.760 | 0.388 | 0.118 | 0.782 | 0.505 | 0.654 | 0.579 | |
| D010190 | 0.749 | 0.088 | 0.366 | 0.744 | 0.373 | 0.098 | 0.589 | 0.622 | 0.598 | |
| D011471 | 0.733 | 0.116 | 0.395 | 0.653 | 0.330 | 0.135 | 0.618 | 0.633 | 0.569 | |
| D012516 | 0.262 | 0.323 | 0.658 | 0.349 | 0.098 | 0.699 | 0.546 | 0.585 | 0.55 | |
| D013274 | 0.132 | 0.503 | 0.835 | 0.249 | 0.161 | 0.811 | 0.657 | 0.693 | 0.643 | |
| D015179 | 0.134 | 0.463 | 0.797 | 0.340 | 0.171 | 0.785 | 0.645 | 0.693 | 0.614 | |
| D015470 | 0.158 | 0.259 | 0.625 | 0.290 | 0.069 | 0.653 | 0.509 | 0.513 | 0.497 |
Results for 5-fold cross-validation on the Hmdd2 and Hmdd3 datasets.
| Dataset | Method | AUC | AP | Sensitivity | Specificity | Accuracy | Precision | F1 | MCC |
|---|---|---|---|---|---|---|---|---|---|
| HMDD2 | 0.89 ± 0.01 | 80.7 ± 1.2 | 81.5 ± 1.4 | 81.1 ± 1.0 | 81.3 ± 1.2 | 81.0 ± 1.0 | 62.2 ± 2.1 | ||
| 0.88 ± 0.01 | 0.87 ± 0.01 | 70.2 ± 26.0 | 77.2 ± 10.1 | 77.9 ± 17.6 | 71.0 ± 26.2 | 54.6 ± 19.9 | |||
| 0.72 ± 0.01 | 0.68 ± 0.01 | 66.9 ± 1.6 | 72.4 ± 1.8 | 69.7 ± 1.1 | 70.8 ± 1.3 | 68.8 ± 1.1 | 39.4 ± 2.1 | ||
| 0.52 ± 0.02 | 0.61 ± 0.02 | 36.0 ± 48.0 | 64.0 ± 48.0 | 50.0 ± 0.0 | 18.0 ± 24.0 | 24.0 ± 32.0 | 0.0 ± 0.0 | ||
| 0.9 ± 0.01 | 81.4 ± 1.1 | 81.5 ± 1.6 | 81.4 ± 1.0 | 81.5 ± 1.3 | 81.4 ± 0.9 | 62.9 ± 2.0 | |||
| 83.0 ± 2.3 | 82.5 ± 2.2 | ||||||||
| 0.5 ± 0.01 | 0.51 ± 0.01 | 0.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 66.7 ± 0.0 | 0.0 ± 0.0 | |||
| HMDD3 | 0.91 ± 0.0 | 0.91 ± 0.01 | 83.8 ± 0.8 | 82.0 ± 0.9 | 82.9 ± 0.6 | 82.3 ± 0.8 | 83.0 ± 0.6 | 65.8 ± 1.2 | |
| 0.89 ± 0.01 | 0.89 ± 0.01 | 84.6 ± 1.7 | 80.7 ± 2.1 | 82.7 ± 0.7 | 81.5 ± 1.5 | 83.0 ± 0.7 | 65.4 ± 1.4 | ||
| 0.76 ± 0.01 | 0.71 ± 0.01 | 71.6 ± 1.2 | 74.4 ± 1.1 | 73.0 ± 0.6 | 73.7 ± 0.7 | 72.6 ± 0.7 | 46.1 ± 1.1 | ||
| 0.48 ± 0.01 | 0.59 ± 0.01 | 48.0 ± 50.0 | 52.0 ± 50.0 | 50.0 ± 0.0 | 24.0 ± 25.0 | 32.0 ± 33.3 | 0.0 ± 0.0 | ||
| 0.91 ± 0.0 | 0.91 ± 0.01 | 84.1 ± 0.7 | 82.0 ± 1.0 | 83.0 ± 0.6 | 82.4 ± 0.8 | 83.2 ± 0.6 | 66.1 ± 1.2 | ||
| 85.2 ± 1.7 | |||||||||
| 0.5 ± 0.0 | 0.5 ± 0.0 | 0.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 66.7 ± 0.0 | 0.0 ± 0.0 |
MPM ’s average prediction scores for Down Syndrome and all 1618 miRNAs.
| Rank | miRNA | Pred. | Rank | miRNA | Pred. |
|---|---|---|---|---|---|
| . | |||||
| . | |||||
| . | |||||
| . | |||||
| . | |||||
| . | |||||
| . | |||||
| . |
The associated miRNAs are marked as italics. The model training data does not contain the association data for Down Syndrome.
MPM ’s prediction results for Down Syndrome and the miRNAs that are located on chromosome 21.
| Rank | miRNA | Pred. | Rank | miRNA | Ped. |
|---|---|---|---|---|---|
| 11 | hsa-mir-4760 | 0.172962854437391 | |||
| 12 | hsa-mir-5692b | 0.168364046134056 | |||
| 13 | hsa-mir-6508 | 0.163143029370321 | |||
| 14 | hsa-mir-6070 | 0.16232917173827 | |||
| 5 | hsa-mir-548x | 0.239129159103197 | 15 | hsa-mir-6815 | 0.159395572782035 |
| 6 | hsa-mir-3648-1 | 0.206785057828119 | 16 | hsa-mir-8069-1 | 0.155993241075239 |
| 7 | hsa-mir-4759 | 0.200771150543586 | 17 | hsa-mir-6724-1 | 0.153456269809843 |
| 8 | hsa-mir-3197 | 0.19795748172893 | 18 | hsa-mir-6501 | 0.152740622433185 |
| 9 | hsa-mir-6130 | 0.194382789321313 | 19 | hsa-mir-6814 | 0.145666592873055 |
| 10 | hsa-mir-4327 | 0.176297567535453 |
Italics is used to highlight the associated miRNAs. The model training data does not contain the association data for Down Syndrome.
The predicted association probabilities for the true positive (marked as italics) and true negative miRNAs[66] corresponding to the Parkinson disease.
| Rank | miRNA | Pred. | Rank | miRNA | Pred. | Rank | miRNA | Pred. | Rank | miRNA | Pred. | Rank | miRNA | Pred. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | hsa-mir-7-1 | 0.99 | 23 | hsa-mir-127 | 0.96 | 45 | hsa-mir-99a | 0.92 | 67 | hsa-mir-25 | 0.85 | 89 | hsa-mir-149 | 0.62 |
| 2 | hsa-mir-30d | 0.99 | 24 | hsa-mir-145 | 0.96 | 46 | hsa-mir-19a | 0.92 | 68 | hsa-mir-23a | 0.85 | 90 | hsa-mir-1264 | 0.62 |
| 0.99 | 25 | hsa-mir-195 | 0.96 | 0.92 | 69 | hsa-mir-191 | 0.85 | 91 | hsa-mir-744 | 0.61 | ||||
| 0.99 | 0.96 | 48 | hsa-mir-1301 | 0.91 | 70 | hsa-mir-140 | 0.84 | 92 | hsa-mir-301b | 0.6 | ||||
| 5 | hsa-mir-335 | 0.99 | 27 | hsa-mir-338 | 0.96 | 49 | hsa-mir-30b | 0.91 | 71 | hsa-mir-136 | 0.83 | 93 | hsa-mir-154 | 0.59 |
| 0.99 | 28 | hsa-mir-222 | 0.96 | 50 | hsa-mir-152 | 0.9 | 72 | hsa-mir-16-2 | 0.82 | 94 | hsa-mir-184 | 0.55 | ||
| 0.98 | 0.96 | 51 | hsa-mir-125b-2 | 0.9 | 73 | hsa-mir-98 | 0.82 | 95 | hsa-mir-223 | 0.54 | ||||
| 0.98 | 30 | hsa-mir-22 | 0.96 | 52 | hsa-mir-125a | 0.9 | 74 | hsa-mir-27b | 0.81 | 96 | hsa-mir-532 | 0.49 | ||
| 9 | hsa-mir-151a | 0.98 | 31 | hsa-mir-299 | 0.96 | 53 | hsa-mir-137 | 0.9 | 75 | hsa-mir-345 | 0.81 | 97 | hsa-mir-1296 | 0.48 |
| 10 | hsa-mir-126 | 0.98 | 32 | hsa-mir-424 | 0.95 | 54 | hsa-mir-204 | 0.89 | 76 | hsa-mir-142 | 0.8 | 98 | hsa-mir-873 | 0.44 |
| 11 | hsa-mir-7-2 | 0.98 | 33 | hsa-mir-21 | 0.95 | 55 | hsa-mir-224 | 0.89 | 77 | hsa-mir-708 | 0.8 | 99 | hsa-mir-125b-1 | 0.42 |
| 12 | hsa-mir-146b | 0.98 | 34 | hsa-mir-17 | 0.95 | 56 | hsa-mir-148b | 0.89 | 78 | hsa-mir-1249 | 0.78 | 100 | hsa-mir-1298 | 0.35 |
| 13 | hsa-mir-29b-2 | 0.98 | 35 | hsa-mir-148a | 0.94 | 57 | hsa-mir-409 | 0.89 | 79 | hsa-mir-190a | 0.78 | 101 | hsa-mir-939 | 0.34 |
| 14 | hsa-mir-30a | 0.98 | 36 | hsa-mir-143 | 0.94 | 58 | hsa-mir-504 | 0.89 | 80 | hsa-mir-129-1 | 0.77 | 102 | hsa-mir-488 | 0.29 |
| 15 | hsa-mir-199b | 0.98 | 37 | hsa-mir-28 | 0.94 | 59 | hsa-mir-186 | 0.89 | 81 | hsa-mir-331 | 0.76 | 103 | hsa-mir-330 | 0.24 |
| 16 | hsa-mir-34c | 0.98 | 38 | hsa-mir-425 | 0.93 | 60 | hsa-mir-448 | 0.88 | 82 | hsa-mir-181c | 0.75 | 104 | hsa-mir-192 | 0.2 |
| 0.98 | 39 | hsa-mir-10b | 0.93 | 61 | hsa-mir-769 | 0.87 | 83 | hsa-mir-150 | 0.73 | 105 | hsa-mir-626 | 0.19 | ||
| 0.97 | 0.93 | 62 | hsa-mir-1248 | 0.87 | 84 | hsa-mir-489 | 0.72 | 106 | hsa-mir-26b | 0.16 | ||||
| 0.97 | 41 | hsa-mir-99b | 0.93 | 63 | hsa-mir-92a-2 | 0.87 | 85 | hsa-mir-505 | 0.68 | 107 | hsa-mir-577 | 0.16 | ||
| 20 | hsa-mir-10a | 0.97 | 42 | hsa-mir-543 | 0.93 | 64 | hsa-mir-328 | 0.86 | 86 | hsa-mir-203a | 0.67 | 108 | hsa-mir-654 | 0.15 |
| 21 | hsa-mir-16-1 | 0.97 | 43 | hsa-mir-34b | 0.93 | 65 | hsa-mir-92a-1 | 0.86 | 87 | hsa-mir-454 | 0.65 | 109 | hsa-mir-378a | 0.15 |
| 22 | hsa-mir-30c-2 | 0.97 | 44 | hsa-mir-431 | 0.92 | 66 | hsa-mir-20a | 0.85 | 88 | hsa-mir-130a | 0.64 | 110 | hsa-mir-501 | 0.12 |
Figure 1The Kaplan survival curve of PBLL patients.
The top miRNAs with the highest prediction scores that appear in —the list of associated miRNAs output from the survival analysis.
| Rank | miRNA | Pred. | Rank | miRNA | Pred. | Rank | miRNA | Pred. | Rank | miRNA | Pred. |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | hsa-mir-17 | 0.98 | 17 | hsa-mir-145 | 0.93 | 37 | hsa-mir-130a | 0.84 | 58 | hsa-mir-200c | 0.75 |
| 3 | hsa-mir-20a | 0.98 | 18 | hsa-mir-143 | 0.92 | 38 | hsa-mir-125a | 0.83 | 61 | hsa-mir-149 | 0.75 |
| 4 | hsa-mir-155 | 0.98 | 19 | hsa-mir-26a-1 | 0.92 | 40 | hsa-mir-204 | 0.83 | 62 | hsa-mir-100 | 0.74 |
| 5 | hsa-mir-16-1 | 0.97 | 23 | hsa-mir-31 | 0.91 | 45 | hsa-mir-122 | 0.81 | 63 | hsa-mir-200b | 0.74 |
| 6 | hsa-mir-150 | 0.97 | 24 | hsa-mir-181a-2 | 0.9 | 46 | hsa-mir-25 | 0.81 | 64 | hsa-mir-192 | 0.74 |
| 7 | hsa-mir-34a | 0.96 | 25 | hsa-mir-19b-1 | 0.9 | 47 | hsa-mir-15b | 0.81 | 71 | hsa-mir-16-2 | 0.73 |
| 9 | hsa-mir-146a | 0.95 | 27 | hsa-mir-22 | 0.89 | 48 | hsa-mir-148a | 0.8 | 72 | hsa-mir-98 | 0.73 |
| 10 | hsa-mir-18a | 0.95 | 29 | hsa-mir-92a-1 | 0.86 | 51 | hsa-mir-132 | 0.79 | 73 | hsa-mir-107 | 0.72 |
| 14 | hsa-mir-19a | 0.94 | 31 | hsa-mir-106b | 0.85 | 54 | hsa-mir-106a | 0.78 | 75 | hsa-mir-335 | 0.72 |
| 15 | hsa-mir-15a | 0.94 | 33 | hsa-mir-181b-1 | 0.85 | 56 | hsa-mir-378a | 0.76 | 76 | hsa-mir-26b | 0.72 |
Figure 2Kaplan–Meyer survival curves of PBLL patients stratified by the top miRNAs with the top highest prediction scores.
Figure 3MPM’s architecture. MPM consists of a message passing layer (section “The message passing framework/module”) , a feature selection with a side supervised task (section “The feature selection module”), a Structural Deep Embedding network (section “The structural embedding learning”), and a binary classifier (section “The classification module”).
Statistics for the side data sources. |E| denotes the number of interactions/associations. represent the number of miRNAs, diseases, and PCGs, respectively.
| Network | | | |||
|---|---|---|---|---|
| miRNA-PCG | 345,357 | 1618 | – | 23,611 |
| Disease-PCG | 510,782 | – | 3679 | 23,611 |
| Protein functional interactions | 423,672 | 23,611 |
Figure 4An example of the protein functional interaction network with the various relation types highlighted by different colors.
Figure 5An example of how a message passing framework functions. The numbers inside the circles indicate nodes’ IDs. ‘w’ indicates the node feature weight (as described in section “The message passing framework/module”). In the first iteration, new weights for nodes 4, 6, 7 are calculated according to equation (1). Only the weight for node 6 gets updated during the second iteration.
Figure 6The final miRNA-disease input pair representation.