| Literature DB >> 33897768 |
Mingyan Tang1, Chenzhe Liu1, Dayun Liu1, Junyi Liu1, Jiaqi Liu1, Lei Deng1.
Abstract
MicroRNAs (miRNAs) are non-coding RNA molecules that make a significant contribution to diverse biological processes, and their mutations and dysregulations are closely related to the occurrence, development, and treatment of human diseases. Therefore, identification of potential miRNA-disease associations contributes to elucidating the pathogenesis of tumorigenesis and seeking the effective treatment method for diseases. Due to the expensive cost of traditional biological experiments of determining associations between miRNAs and diseases, increasing numbers of effective computational models are being used to compensate for this limitation. In this study, we propose a novel computational method, named PMDFI, which is an ensemble learning method to predict potential miRNA-disease associations based on high-order feature interactions. We initially use a stacked autoencoder to extract meaningful high-order features from the original similarity matrix, and then perform feature interactive learning, and finally utilize an integrated model composed of multiple random forests and logistic regression to make comprehensive predictions. The experimental results illustrate that PMDFI achieves excellent performance in predicting potential miRNA-disease associations, with the average area under the ROC curve scores of 0.9404 and 0.9415 in 5-fold and 10-fold cross-validation, respectively.Entities:
Keywords: feature interactions; high-order features; logistic regression; miRNA-disease associations; random forest
Year: 2021 PMID: 33897768 PMCID: PMC8063614 DOI: 10.3389/fgene.2021.656107
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flowchart of PMDFI model to predict potential microRNAs (miRNAs)–diseases associations. The model can be divided into four parts: data set collection and processing, high-order feature extraction, feature interaction, and an integrated learning model. First, we gather miRNA–disease associations from HMDD v2.0, and form the similarity matrix between miRNA and disease; second, we adopt a stacked autoencoders to extract high-order features; then, we use the interaction features layer to learn the interaction between different features. Finally, we combine multiple random forest (RF) with logistic regression to predict potential miRNA–disease associations.
Figure 2Extract high-order features based on autoencoder.
The results of 5-fold and 10-fold cross-validation obtained by PMDFI.
| 5-CV | 0.9404 | 0.9373 | 0.8663 | 0.8812 | 0.8736 |
| 10-CV | 0.9415 | 0.9385 | 0.8669 | 0.8832 | 0.8748 |
The comparison of different methods based on 5-fold cross-validation.
| PMDFI | 0.9404 | 0.9373 | 0.8663 | 0.8812 | 0.8736 |
| GBDT-LR | 0.9274 | 0.9014 | 0.8315 | 0.8273 | 0.8302 |
| LMTRDA | 0.8479 | 0.8217 | 0.8013 | 0.6190 | 0.7076 |
| RFMDA | 0.7388 | 0.7034 | 0.6253 | 0.9548 | 0.7453 |
Figure 3Histograms of the results of different methods based on 5-fold cross-validation.
Comparison of the performance of four interactive cross features.
| D1( | 0.9106 | 0.9093 | 0.8289 | 0.8388 | 0.8338 |
| D2( | 0.9283 | 0.9240 | 0.8513 | 0.8692 | 0.8601 |
| D3( | 0.9239 | 0.9193 | 0.8381 | 0.8642 | 0.8509 |
| D4( | 0.9392 | 0.9334 | 0.8630 | 0.8834 | 0.8730 |
| PMDFI | 0.9404 | 0.9373 | 0.8663 | 0.8812 | 0.8736 |
Figure 4Line chart of area under the ROC curve (AUC) and area under the PR curve (AUPR) scores of different interaction cross features.
Figure 5The ROC curves of different classifier models.
The specific outcomes based on different feature representation methods.
| FeaRep1 | 0.9083 | 0.9119 | 0.8430 | 0.8543 | 0.8486 |
| FeaRep2 | 0.9307 | 0.9252 | 0.8554 | 0.8731 | 0.8641 |
| FeaRep3 | 0.9367 | 0.9327 | 0.8619 | 0.8746 | 0.8682 |
| PMDFI | 0.9404 | 0.9373 | 0.8663 | 0.8812 | 0.8736 |
Figure 6Histograms of comparison of performance based on different feature representation methods.
The candidate miRNAs associated with breast cancer, melanoma, and lymphoma.
| hsa-mir-150 | dbDEMC 2.0;miRCancer | |
| hsa-mir-15b | dbDEMC 2.0 | |
| hsa-mir-130a | dbDEMC 2.0;miRCancer | |
| hsa-mir-196b | dbDEMC 2.0 | |
| Breast cancer | hsa-mir-98 | dbDEMC 2.0;miRCancer |
| hsa-mir-106a | dbDEMC 2.0;miRCancer | |
| hsa-mir-142 | miRCancer | |
| hsa-mir-378a | Unconfirmed | |
| hsa-mir-30e | miRCancer | |
| hsa-mir-372 | dbDEMC 2.0;miRCancer | |
| hsa-mir-150 | miRCancer | |
| hsa-mir-373 | miRCancer | |
| hsa-mir-127 | dbDEMC 2.0 | |
| hsa-mir-181b | dbDEMC 2.0 | |
| Melanoma | hsa-mir-10b | dbDEMC 2.0;miRCancer |
| hsa-mir-224 | dbDEMC 2.0;miRCancer | |
| hsa-mir-101 | dbDEMC 2.0;miRCancer | |
| hsa-mir-223 | dbDEMC 2.0 | |
| hsa-mir-27a | dbDEMC 2.0;miRCancer | |
| hsa-mir-30c | dbDEMC 2.0 | |
| hsa-mir-34a | dbDEMC 2.0;miRCancer | |
| hsa-mir-34c | Unconfirmed | |
| hsa-mir-9 | dbDEMC 2.0;miRCancer | |
| hsa-mir-29a | dbDEMC 2.0;miRCancer | |
| Lymphoma | hsa-mir-222 | dbDEMC 2.0 |
| hsa-mir-7a | dbDEMC 2.0 | |
| hsa-mir-29b | dbDEMC 2.0;miRCancer | |
| hsa-mir-181b | dbDEMC 2.0 | |
| hsa-mir-145 | dbDEMC 2.0;miRCancer | |
| hsa-mir-221 | dbDEMC 2.0 |