| Literature DB >> 28874691 |
Denis Pallez1, Julien Gardès2, Claude Pasquier3.
Abstract
MicroRNAs, small non-coding elements implied in gene regulation, are very interesting biomarkers for various diseases such as cancers. They represent potential prodigious biotechnologies for early diagnosis and gene therapies. However, experimental verification of microRNA-disease associations are time-consuming and costly, so that computational modeling is a proper solution. Previously, we designed MiRAI, a predictive method based on distributional semantics, to identify new associations between microRNA molecules and human diseases. Our preliminary results showed very good prediction scores compared to other available methods. However, MiRAI performances depend on numerous parameters that cannot be tuned manually. In this study, a parallel evolutionary algorithm is proposed for finding an optimal configuration of our predictive method. The automatically parametrized version of MiRAI achieved excellent performance. It highlighted new miRNA-disease associations, especially the potential implication of mir-188 and mir-795 in various diseases. In addition, our method allowed to detect several putative false associations contained in the reference database.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28874691 PMCID: PMC5585369 DOI: 10.1038/s41598-017-10065-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustration of the method. (a) miRNAs are characterized by several kinds of data that are stored in distinct matrices. (b) Each matrix is processed by a dedicated method for transforming it into a weighted matrix where the strength of an association between a miRNA and a characteristic is represented by a float number. (c) Concatenation of matrices (d) similarities and dissimilarities between miRNAs and diseases are highlighted by LSA. (e) Evolutionary computation is used for selecting the data sources to use, for tuning the matrices transformations, for determining the size of the latent space and for choosing to whether expand or not the terms used for LSA queries.
Description and encoding of parameters.
| Id | Description of the parameters | Type | Binary encoding |
|---|---|---|---|
| a | Inclusion of data sources | ||
| Use of target names | 1 binary | bit 1 | |
| Use of family data | 1 binary | bit 2 | |
| Use the proximity with neighbor miRNAs | 1 binary | bit 3 | |
| Use the abstract of associated PubMed papers | 1 binary | bit 4 | |
| Use the abstract of associated MiRBase papers | 1 binary | bit 5 | |
| Use the description of the miRNA in MiRBase | 1 binary | bit 6 | |
| b | Transformation of data sources | ||
| Applying NBI on miRNA-target links | 1 binary | bit 7 | |
| Applying TF-IDF on PubMed abstracts | 1 binary | bit 8 | |
| Applying TF-IDF on MiRBase abstracts | 1 binary | bit 9 | |
| Applying TF-IDF onMiRBase descriptions | 1 binary | bit 10 | |
| Inference of subsumed diseases in matrix | 1 binary | bit 11 | |
| Discretization of disease similarities | x floats | bits [12–30] | |
| c | Dimension of reduced space | 1 integer | bits [31–34] |
| d | Inference of subsumed disease in the query | 1 binary | bit 35 |
Parameter values of algorithm 1.
| surrogate model exploration/exploitation |
|
| population size |
|
| number of cores |
|
| max. real evaluations |
|
| max. surrogate evaluations |
|
| scaling factor |
|
| crossover rate |
|
| individual length’s |
|
| probability estimation operator |
|
Figure 2Average population fitness during MiRAI optimization (average on 9 independent runs).
Figure 3Best population fitness during MiRAI optimization (average on 9 independent runs).
Figure 4Average precision obtained for 10 different level of recall using 15 or 83 diseases.
Prediction results for diseases associated with the largest number of miRNAs.
| Disease name | RWRMDA[ | Chen | HDMP[ | RLSMDA[ | MIDP[ | Liu | MiRAI[ | MiRAI + EA |
|---|---|---|---|---|---|---|---|---|
| 2017 | ||||||||
| Acute myeloid leukemia | 0.839 | 0.716 | 0.858 | 0.853 | 0.913 | 0.871 | 0.895 | 0.906 |
| Breast neoplasms | 0.785 | 0.653 | 0.801 | 0.832 | 0.838 | 0.826 | 0.864 | 0.858 |
| Colorectal neoplasms | 0.793 | 0.662 | 0.802 | 0.831 | 0.845 | 0.833 | 0.864 | 0.868 |
| Glioblastoma | 0.68 | 0.607 | 0.7 | 0.714 | 0.786 | 0.839 | 0.898 | 0.872 |
| Heart failure | 0.722 | 0.761 | 0.77 | 0.738 | 0.821 | 0.812 | 0.796 | 0.847 |
| Liver carcinoma | 0.749 | 0.613 | 0.759 | 0.794 | 0.807 | 0.802 | 0.808 | 0.825 |
| Lung neoplasms | 0.827 | 0.606 | 0.835 | 0.855 | 0.876 | 0.925 | 0.904 | 0.926 |
| Melanoma | 0.784 | 0.642 | 0.79 | 0.807 | 0.837 | 0.834 | 0.849 | 0.875 |
| Ovarian neoplasms | 0.882 | 0.644 | 0.884 | 0.909 | 0.923 | 0.896 | 0.874 | 0.906 |
| Pancreatic neoplasms | 0.871 | 0.684 | 0.895 | 0.887 | 0.945 | 0.901 | 0.928 | 0.925 |
| Prostatic neoplasms | 0.823 | 0.629 | 0.854 | 0.841 | 0.882 | 0.842 | 0.871 | 0.872 |
| Renal cell carcinoma | 0.815 | 0.627 | 0.833 | 0.839 | 0.862 | 0.815 | 0.869 | 0.882 |
| Squamous carcinoma | 0.819 | 0.676 | 0.82 | 0.849 | 0.87 | 0.872 | 0.883 | 0.888 |
| Stomach neoplasms | 0.779 | 0.628 | 0.787 | 0.797 | 0.821 | 0.798 | 0.815 | 0.848 |
| Bladder neoplasms | 0.821 | 0.632 | 0.85 | 0.845 | 0.897 | 0.851 | 0.884 | 0.900 |
|
|
|
|
|
|
|
|
|
|
The AUC scores of MiRAI configured with an evolutionary algorithm (MiRAI + EA) are compared with the scores of manually configured MiRAI and 6 other methods.
Figure 5Main biological processes and pathways known for mir-188 and mir-765.