| Literature DB >> 32884018 |
Prabina Kumar Meher1, Subhrajit Satpathy1, Atmakuri Ramakrishna Rao2.
Abstract
MicroRNAs (miRNAs) are one kind of non-coding RNA, play vital role in regulating several physiological and developmental processes. Subcellular localization of miRNAs and their abundance in the native cell are central for maintaining physiological homeostasis. Besides, RNA silencing activity of miRNAs is also influenced by their localization and stability. Thus, development of computational method for subcellular localization prediction of miRNAs is desired. In this work, we have proposed a computational method for predicting subcellular localizations of miRNAs based on principal component scores of thermodynamic, structural properties and pseudo compositions of di-nucleotides. Prediction accuracy was analyzed following fivefold cross validation, where ~ 63-71% of AUC-ROC and ~ 69-76% of AUC-PR were observed. While evaluated with independent test set, > 50% localizations were found to be correctly predicted. Besides, the developed computational model achieved higher accuracy than the existing methods. A user-friendly prediction server "miRNALoc" is freely accessible at https://cabgrid.res.in:8080/mirnaloc/ , by which the user can predict localizations of miRNAs.Entities:
Year: 2020 PMID: 32884018 PMCID: PMC7471944 DOI: 10.1038/s41598-020-71381-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of the positive and negative datasets.
| Localization type | Positive | ND-I | ND-II |
|---|---|---|---|
| Axon | 16 | 830 | 951 |
| Circulating | 69 | 775 | |
| Cytoplasm | 67 | 808 | |
| Exosome | 524 | 415 | |
| Extracellular vesicle | 25 | 829 | |
| Microvesicle | 21 | 818 | |
| Mitochondrion | 191 | 659 | |
| Nucleus | 42 | 799 |
Last column represents the negative dataset collected from miRBase database. Number of sequences presented are obtained after removing redundancy with sequence identity cut-off 0.8 using CD-HIT program.
Figure 1(A) Distribution of sequences of the test set over number of localizations. (B) Number of sequences of the test set present in different locations. (C) Distribution of sequences in more than one locations. (D) Heat map of AUC-ROC for four different kernels with all the four feature sets. (E) ROC curves for all the four festure sets with all the eight localizations.
Optimum parametric values of RBF kernel for prediction of miRNA in eight subcellular localizations, where sample datasets are used for optimization analysis.
| Localization | γ (gamma) | C (cost) | error |
|---|---|---|---|
| Axon | 0.125 | 2 | 0.05 |
| Circulating | 0.25 | 2 | 0.145 |
| Cytoplasm | 0.125 | 1 | 0.137 |
| Exosome | 0.065 | 8 | 0.121 |
| Extracellular vesicle | 0.125 | 2 | 0.11 |
| Microvesicle | 0.125 | 1 | 0.106 |
| Mitochondrion | 0.125 | 4 | 0.081 |
| Nucleus | 0.125 | 2 | 0.112 |
Prediction accuracy of the proposed model (SVM with PrinComp features).
| Class | First dataset (Positive + ND-I) | Second dataset (Positive + ND-II) | ||
|---|---|---|---|---|
| AUC-ROC | AUC-PR | AUC-ROC | AUC-PR | |
| Axon | 0.715 (0.062) | 0.761 (0.071) | 0.714 (0.053) | 0.765 (0.062) |
| Circulating | 0.675 (0.037) | 0.696 (0.047) | 0.744 (0.027) | 0.782 (0.031) |
| Cytoplasm | 0.671 (0.033) | 0.690 (0.047) | 0.712 (0.027) | 0.752 (0.035) |
| Exosome | 0.971 (0.005) | 0.973 (0.004) | 0.452 (0.019) | 0.505 (0.014) |
| Extracellular Vesicle | 0.702 (0.058) | 0.700 (0.076) | 0.755 (0.043) | 0.765 (0.064) |
| Microvesicle | 0.717 (0.043) | 0.792 (0.039) | 0.749 (0.047) | 0.810 (0.049) |
| Mitochondrion | 0.672 (0.017) | 0.734 (0.024) | 0.712 (0.014) | 0.773 (0.019) |
| Nucleus | 0.635 (0.043) | 0.704 (0.055) | 0.646 (0.041) | 0.719 (0.055) |
Accuracies are measured following fivefold cross validation procedure, where the experiment was repeated 100 times.
Values inside brackets denote standard error.
Estimates of the performance metrics for the proposed model (SVM with PrinComp features).
| Dataset | Localization | Sensitivity | Specificity | F1-score | MCC |
|---|---|---|---|---|---|
| First dataset (Positive + ND-I) | Axon | 0.704 ± 0.027 | 0.740 ± 0.011 | 0.721 ± 0.016 | 0.695 ± 0.031 |
| Circulating | 0.631 ± 0.030 | 0.728 ± 0.008 | 0.676 ± 0.018 | 0.613 ± 0.029 | |
| Cytoplasm | 0.657 ± 0.023 | 0.690 ± 0.016 | 0.672 ± 0.017 | 0.597 ± 0.033 | |
| Exosome | 0.724 ± 0.004 | 0.686 ± 0.004 | 0.706 ± 0.003 | 0.661 ± 0.006 | |
| Extracellular vesicle | 0.713 ± 0.021 | 0.731 ± 0.010 | 0.722 ± 0.013 | 0.694 ± 0.025 | |
| Microvesicle | 0.674 ± 0.027 | 0.742 ± 0.008 | 0.706 ± 0.015 | 0.669 ± 0.027 | |
| Mitochondrion | 0.646 ± 0.015 | 0.665 ± 0.011 | 0.654 ± 0.010 | 0.561 ± 0.019 | |
| Nucleus | 0.513 ± 0.048 | 0.741 ± 0.011 | 0.610 ± 0.031 | 0.524 ± 0.041 | |
| Second dataset (Positive + ND-II) | Axon | 0.747 ± 0.023 | 0.791 ± 0.008 | 0.768 ± 0.013 | 0.739 ± 0.025 |
| Circulating | 0.689 ± 0.020 | 0.786 ± 0.007 | 0.734 ± 0.012 | 0.679 ± 0.020 | |
| Cytoplasm | 0.717 ± 0.021 | 0.741 ± 0.014 | 0.728 ± 0.014 | 0.658 ± 0.026 | |
| Exosome | 0.615 ± 0.007 | 0.684 ± 0.006 | 0.644 ± 0.006 | 0.501 ± 0.011 | |
| Extracellular vesicle | 0.774 ± 0.013 | 0.787 ± 0.010 | 0.780 ± 0.008 | 0.761 ± 0.016 | |
| Microvesicle | 0.753 ± 0.021 | 0.787 ± 0.010 | 0.769 ± 0.013 | 0.740 ± 0.024 | |
| Mitochondrion | 0.694 ± 0.015 | 0.731 ± 0.011 | 0.711 ± 0.010 | 0.626 ± 0.019 | |
| Nucleus | 0.557 ± 0.035 | 0.788 ± 0.012 | 0.656 ± 0.024 | 0.566 ± 0.034 |
Accuracies are computed following fivefold cross validation procedure, where the experiment was repeated 100 times for each localization.
Figure 2Accuracy of machine learning methods in terms of AUC-ROC and AUC-PR with regard to prediction of localizations of miRNAs.
Figure 3(A) Number of sequences observed and correctly predicted in different localizations. (B) Confusion matrix of the number of localizations observed and predicted.
Figure 4Snapshot of the (A) web server and (B) result page.