| Literature DB >> 35163534 |
Prabina Kumar Meher1, Shbana Begam2, Tanmaya Kumar Sahu3, Ajit Gupta1, Anuj Kumar1, Upendra Kumar4, Atmakuri Ramakrishna Rao5, Krishna Pal Singh6,7, Om Parkash Dhankher8.
Abstract
MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server "ASRmiRNA" has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.Entities:
Keywords: abiotic stress; computational biology; machine learning; miRNAs; stress-responsive genes
Mesh:
Substances:
Year: 2022 PMID: 35163534 PMCID: PMC8835813 DOI: 10.3390/ijms23031612
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Feature selection for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets. The optimal number of features were selected based on the higher accuracies in terms of auROC and auPRC. A total of 200, 250, and 500 features were selected for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively.
Performance metrics of support vector machine (SVM) for predicting abiotic stress associated miRNAs and Pre-mirNAs. The predictions were made with the selected feature sets. The prediction accuracies with the Pre-miRNA + miRNA dataset were found higher as compared to that of the miRNA and Pre-miRNA datasets.
| Dataset | Sen (%) | Spe (%) | Acc (%) | Pre (%) | F-Score (%) | auROC (%) | auPRC (%) |
|---|---|---|---|---|---|---|---|
| miRNA | 66.13 | 64.53 | 65.33 | 65.09 | 65.61 | 70.21 | 69.96 |
| Pre-miRNA | 69.20 | 63.60 | 66.40 | 65.53 | 67.31 | 69.71 | 65.64 |
| Pre-miRNA + miRNA | 74.00 | 68.80 | 71.40 | 70.34 | 72.12 | 77.94 | 77.32 |
Performance metrics of random forest (RF), adaptive boosting (ADB), and extreme gradient boosting (XGB) methods. The performance of RF, ADB, and XGB were analyzed using the selected feature sets for predicting abiotic stress responsive miRNAs and Pre-miRNAs. The RF method achieved higher accuracies as compared to the other two methods. Nevertheless, the accuracies were not found to be much different among the three learning methods. For all three learning methods, the accuracies are observed to be higher with the Pre-miRNA + miRNA dataset.
| Dataset | Method | Sen (%) | Spe (%) | Acc (%) | Pre (%) | F-Score (%) | auROC (%) | auPRC (%) |
|---|---|---|---|---|---|---|---|---|
| miRNA | RF | 55.20 | 58.13 | 56.66 | 56.86 | 56.02 | 58.88 | 58.25 |
| XGB | 51.21 | 56.00 | 53.61 | 53.78 | 52.46 | 54.79 | 56.03 | |
| ADB | 52.26 | 57.06 | 54.67 | 54.91 | 53.55 | 57.45 | 57.01 | |
| PremiRNA | RF | 65.60 | 58.50 | 62.20 | 61.42 | 63.44 | 64.25 | 58.03 |
| XGB | 55.61 | 56.40 | 56.00 | 56.04 | 55.82 | 58.26 | 54.91 | |
| ADB | 58.01 | 60.00 | 59.00 | 59.18 | 58.58 | 62.28 | 57.86 | |
| Pre-miRNA + miRNA | RF | 63.20 | 62.00 | 62.60 | 62.45 | 62.82 | 64.63 | 60.28 |
| XGB | 62.20 | 61.60 | 62.00 | 61.90 | 62.15 | 62.56 | 59.64 | |
| ADB | 61.60 | 59.60 | 60.60 | 60.39 | 60.99 | 63.55 | 59.96 |
Figure 2Density graphs for the probabilities of prediction for different machine learning methods. It can be seen that most of the probabilities of prediction with SVM are higher than the random guess (0.5) as compared to random forest (RF), extreme gradient boosting (XGB), and adaptive boosting (ADB) methods. The XGB is observed to be the lowest performer among the considered methods. The variability in the prediction probabilities is lowest for the ADB and highest for the XGB methods.
Prediction accuracies with 5-fold cross validation (5F-CV) and leave-one-out cross validation (LOOCV). The prediction accuracies are higher for the Pre-miRNA + miRNA as compared to the other two datasets. The performances with 5F-CV and LOOCV are similar when all the metrics are accounted for. Among the metrics, the auROC and auPRC are higher.
| Dataset | Cross-Validation | Sen (%) | Spe (%) | Acc (%) | Pre (%) | F-Score (%) | auROC (%) | auPRC (%) |
|---|---|---|---|---|---|---|---|---|
| miRNA | 5-Fold | 66.13 | 64.53 | 65.33 | 65.09 | 65.61 | 70.21 | 69.96 |
| Leave-One-Out | 67.02 | 63.56 | 65.29 | 64.78 | 65.88 | 70.28 | 70.17 | |
| Pre-miRNA | 5-Fold | 69.20 | 63.60 | 66.40 | 65.53 | 67.31 | 69.71 | 65.64 |
| Leave-One-Out | 66.93 | 64.54 | 65.74 | 65.37 | 66.14 | 70.52 | 65.48 | |
| miRNA + Pre-miRNA | 5-Fold | 74.00 | 68.80 | 71.40 | 70.34 | 72.12 | 77.94 | 77.32 |
| Leave-One-Out | 74.50 | 68.53 | 71.51 | 70.30 | 72.34 | 79.35 | 78.73 |
Summary of the independent datasets and their prediction accuracies. The accuracies are on par with the accuracies of 5-fold cross validation. The accuracies are also observed higher for the miRNA + Pre-miRNA dataset.
| Dataset | #Sequences | Performance Metrics | |||
|---|---|---|---|---|---|
| Positive | Negative | Sensitivity (%) | Specificity (%) | Accuracy (%) | |
| miRNA | 72 | 100 | 66.66 | 58.00 | 62.33 |
| Pre-miRNA | 70 | 100 | 65.71 | 64.00 | 64.85 |
| miRNA + Pre-miRNA | 70 | 100 | 71.42 | 67.00 | 69.21 |
Figure 3Flow diagram showing the steps involved in the proposed approach for prediction of abiotic stress-responsive Pre-miRNAs and miRNAs.