| Literature DB >> 26640803 |
Yingli Zhong1, Ping Xuan1, Ke Han2, Weiping Zhang1, Jianzhong Li1.
Abstract
MicroRNAs (miRNAs) play important roles in the diverse biological processes of animals and plants. Although the prediction methods based on machine learning can identify nonhomologous and species-specific miRNAs, they suffered from severe class imbalance on real and pseudo pre-miRNAs. We propose a pre-miRNA classification method based on cost-sensitive ensemble learning and refer to it as MiRNAClassify. Through a series of iterations, the information of all the positive and negative samples is completely exploited. In each iteration, a new classification instance is trained by the equal number of positive and negative samples. In this way, the negative effect of class imbalance is efficiently relieved. The new instance primarily focuses on those samples that are easy to be misclassified. In addition, the positive samples are assigned higher cost weight than the negative samples. MiRNAClassify is compared with several state-of-the-art methods and some well-known classification models by testing the datasets about human, animal, and plant. The result of cross validation indicates that MiRNAClassify significantly outperforms other methods and models. In addition, the newly added pre-miRNAs are used to further evaluate the ability of these methods to discover novel pre-miRNAs. MiRNAClassify still achieves consistently superior performance and can discover more pre-miRNAs.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26640803 PMCID: PMC4657081 DOI: 10.1155/2015/960108
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Constructing the integrated model to classify the real/pseudo pre-miRNAs.
Algorithm 1Algorithm of classifying the real/pseudo pre-miRNAs based on cost-sensitive ensemble learning.
Figure 2Comparison of the performances of MiRNAClassify and microPred.
Datasets and detailed classification results of MiRNAClassify and microPred.
| Method | Species | Dataset | Type | Size | SE (%) | SP (%) |
|
|---|---|---|---|---|---|---|---|
| MiRNAClassify |
| MP_positive_set | Real | 691 | 96.40 | 97.03 | 96.71 |
| microPred | MP_negative_set | Pseudo | 9248 | 90.65 | 92.16 | 91.40 | |
| MiRNAClassify | MP_updated_set | Real | 1186 | 90.56 | |||
| microPred | 85.16 |
Figure 3Comparison of the performances of MiRNAClassify and PlantMiRNAPred.
Datasets and detailed classification results of MiRNAClassify and PlantMiRNAPred.
| Method | Species | Dataset | Type | Size | SE (%) | SP (%) |
|
|---|---|---|---|---|---|---|---|
| MiRNAClassify |
| PMP_positive_set | Real | 2043 | 96.57 | 98.35 | 97.46 |
| PlantMiRNAPred | PMP_negative_set | Pseudo | 2122 | 95.10 | 97.17 | 96.13 | |
|
| |||||||
| MiRNAClassify |
| PMP_ath_updated_set | Real | 138 | 92.75 | ||
| PlantMiRNAPred | 90.58 | ||||||
|
| |||||||
| MiRNAClassify |
| PMP_osa_updated_set | Real | 178 | 77.53 | ||
| PlantMiRNAPred | 67.42 | ||||||
|
| |||||||
| MiRNAClassify |
| PMP_gma_updated_set | Real | 488 | 90.78 | ||
| PlantMiRNAPred | 86.07 | ||||||
Figure 4Comparison of the performances of MiRNAClassify and HuntMi based on 5-fold cross validation.
Datasets and detailed classification results of MiRNAClassify and HuntMi.
| Method | Species | Dataset | Type | Size | SE (%) | SP (%) |
|
|---|---|---|---|---|---|---|---|
| MiRNAClassify |
| HM_hsa_positive_set | Real | 1406 | 97.15 | 98.35 | 97.75 |
| HuntMi | HM_hsa_negative_set | Pseudo | 81228 | 95.02 | 96.94 | 95.98 | |
|
| |||||||
| MiRNAClassify | Animal | HM_animal_positive_set | Real | 7053 | 95.74 | 97.55 | 96.64 |
| HuntMi | HM_animal_negative_set | Pseudo | 218154 | 94.11 | 95.95 | 95.03 | |
|
| |||||||
| MiRNAClassify | Plant | HM_plant_positive_set | Real | 2172 | 93.32 | 97.82 | 95.54 |
| HuntMi | HM_plant_negative_set | Pseudo | 114929 | 91.71 | 95.87 | 93.77 | |
|
| |||||||
| MiRNAClassify |
| HM_hsa_updated_set | Real | 445 | 93.93 | ||
| HuntMi | 92.14 | ||||||
|
| |||||||
| MiRNAClassify |
| HM_ath_updated_set | Real | 68 | 92.65 | ||
| HuntMi | 91.18 | ||||||
|
| |||||||
| MiRNAClassify |
| HM_osa_updated_set | Real | 169 | 75.74 | ||
| HuntMi | 69.82 | ||||||
|
| |||||||
| MiRNAClassify |
| HM_gma_updated_set | Real | 302 | 92.72 | ||
| HuntMi | 88.41 | ||||||
Figure 5Comparison of the performances of MiRNAClassify and HuntMi on the newly added data.
Datasets and detailed classification results of MiRNAClassify and miRNApre.
| Method | Species | Dataset | Type | Size | SE (%) | SP (%) |
|
|---|---|---|---|---|---|---|---|
| MiRNAClassify |
| MP_positive_set | Real | 1496 | 98.33 | 98.27 | 98.29 |
| miRNApre | MP_negative_set | Pseudo | 1446 | 97.66 | 97.23 | 97.44 | |
| MiRNAClassify | MP_updated_set | Real | 355 | 91.27 | |||
| miRNApre | 90.14 |
Figure 6Comparison of the performances of MiRNAClassify and miRNApre.
The classification results of MiRNAClassify and other methods over the merged datasets and the updated datasets.
| Method | Accuracy (%) | Merged human dataset | Merged animal dataset | Merged plant dataset | Updated human dataset | Updated ath dataset | Updated osa dataset | Updated gma dataset |
|---|---|---|---|---|---|---|---|---|
| MiRNAClassify | SE | 97.93 | 95.85 | 93.37 | ||||
| SP | 98.30 | 97.62 | 97.91 | 94.08 | 92.65 | 79.29 | 93.05 | |
|
| 98.11 | 96.73 | 95.61 | |||||
|
| ||||||||
| MicroPred | SE | 92.25 | 91.61 | 89.50 | ||||
| SP | 95.70 | 94.85 | 93.10 | 91.27 | 89.71 | 66.86 | 85.76 | |
|
| 93.96 | 93.21 | 91.28 | |||||
|
| ||||||||
| PlantMiRNAPred | SE | 93.58 | 92.70 | 91.39 | ||||
| SP | 91.20 | 88.60 | 87.10 | 92.11 | 89.71 | 70.42 | 88.41 | |
|
| 92.38 | 90.63 | 89.22 | |||||
|
| ||||||||
| HuntMi | SE | 95.32 | 94.14 | 91.76 | ||||
| SP | 97.11 | 96.07 | 95.94 | 92.68 | 91.18 | 72.78 | 89.07 | |
|
| 96.21 | 95.10 | 93.83 | |||||
|
| ||||||||
| miRNApre | SE | 97.39 | 93.49 | 91.71 | ||||
| SP | 90.90 | 89.80 | 88.10 | 90.14 | 89.71 | 71.01 | 86.09 | |
|
| 94.09 | 91.63 | 89.89 | |||||
The statistic results obtained by using paired t-test over the prediction performance of MiRNAClassify and that of another method.
| Different datasets | microPred | PlantMiRNAPred | HuntMi | miRNApre |
|---|---|---|---|---|
|
| 0.0019 | 4.9374 | 9.7070 | 0.0050 |
|
| 0.0339 | 0.0284 | 0.0354 | 0.0108 |
The classification results of MiRNAClassify and three classification models over the merged datasets.
| Classification models | SE (%) | SP (%) |
|
|---|---|---|---|
| Human | |||
| SVM | 69.18 | 99.83 | 83.11 |
| SVM + SMOTE | 92.25 | 95.70 | 93.96 |
| Naive Bayes | 87.43 | 96.12 | 91.67 |
| Naive Bayes + SMOTE | 90.24 | 94.43 | 92.31 |
| Random Forest | 67.78 | 99.82 | 82.26 |
| Random Forest + SMOTE | 91.51 | 95.34 | 93.41 |
| MiRNAClassify | 97.93 | 98.30 | 98.11 |
|
| |||
| Animal | |||
| SVM | 69.03 | 98.14 | 82.31 |
| SVM + SMOTE | 91.61 | 94.85 | 93.21 |
| Naive Bayes | 85.04 | 95.03 | 89.90 |
| Naive Bayes + SMOTE | 90.83 | 92.61 | 91.71 |
| Random Forest | 69.52 | 98.72 | 82.84 |
| Random Forest + SMOTE | 91.12 | 95.01 | 93.05 |
| MiRNAClassify | 95.85 | 97.62 | 96.73 |
|
| |||
| Plant | |||
| SVM | 68.51 | 99.24 | 82.45 |
| SVM + SMOTE | 89.50 | 93.10 | 91.28 |
| Naive Bayes | 82.91 | 96.75 | 89.57 |
| Naive Bayes + SMOTE | 87.20 | 92.61 | 89.86 |
| Random Forest | 68.32 | 99.35 | 82.39 |
| Random Forest + SMOTE | 89.18 | 92.87 | 91.01 |
| MiRNAClassify | 93.37 | 97.91 | 95.61 |
The statistic results obtained by using paired t-test over the prediction performance of MiRNAClassify and that of another classification model.
|
| SVM | Naive Bayes | Random Forest |
| 7.3053 | 6.2651 | 0.0015 | |
|
| |||
|
| SVM + SMOTE | Naive Bayes + SMOTE | Random Forest + SMOTE |
| 0.0019 | 0.0010 | 0.0028 | |