| Literature DB >> 21172046 |
Jiandong Ding1, Shuigeng Zhou, Jihong Guan.
Abstract
BACKGROUND: MicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21172046 PMCID: PMC3024864 DOI: 10.1186/1471-2105-11-S11-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Classification results obtained by outer 3-fold cross validation with different feature groups and feature selection
| Feature Group | num | SE(%) | SP(%) | Gm(%) | Acc(%) |
|---|---|---|---|---|---|
| 32 | 74.93 | 98.39 | 85.87 | 95.64 | |
| 15 | 98.24 | 92.65 | 97.00 | ||
| 18 | 87.07 | ||||
| 65 | 87.50 | 98.82 | 92.99 | 97.47 | |
| 32 | 87.78 | 98.88 | 93.16 | 97.58 | |
32 features selected by F-score
| Group | num | Feature |
|---|---|---|
| 8 | ||
| 8 | ||
| 16 |
Results of classifier ensembles with different aggregation methods
| Method | SE(%) | SP(%) | Gm(%) | Acc(%) |
|---|---|---|---|---|
| 97.23 | 92.10 | 94.63 | 92.70 | |
| 91.08 | 94.44 | 91.89 | ||
| 93.05 | 96.50 | 96.10 | ||
| 90.55 | 94.10 |
For each aggregation method, only the best two results are presented.
Figure 1Comparison between miRenSVM with other methods. Three representative computational miRNA prediction methods are used to compare with our miRenSVM. MicroPred achieves the highest SE (94.4%), while miRenSVM gets the highest SP (96.5%), Gm (94.8%), and Acc(96.1%). The results are obtained by predicting 2060 sequences (250 real and 1810 pseudo pre-miRNAs).
Figure 2Construction of training and testing datasets. We built the training and testing datasets step by step. First, we collected data from five different data sources. Then, squid, RNAfold and UNAfold were employed to further filter the data. Finally, we constructed one training set (697 positive samples and 5428 negative samples) and two testing sets: one contains 27 bran-new hsa and aga pre-miRNA, the other contains 5238 other hairpin sequences in miRBase13.0 besides hsa and aga.
Figure 3The Architecture of miRenSVM. The original negative samples in the training set are divided into k equal partitions (k ranges from 1 to the ratio of negative samples to positive samples). The final decision is made by aggregating the results of k sub-SVM classifiers that are trained by the entire positive samples and a partition of negative samples. Two aggregation methods are considered in this work: majority vote and mean distance.