| Literature DB >> 27190509 |
Malik Yousef1, Müşerref Duygu Saçar Demirci2, Waleed Khalifa1, Jens Allmer3.
Abstract
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.Entities:
Year: 2016 PMID: 27190509 PMCID: PMC4844869 DOI: 10.1155/2016/5670851
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Classification accuracy of the combined feature sets on a per-organism basis. Note that there is no mathematical relationship that supports the connection among points (measurements) and that the lines were only added for visual guidance to enhance the synchronized variation on a per-organism basis. Supplementary Table 2 contains the underlying data (including sensitivity and specificity) and a plot for the individual feature selection on a per-species basis.
Figure 2Accuracy distribution for the seven selected organisms assessed using one-class classification.
Figure 3Phylogenetic relationship among the selected plant species.
Comparison of classification accuracies among this and other published methods (PlantMiRNAPred [32], Triplet-SVM [38], microPred [39], and MotifmiRNAPred [33]). The best performance per organism is highlighted in bold. ACC: accuracy.
| Organism | PlantMiRNAPred | Triplet-SVM | microPred | MotifmiRNAPred | This study |
|---|---|---|---|---|---|
| ACC | ACC | ACC | ACC | ACC | |
| gma |
| 74.10 | 86.70 | 89.80 | 97.38 |
| zma |
| 66.90 | 93.80 | 94.80 | 93.59 |
| sbi |
| 69.50 | 94.60 | 93.50 | 94.25 |
| ath | 92.20 | 76.00 | 89.40 | 93.30 |
|
| ppt | 92.40 | 71.40 | 89.50 | 90.20 |
|
| ptc | 91.80 | 75.20 | 84.90 | 92.20 |
|
| osa | 94.20 | 75.50 | 90.40 | 90.30 |
|
| Avg | 95.11 | 72.66 | 89.90 | 92.01 |
|