| Literature DB >> 23118959 |
Wei Shen1, Ming Chen, Guo Wei, Yan Li.
Abstract
Predicting miRNAs is an arduous task, due to the diversity of the precursors and complexity of enzyme processes. Although several prediction approaches have reached impressive performances, few of them could achieve a full-function recognition of mature miRNA directly from the candidate hairpins across species. Therefore, researchers continue to seek a more powerful model close to biological recognition to miRNA structure. In this report, we describe a novel miRNA prediction algorithm, known as FOMmiR, using a fixed-order Markov model based on the secondary structural pattern. For a training dataset containing 809 human pre-miRNAs and 6441 human pseudo-miRNA hairpins, the model's parameters were defined and evaluated. The results showed that FOMmiR reached 91% accuracy on the human dataset through 5-fold cross-validation. Moreover, for the independent test datasets, the FOMmiR presented an outstanding prediction in human and other species including vertebrates, Drosophila, worms and viruses, even plants, in contrast to the well-known algorithms and models. Especially, the FOMmiR was not only able to distinguish the miRNA precursors from the hairpins, but also locate the position and strand of the mature miRNA. Therefore, this study provides a new generation of miRNA prediction algorithm, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences. And it presents a new understanding of the biological recognition based on the strongest signal's location detected by FOMmiR, which might be closely associated with the enzyme cleavage mechanism during the miRNA maturation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23118959 PMCID: PMC3484136 DOI: 10.1371/journal.pone.0048236
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration of the construction of the stem-bulge-gap notation.
In the stem-bulge-gap notation at the bottom line, the symbols of ‘|’, ‘!’ and ‘:’ represent respectively the base pair of ‘CG’, ‘AU’ and ‘GU’, the symbols of ‘o’, ‘x’ and ‘-’ represent respectively the loop, bulge and gap. In the asymmetric bulges, the symmetric part is indicated with ‘x’ and the asymmetric part with ‘-’.
Figure 2Distribution of the signal scores in positive and negative datasets.
Figure 3Receiver Operating Characteristic Curve of FOMmiR predictor.
The performances of pre-miRNA prediction.
| Method | Year | Algorithm | Sen | Spe | Acc |
| Triplet-SVM | 2005 | Support vectormachine | 72.15% | 91.09% | 89.62% |
| MiPred | 2007 | Random Forest | 93.25% | 6.59% | 13.41% |
| CIDmiRNA | 2008 | Stochastic contextfree grammar | 75.95% | 96.29% | 94.71% |
| CSHMM | 2010 | Context sensitive HMM | 88.19% | 71.46% | 72.77% |
| FOMmiR | 2012 | Fixed order Markovmodel | 89.45% | 91.27% | 91.13% |
Comparison of sensitivity across different species.
| Method | Vertebrates(6746) | Plants(3052) | Drosophila(1205) | Worms(580) | Viruses(235) |
| Triplet-SVM | 75.26% | 65.27% | 85.39% | 85.00% | 65.11% |
| MiPred | 92.48% | 47.02% | 93.94% | 95.52% | 96.60% |
| CIDmiRNA | 75.85% | 73.23% | 85.81% | 86.90% | 70.64% |
| CSHMM | 93.60% | 91.43% | 95.68% | 97.76% | 91.06% |
| FOMmiR | 91.76% | 93.55% | 97.18% | 97.07% | 89.79% |
Figure 4Distribution of distances between the real and predicted mature miRNA region.
Quantitative distribution of miRNA strands in positive training dataset.
| Predicted | ||||
| Strand |
|
|
| |
|
|
| 124 | 68 | 25 |
|
| 0 | 269 | 7 | |
|
| 0 | 40 | 207 | |
Quantitative distribution of miRNA strands in positive test dataset.
| Predicted | ||||
| Strand | 5p | 3p | both | |
|
|
| 14 | 45 | 17 |
|
| 0 | 54 | 15 | |
|
| 1 | 40 | 26 | |