| Literature DB >> 20122201 |
Sumeet Agarwal1, Candida Vaz, Alok Bhattacharya, Ashwin Srinivasan.
Abstract
BACKGROUND: It has been apparent in the last few years that small non coding RNAs (ncRNA) play a very significant role in biological regulation. Among these microRNAs (miRNAs), 22-23 nucleotide small regulatory RNAs, have been a major object of study as these have been found to be involved in some basic biological processes. So far about 706 miRNAs have been identified in humans alone. However, it is expected that there may be many more miRNAs encoded in the human genome. In this report, a "context-sensitive" Hidden Markov Model (CSHMM) to represent miRNA structures has been proposed and tested extensively. We also demonstrate how this model can be used in conjunction with filters as an ab initio method for miRNA identification.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122201 PMCID: PMC3009500 DOI: 10.1186/1471-2105-11-S1-S29
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The context-sensitive HMM proposed to represent miRNA precursors with estimated transition probabilities. State P1 emits the upper halves of the stem and symmetric bulges. States S1 and S3 emit the asymmetric bulges in the upper and lower sections respectively. State S2 emits the loop. States C11 and C12 emit the lower halves of the stem and symmetric bulges respectively (~ refers to probabilities averaged over the four possible top-of-stack symbols).
5-fold cross-validation Performance of the CSHMM using a human miRNA dataset.
| Actual | ||||
|---|---|---|---|---|
| miRNA | 170(60.67) | 12(121.33) | 182 | |
| non-miRNA | 30(139.33) | 388(278.67) | 418 | |
| 200 (dataset D1) | 400 (dataset D2) | 600 | ||
The number in parentheses following each entry is the expected value of the entry under the hypothesis that the actual class is independent of the predicted one. Estimates of predictive accuracy, sensitivity and specificity from this table are 0.93 (93%), 0.85 (85%) and 0.97 (97%) respectively.
Predictive performance of CSHMM and miPred on a common test dataset.
| (a) CSHMM | ||||
|---|---|---|---|---|
| miRNA | 63(16.75) | 4(50.25) | 67 | |
| non-miRNA | 19(65.25) | 242(195.75) | 261 | |
| 82 (dataset D1) | 246 (datasetD2) | 328 | ||
| miRNA | 64(17.25) | 5(51.75) | 69 | |
| non-miRNA | 18(64.75) | 241(194.25) | 259 | |
| 82 (dataset D1) | 246 (dataset D2) | 328 | ||
The number in parentheses following each entry is the expected value of the entry under the hypothesis that the actual class is independent of the predicted one. Estimates of predictive accuracy, sensitivity and specificity of CSHMM (a) from this table are 0.930 (93.0%), 0.768 (76.8%), and 0.984 (98.4%) respectively. For miPred (b) these are 0.930 (93.0%), 0.780 (78.0%) and 0.980 (98.0%) respectively.
Figure 2Receiver-Operating Characteristic (ROC) curve for the CSHMM classifier on the test set. Classification was done for a range of thresholds on the likelihood score, and true and false positive rates computed for each case. The point in red shows the results of the 'optimal' threshold, as determined by entropy minimization, and corresponds to the results reported in Table 2(a).