| Literature DB >> 25303994 |
Yask Gupta1, Mareike Witte2, Steffen Möller1, Ralf J Ludwig1, Tobias Restle3, Detlef Zillikens1, Saleh M Ibrahim4.
Abstract
UNLABELLED: Non-coding RNAs (ncRNAs) are known to play important functional roles in the cell. However, their identification and recognition in genomic sequences remains challenging. In silico methods, such as classification tools, offer a fast and reliable way for such screening and multiple classifiers have already been developed to predict well-defined subfamilies of RNA. So far, however, out of all the ncRNAs, only tRNA, miRNA and snoRNA can be predicted with a satisfying sensitivity and specificity. We here present ptRNApred, a tool to detect and classify subclasses of non-coding RNA that are involved in the regulation of post-transcriptional modifications or DNA replication, which we here call post-transcriptional RNA (ptRNA). It (i) detects RNA sequences coding for post-transcriptional RNA from the genomic sequence with an overall sensitivity of 91% and a specificity of 94% and (ii) predicts ptRNA-subclasses that exist in eukaryotes: snRNA, snoRNA, RNase P, RNase MRP, Y RNA or telomerase RNA. AVAILABILITY: The ptRNApred software is open for public use on http://www.ptrnapred.org/.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25303994 PMCID: PMC4267668 DOI: 10.1093/nar/gku918
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Total number of test and training sequences
| Training sequences | Testing sequences | |
|---|---|---|
| RNase P | 178 | 90 |
| RNase MRP | 9 | 5 |
| snoRNA + scaRNA | 978 + 9 | 452 + 4 |
| telomerase RNA | 29 | 17 |
| Y RNA | 9 | 5 |
| snRNA | 170 | 85 |
The table displays the number of training and testing sequences of each ptRNA-subclass used for the SVM.
Figure 1.C and γ determination and 5-fold cross validation using LibSVM. The figure shows graphs for different values of the parameters C (a trade-off for misclassification) and γ (inverse width of RBF kernel) on a logarithmic X and Y axis. The ranges of the axes describe the different values that were tested, searching the optimal C and γ values in the grid space. The different colors in the diagram display the different accuracies obtained while optimizing C and γ values. We chose the C and γ values according to the green graphs, respectively, representing the C and γ value with the highest accuracy. (a) C and γ determination and 5-fold cross validation of the two-class SVM. The green graph represents the optimal values for C and gamma. In this case, the highest 5-fold cross validation accuracy (92.89%) is achieved when C = 32768 and γ = 0.008. (b) C and γ determination and 5-fold cross validation of the multi-class SVM. The green graph represents the optimal values for C and gamma. In this case, the highest 5-fold cross validation accuracy (86.69%) is achieved when C = 4 and γ = 0.5.
Comparison between snoReport and ptRNApred
| Organism | RNA class | Total number of sequencesa | Number of sequences identified by snoReport (% of total number of sequences) | Number of sequences identified by ptRNApred (% of total number of sequences) |
|---|---|---|---|---|
| snoRNA | 1603 | 737(46%) | 1589(99%) | |
| snoRNA | 1641 | 852(52%) | 1611(98%) | |
| snoU13b | 245b | 0(0%)b | 245(100%)b |
A murine and a human dataset of snoRNA was abstracted from Ensembl (49) and performance of ptRNApred was compared to snoReport as a well-established tool for snoRNA prediction. ptRNApred achieved higher sensitivity than snoReport (99 versus 46% on the murine and 98 versus 52% on the human set of sequences). Regarding snoU13, a member of the snoRNAs, there is an even larger difference in the sensitivity (100 versus 0%).
aTotal number of snoRNA-sequences downloaded from Ensembl (49).
bsnoU13 among the human snoRNA sequences.