| Literature DB >> 24521294 |
Bharat Panwar, Amit Arora1, Gajendra P S Raghava.
Abstract
BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24521294 PMCID: PMC3925371 DOI: 10.1186/1471-2164-15-127
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The comparative average percent tri-nucleotides compositions (TNC) of non-coding and coding RNAs. The y-axis represents the log2 ratio of non-coding to coding TNC values and the height and color of the bars represents the intensity of the Log2 ratios. The greater TNC of non-coding RNAs can be visualized by the upper (red) shaded bars while the lower panel (green) shows the greater occurrence of a Tri-nucleotide in coding RNA sequences.
SVM based highest prediction performances (on the basis of MCC) of different composition approaches for the discrimination between non-coding and coding RNAs
| 56.67 | 87.13 | 75.08 | 0.47 | 63.95 | 81.15 | 74.34 | 0.46 | 50.19 | 90.48 | 77.48 | 0.46 | |
| 97.13 | 94.27 | 95.40 | 0.91 | 95.93 | 95.08 | 95.42 | 0.91 | 81.54 | 91.15 | 88.04 | 0.73 | |
| 98.90 | 99.04 | 98.98 | 0.98 | 98.65 | 98.79 | 98.74 | 0.97 | 86.33 | 95.39 | 92.47 | 0.83 | |
| 99.50 | 99.35 | 99.41 | 0.99 | 99.29 | 99.05 | 99.15 | 0.98 | 89.51 | 94.98 | 93.22 | 0.85 | |
| 98.62 | 98.55 | 98.58 | 0.97 | 98.78 | 98.29 | 98.49 | 0.97 | 88.43 | 95.64 | 93.31 | 0.85 | |
| 99.46 | 99.46 | 99.46 | 0.99 | 99.11 | 99.18 | 99.15 | 0.98 | 89.10 | 96.29 | 93.97 | 0.86 | |
The MNC, DNA, TNC, TTNC and PNC approaches are mono-, di-, tri-, tetra- and penta-nucleotide compositions respectively. Hybrid is a combined approach based prediction of all predicated SVM scores of MNC, DNC, TNC, TTNC and PNC approaches.
Figure 2The comparative average percent tri-nucleotides compositions (TNC) of different non-coding RNA classes for the complete dataset. The diameter of the bubble is scaled according to the TNC values (the value is also numerically shown inside the bubble).
Overall sensitivity (Q ) of different classifiers for the classification of 18 ncRNA classes
| 0.073 | 0.084 | 0.284 | - | 0.397 | |
| 0.057 | 0.074 | 0.058 | 0.315 | 0.398 | |
| 0.060 | 0.075 | 0.128 | 0.314 | 0.429 | |
| 0.057 | 0.074 | 0.089 | 0.314 | 0.407 | |
| 0.060 | 0.079 | 0.108 | 0.214 | 0.056 | |
| 0.073 | 0.081 | 0.102 | 0.283 | 0.422 | |
| 0.055 | 0.079 | 0.121 | 0.400 | 0.433 |
*IPknot – Normalized (-1.0 to 1.0) graph properties value used for all classifier.
#IPknot – Real graph properties value used for all classifiers.
Figure 3Relative Graph Properties of different non-coding RNA classes. The diameter of the bubble is scaled according to the value of the normalized (between 0 to 1) graph properties (the value is also numerically shown inside the bubble). The sensitivity of the prediction has been depicted by the color range (increasing from Green to White) of the bubbles. As can be seen miRNAs have the greatest sensitivity where as LEADER RNAs have the least prediction sensitivity. The blank boxes are where graph property values were predicted to be 0 or null.
Figure 4Confusion matrix for 18 different classes of non-coding RNAs using RandomForest algorithm. QD and QM values are showing sensitivity and specificity for each ncRNA class respectively. White to green color showing number of entries from the range of 0 to 1980.
Figure 5An overview of the RNAcon with an example sequence.