| Literature DB >> 22296803 |
Monir Hajiaghayi1, Anne Condon, Holger H Hoos.
Abstract
BACKGROUND: RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22296803 PMCID: PMC3347993 DOI: 10.1186/1471-2105-13-22
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of the different RNA classes in MT and MA data sets
| RNA class | No. in MT | mean ± std of length | Avg. similarity | No. in MA | mean ± std of length | Avg. similarity |
|---|---|---|---|---|---|---|
| 16S Ribosomal RNA | 89 | 377.88 ± 167.18 | 0.60 | 675 | 485.66 ± 113.02 | 0.62 |
| 23S Ribosomal RNA | 27 | 460.6 ± 151.3 | 0.53 | 159 | 453.44 ± 117.85 | 0.57 |
| 5S Ribosomal RNA | 309 | 119.5 ± 2.69 | 0.88 | 128 | 120.98 ±3.21 | 0.88 |
| Group I intron | 16 | 344.88 ± 66.42 | 0.63 | 89 | 368.49 ±103.58 | 0.63 |
| Group II intron | 3 | 668.7 ± 70.92 | 0.70 | 2 | 578 ± 47 | 0.72 |
| Ribonuclease P RNA | 6 | 382.5 ± 41.66 | 0.74 | 399 | 332.78 ±52.34 | 0.72 |
| Signal Recognition RNA | 91 | 267.95 ± 61.72 | 0.71 | 364 | 227.04 ± 109.53 | 0.65 |
| Transfer RNA | 484 | 77.48 ± 4.8 | 0.96 | 489 | 77.19 ± 5.13 | 0.95 |
| 1024 | 2305 | |||||
Overview of different RNA classes in the MT and MA data sets, including the number of RNAs, the mean and standard deviation of the length and the average normalized similarity between RNA sequences in these two data sets for each RNA class.
F-measure prediction accuracy of the MEA, MFE, gC-g1, and gC-pMFmeas algorithms on the MT dataset
| RNA class | Class size | Mean ± std of length | F-meas (BL*) | F-meas (Turner99) | ||||
|---|---|---|---|---|---|---|---|---|
| ubcMEA | ubcMFE | gC-g1 | gC-pMFmeas | rsMEA | rsMFE | |||
| 16S Ribosomal RNA | 88 | 377.88 ± 167.18 | 0.649 | 0.621 | 0.640 | 0.659 | 0.574 | 0.539 |
| 23S Ribosomal RNA | 27 | 460.6 ± 151.3 | 0.711 | 0.683 | 0.693 | 0.733 | 0.681 | 0.646 |
| 5S Ribosomal RNA | 309 | 119.5 ± 2.69 | 0.739 | 0.743 | 0.725 | 0.746 | 0.625 | 0.642 |
| Group I intron | 16 | 344.88 ± 66.42 | 0.705 | 0.650 | 0.674 | 0.708 | 0.627 | 0.599 |
| Group II intron | 3 | 668.7 70.92 | 0.720 | 0.739 | 0.683 | 0.750 | 0.744 | 0.703 |
| Ribonuclease P RNA | 6 | 382.5 ± 41.66 | 0.471 | 0.460 | 0.519 | 0.495 | 0.517 | 0.522 |
| Signal Recognition RNA | 91 | 267.95 ± 61.72 | 0.641 | 0.621 | 0.633 | 0.637 | 0.518 | 0.557 |
| Transfer RNA | 484 | 77.48 ± 4.8 | 0.718 | 0.775 | 0.727 | 0.782 | 0.726 | 0.727 |
| 0.669 | 0.662 | 0.662 | 0.689 | 0.627 | 0.617 | |||
| 0.710 | 0.732 | 0.707 | 0.743 | 0.660 | 0.665 | |||
| 0.670 | 0.652 | 0.660 | 0.684 | 0.612 | 0.598 | |||
F-measure prediction accuracy of the MEA, MFE, gC-g1, and gC-pMFmeas algorithms on the MA dataset
| RNA class | Class size | Mean ± std of length | F-meas (BL*) | F-meas (Turner99) | ||||
|---|---|---|---|---|---|---|---|---|
| ubcMEA | ubcMFE | gC-g1 | gC-pMFmeas | rsMEA | rsMFE | |||
| 16S Ribosomal RNA | 675 | 485.66 ± 113.02 | 0.625 | 0.645 | 0.630 | 0.665 | 0.561 | 0.521 |
| 23S Ribosomal RNA | 159 | 453.44 ± 117.85 | 0.645 | 0.643 | 0.626 | 0.664 | 0.588 | 0.562 |
| 5S Ribosomal RNA | 128 | 120.98 ± 3.21 | 0.782 | 0.780 | 0.763 | 0.782 | 0.616 | 0.630 |
| Group I intron | 89 | 368.49 ± 103.58 | 0.644 | 0.631 | 0.642 | 0.670 | 0.576 | 0.550 |
| Group II intron | 2 | 578 ± 47 | 0.540 | 0.609 | 0.524 | 0.582 | 0.472 | 0.471 |
| Ribonuclease P RNA | 399 | 332.78 ± 52.34 | 0.643 | 0.603 | 0.656 | 0.678 | 0.615 | 0.575 |
| Signal Recognition RNA | 364 | 227.04 ± 109.53 | 0.730 | 0.721 | 0.680 | 0.708 | 0.609 | 0.625 |
| Transfer RNA | 489 | 77.19 ± 5.13 | 0.706 | 0.764 | 0.719 | 0.773 | 0.727 | 0.726 |
| 0.668 | 0.676 | 0.655 | 0.690 | 0.596 | 0.583 | |||
| 0.672 | 0.682 | 0.669 | 0.708 | 0.618 | 0.600 | |||
| 0.658 | 0.660 | 0.648 | 0.681 | 0.588 | 0.568 | |||
Figure 1Relative performance of the MEA and MFE algorithms on the MT and MA datasets for Ribonuclease P RNAs and Group I introns.
Figure 295% bootstrap percentile confidence intervals for the F-measure average of the ubcMEA and ubcMFE algorithms. 95% bootstrap percentile confidence intervals are shown for the F-measure average of the ubcMEA (dashed red bars) and ubcMFE (solid black bars) algorithms on the MA and S-Full sets and also different RNA classes in MA.
Figure 395% bootstrap percentile confidence intervals for the F-measure average of the gC-pMFmeas and ubcMFE algorithms. 95% bootstrap percentile confidence intervals are shown for the F-measure average of the gC-pMFmeas (dashed red bars) and ubcMFE (solid black bars) algorithms on the MA and S-Full sets and also different RNA classes in MA.
Confidence intervals obtained by the bootstrap percentile method for the MA and S-Full sets and different RNA classes in MA
| RNA class | Class size | Confidence Interval (CL = 0.95) | ||||
|---|---|---|---|---|---|---|
| ubcMEA | ubcMFE | rsMEA | rsMFE | gC-pMFmeas | ||
| 16S Ribosomal RNA | 675 | (0.611, 0.639) | (0.630, 0.660) | (0.548, 0.574) | (0.507, 0.536) | (0.652, 0.679) |
| 23S Ribosomal RNA | 159 | (0.609, 0.677) | (0.607, 0.677) | (0.556, 0.618) | (0.530, 0.593) | (0.633, 0.693) |
| 5S Ribosomal RNA | 128 | (0.754, 0.807) | (0.751, 0.806) | (0.573, 0.657) | (0.586, 0.671) | (0.758, 0.804) |
| Group I intron | 89 | (0.605, 0.680) | (0.594, 0.666) | (0.540, 0.611) | (0.513, 0.587) | (0.634, 0.704) |
| Ribonuclease P RNA | 399 | (0.629, 0.659) | (0.588, 0.618) | (0.602, 0.628) | (0.561, 0.588) | (0.665, 0.690) |
| Signal Recognition RNA | 364 | (0.708, 0.750) | (0.698, 0.742) | (0.583, 0.635) | (0.599, 0.651) | (0.685, 0.728) |
| Transfer RNA | 489 | (0.683, 0.723) | (0.742, 0.785) | (0.707, 0.747) | (0.705, 0.748) | (0.754, 0.791) |
| MA | 2305 | (0.664, 0.680) | (0.673, 0.691) | (0.610, 0.627) | (0.591, 0.609) | (0.700, 0.715) |
| S-Full | 3246 | (0.673, 0.688) | (0.678, 0.694) | (0.615, 0.631) | (0.598, 0.616) | (0.704, 0.718) |
Confidence intervals are obtained by the bootstrap percentile method for the MA and S-Full sets and different RNA classes in MA (except Group II intron) when ubcMEA and ubcMFE use the BL* parameter set and rsMEA and rsMFE use the Turner99 parameter set and the confidence level (CL) is set to 0.95.
Figure 4Confidence interval width versus RNA class size in the MA set for the ubcMEA, ubcMFE and gC-pMFmeas methods.
Accuracy comparison of different prediction algorithms with various parameter sets on the S-Full set
| Algorithm | F-Measure | |||
|---|---|---|---|---|
| T99-MRF | BL* | CG* | Turner99 | |
| 0.582 (0.574,0.591) | 0.636 (0.628,0.644) | n/a | ||
| 0.601 (0.592,0.609) | 0.671 (0.663,0.679) | n/a | ||
| n/a | n/a | n/a | 0.623 (0.615,0.632) | |
| n/a | n/a | n/a | 0.607 (0.598,0.615) | |
| - | - | - | ||
The table presents the prediction accuracy of different algorithms with different thermodynamic sets in terms of F-measure. The 95% percentile confidence intervals of their accuracies are also shown in parentheses. The parameter set T99-MRF refers to the Turner99 parameters in MultiRNAFold format. BL* and CG* are the parameter sets obtained by the BL and CG approaches of Andronescu et al. [9], respectively. Also, the Turner99 parameter set is the parameter set obtained by Mathews et al. [3]. "n/a" indicates cases in which a given algorithm is not applicable to a parameter set, since that does not match the energy model underlying the algorithm. The highest accuracies for MEA and MFE are shown in bold.
P-value of one-sided permutation test for the MEA versus MFE algorithm on different parameter sets
| Parameter set | T99-MRF | BL* | CG* | Turner99 |
|---|---|---|---|---|
| 0.999 | 0.848 | 1 |
P-value of one-sided permutation test is computed for the MEA versus MFE algorithm on different parameter sets; p-values below the standard significance level of 0.05 are shown in bold face and indicate significantly higher prediction accuracy of MEA compared to MFE.
P-value of two-sided permutation test for the MEA and MFE algorithms on different parameter sets
| Parameter set | T99-MRF | BL* | CG* | Turner99 |
|---|---|---|---|---|
| 0.299 |
P-value of two-sided permutation test is computed for the MEA and MFE algorithms on different parameter sets; p-values below the standard significance level of 0.05 are shown in bold face and indicate significant differences in prediction accuracy.