| Literature DB >> 15296519 |
Kishore J Doshi1, Jamie J Cannone, Christian W Cobaugh, Robin R Gutell.
Abstract
BACKGROUND: A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15296519 PMCID: PMC514602 DOI: 10.1186/1471-2105-5-105
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Distribution of Comparatively Predicted Secondary Structure Models Analyzed
| 5S rRNA | 16S rRNA | 23S rRNA | tRNA | Total | |
| Structures | 90 | 496 | 256 | 569 | 1,411 |
| Total AGCU Nucleotides1 | 10,777 | 724,475 | 712,575 | 42,283 | 1,490,110 |
| Total Nucleotides2 | 10,819 | 736,412 | 714,723 | 43,189 | 1,505,143 |
| Total Comparative Pairings3 | 3,107 | 191,994 | 178,958 | 11,796 | 385,854 |
| Average Sequence Length | 120 | 1,485 | 2,792 | 76 | - |
| Average Pairings/Structure3 | 35 | 387 | 699 | 21 | - |
| Archaea | 12 | 23 | 17 | 76 | 128 |
| Bacteria | 28 | 195 | 75 | 155 | 453 |
| Eucarya | |||||
| Nuclear | 45 | 133 | 52 | 207 | 437 |
| Chloroplast | 4 | 33 | 31 | 131 | 199 |
| Mitochondrion | 1 | 112 | 81 | - | 194 |
| Archaea | Bacteria | Nuclear | Chloroplast | Mitochondrion | |
| 16S rRNA | |||||
| < 80% Identity | 380 / 75% | 32,456 / 86% | 16,574 / 94% | 746 / 71% | 12,332 / 99% |
| < 50% Identity | 0 / 0% | 0 / 0% | 8,500 / 48% | 98 / 9% | 9,852 / 79% |
| >= 95% Identity | 16 / 3% | 1,588 / 4% | 212 / 1% | 4 / 0.4% | 12 / 0.1% |
| Total Pairs | 506 | 37,830 | 17,556 | 1,056 | 12,432 |
| 23S rRNA | |||||
| < 80% Identity | 236 / 87% | 5,214 / 94% | 2,568 / 97% | 710 / 76% | 6,394 / 99% |
| < 50% Identity | 0 / 0% | 0 / 0% | 1,960 / 74% | 62 / 7% | 5,830 / 90% |
| >= 95% Identity | 6 / 2% | 42 / 1% | 8 / 0.30% | 18 / 2% | 8 / 0.1% |
| Total Pairs | 272 | 5,550 | 2,652 | 930 | 6,480 |
| Archaea | Bacteria | Nuclear | Chloroplast | Total | |
| Alanine (Aln) | 13 | 14 | 6 | 5 | 38 |
| Arginine (Arg) | 4 | 9 | 17 | 12 | 42 |
| Asparagine (Asn) | 3 | 9 | 10 | 3 | 25 |
| Aspartic acid (Asp) | 4 | 4 | 12 | 6 | 26 |
| Cysteine (Cys) | 2 | 5 | 3 | 5 | 15 |
| Glutamine (Gln) | 3 | 6 | 13 | 4 | 26 |
| Glutamic acid (Glu) | 4 | 8 | 23 | 8 | 43 |
| Glycine (Gly) | 6 | 18 | 17 | 9 | 50 |
| Histidine (His) | 3 | 6 | 10 | 7 | 26 |
| Isoleucine (Ile) | 3 | 16 | 8 | 10 | 37 |
| Leucine (Leu) | - | - | - | - | - |
| Lysine (Lys) | 3 | 8 | 15 | 4 | 30 |
| Methionine (Met) | 4 | 7 | 7 | 9 | 27 |
| Phenylalanine (Phe) | 3 | 6 | 16 | 10 | 35 |
| Proline (Pro) | 5 | 10 | 12 | 8 | 35 |
| Serine (Ser) | - | - | - | - | - |
| Threonine (Thr) | 7 | 14 | 6 | 10 | 37 |
| Tryptophan (Trp) | 1 | 6 | 3 | 10 | 20 |
| Tyrosine (Tyr) | 2 | - | 6 | 3 | 11 |
| Valine (Val) | 6 | 9 | 23 | 8 | 46 |
| 76 | 155 | 207 | 131 | 569 | |
1 Considers only A, G, C or U nucleotides.
2 Considers all nucleotides.
3 Includes only G:C, A:U and G:U base-pairings predicted with comparative analysis.
4 Average sequence identities for all pairwise comparisons between sequences. Number of pairwise comparisons equals (n2-n) where n is the number of sequences considered.
5 Only Type I tRNAs are considered.
Figure 1Direct Comparison of Mfold 2.3 and Mfold 3.1 Folding Accuracy for Selected 16S and 23S rRNAs. Base-pairs marked in red are predicted correctly by both Mfold 2.3 and Mfold 3.1. Base-pairs marked in blue are predicted correctly only by Mfold 2.3, and base-pairs marked in green are predicted correctly only by Mfold 3.1. Black base-pairs are not predicted correctly by either version of Mfold. Only canonical base-pairs in the comparative models in the current study and previous Gutell Lab studies are considered. Non-canonical base-pairs in the comparative structure models are not counted. Full-sized versions of each annotated structure diagram are available at our website[36]. A: Archaea 16S rRNA Haloferax volcanii. B.1: Archaea 23S rRNA, 5' half, Thermococcus celer. B.2: Archaea 23S rRNA, 3' half, Thermococcus celer. C: Eukaryotic Nuclear16S rRNA, Giardia intestinalis. D.1: Eukaryotic Nuclear 23S rRNA, 5' half, Giardia intestinalis. D.2: Eukaryotic Nuclear 23S rRNA, 3' half, Giardia intestinalis.
Average Accuracy of the Optimal RNA Structure Predicted with Mfold 3.1†
| 5S rRNA | 16S rRNA | 23S rRNA | tRNA | |||||||
| M1 | C2 | P13 | M | C | P24 | M | C | M5 | C | |
| Sequences | 309 | 90 | 56 | 22 | 496 | 72 | 5 | 256 | 484 | 569 |
| Accuracy6,7,8,9 | 78 ± 23 | 71 ± 24 | 46 ± 17 | 51 ± 16 | 41 ± 13 | 44 ± 11 | 57 ± 14 | 41 ± 13 | 83 ± 22 | 69 ± 24 |
| High/Low10 | 98/0 | 81/10 | 77/5 | 74/19 | 74/1 | 100/0 | ||||
| Median | 81 | 41 | 41 | 70 | ||||||
| Distributions | ||||||||||
| ≤ 20% acc11 | 4 | 9 | 4 | 1 | 6 | 2 | ||||
| ≥ 60% acc12 | 77 | 25 | 9 | 6 | 5 | 60 | ||||
| 20%<acc<60%13 | 19 | 66 | 86 | 93 | 89 | 39 | ||||
†All values are percentages unless otherwise indicated. All averages are per sequence averages for folding complete sequences as defined in the Per Sequence Averages section in Methods. C, Current Study; P1, Previous Study by Gutell Lab for 16S rRNA[29]; P2, Previous Study by Gutell Lab for 23S rRNA[30]; M, Previous Study by Mathews et al.[31]. Accuracies from all previous studies are for folding complete sequences.
1 All sequences from the Mathews et al. study (M) were folded with Mfold 3.1 using a window size (W) of 0, percent suboptimality (P) of 20%, maximum number of suboptimals (MAX) of 750 and efn2 re-evaluation and re-ordering.
2 All sequences in the current study (C) were folded with Mfold 3.1 using a window size (W) of 1, percent suboptimality (P) of 5% and efn2 re-evaluation and re-ordering
3 All sequences in the previous Gutell Lab study on 16S rRNA (P1) were folded with Mfold 2.3 using a window size (W) of 10 and no efn2 re-evaluation and re-ordering.
4 All sequences in the previous Gutell Lab study on 23S rRNA (P2) were folded with Mfold 2.3 using a window size (W) of 20 and no efn2 re-evaluation and re-ordering.
5 Bases modified in tRNA that are subsequently unable to fit into an A form helix were constrained to be single-stranded.
6 Comparative base-pairs that are pseudoknotted were excluded from the analysis in previous Gutell Lab studies (P1, P2), but were included in the current study. The Mathews et al. study included a measure of the percentage of pseudoknotted base-pairs in comparatively predicted structures they considered, but it was unclear if they were included in the analysis.
7 In all studies, only canonical, comparative base-pairs (excluding any base-pairs with IUPAC symbols) were considered. For both the current study (C) and previous Gutell Lab studies (P1, P2), a predicted base-pair was considered correct only if it matched a comparative base-pair exactly. In the Mathews et al. (M) study, a base-pair was considered if: 1. it matched a comparatively predicted base-pair exactly or 2. either nucleotide of the Mfold predicted base-pair (X,Y where X and Y are the positions of the nucleotides in the sequence) is within one nucleotide of its comparatively predicted position (X, Y ± 1 or X ± 1,Y).
8 Accuracy values in bold under the (C) columns for 16S and 23S rRNA represent average prediction accuracies in the current study for just the subset of sequences considered in the previous Gutell Lab studies.[29, 30]. The following sequences were considered in previous Gutell Lab studies, but excluded from the current study, Olisthodiscus luteus (16S rRNA, Chloroplast) and Sulfolobus solfataricus (23S rRNA, Archaea).
9 When the efn2 re-evaluation and re-ordering step was omitted from our study, the average prediction accuracy was 40 ± 13 for 16S rRNA, 40 ± 13 for 23S rRNA, 69 ± 24 for 5S rRNA, and 66 ± 24 for tRNA. For complete details, see our website[36].
10 Accuracy scores for the best and worst predicted structures in each group.
11 Percentage of predicted structures with an accuracy of 20% or less.
12 Percentage of predicted structures with an accuracy of 60% or higher.
13 Percentage of predicted structures with an accuracy between 20% and 60%.
Average Accuracy of the Optimal RNA Structure Predicted with Mfold 3.1 Grouped by Phylogeny†
| 5S rRNA | 16S rRNA | 23S rRNA | tRNA | |||
| C | P1 | C | P2 | C | C | |
| Archaea | 79 / 98 / 29 | 68 / 81 / 55 | 62 / 77 / 51 | 59 / 74 / 51 | 58 / 74 / 40 | 73 / 100 / 32 |
| Bacteria | 62 / 94 / 18 | 56 / 69 / 39 | 49 / 68 / 21 | 53 / 66 / 45 | 49 / 66 / 31 | 74 / 100 / 0 |
| Eucarya (n)1 | 75 / 94 / 0 | 30 / 47 / 10 | 34 / 50 / 15 | 41 / 60 / 23 | 42 / 63 / 21 | 61 / 100 / 0 |
| Eucarya (c) | 67 / 85 / 16 | 48 / 71 / 32 | 46 / 71 / 19 | 39 / 54 / 19 | 39 / 49 / 21 | 73 / 100 / 19 |
| Eucarya (m) | 31 / 56 / 17 | 30 / 60 / 5 | 38 / 57 / 24 | 30 / 61 / 1 | ||
| Eucarya (m)2 | 31 / 60 / 5 | |||||
| Eucarya (m)3 | 33 / 60 / 16 | |||||
†All values (average/high/low) shown as percentages unless otherwise indicated. The determination of the accuracy for the structures predicted with Mfold is described in the Methods section, RNA Secondary Structure Prediction and Prediction Accuracy Calculations. C, Current Study; P1, Previous study by the Gutell Lab for 16S rRNA[29]; P2, Previous study by the Gutell Lab for 23S rRNA[30].
1 (n), Nuclear-encoded sequences; (c), Chloroplast-encoded sequences; (m), Mitochondrial-encoded sequences.
2 Based on comparative models with 100 or more canonical base-pairs only.
3 Based on comparative models with 300 or more canonical base-pairs only.
RNA Folding Accuracy of Specific 16S and 23S rRNA Sequences using Mfold 2.3 and 3.1†
| Previous[29, 30] | Current | |
| | 17 | 30 |
| | 17 | 13 |
| | 23 | 24 |
| | 27 | 29 |
| | 22 | 29 |
| | 30 | 33 |
| | 10 | 23 |
| | 18 | 21 |
| | 28 | 25 |
| | 20 | 19 |
| | 19 | 23 |
| | 30 | 31 |
| | 28 | 25 |
| | 27 | 20 |
| | 24 | 29 |
| | 23 | 21 |
| | 24 | 33 |
†All values are percentages unless otherwise indicated. The determination of the accuracy for the structures predicted with Mfold is described in the Methods section, RNA Secondary Structure Prediction and Prediction Accuracy Calculations. Genbank accession numbers are listed in parentheses for each sequence.
Accuracy of Base-pairs Predicted with Mfold 3.1 as a Function of RNA Contact Distance†
| 16S rRNA | 23S rRNA | |||||
| RNA Contact Distance | 496 Structures | 256 Structures | ||||
| Total Base-pairs | % of Total | Total Base-pairs | % of Total | |||
| Total | 191,994 | 100 | 178,958 | 100 | ||
| 2–100 | 145,058 | 76 | 134,085 | 75 | ||
| | ||||||
| | ||||||
| 101+ | 46,936 | 24 | 44,873 | 25 | ||
| | ||||||
| | ||||||
| Total | 223,957 | 100 | 218,908 | 100 | ||
| 2–100 | 150,886 | 67 | 137,780 | 63 | ||
| | ||||||
| | ||||||
| 101+ | 73,071 | 33 | 81,128 | 37 | ||
| | ||||||
| | ||||||
| %C1 | %M | %C | %M | |||
| Total | 81,934 | 43 | 37 | 77,888 | 44 | 36 |
| 2–100 | 75,763 | 52 | 50 | 67,130 | 50 | 49 |
| | ||||||
| | ||||||
| 101+ | 6,171 | 13 | 8 | 10,758 | 24 | 13 |
| | ||||||
| | ||||||
| Current | Previous[29] | Current | Previous[30] | |||
| 2–100 | 50 | 55 | 47 | 53 | ||
| | - | - | ||||
| | - | - | ||||
| 101–200 | 22 | 15 | 26 | 35 | ||
| 201–300 | 10 | 14 | 22 | 21 | ||
| 301–400 | 9 | 13 | 13 | 10 | ||
| 401–500 | 4 | 12 | 16 | 13 | ||
| 501+ | 4 | - | 14 | - | ||
†All base-pairs predicted in the comparative and the Mfold optimal structure predictions including those base-pairs predicted correctly (any base-pairs with IUPAC symbols other than A,G,C, or U are excluded) are grouped by RNA contact distance for 16S and 23S rRNA. RNA contact distance is defined as the number of nucleotides intervening between the 5' and 3' halves of a base-pair. The determination of the accuracy for the structures predicted with Mfold is described in the Methods section, RNA Secondary Structure Prediction and Prediction Accuracy Calculations.
1 %C, the percentage of comparatively predicted base-pairs; %M, the percentage of Mfold predicted base-pairs.
2 The Per Sequence Average (see Per Sequence Average in Methods) percentage of comparative base-pairs in each distance category predicted correctly in the Mfold optimal structure predictions.
Figure 2Accuracy of Comparatively Predicted Base-pairs from 496 16S rRNA Sequences and RNA Contact Distance. A. The RNA contact distance (the number of nucleotides in the RNA sequence that are separates the 5' and 3' base-paired) for all 191,994 base-pairs in comparative structure models is determined and plotted. B. The 191,994 comparatively predicted base-pairs are divided into seven RNA contact distance bins (see Logarithmic Binning of Base-pairs by Contact Distance for 16S rRNA in Methods) represented by columns. The accuracies for all base-pairs in each bin are also plotted as points.
Average, Minimum and Maximum ΔΔG Values for Pairwise Comparisons of Different Suboptimal Folds†
| Optimal Accuracy1 | 80% | 46% | ||
| Total Fold Predictions (Optimal + Suboptimal) | 750 | 750 | ||
| Total Pairwise Comparisons | 280,875 | 280,875 | ||
| | ||||
| Num of Pairwise Comparisons | 32,378 | 12%2 | 11,016 | 4% |
| ΔΔG Min (kcal/mol) | 0 | 0 | ||
| ΔΔG Max (kcal/mol) | 11 | 8.60 | ||
| ΔΔG Average (kcal/mol) | 2.84 | 2.19 | ||
| | ||||
| Num of Pairwise Comparisons | 102,071 | 36% | 32,360 | 12% |
| ΔΔG Min (kcal/mol) | 0 | 0 | ||
| ΔΔG Max (kcal/mol) | 11 | 8.60 | ||
| ΔΔG Average (kcal/mol) | 2.53 | 2.01 | ||
| | ||||
| Num of Pairwise Comparisons | 121,805 | 43% | 134,037 | 48% |
| ΔΔG Min (kcal/mol) | 0 | 0 | ||
| ΔΔG Max (kcal/mol) | 11 | 8.6 | ||
| ΔΔG Average (kcal/mol) | 2.24 | 1.74 | ||
| | ||||
| Num of Pairwise Comparisons | 24,621 | 9% | 103,462 | 37% |
| ΔΔG Min (kcal/mol) | 0 | 0 | ||
| ΔΔG Max (kcal/mol) | 11 | 8.6 | ||
| ΔΔG Avg (kcal/mol) | 2.24 | 1.82 | ||
†Both sequences are 16S rRNAs. For each sequence, Mfold 3.1 predicts one optimal or minimum free energy fold and 749 suboptimal folds (750 total folds). Pairwise comparisons are grouped based on the structural variation between the two folds compared. For details on how structural variation between two folds is calculated see Materials and Methods. The range of ΔΔG values observed is 0–11 kcal/mol for H. volcanii and 0–8.60 kcal/mol for M. hungatei., and all ΔG values are pre-efn2 re-evaluation.
1 Without efn2 re-evaluation and re-ordering of predicted folds.
2 Percentage of total pairwise comparisons.
Figure 3ΔΔG vs. Structural Variation for Pairwise Comparisons from the "Suboptimal Population". A set of 750 structure predictions (optimal + top 749 suboptimal) are compared, resulting in a total of 280,875 pairwise comparisons. The ΔΔG (pre efn2 re-evaluation) for two structure predictions is calculated by taking the absolute value of the difference between the ΔG of each structure prediction before efn2 re-evaluation. Structural variation for two structure predictions is calculated by counting the number of nucleotides in each structure prediction that either 1) have different pairing partners or 2) are paired in one structure prediction and unpaired in the other structure prediction (see Suboptimal Structural Variation Score in Methods). The shading within the figure indicates the number of pairwise comparisons that have the same values for both ΔΔG and structural variation score. A: Archaea 16S rRNA Haloferax volcanii. B: Archaea 16S rRNA Methanospirillum Hungatei.
Distribution of 16S rRNA Base-pairs Predicted Correctly and Incorrectly†
| Overall | Archaea | Bacteria | Eucarya | |||
| (C)1 | (M) | (N) | ||||
| Comparative | 191,994 | 10,211 | 83,385 | 13,406 | 29,979 | 55,013 |
| Opt Correct2 | 81,934 | 6,376 | 41,032 | 6,105 | 9,459 | 18,962 |
| Subopt Correct3 | 137,000 | 8,570 | 65,177 | 10,032 | 21,201 | 32,020 |
| Opt Incorrect2 | 142,023 | 4,758 | 49,563 | 8,603 | 27,617 | 51,482 |
| Subopt Incorrect3 | 2,372,305 | 101,253 | 947,197 | 161,397 | 472,614 | 689,844 |
| Opt Accuracy2,4 | 41% | 62% | 49% | 46% | 30% | 34% |
| Subopt Accuracy3,4 | 71% | 84% | 78% | 75% | 71% | 59% |
| Avg Improvement5 | 30% | 21% | 29% | 30% | 41% | 24% |
| Best Prediction6 | 92% | 91% | 89% | 92% | 92% | 90% |
| Max Improvement7 | 68% | 35% | 54% | 53% | 68% | 48% |
| Min Improvement8 | 10% | 10% | 12% | 12% | 14% | 11% |
†All 496 16S rRNA sequences are considered. Each sequence is folded for a population of one optimal and 749 suboptimal structure predictions. The determination of the accuracy for the structures predicted with Mfold is described in the Methods section, RNA Secondary Structure Prediction and Prediction Accuracy Calculations. Values are calculated by summing the number of unique base-pairs encountered for each sequence that satisfy each particular category (any base-pairs involving IUPAC symbols other than A,G,C, or U are excluded). For example, Subopt Correct is calculated by summing the number of unique, correctly predicted base-pairs encountered in the population of optimal plus suboptimal structure predictions for each of the 496 16S rRNA sequences. Prediction accuracy when including base-pairs predicted correctly in suboptimal structure predictions is also tabulated.
1 (c), Chloroplast-encoded sequences; (m), Mitochondrial-encoded sequences; (n), Nuclear-encoded sequences.
2 Considering only the optimal prediction.
3 Considering the optimal prediction plus up to 749 suboptimal predictions.
4 Averages calculated on per sequence basis. Please see Per Sequence Averages in Methods.
5 Average improvement in Mfold secondary structure prediction accuracy when pooling base-pairs from both the optimal prediction and suboptimal predictions.
6 The highest Mfold secondary structure prediction accuracy for an individual sequence when pooling base-pairs from both the optimal and suboptimal populations.
7 The largest improvement in Mfold secondary structure prediction accuracy for an individual sequence when pooling base-pairs from both the optimal and suboptimal populations.
8 The smallest improvement in Mfold secondary structure prediction accuracy for an individual sequence when pooling base-pairs from both the optimal and suboptimal populations.
Figure 4Frequency of Base-pair predictions within a "Suboptimal Population" for selected 16S rRNAs. The frequency of the prediction of each of the base-pairs in the comparative structure model in a set of 750 structure predictions (optimal + top 749 suboptimal) is displayed on the comparative structure model. Base-pairs marked in red are predicted correctly in all 750 structure predictions. Base-pairs marked in blue are predicted correctly in 600 to 749 structure predictions. Base-pairs marked in magenta are predicted correctly in 151 to 599 structure predictions, base-pairs marked in green are predicted correctly in only 1 to 150 structure predictions, and base-pairs marked in black are not predicted in any of the 750 structure predictions (some are non-canonical or occur in pseudo-knots, and thus are not expected to be predicted correctly). Full-sized versions of each annotated structure diagram are available at our website[36]. A: Archaea 16S rRNA Haloferax volcanii. B: Archaea 16S rRNA Methanospirillum hungatei.
Frequency of Comparative Base-pairs in 750 Structures Predicted with Mfold 3.1†
| RNA Contact Distance | ||||||||||
| Frequency1 | 2–100 nt | 101–500 nt | 501+ nt | |||||||
| 750 | 21,049 | 417 | 0 | 21,466 | ||||||
| 600–749 | 42,362 | 2,805 | 33 | 45,200 | ||||||
| 151–599 | 20,775 | 4,594 | 266 | 25,635 | ||||||
| 1–150 | 31,285 | 12,253 | 1,161 | 44,699 | ||||||
| Correct | 115,471 | 20,069 | 1,460 | 137,000 | ||||||
| Never | 29,587 | 22,935 | 2,472 | 54,994 | ||||||
| Total | 145,058 | 43,004 | 3,932 | 191,994 | ||||||
†For all 496 16S rRNA sequences, a total of 750 structure models are predicted for each sequence (one optimal and 749 suboptimal structure predictions). Every base-pair (excluding any base-pairs involving IUPAC symbols other than A,G,C, or U) in the comparative structure model that appears in a set of 750 structure predictions for a particular sequence is categorized by 1) the number of structure predictions in which it appears and 2) the RNA contact distance. The four bold percentages for each of the three RNA contact distances each total 100%, and reveal the percentage of base-pairs predicted correctly for the four frequency ranges. For example, a total of 115,471 base-pairs with an RNA contact distance of 2–100 nt were predicted correctly. Of those base-pairs, 18% (21,049) were predicted in 750 structure predictions, 37% (42,362) were predicted in 600–749 structure predictions, 18% (20,775) were predicted in 151–599 structure predictions, and 27% (31,285) were predicted in 1–150 structure predictions. In contrast, the three italicized percentages for each of the four frequency ranges, and the "Correct", "Never", and "Total" categories total 100%. For example, 54,994 base-pairs were never predicted in 750 structure predictions. Of those base-pairs, 54% (29,587) have an RNA contact distance of 2–100 nt, 42% (22,935) have an RNA contact distance of 101–500 nt, and 4% (2,472) have an RNA contact distance of 501+ nt.
1 Frequency of prediction throughout a suboptimal population of up to 750 structure predictions.