| Literature DB >> 16480488 |
Björn Voss1, Robert Giegerich, Marc Rehmsmeier.
Abstract
BACKGROUND: Soon after the first algorithms for RNA folding became available, it was recognised that the prediction of only one energetically optimal structure is insufficient to achieve reliable results. An in-depth analysis of the folding space as a whole appeared necessary to deduce the structural properties of a given RNA molecule reliably. Folding space analysis comprises various methods such as suboptimal folding, computation of base pair probabilities, sampling procedures and abstract shape analysis. Common to many approaches is the idea of partitioning the folding space into classes of structures, for which certain properties can be derived.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16480488 PMCID: PMC1479382 DOI: 10.1186/1741-7007-4-5
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Figure 1Different interpretations of operators. Trees showing different interpretations of operators: 1.1 as symbolic constructors, 1.2 as the tree representation of the formula that computes the Boltzmann-weighted energy of the structure. Note that the trees are isomorphic.
Secondary structure operators. Operators build terms by application to (sub-)terms. Operators can be interpreted in different ways with algebras, such as the Boltzmann-weighted energy algebra. In this case, terms evaluate to real numbers. Interpreting operators as mere symbols leads to symbolic terms that represent structures, (cf. also Figure 1)
| operator | description |
| SS(l) | single-stranded region l |
| HL(a,l,b) | hairpin loop with single stranded region l, closed by basepair (a,b) |
| SR(a,x,b) | stacking region, closed by basepair (a,b); x is a closed structure |
| BL(a,l,x,b) | bulge left with single stranded region l, closed by basepair (a,b); x is a closed structure |
| BR(a,x,l,b) | bulge right with single stranded region l, closed by basepair (a,b); x is a closed structure |
| IL(a,l,x,l',b) | internal loop with single stranded regions l and l', closed by basepair (a,b); x is a closed structure |
| ML(a,c,b) | multi-loop, closed by basepair (a,b) |
| AD(x,c) | list of adjacent structures; x is a structure, c a (possibly empty) list of adjacent structures |
| E | empty list of adjacent structures |
Basic secondary structure grammar. This grammar is a simplified version, included for illustrative purposes. The grammar that is actually used for calculating shape probabilities is larger, owing to the requirement to be unambiguous; see the discussion in paragraph "A non-ambiguous grammar with correct dangles" and Table 6. Part a) shows the grammar in its algebraic form. | | | signifies alternative right-hand sides of productions, ... h the application of choice function h, ~~~ juxtaposition of terms. <<< denotes application of the operator to its left-hand side to the arguments of its right-hand side. Operators are as in Table 1, plus ul(x) as an abbreviation for ad(x,e), str for structures, and blk for blocks. The axiom of the grammar is struct. Part b) shows the same grammar in EBNF notation, naturally without the operators to be applied.
| a) |
| struct = str <<< comps ||| |
| str <<< singlestrand ||| |
| str <<< (e <<< empty) ... h |
| block = ad <<< singlestrand ~~~ closed ... h |
| comps = ad <<< block ~~~ comps ||| |
| block |
| ad <<< block ~~~ singlestrand ... h |
| singlestrand = ss <<< region |
| closed = (hl <<< base ~~~ region3 ~~~ base ||| |
| sp <<< base ~~~ closed ~~~ base ||| |
| sr <<< base ~~~ (bl <<< region ~~~ closed) ~~~ base ||| |
| sr <<< base ~~~ (br <<< closed ~~~ region) ~~~ base ||| |
| ml <<< base ~~~ (ad <<< block ~~~ comps) ~~~ base ||| |
| sr <<< base ~~~ (il <<< region ~~~ closed ~~~ |
| region) ~~~ base) |
| 'with' basepairing ... h |
| region3 = region 'with' (minsize 3) |
| b) |
| struct = comps | |
| singlestrand | |
| empty |
| block = singlestrand closed | |
| comps = block comps | |
| block | |
| block singlestrand |
| singlestrand = region |
| closed = base region 3 base | |
| base closed base | |
| base region closed base | |
| base closed region base | |
| base region closed region base | |
| base block comps base |
| region3 = base base region |
| region = base | |
| base region |
| base = 'A' | 'C' | 'G' | 'U' |
Figure 2Structures and shapes. Trees representing 2.1 a structure and its shapes according to 2.2 π5 and 2.3 π3.
Figure 3Shreps of the . Shreps of the three most probable shapes of the N. pharaonis tRNA-ala together with the probabilities of the shapes (sorted by increasing energy).
Figure 4Shreps of the Attenuator. Shreps of the three most probable shapes of the Attenuator together with the probabilities of the shapes (sorted by increasing energy). Together, they cover 0.95 probability, ruling out further shapes of biological importance.
Figure 5Shreps of the leader of the . Shreps of the four most probable shapes of the leader of the ptsGHI operon in B. subtilis together with the probabilities of the shapes (sorted by increasing energy).
Figure 6Shreps of the four most probable shapes of the . Shreps of the four most probable shapes of the C. elegans lin-4 precursor at shape abstraction level 3, together with the shape probabilities (sorted by decreasing probability).
Figure 7Shape probabilities of non-structural and structural RNAs. The sequence in 7.1 shows a result which one would expect for coding sequences where structure plays no role, while 7.2 shows a coding sequence that seems to have a rather well-defined structure.
Comparison of sampling frequencies and exact probabilities. Comparison of sampling frequency and exact probability for the four most probable shapes of the pheS-pheT-Attenuator from E. coli; the Spliced Leader of L. collosoma (gb:S76723/1-56) and the leader of the HIV-1 genome (gb:K02013/1-281), all of which are conformational switches. The sample size for each was 1000 and the analyses were repeated 1000 times.
| pheS-pheT-Attenuator (74nt) | ||
| [] [] | 0.538146 ± 0.012546 | 0.5381897 |
| [] | 0.324908 ± 0.011745 | 0.3243859 |
| [[] []] | 0.097263 ± 0.007509 | 0.0975747 |
| [] [] [] | 0.038984 ± 0.004872 | 0.0388670 |
| Spliced Leader (56nt) | ||
| [[[[[]]]]] | 0.4966 ± 0.012635 | 0.4962782 |
| [[[[]]]] | 0.348569 ± 0.011618 | 0.3491818 |
| [[[]]] | 0.060008 ± 0.005976 | 0.0595903 |
| [[]] | 0.056138 ± 0.005741 | 0.0559218 |
| HIV-1 Leader (281nt) | ||
| [] [[] [[] []]] | 0.629139 ± 0.015878 | 0.6164011 |
| [] [[[] [[] []]] []] | 0.337976 ± 0.014817 | 0.3492262 |
| [[] [] [[[] [[] []]] []]] | 0.017246 ± 0.003252 | 0.0169983 |
Comparison of running times for the exact algorithm and the sampling approach. Comparison of running times for the exact algorithm and the sampling approach (1000 samples) on an Intel Xeon 2.8 GHz CPU.(n = sequence length; * computed on an UltraSparc III 900 MHz using 64-bit.)
| Sampling | Exact Algorithm | |
| 57 nt | 6.42 s | 0.33 s |
| 74 nt | 17.36 s | 0.93 s |
| 94 nt | 69.56 s | 31.85 s |
| 108 nt | 36.24 s | 57.43 s |
| 130* nt | 184.85 s | 12016.68 s |
Figure 8Variation within shape. Three members of the [] shape of C. elegans lin-4 miRNA precursor. The structure shown in 8.1 is the shrep of the [] shape and also the MFE-structure. 8.2 and 8.3 show members which are structurally dissimilar to the shrep. Note the very low probabilities of the latter two.
Figure 9tRNA cloverleaf shape members (skating on a winter pond). Complete snapshot of 127 low-energy members of the cloverleaf shape of Myc. capricolum tRNA-Leu in the energy range of 6 kcal/mol above the MFE. All resemble the shrep very closely. Artistic arrangement by S. Konermann.
Shape space sizes. Comparison of the shape space size for the 5 shape levels.
| Shape level | 1 | 2 | 3 | 4 | 5 |
| Growth with | 1.26 | 1.23 | 1.16 | 1.20 | 1.10 |
The full unambiguous grammar in EBNF notation. This is the full unambiguous grammar in EBNF notation. Note that dangling bases are not represented explicitly by a special terminal symbol, but as a 'base'. Their dangling property is accounted for by the derivation path, e.g. the secondary structure .( (...) ). for sequence 'ACCUAUGGG' will be derived as struct → left_dangle → edanglelr left_dangle → base initstem base left_dangle → base initstem base empty. The two unpaired bases in 'base initstem base' have been derived via ' edanglelr', which accounts for their dangling property. We do not give an explicit representation for dangling bases, as the need to derive them explicitly is due to the energy model and not to handling them as discrete structural elements, i.e. a dangling base is nothing more than an unpaired base next to a stem, but it has a non-positive energy contribution that cannot be neglected.
| struct = left_dangle | noleft_dangle |
| left_dangle = base left_dangle | |
| edanglel base noleft_dangle | |
| edanglel (noleft_dangle | empty) | |
| edanglelr left_dangle | |
| empty |
| noleft_dangle = edangler left_dangle | |
| nodangle (noleft_dangle | empty) | |
| nodangle base noleft_dangle |
| edanglel = base initstem |
| edangler = initstem base |
| edanglelr = base initstem base |
| nodangle = initstem |
| initstem = closed |
| closed = stack | hairpin | multiloop | leftB | rightB | iloop |
| multiloop = base base base ml_comps1 base base | |
| base base base ml_comps2 base base | |
| base base ml_comps3 base base base | |
| base base ml_comps2 base base base | |
| base base base ml_comps4 base base base | |
| base base base ml_comps2 base base base | |
| base base base ml_comps1 base base base | |
| base base base ml_comps3 base base base | |
| base base ml_comps2 base base |
| ml_comps1 = block_dl no_dl_no_ss_end | |
| block_dlr dl_or_ss_left_no_ss_end | |
| block_dl base no_dl_no_ss_end |
| ml_comps2 = nodangle no_dl_no_ss_end | |
| edangler dl_or_ss_left_no_ss_end | |
| nodangle base no_dl_no_ss_end |
| ml_comps3 = nodangle no_dl_ss_end | |
| nodangle base no_dl_ss_end |
| ml_comps4 = block_dl no_dl_ss_end | |
| block_dlr dl_or_ss_left_ss_end | |
| block_dl base no_dl_ss_end |
| block_dl = region edanglel | |
| edanglel |
| block_dlr = region edanglelr | |
| edanglelr |
| no_dl_no_ss_end = ml_comps2 | |
| nodangle |
| dl_or_ss_left_no_ss_end = ml_comps1 | |
| block_dl |
| no_dl_ss_end = ml_comps3 | |
| edangler | |
| edangler region |
| dl_or_ss_left_ss_end = ml_comps4 | |
| block_dlr | |
| block_dlr region | |
| stack = base closed base |
| hairpin = base base region base base |
| leftB = base base region initstem base base |
| rightB = base base initstem region base base |
| iloop = base base region closed region base base |
| base = ' A ' | ' C ' | ' G ' | ' U ' |
| region = base | |
| base region |