Literature DB >> 15831794

Asymmetry in RNA pseudoknots: observation and theory.

Daniel P Aalberts1, Nathan O Hodas.   

Abstract

RNA can fold into a topological structure called a pseudoknot, composed of non-nested double-stranded stems connected by single-stranded loops. Our examination of the PseudoBase database of pseudoknotted RNA structures reveals asymmetries in the stem and loop lengths and provocative composition differences between the loops. By taking into account differences between major and minor grooves of the RNA double helix, we explain much of the asymmetry with a simple polymer physics model and statistical mechanical theory, with only one adjustable parameter.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15831794      PMCID: PMC1079967          DOI: 10.1093/nar/gki508

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Accurately predicting how biological macromolecules fold is one of the great challenges of our day because ‘structure is function’. Encoded in the primary amino acid sequence of proteins are α-helix and β-sheet secondary structures which assemble into final native folds through additional tertiary contacts. The protein folding problem is notoriously difficult because local secondary and non-local tertiary contacts both contribute significantly to the stability of the final fold. In RNA, however, because base-pairing interactions are stronger and more specific typically than tertiary contacts, it is secondary structure which most influences the final fold. Listing which bases are paired to which other bases uniquely describes the secondary structure of RNA. Base pairs can be annotated with left and right parenthesis pairs; blocks of base pairs, with a letter. In the vast majority of cases, RNA adopts ‘nested’ secondary structures composed of consecutive helices separated by bulges or by hairpin turns, such as the AABB (((())))[[[]]] or the ABBA ((([[[[]]]]))) base-pairing patterns. Folding algorithms like mfold (1,2) or ViennaRNA (3) restrict themselves to nested structures to benefit from the algorithmic efficiency of dynamic programming. These algorithms ignore the more unusual non-nested structures of pseudoknot folds, such as the ABAB ((([[[[[)))]]]]] pattern, depicted in Figure 1.
Figure 1

(a) An ABAB-pseudoknot is depicted in planar representation. The structure is composed of two double-helical stems (with s1 and s2 base pairs) and the three single-stranded loops of lengths L1, L2 and L3 nucleotides. (b) The 3D fold of the same knot is depicted. The x, y, z axes point left, out, up. Coaxial stacking interactions between stems 1 and 2 can stabilize the structure, particularly if L2 = 0. Note that loop 1 lies on the major groove side, while loop 3 lies on the minor groove side.

Pseudoknots have attracted attention as important functional structures of viruses and auto-catalytic RNAs. This class of structures is more highly constrained by non-local base pairs and exhibits particular 3D geometries. The general pseudoknot problem has been proven to be NP-complete (4) because of non-local contacts. A number of pseudoknot algorithms have been developed recently (4–13), which search for only a subset of pseudoknot structures (14,15). Algorithms which lack such basic biochemical elements as GU wobble base pairs or basic polymer theory are of questionable value. Furthermore, none of these approaches are tested against the ensemble of known pseudoknots. In the following section, we begin by describing the statistics of the pseudoknots in PseudoBase (16). To our knowledge, Ref. (16) is the only online database () focused on pseudoknots. The statistics illuminate key physical characteristics of pseudoknots: (i) the simplest pseudoknots are the most abundant, (ii) these pseudoknots have asymmetric loop and stem lengths and (iii) their loop compositions differ. The asymmetries in the ensemble of pseudoknots have not been characterized previously. To self-consistently explain the source of these asymmetries, we proceed to develop a polymer physics model and statistical mechanical theory in Section 3. We argue that including the asymmetry of the major versus the minor groove is essential.

CHARACTERIZING PSEUDOBASE AND PSEUDOKNOT ASYMMETRY

PseudoBase is a gold mine of information, allowing us to dig deeply into the properties of pseudoknots. As of January 2005, there are 245 pseudoknots in PseudoBase. After removing duplicate sequences (PKB6 and 9, 25 and 26 and 29, 39 and 40 and 41, 19 and 27, 33 and 34), there are 238 unique pseudoknots. Of these, 230 (97%) are the simple ABAB-pseudoknot variety shown in Figure 1. This most common type of pseudoknot is involved in a number of essential biological processes including RNA self-splicing, translation control and viral frameshifting. It is perhaps not surprising that as the complexity of the knot increases, its likelihood of occurring decreases. In PseudoBase, there are also six ABACBC kissing hairpin structures (PKB150, 163, 169, 171, 173 and 178), and two more exotic structures (PKB71 and 75). The pseudoknot loops occasionally contain an additional self-contained hairpin loop (e.g. ABACCB) but such nested structures do not change the degree of non-nestedness (e.g. ABAB-class), and in these cases PseudoBase only catalogs the A and B stems. ABAB-pseudoknots are asymmetric. The distribution of stem lengths s1 and s2 are markedly different, as shown in Figure 2a. Excessively long stems are not required for pseudoknot formation; s1 peaks at 3 bp and s2 favors 5 or 6 bp.
Figure 2

The statistics of ABAB-pseudoknots in PseudoBase (obs) with L2 = 0 is compared with our theory (thy). (a) Stems favor different numbers of base pairs s1 and s2. (b) Loop lengths L1 and L3 are also asymmetric.

Loop 2 is often very short (172 of the 230 unique ABAB-pseudoknots, or 75% have L2 = 0; 195 of 230, or 85% have L2 ≤ 1) resulting in favorable coaxial helix stacking interactions which stabilize the pseudoknot. The Turner rules (17) permit helix stacking for L2 ≤ 1. In Section 3 we will present a theory for the ABAB class with stacked stems. In Figure 2b, we also see differences in the distributions of L1 and L3 sizes, including multiple peaks. These features may arise because of differences in tertiary interactions between loops and stems. We observe striking composition biases in the loops of ABAB-pseudoknots. As Table 1 shows, loop 1 is uracil rich while loop 3 tends to be adenine rich, particularly the end of loop 3 which is across from stem 1. These observations are consistent with reports of tertiary contacts (with one to four hydrogen bonds) between loop adenines and the minor grooves of helices, known as A-minor interactions (18–23). Adenine-rich loop 3 is on the minor groove side of stem 1. On the other hand, uracil-rich loop 1 is a more flexible loop (24), and interacts less with the major groove of stem 2 (see Figure 1b).
Table 1

The overall base composition of loops 1 and 3 differs

ACGU
Loop 127.015.617.939.5
Loop 346.114.311.128.5
Loop 3 (last)63.911.44.420.3
Loop 3 (first)35.19.411.444.1
Stem 118.027.732.122.1
Stem 219.628.230.521.8

Loop 3 has a high percentage of adenines which makes it prone to A-minor stacking interactions with stem 1. Loop 1 has a high percentage of uracils, making it a more flexible loop and more interaction neutral. The adenines in loop 3 are strongly biased toward the 3′ end of the loop. The large fraction of uracils at the start (5′ side) of loop 3 enhances loop flexibility in the turn.

The asymmetries in the populations of stem and loop lengths have not been explained by previous pseudoknot algorithms and models (4–13). The algorithms in Refs (5–8,13) are all symmetric with respect to stem and loop lengths (i.e. transforming an ABAB-into a BABA-pseudoknot by interchanging stems 1 ↔ 2 and loops 1 ↔ 3). The phenomenological estimates of Gultyaev and co-workers (25) do provide different free energies for loops 1 and 3 but result from ad hoc assumptions rather than polymer physics. We assert that the differences in stem and loop sizes arise primarily from major/minor groove asymmetries and use this fact to reproduce the population of pseudoknots observed in PseudoBase.

ABAB PSEUDOKNOT MODEL

The dominant contributions to the free energy of ABAB-pseudoknots are (i) base-pairing of stems and (ii) entropy of the loops. The overall free energy of the complex is then where is the free energy of helix j and S(s1, s2, L1, L2, L3) is the entropy of the loops.

Stems, RNA duplex

Step one is to describe the base-paired stems. The cartesian coordinates of complementary bases in double-helical A-form RNA are approximately: Here s indexes both the nucleotide on the Watson strand and its complement on the Crick strand. The coordinates of the 4′ carbon from the six double-helical RNA structures which appear in the Protein Data Bank (26) (1AL5, 1RNA, 1RRR, 1RXB, 1SDR and 433D) were incorporated in a least-squares fit to obtain values for the model parameters: the number of base pairs per helical turn Nt = 11.2 ± 0.3, the radius of the 4′ carbon r = 9.9 ± 0.2 Å, the height per stack is h = 2.7 ± 0.2 Å, the phase angle between complementary strands φ = 1.6 ± 0.1 rad = 93 ± 4° and the vertical offset between complementary strands Hoff = −4.2 ± 1.4 Å. Consider the typical ABAB-pseudoknot, with L2 = 0 and helices 1 and 2 stacked. In this configuration loop 1 must traverse the distance from the junction between the helices to the other end of stem 2 across the major groove. This distance is where is the phase angle between the strands. The other loop must traverse the distance from the junction between the helices to the other end of stem 1 across the minor groove. This distance is with The sign difference in θ and in the Hoff term arises from the major/minor groove asymmetry. In Figure 3, we show how the distances differ in the two cases.
Figure 3

The distances and across the major or minor groove as a function of the number of bases s in the associated stem. The differences are due to the geometries of major- and minor grooves.

Loops

Step two is to estimate the loop entropy. In the standard Gaussian approximation, a chain of N links of length a has end-to-end separation distance between D and D + d with probability where a = 6.2 Å and d = 0.1 Å is our model's one free parameter. Other polymer physics models, such as the worm-like chain, self-avoiding chain (13), freely-jointed chain models, could also be used in place of Equation (5). The entropic contribution of loop 1 to Equation (1) can be obtained from Equation (5), taking and N = (L1 + 1) links for L1 nucleotides. To be explicit, the total entropy is .

ABAB probability

The probability of an ABAB-pseudoknot with lengths {s1, s2, L1, L2 = 0, L3} is the product of a degeneracy factor for the ABAB pattern and the likelihood of that pattern resulting in a pseudoknot. The degeneracy of the ABAB pattern is , out of all patterns , because of the required complementarity. For the sake of simplicity, we ignore bulge loops in stems (which occur in ∼30% of structures) at this stage. To estimate the free energy of the stems, we compose random strings with s1 + s2 consecutive complementary base pairs bookended with mismatch pairs, then calculate their binding free energy using bindigo (27), finding: For the loop entropy, we use the Gaussian approximation, Equation (5), assuming the loops must traverse the distances given by Equations (3) and (4). Thus, is the Boltzmann factor for ABAB-pseudoknots, with β−1 = RT37° = 0.62 kcal/mol. We estimate the free energy of the optimal nested fold of an ensemble of randomly selected nucleotides using mfold (2), finding for the Boltzmann factor for nested folds. Combining the degeneracy factors and the Boltzmann likelihoods, the probability of a pseudoknot is thus The 1 in the denominator includes the Boltzmann factor for an open polymer configuration (Gopen = 0). To compare Equation (8) with the histograms of Figure 2, we simply sum the other degrees of freedom. For example, to obtain the distribution, we compute and analogously for the other sub-ensembles. The agreement of theory and observation is excellent. Studying the properties of the ensemble can reveal insights into the folding problem that individual cases may not. ABAB-pseudoknots form because of their low energy, with about three-quarters of nucleotides base paired, versus about half of bases paired in nested structures. However, because pseudoknots require many base pairs constrained to the ABAB pattern, they remain unlikely in sequence space.

CONCLUSIONS

Pseudoknots are rare compared with conventional nested secondary structures but their structure gives them biological importance. Of the pseudoknots that appear, the ABAB-type are by far the most common. The structures of these ABAB-pseudoknots are asymmetric. We have argued that this asymmetry is due to structural differences between the major and minor groove. Our simple model is consistent with the observed asymmetry of s1 and s2. The statistical mechanical theory Equation (8) provides remarkable agreement with experiment as seen in Figure 2. This suggests that PseudoBase is a representative sample of ABAB-pseudoknot characteristics in nature and that we can now compute pseudoknot abundances in aggregate. Using free energies specific to a given sequence, we can also use the Boltzmann factors to calculate the likelihood of forming a particular pseudoknot. Models which ignore major/minor groove asymmetry will predict the same free energies for an ABAB-pseudoknot and its BABA counterpart. For example, the symmetry of the theory in Ref. (8) arises because those authors effectively take Hoff = 0 and φ = π rad = 180°, in disagreement with the actual A-form structural asymmetry. We predict that the differences between loops 1 and 3 will destabilize many of the BABA version pseudoknots due to the differences between Equations (3) and (4), the decrease in A-minor interactions and the increased rigidity of the major groove loop. The rarity of more complicated folds makes comparisons with observed distributions infeasible. Nevertheless, other pseudoknot types like kissing hairpins can be treated with methods similar to those presented. In addition, our simple theory could be extended to permit the possibility of secondary strucure within the loops (e.g. an ABACCB structure) and to permit flexibility between the stems when L2 > 0. Our interest in this paper has been to estimate properties of the ensemble of ABAB-pseudoknots and compare those with observed pseudoknots. To study a particular pseudoknot, values specific to its sequence should be used in place of the general Gstem and Gnest average values given.
  19 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  PseudoBase: a database with RNA pseudoknots.

Authors:  F H van Batenburg; A P Gultyaev; C W Pleij; J Ng; J Oliehoek
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme.

Authors:  H Isambert; E D Siggia
Journal:  Proc Natl Acad Sci U S A       Date:  2000-06-06       Impact factor: 11.205

4.  Energetics of a strongly pH dependent RNA tertiary structure in a frameshifting pseudoknot.

Authors:  P L Nixon; D P Giedroc
Journal:  J Mol Biol       Date:  2000-02-18       Impact factor: 5.469

5.  An approximation of loop free energy values of RNA H-pseudoknots.

Authors:  A P Gultyaev; F H van Batenburg; C W Pleij
Journal:  RNA       Date:  1999-05       Impact factor: 4.942

6.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure.

Authors:  D H Mathews; J Sabina; M Zuker; D H Turner
Journal:  J Mol Biol       Date:  1999-05-21       Impact factor: 5.469

7.  Mfold web server for nucleic acid folding and hybridization prediction.

Authors:  Michael Zuker
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

8.  A partition function algorithm for nucleic acid secondary structure including pseudoknots.

Authors:  Robert M Dirks; Niles A Pierce
Journal:  J Comput Chem       Date:  2003-10       Impact factor: 3.376

9.  Efficient computation of optimal oligo-RNA binding.

Authors:  Nathan O Hodas; Daniel P Aalberts
Journal:  Nucleic Acids Res       Date:  2004-12-17       Impact factor: 16.971

10.  A dynamic programming algorithm for RNA structure prediction including pseudoknots.

Authors:  E Rivas; S R Eddy
Journal:  J Mol Biol       Date:  1999-02-05       Impact factor: 5.469

View more
  26 in total

1.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots.

Authors:  Stanislav Bellaousov; David H Mathews
Journal:  RNA       Date:  2010-08-10       Impact factor: 4.942

2.  A two-length-scale polymer theory for RNA loop free energies and helix stacking.

Authors:  Daniel P Aalberts; Nagarajan Nandagopal
Journal:  RNA       Date:  2010-05-26       Impact factor: 4.942

3.  Computing the conformational entropy for RNA folds.

Authors:  Liang Liu; Shi-Jie Chen
Journal:  J Chem Phys       Date:  2010-06-21       Impact factor: 3.488

4.  FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

Authors:  Tungadri Bose; Anirban Dutta; Mohammed Mh; Hemang Gandhi; Sharmila S Mande
Journal:  J Biosci       Date:  2015-09       Impact factor: 1.826

5.  Annotation of tertiary interactions in RNA structures reveals variations and correlations.

Authors:  Yurong Xin; Christian Laing; Neocles B Leontis; Tamar Schlick
Journal:  RNA       Date:  2008-10-28       Impact factor: 4.942

6.  Tuning a riboswitch response through structural extension of a pseudoknot.

Authors:  Marie F Soulière; Roger B Altman; Veronika Schwarz; Andrea Haller; Scott C Blanchard; Ronald Micura
Journal:  Proc Natl Acad Sci U S A       Date:  2013-08-12       Impact factor: 11.205

7.  Improved free energy parameters for RNA pseudoknotted secondary structure prediction.

Authors:  Mirela S Andronescu; Cristina Pop; Anne E Condon
Journal:  RNA       Date:  2009-11-20       Impact factor: 4.942

8.  Prediction of geometrically feasible three-dimensional structures of pseudoknotted RNA through free energy estimation.

Authors:  Jian Zhang; Joseph Dundas; Ming Lin; Rong Chen; Wei Wang; Jie Liang
Journal:  RNA       Date:  2009-10-28       Impact factor: 4.942

9.  Cocrystal structure of a class I preQ1 riboswitch reveals a pseudoknot recognizing an essential hypermodified nucleobase.

Authors:  Daniel J Klein; Thomas E Edwards; Adrian R Ferré-D'Amaré
Journal:  Nat Struct Mol Biol       Date:  2009-02-22       Impact factor: 15.369

10.  Analysis of four-way junctions in RNA structures.

Authors:  Christian Laing; Tamar Schlick
Journal:  J Mol Biol       Date:  2009-05-13       Impact factor: 5.469

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.