Ian Brierley1, Simon Pennell, Robert J C Gilbert. 1. Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK. ib 103@mole.bio.cam.ac.uk
Abstract
RNA pseudoknots are structural elements found in almost all classes of RNA. First recognized in the genomes of plant viruses, they are now established as a widespread motif with diverse functions in various biological processes. This Review focuses on viral pseudoknots and their role in virus gene expression and genome replication. Although emphasis is placed on those well defined pseudoknots that are involved in unusual mechanisms of viral translational initiation and elongation, the broader roles of pseudoknots are also discussed, including comparisons with relevant cellular counterparts. The relationship between RNA pseudoknot structure and function is also addressed.
RNA pseudoknots are structural elements found in almost all classes of RNA. First recognized in the genomes of plant viruses, they are now established as a widespread motif with diverse functions in various biological processes. This Review focuses on viral pseudoknots and their role in virus gene expression and genome replication. Although emphasis is placed on those well defined pseudoknots that are involved in unusual mechanisms of viral translational initiation and elongation, the broader roles of pseudoknots are also discussed, including comparisons with relevant cellular counterparts. The relationship between RNA pseudoknot structure and function is also addressed.
The pioneering work that established pseudoknots as a genuine folding motif in RNA was carried out in the laboratories of Cornelis Pleij, Krijn Rietveld and Leendert Bosch in the early 1980s. These authors were investigating how it was possible for the 3′ ends of some plant virus genomes to possess a number of the functional characteristics of transfer RNAs (tRNAs) yet lack an obvious clover-leaf secondary structure. By applying a 'pseudoknot building principle', it became clear how these viral RNAs could fold into L-shaped structures resembling tRNAs[1,2]. A seminal paper from the same authors[3] subsequently defined the general principles of pseudoknot folding and provided the first examples of pseudoknots in other RNAs — the central pseudoknot of 16S ribosomal RNA (rRNA) and a pseudoknot present in group I introns.Since then, many more pseudoknots have been discovered, and they are associated with a remarkably diverse range of biological activities (Supplementary information S1, S2 (tables); reviewed in Refs 4–11). They are especially associated with key roles in the replication cycles of numerous animal and plant viruses, including, in humans, the flavivirus hepatitis C virus (HCV)[12], the coronavirus responsible for severe acute respiratory syndrome (SARS-CoV)[13], the oncogenic retrovirus T-cell lymphotrophic virus types I and II[14] and certain strains of HIV[15].The function of a viral pseudoknot is linked logically to its location in the genome (Fig. 1; Supplementary information S1 (table)). So, in non-coding regions (NCRs) of positive-strand RNA viruses (in which the genomic RNA serves as the mRNA template for translation and then as a template for replication), pseudoknots act in the regulation of initiation of protein synthesis and in template recognition by the viral replicase. By contrast, in coding regions they modulate the elongation and termination steps of translation. Fewer pseudoknots have been documented in the mRNAs of viruses with DNA genomes, but several DNA bacteriophage mRNAs are known to encode pseudoknots[16]. In the 5′ NCRs, these motifs have roles in the regulation of translation initiation, whereas in the coding region they affect translation elongation. Pseudoknots have also been described in the catalytic RNAs of some RNA satellite viruses, where they have a role in genome replication[17].
Figure 1
RNA pseudoknots in virus gene expression.
A schematic of a generic RNA virus genome is shown. Viral pseudoknots have been described in the 5′ non-coding region (NCR), the coding region, the intergenic region (IGR) and the 3′ NCR, where they function in various steps of the replication cycle. Although the majority of examples are from positive-strand RNA viruses, pseudoknots also have a role in the replication cycles of certain DNA viruses, satellite RNA viruses and viroids. For simplicity, viral pseudoknots involved in long-range interactions (including virus genome circularization) or possessing catalytic activity are not shown, but are discussed in the text.
RNA pseudoknots in virus gene expression.
A schematic of a generic RNA virus genome is shown. Viral pseudoknots have been described in the 5′ non-coding region (NCR), the coding region, the intergenic region (IGR) and the 3′ NCR, where they function in various steps of the replication cycle. Although the majority of examples are from positive-strand RNA viruses, pseudoknots also have a role in the replication cycles of certain DNA viruses, satellite RNA viruses and viroids. For simplicity, viral pseudoknots involved in long-range interactions (including virus genome circularization) or possessing catalytic activity are not shown, but are discussed in the text.Here, we review selected, well characterized examples of pseudoknots in virus genomes — with an emphasis on structure–function relationships — highlighting recent advances in our understanding of pseudoknot conformation at high resolution and exploiting, where relevant, our improved knowledge of ribosome architecture.What is an RNA pseudoknot?As defined originally[3], a pseudoknot is a structure formed upon base-pairing of a single-stranded region of RNA in the loop of a hairpin to a stretch of complementary nucleotides elsewhere in the RNA chain (Fig. 2). Such pseudoknots, referred to as hairpin type (H-type) pseudoknots, have two base-paired stem regions (S1 and S2) and, depending on the number of loop bases that participate in the pseudoknotting interaction[18], two or three single-stranded loops (L1, L2 and L3). In most (>85%; Refs 19, 20) H-type pseudoknots, L2 is absent or very short, and the base-paired stems stack coaxially to form a quasi-continuous helix. In these structures, L1 spans S2 and crosses the deep groove of the helix, whereas L3 spans S1 and crosses the shallow groove (Fig. 2).
Figure 2
RNA pseudoknot structure.
a | Various structural motifs have been described in RNA[149]. Orthodox secondary structures consist of base-paired regions (stems) connected by single-stranded loops at stem termini (hairpin loop), or in the body of a stem (bulge (B) or interior (I) loop) or at the junction of several stems (multibranched (M) loop). Pseudoknots are considered as a tertiary structure and form when bases in a loop pair with a single-stranded region elsewhere. The hairpin type (H-type) pseudoknot is by far the most common, and this tertiary interaction involves bases in the loop of a hairpin loop. The resultant structure contains two stem regions, S1 and S2, connected by single-stranded loops. In many cases, no unpaired bases are present between the two stems (L2 is zero), and the stems stack coaxially to give a quasi-continuous helix. b | The secondary structure of the pseudoknot of the ribosomal frameshifting signal of simian retrovirus 1 (SRV-1) is shown alongside three dimensional views of the nuclear magnetic resonance model[74]. The stems are shown as surface representations and the loops as ribbons (all structural images were prepared using PyMol). The polarity and handedness of the double helix leads to inequivalence of the loops, with L1 (yellow) crossing the deep groove and L3 (green) crossing the shallow groove. S1 is blue, S2 is red, L1 is yellow and L3 is green. L2 is not present in the example shown.
RNA pseudoknot structure.
a | Various structural motifs have been described in RNA[149]. Orthodox secondary structures consist of base-paired regions (stems) connected by single-stranded loops at stem termini (hairpin loop), or in the body of a stem (bulge (B) or interior (I) loop) or at the junction of several stems (multibranched (M) loop). Pseudoknots are considered as a tertiary structure and form when bases in a loop pair with a single-stranded region elsewhere. The hairpin type (H-type) pseudoknot is by far the most common, and this tertiary interaction involves bases in the loop of a hairpin loop. The resultant structure contains two stem regions, S1 and S2, connected by single-stranded loops. In many cases, no unpaired bases are present between the two stems (L2 is zero), and the stems stack coaxially to give a quasi-continuous helix. b | The secondary structure of the pseudoknot of the ribosomal frameshifting signal of simian retrovirus 1 (SRV-1) is shown alongside three dimensional views of the nuclear magnetic resonance model[74]. The stems are shown as surface representations and the loops as ribbons (all structural images were prepared using PyMol). The polarity and handedness of the double helix leads to inequivalence of the loops, with L1 (yellow) crossing the deep groove and L3 (green) crossing the shallow groove. S1 is blue, S2 is red, L1 is yellow and L3 is green. L2 is not present in the example shown.The geometry of the pseudoknot is such that when S2 is six or seven base pairs in length, L1 can be as short as a single nucleotide[3,21]. However, in some pseudoknots, the loops are much longer and include their own secondary structure elements. Pseudoknots are also formed upon base-pairing of single-stranded bulge (B), interior (I) and multibranched (M) loops with complementary regions elsewhere in the RNA (which themselves can be constrained in a secondary structure, for example in intramolecular hairpin–loop–hairpin–loop (H–H or so-called kissing loop) interactions)[22]. However, unless all of the loop nucleotides are paired, these pseudoknots are generally considered as H-type pseudoknots, as the additional base-pairing interaction is viewed as a substructure within the loop. For this reason, the B, I and M nomenclature (as well as H–H) is not extensively used, and most structures are referred to as H-type pseudoknots and often simply as pseudoknots. An additional issue regarding nomenclature is loop numbering. In the early pseudoknot literature, most examples did not possess unpaired residues between the two stems, so the convention was to name the groove-spanning loops L1 and L2 (now L1 and L3). Here, we have opted for the L1, L2 and L3 nomenclature, which is more generally applicable.As will be discussed in more detail below, the biological properties of pseudoknots are intimately linked to their structural features[4,5,6,7,18,23]. For example, the geometry of the junction between the stems and the interactions that can occur between the constituent loops and stems are often of great functional relevance[21,24,25,26,27]. Indeed, for many viral pseudoknots, much of the primary sequence is unimportant for function, as long as the conformation and overall stability of the structure is maintained. Where precise nucleotide-sequence requirements have been identified, this is likely to reflect a specific structural necessity, although additional roles might be possible (for example, base-specific recognition by proteins).Pseudoknots and translationMost eukaryotic cellular mRNAs are translated in a cap-dependent manner, with the 40S subunit and associated initiation factors scanning along the mRNA until the start codon (AUG) is reached[28]. Efficient translation also requires mRNA circularization, which is brought about by the interaction of the 3′-end poly(A) tail-binding protein (PABP) with initiation factor 4E (eIF4E), bound to the 5′ cap[29]. However, the genomes of many positive-strand RNA viruses often lack a cap, a poly(A) tail or both, and translation initiation involves non-standard mechanisms[30,31].One such example is cap-independent internal ribosome entry, in which the ribosome is recruited internally to a structured region of the mRNA (the internal ribosome entry site (IRES), usually located in the 5′ NCR) and often directly to the start codon. Pseudoknots have been identified in a number of IRESs, and their function is best exemplified in the IRESs of the flavivirus HCV and the dicistrovirus cricket paralysis virus (CrPV) (Fig. 3). Unlike the mammalian picornaviral IRESs, which retain a requirement for many or most canonical initiation factors, the mammalian 40S subunit can associate with the HCV IRES in the absence of translation initiation factors, although the formation of the 48S complex (with the initiation codon locked into the mRNA-binding cleft of the small subunit) requires the participation of the eIF2–GTP–Met-tRNAi ternary complex and eIF3 (reviewed in Ref. 32). The HCV IRES forms a defined secondary structure that contains two major hairpins (domains 2 and 3) and an essential pseudoknot structure[12] at the base of domain 3 (domain 3e/f) (Fig. 3). IRES function requires domains 2 and 3, but initial binding of the 40S subunit is mediated principally by the basal region of domain 3, including stem–loop 3d, with a modest contribution from the pseudoknot[33,34]. Comparisons of HCV IRES–40S (Ref. 35) and IRES–80S (Ref. 36) complexes have revealed that the overall appearance of the IRES is similar in the two complexes[36]. In the 80S complex, the pseudoknot corresponds to an L-shaped density located at the mRNA exit channel in the vicinity of ribosomal protein S5 (rpS5) (Fig. 3; Supplementary information S3 (figure)). The assignment of the pseudoknot to the L-shaped density is based on molecular modelling and is consistent with recent crosslinking data[37]. The three-dimensional (3D) structure of the HCV pseudoknot is not available, but in essence it is an H-type pseudoknot in which L1 is long and highly structured, and includes the entirety of domains 3a–3e. Why a pseudoknot is present at this key location of the IRES is unclear, but it might be linked to a capacity to bind rpS5 (Ref. 37). This protein is present at the mRNA exit channel and also associates with IRES domain 2 (Refs 36, 38). Boehringer and colleagues propose that the HCV IRES domains function synergistically to position the AUG into the ribosomal peptidyl (P) site, coupled to movement of the pseudoknot[36]. In this model, a conformational change of the four-way junction (which includes domains 3a and 3c; Fig. 3) pivoted around domain 3d is transmitted to the pseudoknot, which moves towards the mRNA exit channel, positioning the AUG correctly into the ribosomal P site and allowing subunit joining. The movement could be potentiated by eIF3 binding, as this multisubunit initiation factor makes intimate contacts with the HCV IRES[39].
Figure 3
Pseudoknots and internal ribosome entry.
a | A secondary structure representation of the hepatitis C virus (HCV) internal ribosome entry site (IRES) with the pseudoknot shown in blue. b | A surface representation of the human 80S ribosome (grey) in complex with the HCV IRES (red) derived from the cryo-electron microscopy (cryo-EM) structure[36]. Density corresponding to the pseudoknot is indicated in blue. c | A secondary structure representation of the Plautia stali intestine virus (PSIV) IRES is shown above a ribbon representation of the RNA (domains 1 and 2) derived from the crystal structure[27]. Domain 3 remains to be solved. The secondary structure of the cricket paralysis virus (CrPV) IRES (not shown) is similar. d | A surface representation of the yeast 80S ribosome (grey) is shown in complex with the CrPV IRES (red) derived from the cryo-EM structure[26]. Below is a fit of the density to the modelled CrPV IRES showing the interactions that occur between the various domains and ribosomal components. Ribosomal proteins are cyan, 25S ribosomal RNA (rRNA) is purple and 18S rRNA is brown.
Pseudoknots and internal ribosome entry.
a | A secondary structure representation of the hepatitis C virus (HCV) internal ribosome entry site (IRES) with the pseudoknot shown in blue. b | A surface representation of the human 80S ribosome (grey) in complex with the HCV IRES (red) derived from the cryo-electron microscopy (cryo-EM) structure[36]. Density corresponding to the pseudoknot is indicated in blue. c | A secondary structure representation of the Plautia stali intestine virus (PSIV) IRES is shown above a ribbon representation of the RNA (domains 1 and 2) derived from the crystal structure[27]. Domain 3 remains to be solved. The secondary structure of the cricket paralysis virus (CrPV) IRES (not shown) is similar. d | A surface representation of the yeast 80S ribosome (grey) is shown in complex with the CrPV IRES (red) derived from the cryo-EM structure[26]. Below is a fit of the density to the modelled CrPV IRES showing the interactions that occur between the various domains and ribosomal components. Ribosomal proteins are cyan, 25S ribosomal RNA (rRNA) is purple and 18S rRNA is brown.Pseudoknots also have an essential role in the function of the intergenic region (IGR) IRES of CrPV[40] and other dicistroviruses[41]. This remarkable IRES, only ∼200 nucleotides in length, has been described as an RNA-based translation factor[42] because it recruits ribosomes and activates translation without the involvement of initiation factors or initiator tRNA. The ribosomal 40S and 60S subunits bind directly to the IRES[43], which then occupies the ribosomal intersubunit space of the 80S complex and interacts with key components that form the ribosomal aminoacyl (A), P and exit (E) sites.The three defined domains of this IRES have distinct functional tasks (Fig. 3). Domain 1 contributes to interactions with the 60S subunit in the E- and P-site regions, and domain 2 interacts with the 40S subunit at the E site. Domain 3, which is located predominantly between the A and P sites and is in a similar orientation to ribosome-bound tRNAs, places its most 3′ nucleotide triplet (GCU) into the decoding region of the A site[11]. Here, it binds the anticodon of tRNAAla, which is brought to the ribosome as part of the ternary complex (elongation factor 1A (eEF1A)–GTP–tRNAAla). Subsequently, tRNAAla is pseudotranslocated (without peptide-bond formation) by eEF2 into the P site, allowing delivery of the next tRNA into the A site and authentic elongation to begin[41,43,44].Modelling and structural analysis of the CrPV IRES and IRESs from related viruses, including Plautia stali intestine virus (PSIV)[26,27,45,46], has revealed that the structure is dominated by three H-type pseudoknots (Fig. 3), one per domain. Pseudoknot PKI, which essentially forms all of domain 3, is characterized by the possession of an AU-rich S1 and short loops. The folding of the rest of the IRES is dominated by interactions between pseudoknots PKII and PKIII (Fig. 3). The pseudoknot of domain 2, PKIII, is a nested pseudoknot in that it is entirely contained within L3 of the pseudoknot of domain 1, PKII. Cryo-electron microscopy (cryo-EM) and X-ray crystallography studies[26,27] (reviewed in Ref. 11) have indicated a complex folding strategy that forces two small hairpins of PKIII, present as substructures in L1 (stem–loop (SL) IV) and L3 (SL V), to project from a central core and emerge on the same side of the structure to make vital interactions with rpS5 at the E site[46,47,48]. Pivotal to this folding strategy are pseudoknot loop–helix interactions. In PKIII, for example, an L1–S2 major-groove interaction positions SL IV, and a second interdomain interaction occurs between the minor groove of S2 and four L1 bases of the adjacent pseudoknot PKII, which stabilizes the core. The geometry of the overall fold is such that S2 of PKII stacks on S2 of PKIII to create a wedge-shaped section that occupies the mRNA channel and directs PKI into the decoding site.Recruitment of the first aminoacyl tRNA to the ribosome requires the participation of eEF2 (Ref. 49). The activation of eEF2 has been linked to an IRES-mediated destabilization of a conserved cellular pseudoknot that is present in helix 18 of the small subunit rRNA (the so-called 530-loop pseudoknot; Supplementary information S2 (table)). This region of rRNA is involved in the enhancement of translational accuracy and tRNA binding[50], and its destabilization (probably by the pseudoknot of domain 3) is likely to be crucial to the IRES mechanism.The structure of the IGR IRESs illustrates how pseudoknots can be used to direct the global folding of an RNA sequence. The pseudoknot motif naturally provides the potential for coaxial stacking of constituent helices, but also the opportunity for additional stacking with helices present in constituent loops. This allows longer helical domains to be generated, a common feature in the organization of global RNA structures[51]. If the pseudoknot is itself nested in another pseudoknot, additional helical stacking possibilities are created. Superimposed on this is the capacity of the single-stranded loops to interact with constituent stems to add stability, or with other regions of RNA to promote packing of adjacent helices. The compact and complex fold of the IGR IRESs brought about by the nested pseudoknots is a strategy similar to that used to fold certain ribozyme cores, discussed in more detail below. There is no known ribozyme activity associated with IRESs, however, which indicates that this is a general folding strategy that can be used to satisfy different mechanistic requirements.One of the first viral pseudoknots to be described[16] is encoded by T-even bacteriophages (such as T2, T4 and T6) and functions in translational autoregulation of the gene 32 protein (gp32), a single-stranded DNA-binding protein that mainly functions in replication of the viral double-stranded genomic DNA. The pseudoknot is located some 40 nucleotides upstream of the translation start site (AUG) of the gp32 mRNA and acts as a specific binding site for gp32 itself. At low protein concentrations, pseudoknot-bound gp32 does not overlap the ribosome-binding site, but as protein levels increase, cooperative binding of multiple copies occurs, nucleated at the pseudoknot-bound gp32 (Ref. 52). This assembly blocks access to the Shine–Dalgarno sequence and so represses translation. Unfortunately, molecular details of the gp32–pseudoknot interaction are lacking, but the structure of the pseudoknot isolated from bacteriophage T2 has been solved by nuclear magnetic resonance (NMR)[53]. It is a classic H-type pseudoknot with short loops and coaxially stacked stems, albeit with S2 rotated by ∼18˚ with respect to S1 to relieve close phosphate–phosphate contacts at the junction while preserving the stabilizing effects of base stacking. Although the T2 pseudoknot possesses only a single nucleotide in L1 (an A), this is stereochemically feasible as the distance between the two phosphates across the deep groove of A-form RNA reaches a minimum when six or seven base pairs of S2 are bridged[3,21].Although the gene 32 system represents the only known viral example of pseudoknot involvement in the autoregulation of translation initiation, there are related examples in cellular mRNAs (Supplementary information S2 (table)). For example, translational repression of the ribosomal S4 α mRNA operon[54,55] and autoregulation of ribosomal protein S15 synthesis[56] requires specific binding of the respective proteins to pseudoknots in the 5′ untranslated region (UTR) of their own mRNAs.RNA pseudoknots in coding regions are principally associated with sites of programmed −1 ribosomal frameshifting. This is a translational mechanism used by many viruses to coordinately express two proteins from a single mRNA at a defined ratio[7,57]. During elongation, ribosomes decode the mRNA in triplet steps and the reading frame is accurately maintained. However, in frameshifting, the ribosome is forced to shift one nucleotide backwards into an overlapping reading frame and translate an entirely new sequence of amino acids (Fig. 4). In retroviruses, frameshifting at the overlap of the gag and pol open-reading frames (ORFs) allows expression of the viral Gag–Pol polyprotein and sets a defined cytoplasmic Gag:Gag–Pol ratio that is optimized for virion assembly and packaging of reverse transcriptase[58]. In other RNA viruses, frameshifting allows expression of RNA-dependent RNA polymerases (RdRps)[59]. Maintaining a precise efficiency of frameshifting has been shown to be crucial to the replication of HIV-1 (Ref. 60) and the retrovirus-like double-stranded RNA virus of yeast, L-A[61]. Similarly, in other RNA viruses, changing the stoichiometry of non-frameshifted and frameshifted products is also likely to be detrimental. In SARS-CoV, for example, components of the viral replication machinery present in the viral polyproteins pp1a and pp1ab (which are expressed by frameshifting) are predicted to form a heterodimer with a stoichiometry of ∼8:1 (Refs 62, 63), a ratio that is consistent with the natural level of frameshifting[64,65]. For these reasons, frameshifting has emerged as a potential target for antiviral therapeutics.
a | The 3′ end of the turnip yellow mosaic virus (TYMV) genomic RNA. In the upper panel, the predicted secondary structure is shown, with pseudoknotting interactions indicated by dashed red lines. Below is a secondary structure representation of the folded molecule, showing the transfer RNA (tRNA)-like structure (TLS) and the pseudoknots in the acceptor arm and upstream of the TLS (UPK). The ribbon representation (boxed) is derived from the nuclear magnetic resonance structure of the acceptor arm pseudoknot[21]. b | For comparison, secondary structure representations of tRNAPhe are also shown. D, dihydrouridine modified bases; T, ribothymidine base.
Pseudoknots and transfer RNA-like structures.
a | The 3′ end of the turnip yellow mosaic virus (TYMV) genomic RNA. In the upper panel, the predicted secondary structure is shown, with pseudoknotting interactions indicated by dashed red lines. Below is a secondary structure representation of the folded molecule, showing the transfer RNA (tRNA)-like structure (TLS) and the pseudoknots in the acceptor arm and upstream of the TLS (UPK). The ribbon representation (boxed) is derived from the nuclear magnetic resonance structure of the acceptor arm pseudoknot[21]. b | For comparison, secondary structure representations of tRNAPhe are also shown. D, dihydrouridine modified bases; T, ribothymidine base.The tRNA mimicry accounts for the reactivity of the 3′ end of the viral genome with several enzymes that recognize tRNA. These include the CCA nucleotidyl transferase, which adds a 3′ terminal A to complete the 3′ CCA end of the genome upon infection; valyl tRNA synthetase, which aminoacylates the 3′ end with Val; the elongation factor eEF1A, which binds to the TLS to give a viral RNA–eEF1A–GTP ternary complex; and the viral replicase p69/p206 (Refs 96, 97).An elegant relationship has been unearthed between the TLS-specific reactivities and the TYMV lifecycle. Upon entry into the cell, the 3′ CCA end is completed and aminoacetylated, at which point translation of the input virus genome, an obligatory step in the production of the virus replicase, is stimulated synergistically by the 5′ cap and the TLS[97]. The stimulation of translation is maximal when the genome is aminoacetylated and is linked to the formation of a viral RNA–eEF1A–GTP ternary complex[97]. How eEF1A enhances translation is not known, but the available evidence hints at an unexpected involvement of this elongation factor in the process of translation initiation. It has been suggested[98] that binding of the viral RNA ternary complex to the A-site of initiating ribosomes could stimulate initiation at the 5′ end (perhaps in a manner similar to the pseudoknot of the IGR IRES domain 3 discussed above), but recent evidence argues against this specific mechanism[99]. Nevertheless, close proximity of the virus genome ends during translation (the circularization observed for most cellular mRNAs) could offer an explanation for the translational synergy afforded by the 5′ cap and the 3′ TLS–eEF1A complex.The TLS of TYMV is also crucial in the switch between translation and replication. As has been elegantly illustrated for bacteriophage Qβ100 and poliovirus[101,102], the movement of ribosomes in the 5′→3′ direction is incompatible with negative-strand RNA synthesis, in which the RNA polymerase travels 3′→5′. Thus, following an initial burst of translation, the viral mRNA must be cleared of ribosomes to allow replication from the 3′ end. Negative-strand synthesis in TYMV is initiated from the second C of the 3′ CCA triplet following pseudoknot-dependent binding of the replicase to the TLS[103,104]. It has been demonstrated recently that this reaction is inhibited by the binding of eEF1A to the valylated TLS[105]. So, translation is favoured until the levels of viral RdRp are sufficient to compete with eEF1A for binding to the TLS or perhaps until genomes are sequestered into vesicular sites of virus replication, which might be free of competing eEF1A and aminoacyl tRNA synthetases. The TLS might also be involved in genome packaging: in the related brome mosaic virus, viral RNAs lacking the TLS fail to assemble into virions[106].In TYMV, a second pseudoknot is present immediately upstream of the TLS, and this pseudoknot also contributes to translational enhancement, probably by acting as a spacing element to present the functional TLS to enzymes[97] (Fig. 5). The presence of such upstream pseudoknots is not uncommon and indeed, tobamo-, hordei-, furo-, pumo- and certain tymoviruses boast clusters of pseudoknots (between two and seven) in this region[107], which is termed the upstream pseudoknot domain (UPD). So, in tobacco mosaic virus (TMV), for example, a total of five pseudoknots are present in the 3′ UTR, three of which form the quasi-continuous double helical stalk of the UPD (5′ PKIII, PKII, PKI 3′) and two of which (5′ PKb, PKa 3′) are present in the TLS[108]. In the TLS, PKa forms part of the acceptor arm and PKb forms the central core, imposing the tRNA-like shape and orienting the UPD[109]. PKb is a B-type pseudoknot and forms a Y-shaped three-way junction that is proposed from modelling to have some structural similarity to the ribozyme of hepatitis delta virus (HDV; see below), although it displays no catalytic activity[109].Despite the complexity of folding of the TMV TLS, the UPD seems to have usurped its role, at least in terms of translational enhancement[110]. Although the TLS can be aminoacetylated with His and can bind eEF1A[22], the major player is the UPD, in which the highly conserved PKII and PKI can be crosslinked to eEF1A, independently of TLS aminoacylation[111], and form an essential element of the promoter for negative-strand synthesis[112]. The UPD can also bind to the heat-shock chaperone HSP101, although the protein does not appear to be essential for efficient replication of TMV[113]. It seems that there is a general requirement for the binding of certain cellular proteins to the 3′ end that does not necessarily have to be mediated solely by a TLS.Another model system used to study the role of viral NCR pseudoknots in facilitating translation, replication and the switch between the two is tomato bushy stunt virus (TBSV)[114,115,116,117]. The TBSV genome is uncapped and lacks a poly(A) tail. It recruits the translation machinery initially to the 3′ NCR, a process requiring a highly structured 3′ cap-independent translational enhancer (3′ CITE), comprising a Y-shaped hairpin domain upstream of a 3′-end pseudoknot. It is thought that upon infection, the pseudoknot is folded (the closed conformation) and translation factors gather on the 3′ CITE. Translation initiation complexes are then transferred to the vicinity of the 5′ NCR by virtue of genome circularization, mediated, at least in part, by a kissing-loop interaction between a stem–loop in the 3′ CITE and a partner close to the 5′ end of the genome (SL III). Translation complexes then scan to the first AUG and begin protein synthesis. Subsequently, the pseudoknot at the 3′ end of the genome is destabilized, possibly by binding of the accumulating viral RdRp, yielding an 'open' complex that is compatible with negative-strand synthesis[117]. Another pseudoknot (PK-TD1) in the 5′ NCR is also required for efficient replication, although its mechanism of action is not fully understood.Pseudoknots in non-coding regions have also been documented in animal RNA viruses, with the vast majority known to be essential for virus replication (Supplementary information S1 (table)). Mostly, they function in translation, genome replication or the switch between the two. However, they are not well characterized structurally (at high resolution) and few details of the molecular interactions in which they participate are available.Pseudoknots and ribozymesHDV is a satellite RNA virus of humans that replicates in association with a helper virus, hepatitis B virus[17]. The circular, single-stranded RNA genome of HDV is replicated through an intermediate, the antigenome, by a double rolling-circle mechanism that requires self-cleavage by closely related genomic and antigenomic versions of a ribozyme. The HDV ribozymes have been extensively studied as a model for the mechanism of catalytic RNAs[9] and fold into similar structures, characterized by a nested double pseudoknot that helps form the catalytic site and brings great stability to the RNA[118] (Supplementary information S4 (figure)). The nested double-pseudoknot motif is also common in cellular ribozymes (Supplementary information S2 (table)), for example, the HDV-like ribozyme present in an intron of the gene that encodes human cytoplasmic polyadenylation binding protein 3 (Ref. 119) and the glmS
riboswitch ribozyme present in the 5′ UTR of the gene that encodes glucosamine-6-phosphate synthase in numerous Gram-positive bacteria[120,121]. The intricate connectivity that results from nesting two pseudoknots within each other makes it possible for these short RNAs to adopt complex and stable 3D folds.A viral telomerase pseudoknotTelomerases are ribonucleoprotein complexes responsible for the maintenance of telomeres[10]. In addition to a specialized reverse transcriptase (TERT), an RNA component (telomerase RNA (TR)) is present and includes the RNA template for telomere addition and a highly conserved pseudoknot necessary for telomerase activity. Like frameshift-promoting pseudoknots, the TR pseudoknot has extensive triplex interactions at the junction of the two stems[25] (Supplementary information S5 (figure)). It has been proposed that the pseudoknot region makes contacts with the TERT[122] and is needed for telomere repeat-addition processivity[123]. Indeed, a switch between the pseudoknot and a partially unfolded form might be important in the translocation of telomerase during telomere addition[124,125,126]. High levels of telomerase activity are detected in a large range of cancers and are closely associated with the immortalization process[127,128]. Recently, a chicken TR homologue, vTR, has been described in the oncogenic herpesvirus Marek's disease virus[129]. Deletion of the two copies of vTR from the virus genome led to reduced incidence and severity of lymphoma in infected chickens, indicating that vTR has the attributes of an oncogene[130]. An understanding of how vTR supports lymphomagenesis in genetic and molecular terms has the potential to yield new insights into telomerase function in cancer[131].The versatile pseudoknotThe development of sophisticated algorithms (Box 1; Table 1) to scan genomic sequences for RNA structural motifs has highlighted the prevalence of the pseudoknot fold in the RNA universe, indicating evolutionary selection. A plausible explanation for this abundance is that pseudoknots offer functional versatility. Some pseudoknots (for example the BWYV frameshift pseudoknot) are extensively stabilized by both Watson–Crick and non-Watson–Crick interactions[73] and are more stable than an equivalent hairpin (containing a base-paired stem that is equivalent to S1 plus S2). Thus, a very stable motif can be included in an RNA when space — either in terms of genomic coding capacity or molecular dimensions — is at a premium. At the other end of the spectrum, many pseudoknots are considerably less stable than their equivalent hairpin and could conceivably act as regulatory switches, oscillating between stem–loop and pseudoknot conformations in response to environmental signals[132]. Pseudoknots also offer binding sites for proteins or single-stranded loops of RNA. The often extensive intra- and intermolecular contacts that pseudoknots engage in provide many targets for such interactions. Indeed, the in vitro selection of RNA aptamers that bind various biomolecules often generates pseudoknotted RNAs (Box 2). Pseudoknotting can also be the most efficient way of folding RNAs in an active conformation (for example, ribozymes).
Examples of pseudoknot prediction programmesLong-range interactions are also facilitated by this motif to organize global folding (for example, in the ribosome itself; Supplementary information S2 (table)) and to link separate domains of RNA together. In bacteriophage Qβ, although the viral replicase is anchored at a site some 1.2 kb from the 3′ end of the genome where replication initiates, an adjacent pseudoknot forms a long-range interaction that brings the 3′ end to the active site of the polymerase[133]. Such connections also offer the potential for global regulation of gene expression. In barley yellow dwarf virus, L3 of the frameshift-promoting pseudoknot is over 4 kb in length (of a 5.6-kb genome) and links the frameshift region with the 3′ end of the genome[134]. It has been suggested that disruption of this long-range contact is a means to regulate ribosome and replicase traffic on the viral genome[135].Especially in RNA viruses, there is often enormous selective pressure, and genomes that are optimized for viral fitness rapidly dominate. Although it is plain that some or all of the features of pseudoknots discussed above could be reproduced in other ways, it might be that this can only be achieved at considerable cost to fitness.Future perspectivesAlthough 25 years have passed since the first pseudoknot was described[1], much remains to be discovered. Foremost among the tasks ahead is to determine the true frequency of these motifs in natural RNAs. The majority of pseudoknots described in this article were identified as part of the normal process of scientific research, but computational approaches to their identification will be of increasing relevance. To maximize the proportion of genuine candidate pseudoknots identified in such screens, a better understanding of the thermodynamic parameters that govern pseudoknot formation will be invaluable. Additionally, we will need more information about the structure and function of pseudoknots. Until recently, atomic resolution structural information has been restricted to short, often very stable, pseudoknots. Elucidating the cryo-EM and crystal structures of the IGR IRESs of CrPV[26] and PSIV[27] were huge steps forward, but similar monumental efforts will be required to solve those structures in which the pseudoknot orchestrates the folding of long and complex domains, as is found in many viral NCRs. Several viral pseudoknots remain poorly characterized, both in terms of structure and biological activity. A number of these have been shown to interact with proteins, both viral (including RdRp) and cellular, yet molecular details are lacking. Furthermore, we do not know whether pseudoknots are ever substrates for viral or cellular helicases present in infected cells; such activities could offer an extra level of control of virus gene expression and replication. The use of pseudoknots in antiviral strategies should also be considered more broadly. Antisense oligomers targeting the SARS frameshift-promoting pseudoknot have marked antiviral activity[136], and pseudoknotted aptamers have been shown to block HIV replication (Box 2).The study of viral pseudoknots will continue to excite and challenge researchers. It is unquestionable that many more pseudoknots remain to be discovered, and it is highly likely that some of these will possess novel activities or have unprecedented functions. The field has come a long way since the early experiments of Pleij and his collaborators, yet there is still much to be discovered about these fascinating motifs.RNA secondary structure prediction is not trivial and is restricted by our incomplete understanding of RNA thermodynamics and folding kinetics, and by long computer processing times[137]. RNAs can also adopt alternative conformations. Most programmes determine a minimum free-energy structure from the primary sequence, but cannot take into account pseudoknot topology, predominantly owing to the computational complexity[138]. Indeed, it has been calculated that to find a general method to search sequences for minimum free-energy pseudoknots is an intractable task[139]. Nevertheless, a substantial proportion (perhaps a quarter) of the pseudoknot literature deals with computational approaches to pseudoknot identification. Most commonly, a heuristic approach is taken where the search is restricted to certain pseudoknot types. However, most programmes are effective only for short sequences, as processing time can increase as the third to sixth power of sequence length, depending on the algorithm used. Another approach that has been used successfully is to perform simulations of RNA folding. These methods do not search for minimum free-energy structures and are therefore computationally more efficient. For longer sequences, a more efficient approach is to use pattern matching to perform a primary screen of a sequence database to gather those sequences with the potential to form pseudoknots of a defined type and subsequently analyse only these sequences. Multiple sequence comparisons are also of use. A combination of approaches is usually needed to have confidence in the outcome of a database search and the pseudoknots predicted within. The capacity of the programme to deal with pseudoknots containing stem regions with non-Watson–Crick pairs or bulged residues is often an issue. Despite these limitations, pseudoknot search programmes are a valuable resource. Examples of commonly used programmes and webservers are provided in Table 1 (see also Ref. 140 for a review).Aptamers are small nucleic acid ligands that are generated in vitro against various biological molecules. The majority are produced using the SELEX (selective evolution of ligands by exponential enrichment) method[141]. A pool of randomized RNAs is selectively enriched over repetitive rounds of target binding and subsequent sequence amplification until the surviving pool consists of sequences that bind to the target with high affinity and specificity. Initial experiments with SELEX focused on viral polymerases, confirming the sequence selectivity of T7 DNA polymerase[141] and these were followed by the isolation of pseudoknotted aptamers able to bind to HIV-1 reverse transcriptase[142]. These aptamers have been shown to bind with picomolar affinity[143] and can block HIV replication in tissue culture[144], inhibiting the polymerase at all stages of genome replication[145]. The crystal structure of aptamer-bound reverse transcriptase shows how the pseudoknot ligand (coloured strand) is bound along the cleft between the polymerase active site and the RNAse H domain[146] (see image; stem 1 (S1) is blue, S2 is red, loop 1 (L1) is yellow, L3 is green and additional non-pseudoknot bases are orange; dark grey is p51 and light grey is p66). The pseudoknot holds the polymerase in a 'closed' conformation, blocking its action by competitive inhibition of the primer–template binding site. Because of their specificity, there is considerable interest in the use of aptamers as therapeutics. However, aptamers cross cell membranes inefficiently owing to their hydrophilicity and are mostly restricted to extracellular applications such as the blocking of viral glycoproteins to prevent cell attachment and viral entry[147]. To be effective, such aptamers need to be chemically modified so as to resist host ribonucleases. SELEX experiments regularly isolate pseudoknotted ligands[148], presumably reflecting the range of binding surfaces that pseudoknots can offer.
Authors: Kathryn D Smith; Carly A Shanahan; Emily L Moore; Aline C Simon; Scott A Strobel Journal: Proc Natl Acad Sci U S A Date: 2011-04-25 Impact factor: 11.205