Michael T Banco1, Adrian R Ferré-D'Amaré1. 1. Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, Bethesda, Maryland 20892-8012, USA.
Abstract
G-quadruplexes (G4s) are four-stranded nucleic acid structures that arise from the stacking of G-quartets, cyclic arrangements of four guanines engaged in Hoogsteen base-pairing. Until recently, most RNA G4 structures were thought to conform to a sequence pattern in which guanines stacking within the G4 would also be contiguous in sequence (e.g., four successive guanine trinucleotide tracts separated by loop nucleotides). Such a sequence restriction, and the stereochemical constraints inherent to RNA (arising, in particular, from the presence of the 2'-OH), dictate relatively simple RNA G4 structures. Recent crystallographic and solution NMR structure determinations of a number of in vitro selected RNA aptamers have revealed RNA G4 structures of unprecedented complexity. Structures of the Sc1 aptamer that binds an RGG peptide from the Fragile-X mental retardation protein, various fluorescence turn-on aptamers (Corn, Mango, and Spinach), and the spiegelmer that binds the complement protein C5a, in particular, reveal complexity hitherto unsuspected in RNA G4s, including nucleotides in syn conformation, locally inverted strand polarity, and nucleotide quartets that are not all-G. Common to these new structures, the sequences folding into G4s do not conform to the requirement that guanine stacks arise from consecutive (contiguous in sequence) nucleotides. This review highlights how emancipation from this constraint drastically expands the structural possibilities of RNA G-quadruplexes. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
G-quadruplexes (G4s) are four-stranded nucleic acid structures that arise from the stacking of G-quartets, cyclic arrangements of four guanines engaged in Hoogsteen base-pairing. Until recently, most RNA G4 structures were thought to conform to a sequence pattern in which guanines stacking within the G4 would also be contiguous in sequence (e.g., four successive guanine trinucleotide tracts separated by loop nucleotides). Such a sequence restriction, and the stereochemical constraints inherent to RNA (arising, in particular, from the presence of the 2'-OH), dictate relatively simple RNA G4 structures. Recent crystallographic and solution NMR structure determinations of a number of in vitro selected RNA aptamers have revealed RNA G4 structures of unprecedented complexity. Structures of the Sc1 aptamer that binds an RGG peptide from the Fragile-X mental retardation protein, various fluorescence turn-on aptamers (Corn, Mango, and Spinach), and the spiegelmer that binds the complement protein C5a, in particular, reveal complexity hitherto unsuspected in RNA G4s, including nucleotides in syn conformation, locally inverted strand polarity, and nucleotide quartets that are not all-G. Common to these new structures, the sequences folding into G4s do not conform to the requirement that guanine stacks arise from consecutive (contiguous in sequence) nucleotides. This review highlights how emancipation from this constraint drastically expands the structural possibilities of RNA G-quadruplexes. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
In 1910, the Swedish biochemist Ivar Bang reported that high concentrations of guanylic acid form a gelatinous material in aqueous solution. Half a century later, Gellert et al. (1962) deduced from fiber diffraction experiments on this material that four guanines could arrange into a cyclic, Hoogsteen-paired arrangement, the canonical G-quartet (or G-tetrad, Fig. 1A), and that vertical stacks of such G-quartets could result in a four-stranded structure termed the G-quadruplex (G-tetraplex, G4). Formally, at least two stacked G-quartets are needed to form a G-quadruplex. G4s appear to be widespread in biology (Agarwala et al. 2015; Fay et al. 2017; Saranathan and Vivekanandan 2019; Varshney et al. 2020). Analysis of the human genome revealed over 700,000 putative G4-forming sequences (Chambers et al. 2015). Additionally, recent transcriptome-wide studies identified numerous putative G4-forming sequences (Kwok et al. 2016; Murat et al. 2018; Yang et al. 2018; Sauer et al. 2019; Lee et al. 2020), some of which have been validated biochemically (Kumari et al. 2007; Martadinata and Phan 2009; Arora and Suess 2011).
FIGURE 1.
General structural characteristics of G4s. (A) The cyclic arrangement of guanines in a canonical G-quartet or G-tetrad. G-quadruplexes (G4s) arise from stacking of two or more G-quartets. The glycosidic bond conformations anti and syn, as well as a cation occupying the central pore are noted. Black and yellow dashed lines denote hydrogen bond and metal ion coordination, respectively. (B) As in other nucleic acid structures, the pentose pucker can be C2′-endo or C3′-endo. In these two examples, the glycosidic bond angles are syn. (C) Various connectivities of G4 structures, which include all-parallel, anti-parallel, and mixed. Diamonds represent guanine bases. Four colors (red, orange, blue, green) denote the stacked guanines of three successive G-tracts or stacks. The loops connecting the all-parallel guanine stacks are of the propeller type. Lateral and diagonal loops are noted in the anti-parallel G4. Outline arrowheads in the backbone denote 5′-to-3′ chain direction. (D) Different local strand polarities (directions) of guanines in G-tracts. Three guanines of a G-tract are highlighted in cyan. In the left panel, the 5′ to 3′ direction of the riboses of the three guanines is the same. In the right panel, the direction of one of the guanines is opposite.
General structural characteristics of G4s. (A) The cyclic arrangement of guanines in a canonical G-quartet or G-tetrad. G-quadruplexes (G4s) arise from stacking of two or more G-quartets. The glycosidic bond conformations anti and syn, as well as a cation occupying the central pore are noted. Black and yellow dashed lines denote hydrogen bond and metal ion coordination, respectively. (B) As in other nucleic acid structures, the pentose pucker can be C2′-endo or C3′-endo. In these two examples, the glycosidic bond angles are syn. (C) Various connectivities of G4 structures, which include all-parallel, anti-parallel, and mixed. Diamonds represent guanine bases. Four colors (red, orange, blue, green) denote the stacked guanines of three successive G-tracts or stacks. The loops connecting the all-parallel guanine stacks are of the propeller type. Lateral and diagonal loops are noted in the anti-parallel G4. Outline arrowheads in the backbone denote 5′-to-3′ chain direction. (D) Different local strand polarities (directions) of guanines in G-tracts. Three guanines of a G-tract are highlighted in cyan. In the left panel, the 5′ to 3′ direction of the riboses of the three guanines is the same. In the right panel, the direction of one of the guanines is opposite.At present, there are 246 G4 entries in the structural database (PDB, curated manually), of which 229 are DNA (or DNA–protein complexes), and the remaining RNA (or RNA–protein complexes). DNA and RNA G4s both exhibit polymorphism (that is, one sequence may form several different G4-arrangements) and share general structural principles. G4s can be formed by either a single nucleic acid chain (unimolecular) or multiple nucleic acid chains (bimolecular, trimolecular, tetramolecular, etc.) The placement of the four guanine carbonyl oxygens near the fourfold axis of the G-quartet generates an electronegative central pore, which is stabilized by coordination of a cation. This cation is most commonly K+, but NMR and crystal structures exist where the axial cation is Na+, Cs+, Sr2+, Ba2+, or even NH4+. Cation coordination by G-quadruplexes was reviewed by Bhattcharyya et al. (2016). The guanines forming the G4 structure can be in the anti or the syn conformation (Fig. 1A), with the latter only possible if steric clash between the ribose and the base is relieved by sugar puckering. The pucker itself is either (predominantly) C2′-endo or the C3′-endo (Fig. 1B). In the case of the DNA double helix, these correspond to B-form and A-form, respectively, and reflect differences in hydration. For the RNA duplex, the steric block of the ribose 2′-OH imposes the A-form (C3′-endo) pucker (Saenger 1984). However, for nonhelical conformations, the most important correlate of the pucker is the distance between the two flanking phosphates, which is 7.0 Å and 5.9 Å for C2′-endo or the C3′-endo puckered riboses, respectively (Murray et al. 2003).The sequential arrangement of their guanine tracts will determine the connectivity of the G4 structures, which are categorized as parallel, anti-parallel, or mixed (Fig. 1C). This refers to the relative 5′-to-3′ direction of the four strands that make up the quadruplex. Depending on the connectivity (not “topology”; no knotted G4s have been described; all known G4 have the topology of a simple line, or multiple simple lines) of the G4 structure, the guanines can stack on each other with the same or inverted 5′-to-3′ polarity (Fig. 1D). Stacking of nucleotides with inverted polarity is commonly observed in nonquadruplex RNA, for instance, when cross-strand stacking occurs within a helix (Jhunjhunwala et al. 2020).Nucleotides that are not part of a quartet and connect successive G-tracts (arrangements of stacked guanines parallel to the fourfold axis of the G4), form the loops of the G4 structure. There are three common types of loops: propeller, lateral, and diagonal. Propeller loops connect adjacent parallel G-tracts. Lateral and diagonal loops connect adjacent and nonadjacent antiparallel G-tracts, respectively (Fig. 1C). Mixed connectivity G4s can contain a combination of loop types.Two simple G4 nucleic acids illustrate how the interplay of structural principles described above gives rise to stable G4 arrangements. First is the unimolecular DNA G4 structure formed in the nuclease hypersensitivity element III1 region of the c-Myc oncogene (Ambrus et al. 2005). Overexpression of c-Myc is associated with various cancers (Dang 2012). The c-Myc G4 structure functions as a repressor of the gene (Siddiqui-Jain et al. 2002; Hurley et al. 2006), which has made it an attractive alternative target for new anti-cancer therapeutics (Hurley et al. 2006; Calabrese et al. 2018; Carvalho et al. 2020). This G4 is an all-parallel quadruplex with three canonical G-quartet tiers (Fig. 2A). All the guanine glycosidic bonds are anti, and most of the sugars are in the C2′-endo conformation. All loops connecting the guanine stacks are of the propeller type and consist of one or two unpaired nucleotides. As in the vast majority of G4s (Chung et al. 2015; Winnerdy et al. 2019), the loop connections are all right-handed. Second is the G4 structure formed by the telomeric repeat-containing RNA (TERRA) sequence. Although mammalian telomeres were thought to be transcriptionally silent, long transcripts that include the repeat sequence r(UUAGGG) were identified more recently (Azzalin et al. 2007). Solution NMR structure determination of a stable species formed by TERRA in the presence of K+ revealed a bimolecular G4 with three G-quartet tiers (Fig. 2B; Martadinata and Phan 2009; Collie et al. 2010). The symmetric structure contains two trinucleotide propeller loops, all glycosidic bonds are anti, and most riboses adopt the C3′-endo conformation.
FIGURE 2.
Simple G4 structures of c-Myc and TERRA. (A) Three-tiered, unimolecular G4 structure adopted by a sequence from the c-Myc promoter (PDB:1XAV; Ambrus et al. 2005). (B) Three-tiered bimolecular RNA G4 structure of TERRA (PDB:3IBK; Collie et al. 2011). (C) Schematic of the c-Myc G4, highlighting canonical DNA G4 structural features. The guanines of the G4 are anti, predominantly C2′-endo, and are in an all-parallel connectivity, with mono- and di-nucleotide propeller loops. See Table 1 for code. In all figures, residue numbers correspond to those in the PDB depositions. (D) Schematic of TERRA illustrating canonical RNA G4 structural features. The guanines adopt the anti conformation are connected in a parallel connectivity and are predominantly in the C3′-endo sugar pucker.
Simple G4 structures of c-Myc and TERRA. (A) Three-tiered, unimolecular G4 structure adopted by a sequence from the c-Myc promoter (PDB:1XAV; Ambrus et al. 2005). (B) Three-tiered bimolecular RNA G4 structure of TERRA (PDB:3IBK; Collie et al. 2011). (C) Schematic of the c-Myc G4, highlighting canonical DNA G4 structural features. The guanines of the G4 are anti, predominantly C2′-endo, and are in an all-parallel connectivity, with mono- and di-nucleotide propeller loops. See Table 1 for code. In all figures, residue numbers correspond to those in the PDB depositions. (D) Schematic of TERRA illustrating canonical RNA G4 structural features. The guanines adopt the anti conformation are connected in a parallel connectivity and are predominantly in the C3′-endo sugar pucker.
TABLE 1.
Legend for symbols in G4 schematics
In the past decade, crystallographic structure determination of RNA aptamers evolved in vitro to recognize various ligands revealed that some of them possess G4s of unprecedented complexity. These structures expand the known range of structural possibilities of RNA G4s and are the focus of this review. A number of solution and crystal structures of DNA G4s have been solved containing noncanonical structural features, such as bulges, vacancies, inclusions from loop nucleotides, and non-G tetrads (Lightfoot et al. 2019). All of these G4 features have now been observed in RNA aptamers and are discussed below. G4 structures adopted by several RNA aptamers challenge the received wisdom regarding the structural scope of RNA G4s. The sequence context of these RNA G4s often does not conform to the canonical sequence motif (G>2–NX–G>2–NX–G>2–NX–G>2). Recent reviews provide an overview of the basic structural principles of RNA G4s (Malgowska et al. 2016), and the structural diversity of G4s in general (Lightfoot et al. 2019). Here, we emphasize insights into the potentially much higher complexity of RNA G4s from recently reported structures of in vitro selected aptamers. Other aspects of G4 structure, chemistry and biology, including recognition of G4 by proteins and small molecules, have been extensively reviewed elsewhere (by, among others, Sissi et al. 2011; Mendoza et al. 2016; Hänsel-Hertsch et al. 2017; McRae et al. 2017; Neidle 2017; Sauer and Paeschke 2017; Varshney et al. 2020).
A GRAPHICAL SHORTHAND FOR G-QUADRUPLEXES
Typically, in schematic representations, G4s are drawn as stacks of four squares or diamonds (G-quartet, each square depicting a guanine) connected by lines (loops, Fig. 1C). Although this suffices to indicate the number of quartets and their connectivity, additional stereochemical information is not immediately apparent. The Leontis and Westhof (2001) symbols are widely used to represent noncanonical base pairs, but are not informative for G-quadruplexes, as G-quartets have but one (cyclically iterated) base-pairing scheme: Hoogsteen to Watson–Crick. We propose a graphical shorthand that captures salient structural features of G-quadruplex nucleic acids (Table 1). Structural similarities and differences between the c-Myc and TERRA G4s, in particular, how variation of sugar pucker concentrates at the ends of G-stacks, are readily apparent using this representation (Fig. 2C,D).Legend for symbols in G4 schematicsIn our convention, each row represents the nucleotides of a quartet, and columns indicate nucleotide stacks. Mixed-sequence and noncanonical quartets are included in the schematics only when these quartets are stacking directly on a G-quartet. The 5′ and 3′ boundaries of the schematics are the first and last nucleotide that are part of a quartet (canonical or mixed). Strand polarity inversions are indicated as upside-down letters for quartet-forming nucleotides with respect to the 5′-most nucleotide in the schematic. Lines connecting the quadruplex nucleotides are loops (connections between stacks) and bulges (interruptions within stacks). The number of nucleotides in loops and bulges are indicated within a circle. Expansions of a quartet into a pentad, hexad, etc., by inclusion of a loop nucleotide in the plane of the quartet are denoted by drawing a loop nucleotide adjacent to the quartet row.
STEREOCHEMICAL CONSTRAINTS ON SIMPLE RNA G-QUADRUPLEXES
Until the NMR and cocrystal structures of the Fragile-X aptamer (Phan et al. 2011), it had been received wisdom that RNA G4s conform to three general stereochemical rules. First, all guanines involved in G-tetrads adopt the anti conformation. Second, RNA G4 structures are limited to a parallel connectivity. Third, the ribose moiety in RNA G4s preferentially adopts the C3′-endo sugar pucker. Although C2′-endo ribose puckers have been observed in RNA G4s, the C3′-endo pseudorotamer was thought to be preferred. For instance, the structure of TERRA contains a mixture of C2′- and C3′-endo puckers with the majority of riboses exhibiting the latter (Fig. 2D). The sugar puckers were proposed to result from steric clash between the purine bases and 2′-OHs of guanines forming the G-quartets (Martadinata and Phan 2009). It has been noted that the 2′-OH, as well as other features of RNA, contribute to the structural stability of RNA G4s, resulting in melting temperatures substantially higher than those of their DNA counterparts (Arora and Maiti 2009; Joachimi et al. 2009; Agarwal et al. 2012).
THE FRAGILE-X APTAMER
The fragile-X mental retardation protein (FMRP) is an essential regulatory RNA-binding protein that is associated with several human disorders, including fragile-X syndrome and autism (De Boulle et al. 1993; Bassell and Warren 2008; Hernandez et al. 2009). FMRP contains four highly conserved nucleic acid binding domains: three KH domains and an arginine-glycine rich domain (RGG). A number of studies determined that the RGG domain of FMRP specifically binds to G-rich mRNA sequences, some of which were shown to fold into unimolecular G4s (Brown et al. 2001; Darnell et al. 2001; Schaeffer et al. 2001; Phan et al. 2011). The G4s in these mRNAs were found to be polymorphic. Therefore, for structural studies, a conformationally uniform RNA G4 (Sc1) was selected in vitro against the full-length protein (Darnell et al. 2001; Ramos et al. 2003).Solution and cocrystal structures of the RGG peptide complexed with Sc1 (RGG–Sc1) revealed an RNA comprised of a three-tier G4, coaxial to an anti-parallel A-form duplex. The two are connected by a junctional mixed-sequence (i.e., not canonical, all-G) quartet (Fig. 3A; Phan et al. 2011; Vasilyev et al. 2015). The connectivity of the Sc1 G4 was unprecedented and exhibits several noncanonical structural features (Fig. 3B). The two upper G-quartets (G11•G15•G20•G24 above G12•G16•G21•G25) assemble into a parallel G4 structure consisting of guanines in the anti conformation and backbones aligned in the same direction. Remarkably, the third G-quartet (G9•G6•G18•G26) is inverted relative to the top two, such that the strand polarity of the guanines of the top quartets is opposite that of the bottom (Fig. 3B). Three of the guanines of the third (inverted) G-quartet are not directly connected to the G-tracts, which differs from the classical G4 sequence motif (Patel et al. 2007; Agarwala et al. 2015; Malgowska et al. 2016; Lightfoot et al. 2019). These three guanines connect to the bottom G-tracts through mononucleotide loops, whereas G25 connects directly to G26. Furthermore, while the guanines adopt both C2′-endo and C3′-endo puckers, the majority are in the former (DNA-like) conformation.
FIGURE 3.
The Fragile-X peptide-aptamer RNA and DHX36-c-Myc complexes. (A) Cartoon representation of the three-dimensional structure of the RGG-Sc1 complex (PDB:5DE5; Vasilyev et al. 2015). The location of the mixed-sequence quartet is indicated. The RGG peptide is highlighted by its translucent molecular surface. (B) Schematic of the Sc1 quadruplex. (C) Hydrogen bond and coordination network of the mixed-sequence quartet of Sc1. The RGG peptide abuts this quartet. Red and purple spheres depict water molecules and K+, respectively. (D) Cartoon representation of the c-Myc quadruplex bound to the amino-terminal α-helix (DSM) of DHX36 (PDB:5VHE; Chen et al. 2018). (E) Schematic of the c-Myc quadruplex bound to DHX36. (F) Detail of the mixed-sequence quartet.
The Fragile-X peptide-aptamer RNA and DHX36-c-Myc complexes. (A) Cartoon representation of the three-dimensional structure of the RGG-Sc1 complex (PDB:5DE5; Vasilyev et al. 2015). The location of the mixed-sequence quartet is indicated. The RGG peptide is highlighted by its translucent molecular surface. (B) Schematic of the Sc1 quadruplex. (C) Hydrogen bond and coordination network of the mixed-sequence quartet of Sc1. The RGG peptide abuts this quartet. Red and purple spheres depict water molecules and K+, respectively. (D) Cartoon representation of the c-Myc quadruplex bound to the amino-terminal α-helix (DSM) of DHX36 (PDB:5VHE; Chen et al. 2018). (E) Schematic of the c-Myc quadruplex bound to DHX36. (F) Detail of the mixed-sequence quartet.The mixed-sequence quartet (U8•A17•U28•G29) that joins the G4 and duplex moieties of Sc1 is held by a novel hydrogen-bond network (Fig. 3C). Two crystallographically well-ordered water molecules participate in inter-nucleobase hydrogen bonding. Other noncanonical tetrads have been observed in G4s, typically positioned at dimer interfaces or forming interactions with proteins (Cheong and Moore 1992; Patel and Hosur 1999; Patel et al. 2000; Pan et al. 2003; Kimura et al. 2009; Chen et al. 2018; Andralojc et al. 2019). For instance, the solution structure of r(UGGUGGU) shows a tetramolecular G4 containing multiple U-tetrads. Of these, the 3′ terminal U-tetrad, contributes to the thermal stability of the G4, whereas in the DNA G4 of analogous sequence, the uridines lead to a 30°C lower Tm (Cheong and Moore 1992; Andralojc et al. 2019). The Sc1 structure is stabilized by binding of the RGG peptide to a groove between its G4 and duplex moieties (Fig. 3A). In the peptide-free state, Sc1 adopts multiple conformations, whereas binding of the RGG peptide selects for the G4 structure (Phan et al. 2011). In particular, the RGG peptide binding energy facilitates formation of the mixed, noncanonical tetrad.The recent cocrystal structure of the DEAH helicase, DHX36, bound to the c-Myc DNA G4 is another example of protein binding-mediated stabilization of otherwise unstable mixed-sequence tetrads (Fig. 3D,E; Chen et al. 2018). DHX36 binding to the canonical c-Myc G4 (Fig. 2A) results in removal of the last guanine of the 3′-most G-stack from the quadruplex. The DNA rearranges such that two nucleotides previously in loops form part of the top tetrad of the three-tier quadruplex, whose composition is G•G•A•T (Fig. 3F). Differential scanning calorimetric analyses demonstrated that in the absence of the helicase, the G4 containing the G•G•A•T is much less stable than the canonical c-Myc G4 (Chen et al. 2018).
THE SPINACH FLUORESCENCE TURN-ON APTAMER
Fluorescence turn-on aptamers are RNAs selected in vitro to bind and activate otherwise weakly fluorescent small molecules, that is, conditional fluorophores (Bouhedda et al. 2017; Truong and Ferré-D'Amaré 2019). Their most common mechanism of action is to restrain the bound, photoexcited fluorophores in a planar conformation, suppressing nonradiative decay pathways (Trachman and Ferré-D'Amaré 2019). These RNA-fluorophore complexes have attracted great interest for their application as RNA analogs of fluorescent proteins and have been successfully used both, in vitro and in vivo for analysis of RNA folding and localization, as well as reporters for biosensors (Trachman and Ferré-D'Amaré 2019; Braselmann et al. 2020). Structures have been reported of multiple independently evolved fluorescence turn-on aptamers, which revealed diverse, and often unrelated three-dimensional architectures. Unexpectedly, many of these aptamers have fluorophore binding sites organized around G4 structural elements. It has been proposed that this may be a consequence of the mechanism of fluorescence activation used by these aptamers. The larger and flatter molecular surface of a G-quartet, compared to those of a base pair or base triple, may facilitate restraining the photoexcited fluorophores (Warner et al. 2014).Spinach is a fluorescence turn-on aptamer that induces by 1000-fold the fluorescence of 3,5-difluoro-4-hydroxybenzylidene (DFHBI), a small-molecule analog of the intrinsic fluorophore of green fluorescent protein (Paige et al. 2011). Crystal structures of Spinach in complex with DFHBI revealed an RNA consisting of two A-form duplexes stacked coaxially on either side of a fluorophore binding site organized around a highly irregular, noncanonical quadruplex (Huang et al. 2014; Warner et al. 2014). Presence of a G4 in Spinach had not been predicted, as the RNA does not conform to the standard G4 sequence motif. In the Spinach-DFHBI complex structure, the fluorophore is held between a base triple and the top G-quartet (G26•G30•G65•G70) of the G4 motif (Fig. 4A). Subsequently, structures of the reselected variant aptamer iSpinach (Autour et al. 2016), which contains an identical G4 structure, in complex with two different fluorophores, have been reported (Fernandez-Millan et al. 2017; Jeng et al. 2021).
FIGURE 4.
The Spinach fluorescence turn-on RNA aptamer. (A) Cartoon representation of the G4-containing core of Spinach bound to its cognate fluorophore DFHBI (PDB:4TS2; Warner et al. 2014). Locations of the mixed-sequence quartet and base triple are indicated. Dots indicate 28 nt omitted from the figure. (B) Schematic of the quadruplex element of Spinach.
The Spinach fluorescence turn-on RNA aptamer. (A) Cartoon representation of the G4-containing core of Spinach bound to its cognate fluorophore DFHBI (PDB:4TS2; Warner et al. 2014). Locations of the mixed-sequence quartet and base triple are indicated. Dots indicate 28 nt omitted from the figure. (B) Schematic of the quadruplex element of Spinach.The RNA quadruplex motif of Spinach is among the most complex described to date. Two G-quartets stack onto a mixed-sequence quartet, (C28•U66•U73•G74), and the three tiers together coordinate two axial K+ ions (Fig. 4A,B). The connectivity of the G4 is a combination of parallel and antiparallel, and only four of the eight guanines in the two G-quartets are sequential; the other four follow loops of various lengths. One of the loops (formally a diagonal loop) encompasses the base triple that stacks on the fluorophore as well as the entire A-form duplex that stacks coaxially on it. In the standard form of Spinach, it is 34 nt long. Before this long loop, the guanines are in the anti conformation and the connectivity is parallel. Following the loop, the connectivity is antiparallel, and the guanines deviate greatly from canonical RNA G4 stereochemistry. Three of these guanines adopt the syn conformation and only one is anti. Furthermore, all of these guanines adopt the 2′-endo ribose pucker. The three syn guanines are also in locally opposite polarity to the guanines 5′ of the 34 nt loop. The most notable difference of Spinach from canonical G4s is the drastic deviation from the classical sequence motif subsequent to the 34-nt loop. The guanines involved in the G-quartets are interrupted by nucleotides that form bulges or are incorporated into the mixed-sequence quartet. The mixed-sequence quartet is an integral part of the Spinach quadruplex, and beyond its composition, is rich in noncanonical features, including three nucleotides in inverted polarity and with 2′-endo puckers.Analogous to the function of the mixed tetrad in the FMRP–Sc1 complex (Phan et al. 2011; Vasilyev et al. 2015), the mixed tetrad in Spinach aids in the transition from the mixed G4 structure to an anti-parallel duplex. Unlike in the FMRP–Sc1 complex, where the G4 stacks coaxially on a duplex only on one side, the Spinach G4 is in the middle of a continuous coaxial stack, being flanked by A-form duplexes on both sides. The elaborate, noncanonical connectivity of the Spinach G4 must arise, at least in part, as a solution to the problem of transitioning, without interrupting base stacking, from an antiparallel duplex, to a four-stranded G4, and back to an antiparallel duplex. Functionally, it has been proposed that the Spinach G4 is favorable for inducing fluorescence of its cognate fluorophore because G4s are planar and have a larger surface area than a base pair or base triple. In addition, it has been noted that electronic coupling between the fluorophore and the G4, as well as the coordinated axial cations, can modulate the photophysics of the turn-on aptamer-fluorophore complex (Warner et al. 2014; Trachman and Ferré-D'Amaré 2019).
THE CORN FLUORESCENCE TURN-ON APTAMER
To discover fluorescence turn-on aptamers with improved photophysical properties, several additional in vitro selection experiments have been reported (Shelke et al. 2018; Bouhedda et al. 2020; Braselmann et al. 2020). Selection against 3,5-difluoro-4-hydroxybenzyildene-imidazolinone-2-oxime (DFHO), a variant of DFHBI with additional conjugation, resulted in a small (36 nt), yellow fluorescent RNA, termed Corn (Song et al. 2017). The Corn-DFHO cocrystal structure (Warner et al. 2017) revealed a quasi-symmetric (see Jones and Ferré-D'Amaré 2015 for definition) homodimeric RNA, with one molecule of the fluorophore bound at the dimer interface (Fig. 5A). Biochemical studies of Corn demonstrated that the aptamer is a homodimer in solution (Warner et al. 2017). A crystal structure of the unliganded aptamer revealed a fully symmetric dimer with a collapsed fluorophore binding site. Structures of the Corn dimer in complex with the nonspecific fluorophores thioflavin T and thiazole orange also exhibited strict (crystallographic) symmetry, suggesting that local symmetry breaking at the interface is induced by DFHO binding (Sjekloća and Ferré-D'Amaré 2019).
FIGURE 5.
The Corn fluorescence turn-on aptamer. (A) Cartoon representation of the Corn homodimer with the fluorophore DFHO bound at the dimer interface (PDB:5BJO; Warner et al. 2017). Red arrows point to the two mixed-sequenced quartets. (B) Schematic of the quadruplex element of Corn. (C) Detail of the A•U•A•U quartet of Corn. Water molecules and K+ depicted as red and purple spheres, respectively. For structure determination U17 was replaced with 5-iodouracil (Warner et al. 2017). Subsequent Corn structures, containing RNA lacking halogens, showed the same conformation of the quartet (Sjekloća and Ferré-D'Amaré 2019). (D) Detail of the G•C•G•C quartet of Corn.
The Corn fluorescence turn-on aptamer. (A) Cartoon representation of the Corn homodimer with the fluorophore DFHO bound at the dimer interface (PDB:5BJO; Warner et al. 2017). Red arrows point to the two mixed-sequenced quartets. (B) Schematic of the quadruplex element of Corn. (C) Detail of the A•U•A•U quartet of Corn. Water molecules and K+ depicted as red and purple spheres, respectively. For structure determination U17 was replaced with 5-iodouracil (Warner et al. 2017). Subsequent Corn structures, containing RNA lacking halogens, showed the same conformation of the quartet (Sjekloća and Ferré-D'Amaré 2019). (D) Detail of the G•C•G•C quartet of Corn.Each of the Corn RNA chains forming the dimer (the two “protomers” in the sense of, e.g., Jones and Ferré-D'Amaré 2015) comprises a four-tiered mixed-sequence quadruplex, connected to an A-form duplex through an irregular junction. The Corn quadruplex contains two canonical G-quartets and two mixed-sequence quartets (Fig. 5B). The G-quartets of the quadruplex have parallel connectivity (except for one guanine that is locally inverted), all guanine bases are in the anti conformation, and most bases adopt a C3′-endo pucker. In contrast to the G-quartets, the mixed-sequence quartets have an anti-parallel connectivity, and unusual base-pairing interactions within each quartet (Fig. 5B). The mixed quartet adjacent to the G-quartets (A10•U17•A21•U27) exhibits noncanonical and water-mediated A•U pairing (Fig. 5C). These noncanonical interactions mediated by structured waters in mixed quartets have only been observed, to our knowledge, in the Sc1 and Corn RNAs (Vasilyev et al. 2015; Warner et al. 2017). The next mixed-composition quartet (G9•C18•G20•C28) contains two G•C Watson–Crick base pairs, in which both base pairs display a pronounced negative roll (Fig. 5D; Olson et al. 2001) of ∼70°, causing the quartet to buckle. Previously, DNAs folding into a tetramolecular “fold-back” quadruplex have been observed adopting similar noncanonical G•C Watson–Crick base pairs that in turn pair into mixed quartets (Chu et al. 2018). Distal to this quartet, a junctional base triple joins the four-tier quadruplex of Corn with its A-form anti-parallel duplex.
THE MANGO FAMILY OF FLUORESCENCE TURN-ON APTAMERS
Selection experiments for small RNA turn-on aptamers that exhibit high affinity for the thiazole orange (TO) derivative, TO1-Biotin, yielded four aptamers, Mango-I, Mango-II, Mango-III, and Mango-IV (Dolgosheina et al. 2014; Autour et al. 2018; Trachman and Ferré-D'Amaré 2019). The sequences of these ∼30-nt long aptamers are related, and crystallographic structure determination (Trachman et al. 2017, 2018, 2019, 2020) confirmed that their ligand binding sites are all structured around G4s (Fig. 6). In terms of their overall structure, Mango-I and Mango-II (Fig. 6A–C) are most closely related. Both their structures are based on a three-tiered G4 that is linked to an A-form double helix by a GAAA-tetraloop-like junction. The sequence of Mango-IV is not highly divergent from those of Mango-I and Mango-II, but this RNA exists as a homodimer, and its G4 is stacked coaxially onto an A-form double helix that links the two protomers through a domain-swapping interaction. Mango-III is the most divergent of the four aptamers, both in sequence and in structure. This monomeric aptamer consists of a two-tiered G4 stacking coaxially on a junctional base triple, which assists with the transition to a coaxially stacked A-form duplex. Unlike the other three Mango aptamers, the Mango-III structure incorporates a noncanonical helix formed between a propeller loop of the quadruplex and nucleotides 3′ of the G4, in a manner analogous to a pseudoknot (reminiscent in this regard to Mango-III, a solution NMR structure of a DNA G4 possessing a duplex hairpin located within a bulge was recently reported [Ngoc Nguyen et al. 2020]). Moreover, and unlike the TO1-Biotin binding sites of the other three Mango aptamers, which are open on the side opposite to the G4, the Mango-III fluorophore binding site is capped by a long-range, noncanonical Watson–Crick pair (Fig. 6A,D). The presence of tertiary interactions in Mango-III is consistent with it having, uniquely within aptamers of the family, multiphasic melting behavior.
FIGURE 6.
The Mango family of fluorescence turn-on RNA aptamers. (A) Cartoon representation of the quadruplex portion of the cocrystal structure of Mango-I bound to the TO1-Biotin fluorophore (PDB: 5V3F; Trachman et al. 2017). (B) Schematic of Mango-I G4. (C) Schematic of the Mango-II G4 (PDB:6C63; Trachman et al. 2018). (D) Cartoon representation of the quadruplex portion of the cocrystal structure of Mango-III bound to TO1-biotin (PDB:6E8T; Trachman et al. 2019). (E) Schematic of the Mango-III G4. (F) Schematic of the Mango-IV G4 (PDB:6V9B; Trachman et al. 2020).
The Mango family of fluorescence turn-on RNA aptamers. (A) Cartoon representation of the quadruplex portion of the cocrystal structure of Mango-I bound to the TO1-Biotin fluorophore (PDB: 5V3F; Trachman et al. 2017). (B) Schematic of Mango-I G4. (C) Schematic of the Mango-II G4 (PDB:6C63; Trachman et al. 2018). (D) Cartoon representation of the quadruplex portion of the cocrystal structure of Mango-III bound to TO1-biotin (PDB:6E8T; Trachman et al. 2019). (E) Schematic of the Mango-III G4. (F) Schematic of the Mango-IV G4 (PDB:6V9B; Trachman et al. 2020).The G4s of the Mango-I, Mango-II, and Mango-IV aptamer RNAs are all three-tiered, consisting of two parallel G-quartets stacked on top of a G-quartet with the opposite polarity (Fig. 6B,C,F). The G4 of Mango-III is only two-tiered, with its first two G-tracts being parallel, while the last one runs in the opposite direction (Fig. 6E). The guanines of the three-tiered G4s of Mango-I, Mango-II, and Mango-III are predominately in the anti conformation, while their sugar puckers are both C2′-endo and C3′-endo. The Mango-I and Mango-II quadruplexes are most similar, in this regard, with the two outlying G-quartets having most riboses in the C2′-endo conformation and the middle G-quartet having all riboses in the C3′-endo pucker. The G4 of Mango-IV differs in that its two parallel G-quartets are comprised of guanines with C3′-endo puckers exclusively. The puckers of the Mango-III G4 guanines is a mixture of C2′-endo and C3′-endo. For all four Mango aptamers, multiple bulges interrupt G-tracts forming the G-quartets, which are flanked by locally inverted guanosines except for Mango-III. Functionally, the loop and bulge nucleotides of the Mango aptamer G4s are very important, as they provide “flaps” that sequester their fluorophores against the flat surface of their quadruplexes (Fig. 6A). The bottom (locally inverted) G-quartet of Mango-I and Mango-II have expansions in the plane (i.e., into pentads and hexads) by adenosine nucleotides that originate in loops. Such expansions by loop nucleotides enlarge the surface area of G-quartets, and are commonly observed in dimeric G4 structures or interacting with small-molecules (Zhang et al. 2001; Liu et al. 2002; Mashima et al. 2009; Collie et al. 2011; Martadinata and Phan 2013).
OTHER RNA APTAMERS WITH COMPLEX G4 ELEMENTS
Spiegelmers (from the German word for mirror) are RNA aptamers made from the unnatural L-ribose nucleotides. The intrinsic resistance of spiegelmers to plasma exo- and endonucleases has made them of interest as novel therapeutics (Vater and Klussmann 2015). Complement is an essential component of the human innate immunity (Ricklin et al. 2010). Recognition of pathogens by complement causes the release of the anaphylatoxins, C3a and C5a, which function as signaling molecules (Zhou 2012). Elevated levels of C5a have been suggested to be involved in acute and chronic inflammation disorders. Substantial efforts have been made to develop novel inhibitors targeting complement and have been reviewed elsewhere (Ricklin et al. 2018). An in vitro selection experiment produced an L-DNA/L-RNA aptamer that binds C5a with high affinity, termed NOX-D20 (Hoehlig et al. 2013). The cocrystal structure of NOX-D20 in complex with C5a revealed a unique G4-containing nucleic acid (Fig. 7A; Yatime et al. 2015). Bound to C5a, NOX-D20 folds into a V-shaped structure comprised of a two-tier G4 and an anti-parallel duplex. Unusually, the cation coordinated by the NOX-D20 G-quartet is Ca2+. Its two-tiered G4 structure has mixed connectivity and two bulges (Fig. 7B). Guanines in two of the G-stacks, which harbor bulges, have locally opposite polarity from the others. The guanines in this G4 are in both anti and syn conformations and adopt a mixture of C2′-endo and 3′-endo sugar puckers. The G4 structure contains two bulges, a 2 nt and a 4 nt, that interrupt two of the G-tracts. Interestingly, the tetranucleotide bulge forms a Watson–Crick and a noncanonical base pair with another loop region from the rest of the NOX-D20 structure. These are important in maintaining the overall fold of the nucleic acid (Yatime et al. 2015).
FIGURE 7.
The NOX-D20 aptamer complex. (A) Cartoon representation of the structure of the NOX-D20 aptamer bound to C5a (PDB:4WB2; Yatime et al. 2015). The C5a four-helix bundle is highlighted by a translucent molecular surface. The antiparallel duplex and G4 structural elements of NOX-D20 are represented in white and salmon, respectively. Green spheres denote Ca2+. (B) Schematic of the NOX-D20 quadruplex.
The NOX-D20 aptamer complex. (A) Cartoon representation of the structure of the NOX-D20 aptamer bound to C5a (PDB:4WB2; Yatime et al. 2015). The C5a four-helix bundle is highlighted by a translucent molecular surface. The antiparallel duplex and G4 structural elements of NOX-D20 are represented in white and salmon, respectively. Green spheres denote Ca2+. (B) Schematic of the NOX-D20 quadruplex.Prion proteins (PrP) are proteinaceous infectious particles, and the etiological agent for multiple neurodegenerative diseases that include bovine spongiform encephalopathy, scrapie of sheep, and Creutzfeldt-Jakob disease (Prusiner 1998). Several aptamers have been produced by in vitro selection to preferentially bind with high affinity to isoforms of PrP (Weiss et al. 1997; Proske et al. 2002; Rhie et al. 2003; Sekiya et al. 2005; Nishikawa et al. 2007; Murakami et al. 2008), some of which are potential therapeutic agents (Proske et al. 2002; Rhie et al. 2003; Mashima et al. 2013;). Of these in vitro selected aptamers, the NMR solution structure of the sequence r(GGAGGAGGAGGA) (R12) was shown to fold into a two-tiered G4 that forms a “tail-to-tail” homodimer in solution (Fig. 8A; Mashima et al. 2009, 2013). Recently, another NMR solution structure was published of an RNA aptamer that linked two R12 sequences together in tandem, which resembled the R12 structure (Mashima et al. 2020). In the peptide-bound R12 complex structure, the peptide directly interacts with the solvent exposed face of the G-quartet and loop nucleotides. In comparison to the other G4 RNA aptamers discussed above, the R12 aptamer has the simplest structure and mostly conforms to canonical RNA G4 stereochemistry. R12 nucleotides adopt the anti conformation and a parallel connectivity but have predominantly C2′-endo puckers (Fig. 8B). The G-quartet positioned at the dimer interface forms a hexad by addition of two loop adenosines, which extend the dimer interface. Its NMR solution structure shows that the DNA counterpart of R12 forms a heptad at the dimer interface, which was proposed to result from differences in sugar pucker (Matsugami et al. 2001; Mashima et al. 2009).
FIGURE 8.
PrP Aptamer. (A) Cartoon representation of the three-dimensional structure of an anti-PrP aptamer bound to a small peptide from the soluble isoform of the protein (PDB:2RSK; Mashima et al. 2013). The PrP-derived peptide is highlighted by its molecular surface. Dashed red line denotes the dimer interface of the aptamer. Adenosines that extend the interfacial G-quartet into a hexad are colored magenta. (B) Schematic of the PrP aptamer illustrating the canonical RNA G4 stereochemistry.
PrP Aptamer. (A) Cartoon representation of the three-dimensional structure of an anti-PrP aptamer bound to a small peptide from the soluble isoform of the protein (PDB:2RSK; Mashima et al. 2013). The PrP-derived peptide is highlighted by its molecular surface. Dashed red line denotes the dimer interface of the aptamer. Adenosines that extend the interfacial G-quartet into a hexad are colored magenta. (B) Schematic of the PrP aptamer illustrating the canonical RNA G4 stereochemistry.
THE STRUCTURAL SYNTAX OF RNA G-QUADRUPLEXES
The recent structure determinations of elaborate RNA G4s show that, once emancipated from the requirement that G-stacks be assembled from consecutive guanines, the structural possibilities of RNA G4s are unshackled. (A summary of structural parameters observed in the RNA aptamers discussed can be found in Table 2.) The structures described above highlight that by expanding their sequence complexity, RNA G-quartets can overcome the stereochemical constraints imposed by the ribose 2′-OH and have additional conformational freedom. For instance, TERRA possesses low sequence complexity (Fig. 2D) resulting in a G4 structure that conforms to the canonical RNA G4 stereochemistry, whereas the Spinach turn-on aptamer (Fig. 4B) is comprised of multiple bulges and a 34-nt. diagonal loop, which support the intricate architecture of its quadruplex.
TABLE 2.
Summary of structural features of representative simple and elaborate G-quadruplexes
Summary of structural features of representative simple and elaborate G-quadruplexesThe results of genome-wide sequencing suggest that bulges are the most abundant noncanonical structural feature in putative G4 sequences (Chambers et al. 2015). In most of the elaborate RNA G4 structures discussed above, bulges of varying lengths interrupt their G-tracts. Comparison of these bulged nucleotides suggests that they prefer to adopt the anti conformation and the C2′-endo sugar pucker. Thus far, only the Mango-IV (Fig. 6F) and NOX-D20 (Fig. 7B) structures have either syn or C3′-endo bulged nucleotides. Bulges present in DNA and RNA G4 structures have expanded ranges in backbone torsion angles (Meier et al. 2020), which would contribute to overcoming the steric barriers imposed by the ribose 2′-OH. The recent RNA G4 structures suggest that inversion of the strand polarity of adjacent guanines (within a G-stack) requires the presence of intervening bulges or loops. This is apparent in the inverted G-quartets of, Mango-I, Mango-II, and Mango-IV, where bulges precede the local inversion of strand polarity (Fig. 6B,C,F). Spinach (Fig. 4B) also has a bulge and a long loop prior to a guanine with inverted polarity. Interestingly, the RGG-motif–Sc1 complex (Fig. 3B), Corn turn-on aptamer (Fig. 5B), and NOX-D20–C5a complex structures (Fig. 7B) have guanines with inverted polarity immediately before a loop or bulge. In these structures, some of the inverted guanines are directly connected. In all cases, inversion in strand polarity is accompanied by a change in sugar pucker. Nucleotides in the syn conformation also appear to exist preferentially after a bulge or loop.Another noncanonical structural feature exhibited by the some of the RNA G4s are inclusions (pentads, hexads, etc.) of nucleotides in the plane of quartets. Quartets with inclusions are observed forming at dimer interfaces of two stacked protomers (Kolesnikova and Curtis 2019; Lightfoot et al. 2019), which is also demonstrated by the RNA G4s. Inclusions were shown to be indispensable for the formation of the homodimer of d(GGA)4 (Matsugami et al. 2001). Comparison of the RNA G4s with inclusions shows that the incorporated nucleotide in the quartet plane prefers the 2′-endo sugar pucker but is agnostic to adopting the anti or syn conformation. Additionally, inclusion of a nucleotide also appears to affect the sugar pucker of the subsequent guanine in the G4 structure. As shown by Mango-I (Fig. 6B), Mango-II (Fig. 6C) and the anti-PrP aptamer (Fig. 8B), the guanine following the inclusion prefers to be anti and 2′-endo.Noncanonical quartets adjacent to G-quartets have now been observed in several structures (e.g., RGG–Sc1, Fig. 3; Spinach, Fig. 4; Corn, Fig. 5). These appear to be integral to the structures of these aptamers, serving to connect the G-quadruplexes to other structural elements of the RNAs, as do base triples (e.g., in the structures of Spinach and Mango-III). Although much of the structural work on G4s up to the present has concentrated on isolated quadruplexes, it is likely that many quadruplexes will be part of more elaborate structures, and as such, the structural solutions to transitioning from a four-stranded structure to other nucleic acids structures (and in particular the most common structure, the A-form duplex) will reveal intricate and idiosyncratic arrangements of nucleotides, not necessarily limited to guanines. Based on recent structures, we suggest that the apparent stereochemical simplicity of G4s, and specifically of RNA G4s, is probably a misrepresentation arising from the difficulty in preparing biochemically tractable samples. This may also resolve the apparent paradox that G4s appear to be more common among in vitro selected than in natural RNAs. That is, cellular G4s may have evolved to be energetically metastable, and therefore challenging to characterize structurally. Consistent with this is the existence of many helicase proteins (Mendoza et al. 2016) capable of recognizing and destabilizing G4s. The future study of functional nucleic acids, either natural or artificial, that incorporate quadruplexes is likely to increase our understanding of the complex structural syntax of RNA G-quadruplexes, and their interplay with other RNA structural elements.
Authors: Nikita Vasilyev; Anna Polonskaia; Jennifer C Darnell; Robert B Darnell; Dinshaw J Patel; Alexander Serganov Journal: Proc Natl Acad Sci U S A Date: 2015-09-15 Impact factor: 11.205
Authors: Anh Tuân Phan; Vitaly Kuryavyi; Jennifer C Darnell; Alexander Serganov; Ananya Majumdar; Serge Ilin; Tanya Raslin; Anna Polonskaia; Cynthia Chen; David Clain; Robert B Darnell; Dinshaw J Patel Journal: Nat Struct Mol Biol Date: 2011-06-05 Impact factor: 15.369
Authors: Alexandre Rhie; Louise Kirby; Natalie Sayer; Rosanna Wellesley; Petra Disterer; Ian Sylvester; Andrew Gill; James Hope; William James; Abdessamad Tahiri-Alaoui Journal: J Biol Chem Date: 2003-08-05 Impact factor: 5.157
Authors: Alexis Autour; Sunny C Y Jeng; Adam D Cawte; Amir Abdolahzadeh; Angela Galli; Shanker S S Panchapakesan; David Rueda; Michael Ryckelynck; Peter J Unrau Journal: Nat Commun Date: 2018-02-13 Impact factor: 14.919
Authors: Sajad Shiekh; Golam Mustafa; Sineth G Kodikara; Mohammed Enamul Hoque; Eric Yokie; John J Portman; Hamza Balci Journal: Proc Natl Acad Sci U S A Date: 2022-07-18 Impact factor: 12.779