Literature DB >> 35765654

Secondary structures in RNA synthesis, splicing and translation.

Ilias Georgakopoulos-Soares1,2, Guillermo E Parada3,4, Martin Hemberg5.   

Abstract

Even though the functional role of mRNA molecules is primarily decided by the nucleotide sequence, several properties are determined by secondary structure conformations. Examples of secondary structures include long range interactions, hairpins, R-loops and G-quadruplexes and they are formed through interactions of non-adjacent nucleotides. Here, we discuss advances in our understanding of how secondary structures can impact RNA synthesis, splicing, translation and mRNA half-life. During RNA synthesis, secondary structures determine RNA polymerase II (RNAPII) speed, thereby influencing splicing. Splicing is also determined by RNA binding proteins and their binding rates are modulated by secondary structures. For the initiation of translation, secondary structures can control the choice of translation start site. Here, we highlight the mechanisms by which secondary structures modulate these processes, discuss advances in technologies to detect and study them systematically, and consider the roles of RNA secondary structures in disease.
© 2022 The Authors.

Entities:  

Year:  2022        PMID: 35765654      PMCID: PMC9198270          DOI: 10.1016/j.csbj.2022.05.041

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   6.155


Introduction

mRNAs are essential molecules in the cell as they are key to extracting information stored in the DNA. Although the function of mRNA molecules is primarily determined by the nucleotide sequence, some properties are determined by secondary structures. Secondary structures are defined as distinct features, including hairpins, long range interactions, G-quadruplexes, R-loops and pseudoknots and they are formed as a consequence of the interactions of non-adjacent nucleotides. Their presence can impact various processes involving the mRNA, including synthesis, splicing and translation. Secondary structures are dynamic and can be modulated by multiple proteins, in particular RNA binding proteins (RBPs), and as they cannot be predicted solely from the primary sequence they are challenging to study. Nevertheless, several assays are available for both in vitro and in vivo profiling, and in this Review, we summarize these methods, provide an overview of some of the elucidated and putative functional roles of mRNA secondary structures, and finally we discuss their impact on disease. We discuss the consequences of secondary structure formation for splicing and translation, with particular focus in G-quadruplexes, hairpins and long range interactions. We also discuss the contribution of secondary structures in the regulation of mRNA splicing and in translation initiation and discuss the mechanisms involved.

RNA secondary structure formation

In RNA, intra and intermolecular long-range interactions, including hairpins, pseudoknots, and G-quadruplexes, are commonly observed. Hairpins are composed of a hybridized stem and a single stranded loop (Fig. 1a and b) and can contain mismatches and bulges. Pseudoknots contain nested stem-loop structures, with half of one stem intercalated between the two halves of another stem. G-quadruplex formation is driven by the inherent propensity of guanines to self-assemble, in the presence of monovalent cations, into planar structures known as G-quartets [1]. Each G-quartet is composed of four guanine nucleotides that interact with each other through Hoogsteen hydrogen-bonds. Consecutive runs of guanines (G-tracts) may lead to the formation of consecutive G-quartets that can stack with each other to form G-quadruplex structures (Fig. 1c). Biophysical properties such as the length of intervening loops between consecutive G-runs influence their formation dynamics. In addition, G-quadruplexes can be intramolecular or intermolecular. During transcription, dynamic hybrid structures between DNA and nascent RNA transcripts can be formed, such as R-loops (Fig. 1d) [2]. R-loops are three stranded hybrid structures in which an RNA molecule invades and hybridizes with one DNA strand, while displacing the other. The size of R-loops can range from <100 base pairs to >2000 base pairs [3]. Formation and stabilization of R-loops is particularly favorable when the non-template strand is G-rich, but it can also be promoted by DNA supercoiling, the presence of DNA nicks, and the formation of G-quartets [3], [4]. The impact of R-loop formation, as well as the formation of DNA and RNA G-quadruplexes and other secondary structures, impacts transcript elongation rates and can have a kinetic repercussion on co-transcriptional events involved in RNA processing, such as alternative splicing [5], [6].
Fig. 1

RNA and DNA-RNA hybrid secondary structures. A. Hairpin formation in which the stem hybridizes with hydrogen bonds while the loop remains single stranded. B. A long range interaction with an imperfect hairpin containing a bulge C. A G-quartet is formed by four guanines linked with Hoogsteen hydrogen bonds with each other (shown as squares in brown). Hoogsteen base pairing is a type of non-Watson–Crick base pairing. G-quadruplexes are formed by the stacking of multiple G-quartets. D. R-loops are three stranded DNA:RNA hybrid structures that can be formed co-transcriptionally at the template strand. The nascent RNA produced by the RNAPII (shown in green) hybridizes with the template strand to form an R-loop structure, while the non-template strand remains single-stranded. Phosphorylation events in the Carboxy-Terminal Domain (CTD) of RNA polymerase II are shown in yellow. In schematics A, B and D thicker lining of the mRNA indicates exonic regions whereas thinner lining indicates intronic regions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

RNA and DNA-RNA hybrid secondary structures. A. Hairpin formation in which the stem hybridizes with hydrogen bonds while the loop remains single stranded. B. A long range interaction with an imperfect hairpin containing a bulge C. A G-quartet is formed by four guanines linked with Hoogsteen hydrogen bonds with each other (shown as squares in brown). Hoogsteen base pairing is a type of non-Watson–Crick base pairing. G-quadruplexes are formed by the stacking of multiple G-quartets. D. R-loops are three stranded DNA:RNA hybrid structures that can be formed co-transcriptionally at the template strand. The nascent RNA produced by the RNAPII (shown in green) hybridizes with the template strand to form an R-loop structure, while the non-template strand remains single-stranded. Phosphorylation events in the Carboxy-Terminal Domain (CTD) of RNA polymerase II are shown in yellow. In schematics A, B and D thicker lining of the mRNA indicates exonic regions whereas thinner lining indicates intronic regions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) A number of methods that probe RNA structures have been developed. Methods such as selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE)-seq [7] and parallel analysis of RNA structure (PARS) [8] were able to identify RNA structures in vitro, while more recent methods can deduce structures in vivo [9], [10]. For instance, RNA in situ conformation sequencing (RIC-seq) [11] is a powerful new method that enables global detection of intra- and intermolecular RNA–RNA interactions, such as duplexes and long-range loop-loop interactions. Cross-linking immunoprecipitation high-throughput sequencing (CLIP-seq) enables the investigation of protein interactions with RNA molecules [12] from which many variant technologies have emerged. RNA G-quadruplexes can be characterized transcriptome-wide [13], [14] using rG4-seq, which is a modified sequencing method that stalls at RNA G-quadruplexes, enabling identification of RNA G-quadruplexes in vitro, and RNA G-quadruplexes have also been visualized in cellulo using a specific antibody [15]. Moreover, researchers have developed small molecules, such as carboxy-pyridostatin, a cyanine dye called CyT and Thioflavin T [15], [16], [17], [18], [19], that can shift the equilibrium between the folded and unfolded state of RNA G-quadruplexes and which display preference for RNA over DNA G-quadruplexes. Identification of R-loops has been enabled by usage of specific antibodies [20], [21], [22], [23] and other nuclease-based methods [24], [25].

RNA polymerase speed and secondary structures

A variety of features are associated with RNAPII speed. For instance, the presence of introns and the length of the first intron are both positively correlated with RNAPII speed [26], while nucleosome formation can reduce RNAPII speed [27], [28]. Regions with high propensity of forming DNA, RNA, or hybrid secondary structures are also associated with RNAPII pausing or slower RNAPII speed (Fig. 2a and b) [29], [30], [31]. Another example of structure remodeling due to slower RNAPII speed is inhibition of hairpin formation due to competition with other alternative structures resulting in reduced binding by stem–loop-binding proteins [30]. In S. cerevisiae and S. pombe, folding energy and GC content in the transcription bubble have been correlated with RNA polymerase distribution, and RNA structures within nascent transcripts promote forward translocation of the polymerase and limit back-tracking [32]. This indicates how nascent RNA structures can promote the forward movement of an RNA polymerase molecule. Analyses of nascent RNAs have provided evidence that the formation of secondary structures within introns is associated with more efficient co-transcriptional splicing, which is favored under slower transcriptional rates [32], [33]. Taken together, secondary structures will impact several processes, including promoter-proximal pausing, exon recognition, splicing and transcription termination, as they are all influenced by RNAPII speed.
Fig. 2

Mechanisms by which structure formation influences splicing. A. In the absence of secondary structures, RNAPII elongation rate is higher, which disfavors the recruitment of splicing factors that promote assembly of the spliceosome and exon definition. In this situation exons flanked by weak splice sites may not be recognised, and they are consequently skipped. Exons flanked by strong splice sites can be efficiently recognized by small ribonucleoproteins (snRNPs) U1 and U2, leading to the formation of the pre-spliceosome (complex A) and promoting exon definition and inclusion in the mature mRNA transcripts. B. Formation of secondary structures at DNA and RNA can decrease RNAPII elongation speed. For example, during transcription R-loops formed at the 3′ of genes can be stabilized by non-template DNA G-quadruplex formation. Low transcription rates promote exon inclusion by allowing the formation of secondary structures and binding of proteins that can favor the recognition of weak splice sites that would not be recognized otherwise. An RBP that recognizes and binds to the secondary structure is shown in green whereas an RBP whose binding is inhibited by secondary structure formation is shown in red. C. RNA secondary structures can modulate mRNA interactions with RBPs either promoting or inhibiting their binding at the mRNA molecule. For example, G-quadruplexes formed at the DNA or RNA level can selectively recruit RBPs to influence splicing outcome. In schematics A, B and C, thicker lining of the mRNA indicates exonic regions whereas thinner lining is indicating intronic regions. The dashed line of mRNA molecules indicates that the length of the transcript can be longer than displayed. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Mechanisms by which structure formation influences splicing. A. In the absence of secondary structures, RNAPII elongation rate is higher, which disfavors the recruitment of splicing factors that promote assembly of the spliceosome and exon definition. In this situation exons flanked by weak splice sites may not be recognised, and they are consequently skipped. Exons flanked by strong splice sites can be efficiently recognized by small ribonucleoproteins (snRNPs) U1 and U2, leading to the formation of the pre-spliceosome (complex A) and promoting exon definition and inclusion in the mature mRNA transcripts. B. Formation of secondary structures at DNA and RNA can decrease RNAPII elongation speed. For example, during transcription R-loops formed at the 3′ of genes can be stabilized by non-template DNA G-quadruplex formation. Low transcription rates promote exon inclusion by allowing the formation of secondary structures and binding of proteins that can favor the recognition of weak splice sites that would not be recognized otherwise. An RBP that recognizes and binds to the secondary structure is shown in green whereas an RBP whose binding is inhibited by secondary structure formation is shown in red. C. RNA secondary structures can modulate mRNA interactions with RBPs either promoting or inhibiting their binding at the mRNA molecule. For example, G-quadruplexes formed at the DNA or RNA level can selectively recruit RBPs to influence splicing outcome. In schematics A, B and C, thicker lining of the mRNA indicates exonic regions whereas thinner lining is indicating intronic regions. The dashed line of mRNA molecules indicates that the length of the transcript can be longer than displayed. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

RNA splicing and secondary structures

Pre-mRNA splicing is a key biological process that enables the removal of introns and the joining of intervening exons, eventually resulting in a mature mRNA molecule. Alternative splicing affects approximately 90–95% of mRNA transcripts in humans [34], [35] and most often occurs co-transcriptionally [33], while for a minority of transcripts it occurs post-transcriptionally [36]. Splicing is a highly conserved mechanism [37] that is pivotal for a number of biological processes such as cell growth, differentiation, immune response, neuronal development [38], [39], [40], while aberrant splicing is implicated in multiple diseases [41] including neurological disorders [42] and cancer [43]. Splicing is mediated through the spliceosome complex which recognizes splice signals, the key members being the 5′ splice site (5′ss), the 3′ splice site (3′ss), and the branch point. The recognition of these consensus sequences is commanded by U1 and U2 small nuclear ribonucleoproteins (snRNPs) and other auxiliary protein factors that are involved in early spliceosomal assembly. Since higher-eukaryotic genes are often interrupted by long introns, early spliceosomal complex assembly over exons recognizes both splice sites during a process commonly known as exon definition [37]. Nevertheless, computational analyses of vertebrate splice sites have shown that the consensus splicing signals only account for approximately half of the information required to accurately define exon/intron boundaries [34], suggesting that other regulatory elements such as RBP sites and secondary structures are crucial for splice site definition. Splice sites with sequences that are substantially different from the consensus signals lead to suboptimal recognition of splice sites (weak splice sites), and are often associated with alternative splicing events. Recent models using deep learning can predict to a large extent splicing events using the primary DNA sequence and can integrate the effects of mutations [44], [45]. Even though the RNA structural code has been less explored [46], it is known that the effects of cis-regulatory elements can be modulated by the presence of RNA structures in nascent transcripts and in mature mRNAs [47]. Co-transcriptional transient RNA structure formation can impact splicing through RNAPII pausing and backtracking, which can have a direct kinetic effect over co-transcriptional splicing events [48]. One such example is the human ATE1 gene, where splicing of two mutually exclusive exons is regulated by competing long-range hairpin structures that span up to 30 kB [49]. Mutations that disrupted each of the secondary structures shift the equilibrium between the two exons indicating direct control of splicing outcome. Reduction of transcription rates can favor further formation of RNA secondary structures [30] and binding of splicing regulatory factors that can increase splicing efficiency therefore allowing the recognition of exons that are flanked by weak splice sites, which would otherwise be skipped [5], [50] (Fig. 2a and b).

The interplay between RBPs and secondary structures

During the mRNA lifecycle, RBPs regulate to a significant extent diverse transcriptional and post-transcriptional stages including splicing, transportation, translation, stability and degradation. They bind to pre-mRNA molecules in the nucleus and regulate its maturation and transportation to the cytoplasm where they regulate translation and degradation. The number of proteins that can bind to RNA in humans is estimated to be more than 1,500, adding complexity to all the aforementioned programs [51]. RBPs can facilitate or inhibit the recognition of splice sites thereby acting as splicing enhancers or splicing silencers [46], [52], [53]. The majority of RBP motifs are not bound in vivo as demonstrated by high-throughput experiments that identify the sites where RBPs bind to endogenous RNAs such as cross-linking immunoprecipitation followed by high-throughput sequencing (CLIP-seq). One possible explanation is that RNA structures provide additional contextual features beyond the primary motif sequences (Fig. 2b and c), and it has also been shown that RNA secondary structure is predictive of binding [54], [55]. Several studies have shown that during pre-mRNA synthesis the formation of RNA structures influences alternative splicing by diverse mechanisms [56], [57], and that local RNA structure formation can impact splicing by modulating the accessibility of core splicing signals [58], [59], [60] as well as RBP binding sites [58], [61], [62]. An example of how RNA secondary structures can dictate the binding of specific RBPs, is provided by MBNL1 and U2AF65 binding to influence exon inclusion in the fifth exon of TNNT2 [63], [64]. MBNL1 favors hairpins and when bound inhibits U2AF65, which favors a linear structure, from binding the polypyrimidine tract resulting in exon skipping. Additional evidence from mice shows that MBNL1 also binds the hairpin structure of exon F in TNNT3. Another example is elF3, which recognizes and binds to hairpin structures at 5′UTR to exert translational activation or repression [65]. Other studies have shown preferential binding of RBPs at RNA G-quadruplex sites, e.g. CNBP, which prevents RNA G-quadruplex structure formation and promotes translation [66] and FMRP, which preferentially binds RNA G-quadruplex structures [66], [67]. Secondary structures and RNA binding proteins have been systematically investigated, enabling the identification of preferences of structured RNA for particular proteins [68], [69]. Interestingly, a recent genetic study showed that G-quadruplex sequences at 5′UTRs are selectively constrained and are enriched for eQTLs, loci containing genetic variants that result in changes of the expression level of a gene, and RBP sites [70].

Helicases as key regulators of secondary structures

Structure formation is to a large extent modulated by enzymes such as eIF4A and DHX29, that can unwind them, and their importance is demonstrated by their pivotal role in translation initiation [71], [72]. Similarly, the continuous activity of DNA/RNA helicases and ribonucleases H (RNAse H1 and H2) release R-loop structures [3]. Interestingly, R-loops and G-quadruplexes were both found to be unwound by the helicase DHX9 in humans [73]. DHX9 activity protects single-stranded DNA against damage and preserves genomic stability [74]. RNA G-quadruplexes are known to interact with several proteins [70], [75], [76]. For example, the RNA helicase RHAU (also known as DHX36) resolves mRNA G-quadruplexes [77], [78]. One of its targets is a G-quadruplex at the 5′UTR of Nkx2-5 mRNA, and it has been shown that DHX36-mediated G-quadruplex structure unfolding is required for the gene to be expressed [79]. Another DHX36 target is Gnai2 mRNA, a key regulator of stem cell function and muscle regeneration [78]. DHX36 and DHX9 were also found to modulate translational efficiency by resolving 5′UTR RNA G-quadruplexes [80], while several RBPs such as hnRNP H/F and helicases such as DDX21, DDX17 DDX3X, DDX5 and DDX1 have been found to unwind RNA G-quadruplexes and are also involved in transcription, splicing and translation regulation [81], [82], [83], [84]. Similarly, multiple helicases have been shown to resolve hairpin structures. For instance, UPF1 can resolve RNA hairpins [85], while DDX5 can resolve DNA and RNA G-quadruplexes as well as hairpin structures [86], [87] (Table 1).
Table 1

Important helicases that play a role unwinding RNA and DNA secondary structures. G4s in the table refer to G-quadruplexes. This a non-exhaustive list of relevant DNA/RNA helicases. Additional examples are reviewed by [92], [93], [94]. Alternative gene names are listed between parenthesis and gene paralogs with homologous functions are separated by “/”.

Gene nameTargetMolecular functionAssociated phenotype upon loss of function experiments
PIF1DNA G4Prevent genome instability associated with DNA G4s and R-loops. [95], [96].Absence or deficiency of PIF1 increases replication stress and induces DNA damage [95], [96].
ERCC2DNA G4XPD is involved in nucleotide excision repair [97]. Evidence suggests that its helicase activity unwinds G4 during transcription [98].Knock down of XPD results in accumulation of G4s [99].
BLMDNA G4D-loopsHolliday junctionsUnwinds a variety of structures DNA that emerge during DNA replication, recombination and repair [100].Loss of functions mutations leads to Bloom syndrome [101]. Absence of BLM is associated with genome instability and excess of sister chromatid exchange events at G4 loci [102].
WRNDNA G4R-loopsPrevents genome instability associated with DNA G4s and R-loops [103], [104].WRN loss of function leads to accumulation of G4s and expression changes associated with G4-containing promoters [105].
DHX9 (DDX9)RNA G4R-loopsH-DNAInvolved in DNA replication, transcription and translation [106].Resolves R-loop and H-DNA structures to promote genomic stability [107], [108], [109].Unwinds RNA G4s to control translation [80].Absence of DHX9 promotes back-splicing events and induce translational repression of transcripts containing inverted-repeats Alu elements [110].
DHX36DNA/RNA G4Activates transcription by resolving DNA G4s at promoters [111], [112].Unwinds RNA G4s to control translation [80], [113] and miRNA biogenesis [114].Formation of stress granules and increases protein kinase R (PKR) phosphorylation [113].Reduced telomerase efficiency and shorter telomeres [115].Higher UV sensitivity due to lack of p53 expression [116].
DDX5/DDX17DNA/RNA G4RNA HairpinsParalogues that encode for helicases that resolve RNA hairpins and G4s, having a regulatory role in alternative splicing and translation [84], [86], [117].DDX5 also resolves DNA G4s that control gene transcription [87].Knock out leads to mouse embryonic lethality [118].DDX5/DDX17 absence impairs splicing and miRNA biogenesis during neuronal differentiation [119].
DDX21RNA G4 R-loopsInvolved in ribosomal RNA biogenesis and anti-viral immune response [120], [121], [122].DDX21 knock down results in increased expression of genes with G4 motifs in their 3′UTR [83].
DDX1RNA G4Converts RNA G4 into R-loops [81].DDX1 deficiency impairs class switch recombination in B cells [81]
DDX2A/DDX2B (EIF4A1/EIF4A2)RNA hairpins RNA G4Paralogues that encode for the two subunits of the eukaryotic translation initiation factor 4A (eIF4A). These helicases resolve RNA hairpins and G4s located at the 5′-UTR, which has an impact on mRNA translation efficiency.DDX2A plays an essential role in spermatogenesis, whereas DDX2B is essential for mouse viability [123].
DDX41R-loopsResolves R-loops that emerge during transcription [124].R-loop accumulation and genomic instability due to knock down of DDX41 [124].
DDX39B (UAP56)R-loopsSpliceosomal helicase with roles in the removal of R-loops [125].R-loop accoumlaton, genomic instability and replication fork stalling [125].
SETXR-loopsSenataxin removes R-loops to maintain genome integrity [126].Knock down of Senataxin results in an increase in R-loops downstream of the poly(A) signal [127].
AQR (EMB4)R-loopsIntron-binding spliceosomal factor with helicase activity that contributes to R-loop removal [128], [129].Genome instability and deficiency in co-transcriptional gene silencing pathways mediated by small RNAs [129], [130].
Important helicases that play a role unwinding RNA and DNA secondary structures. G4s in the table refer to G-quadruplexes. This a non-exhaustive list of relevant DNA/RNA helicases. Additional examples are reviewed by [92], [93], [94]. Alternative gene names are listed between parenthesis and gene paralogs with homologous functions are separated by “/”. The cellular mechanisms mediating the stabilization and resolution of RNA secondary structures remain incompletely understood, as are the interactions between secondary structures and protein complexes. In addition, the effect of perturbing these mechanisms and their relevance to disease progression is unclear. High throughput screens coupled with short hairpin RNAs (shRNAs) or CRISPR-based technologies have enabled systematic interrogation of the roles of diverse proteins, such as RBPs, helicases, and topoisomerases [88], [89], [90], [91]. Furthermore, mutational analysis with CRISPR-Cas9 could be used to study the effects of secondary structure disruption in vivo or in cellulo. CRISPR-induced mutations that destroy the secondary structure motifs, for example the G-runs of G-quadruplexes or the stem sequence of hairpins, but leave other regulatory sequences such as RBP motifs unchanged, could advance the understanding of how secondary structures determine gene expression.

G-quadruplexes as regulators of alternative splicing

G-quadruplex sequences are enriched at promoters and they have been extensively studied in this context [131]. Additionally, G-quadruplexes have been related to splicing, 3′ processing, transcription termination, RNA localization and translation regulation [76]. Interestingly, it has been shown that G-quadruplex sequences have a high enrichment in the proximity of both 3′ and 5′ splice sites across a wide range of species. The effect is more pronounced at the non-template strand, suggesting that the G-quadruplexes are formed primarily by the RNA and that they may favor or block the binding of RBPs [132]. One of the first exemplary cases of RNA G-quadruplex mediated regulation of alternative splicing was found in the hTERT gene, which encodes for the catalytic subunit of the telomerase enzyme, and one of its exon skipping events is promoted by the stabilization of intronic G-quadruplexes [133]. Gomez and colleagues hypothesized that RNA G-quadruplex formation can prevent RBP binding to intronic enhancers, leading to exon skipping. However, based on different functional assays, RNA G-quadruplex formation has also been proposed to promote RBP binding to splicing regulatory elements [134], [135], [136]. Since G-quadruplex-dependent splicing events were often demonstrated by introducing mutations at G-quadruplex motifs, it was unclear from these results whether the G-quadruplex structure or the linear form of these G-rich sequences act as a splicing enhancer. To disentangle these effects, Huang and colleagues showed that mutations that prevent intronic G-quadruplex formation but keep G tracts intact, led to exon exclusion of an alternative exon in the CD44 gene [137]. Since the CD44 intronic G-quadruplex motif sequence can be bound by two RBPs that have the opposite effect on exon exclusion, RNA G-quadruplex formation may function as a switch to promote the binding of one RBP over the other [138]. In another recent study where the role of wild-type and mutated G-quadruplex sequences in alternative splicing was tested using a minigene, it was also shown that the presence of an RNA G-quadruplex favors exon inclusion [132], consistent with the aforementioned findings. There is also evidence of an interplay between RNA G-quadruplex stabilization and specific binding proteins such as HNRNP H/F [116], [137] and HNRPU [139] and recent studies suggest that RNA G-quadruplex formation can modulate in vitro RBP binding to mRNA molecules [66]. The genome-wide effect of RNA G-quadruplex formation over splicing factor binding remains unclear. High-throughput screening of chemical compounds via dual-color splicing reporters has identified two small molecules, emetine and cephaeline, that disrupt RNA G-quadruplex formation [140]. Genome-wide evaluation of emetine effects on alternative splicing showed substantial alternative splicing changes after treatment, with nearly 60% being exon skipping events. It was also shown that multiple RBPs colocalize with G-quadruplex motifs flanking splice junctions, suggesting an interplay between RBP binding and RNA G-quadruplex structure formation, which was further corroborated by loss of function experiments followed by RNA-seq, identifying consistent associations for 36 RBPs [132], [137].

Hairpins enable long range RNA interactions during splicing

Long range interactions are important for splicing modulation [141], and they are more enriched at weak alternative acceptor splice sites [142]. Some long range interactions can span several kilobases and can bring in proximity otherwise distant splice sites. One of the best-characterized examples of regulation of splicing through RNA structures can be found in D. melanogaster for the DSCAM gene, where RNA-RNA interactions, mediated through multiple structures, regulate the selection of exons within arrays of mutually exclusive exons [143], [144]. In this case, RNA looping can bring splicing elements situated thousands of bases away from each other into close proximity. Hairpins may also directly affect exon skipping events by a mechanism known as “looping-out”, whereby inter-intronic base-pairing RNA interactions can loop out exons to promote their skipping [56]. This mechanism is supported by the enrichment of conserved complementary sequences present in intronic regions flanking exon skipping events [145]. Moreover, the artificial introduction of self-complementary regions across exons suppresses exon inclusion in yeast, suggesting a causal relationship between hairpins and exon skipping [146]. Interestingly, the expansion of self-complementary regions is related to the primate-specific Alu retrotransposon, which is enriched in regions flanking alternative exons, suggesting a role in splicing regulation [147]. During back-splicing, an unconventional splicing mechanism, the second nucleophilic attack is performed over an upstream 3′ splice leading to circular RNA (circRNAs) products. circRNAs are particularly abundant in the brain and RNA structures that favor back-splicing are often derived from complementary intronic sequences associated with Alu elements [148]. In zebrafish, hairpin formation between dinucleotide repeats that co-occur at opposite boundaries of an intron, mediate splicing without U2AF2, which is a major component of the spliceosome [149]. The formation of RNA structures can also enhance RBP regulatory range by bringing distal regulatory elements in close proximity with their exon targets [150]. This can be particularly important for RBFOX2 regulated exons since more than half of RBFOX2-binding sites are found over 500 bp away from any annotated exons, and it has been shown that long-range RNA hairpin formation is necessary for the regulatory effect of distal binding sites [151]. It has also been shown that hairpin formation can influence splicing regulatory protein binding, with enhancers and silencers having a stronger effect when present in the loop relative to the stem [52], [54], suggesting that RBP binding is inhibited at the stem [58], [61]. In an elegant set of experiments, it was shown that in the case of FGFR2, the formation of a hairpin structure is required for efficient splicing from two mutually exclusive exons and its splicing effect is not dependent on its primary nucleotide composition as shown using minigene assays [152]. The fibronectin EDA exon is controlled by seven hairpins and a key exonic splicing enhancer is found in the loop of one of the hairpins, which is in turn bound by splicing regulatory proteins such as SRSF1 [153], [154]. Other examples include a hairpin which modulates the inclusion of the alternative exon 6B of the β-tropomyosin transcript in chicken [155]. It was also shown that a mutation in PS2 that deletes or destabilizes a hairpin in exon 5, results in higher levels of exon inclusion [156]. Importantly, the formation of hairpin structures could be dynamic and due to environmental changes, an example being temperature-dependent formation of a hairpin that controls splicing of APE2 gene in yeast [157]. In addition, alternatively spliced exons display an enrichment for secondary structures and evolutionary conservation of many of these structures indicates their important regulatory functions [57]. This is exemplified by conservation of secondary structures over the primary nucleotide sequence such as a conserved hairpin structure in RB1CC1 [57]. Advances in long-read RNA sequencing technologies will enable improved detection of long-range interactions and their impact in the regulation of alternative splicing events.

The role of RNA structures on RNA stability and decay

The half-life and decay rates of mRNA transcripts in human cells influence protein expression levels. A number of features determine transcript stability including GC content, transcript length, polyA tail length, RBP sites, microRNA binding sites, and mRNA secondary structures [158], [159], [160], [161], [162], [163]. Structural features of mRNAs dictate to a large extent mRNA half-life with transcripts that have a structured coding sequence showing higher expression levels [159]. Hairpins in mRNA transcripts can result in increased stability [163], [164], [165], such as when found at the 3′UTR near mRNA cleavage sites. The accessibility of microRNA sites influences mRNA half-life and secondary structure formation can change the microRNA binding efficiency [166]. For example, the introduction of a hairpin in the 5′UTR of a transcript, results in substantial increases in gene expression [167], [168]. Constitutive decay elements are RNA motifs that mediate the destabilization and degradation of mRNA molecules, and contain a hairpin sequence [169] at which Roquin proteins bind to induce the decay of the transcript [170]. Massively parallel reporter assays are high-throughput technologies that enable rapid measurements of thousands of sequences for their regulatory activity and have received widespread adoption in recent years [171], [172], [173], [174]. Multiple variants of this technology have been implemented to study a plethora of gene regulatory elements, including promoters, enhancers, 5′ UTRs, and 3′ UTRs, by placing synthetic sequences in the appropriate location relative to a reporter gene. In this case massively parallel reporter assay experiments have shown that its destabilizing effects increase as a function of the hairpin length [165].

Secondary structures in translation

Translation can be divided into four phases, initiation, elongation, termination and ribosome recycling [175], [176]. Initiation is the rate limiting and most regulated step, consisting of several complex programs. The regulation of translation directly impacts protein levels with most regulatory mechanisms affecting the rate-limiting initiation step [177], [178], [179]. The multifarious effects of translational control can be observed across biological processes including development, differentiation, functions of the nervous system and disease [177], [180]. Initiation can be either cap-dependent or cap-independent [181], [182]. Cap-dependent translation is the most frequently used in eukaryotes and starts with the binding of eIF4E to the mRNA cap. The most common cap-independent initiation mechanism, often utilized by viral RNAs, involves an internal ribosome entry site (IRES) of structured mRNA. IRES structures can recruit ribosomal subunits and eukaryotic initiation factors [183]. RNA molecules fold in complex configurations with the presence of RNA secondary structures in the 5′UTR being a major determinant of the rate of translation (Fig. 3a and b) [184], [185], [186]. Moreover, the ribosome itself is a major remodeler of RNA structure [187]. Lower translation rates can not only limit protein abundance, but can also enable correct co-translational protein folding [188], [189]. In addition, secondary structures can influence the recognition of the IRESs (Fig. 3c).
Fig. 3

Mechanisms by which RNA structure formation influences translation. A. During cap-dependent translation, translation initiation factors (blue proteins) recognize the mRNA 5′ cap structure (purple circle) and bridge its interaction with the 3′ polyA tail, through polyA binding proteins (PABPs). During translation several helicases actively unwind the mRNA, which could remove secondary structures. This could lead to faster ribosome speeds, which may result in protein misfolding. B. Cap-dependent translation can be regulated by the dynamic formation of secondary structures in the 5′ UTR. Hairpin formation can limit the binding of the ribosome and translation initiation factors, thereby repressing protein translation. The presence of G-quadruplexes in the 5′ UTR may inhibit translation directly, activate upstream ORFs, or promote translation. C. Cap-independent translation can take place in the presence of IRESs, which require highly structured 5′UTR domains that indirectly interact with PBAPs to promote mRNA circularisation. Some IRES structures can be activated by RNA G-quadruplex formation. Further formation of RNA secondary structures across the ORF can limit the translation speed and favor a step-by-step modular folding. Additional details on Cap-dependent and Cap-independent mechanisms are comprehensively reviewed at [234]. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Mechanisms by which RNA structure formation influences translation. A. During cap-dependent translation, translation initiation factors (blue proteins) recognize the mRNA 5′ cap structure (purple circle) and bridge its interaction with the 3′ polyA tail, through polyA binding proteins (PABPs). During translation several helicases actively unwind the mRNA, which could remove secondary structures. This could lead to faster ribosome speeds, which may result in protein misfolding. B. Cap-dependent translation can be regulated by the dynamic formation of secondary structures in the 5′ UTR. Hairpin formation can limit the binding of the ribosome and translation initiation factors, thereby repressing protein translation. The presence of G-quadruplexes in the 5′ UTR may inhibit translation directly, activate upstream ORFs, or promote translation. C. Cap-independent translation can take place in the presence of IRESs, which require highly structured 5′UTR domains that indirectly interact with PBAPs to promote mRNA circularisation. Some IRES structures can be activated by RNA G-quadruplex formation. Further formation of RNA secondary structures across the ORF can limit the translation speed and favor a step-by-step modular folding. Additional details on Cap-dependent and Cap-independent mechanisms are comprehensively reviewed at [234]. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Although the vast majority of eukaryotic translation start sites have an AUG codon, often the first AUG codon is bypassed, resulting in usage of more distal AUG codons and alternative protein isoforms. This process is referred to as leaky scanning, with a proportion of ribosomes initiating translation from downstream start codons. Leaky scanning and translational efficiency are influenced by the presence of secondary structures [8], [190], [191], [192]. Moreover, there is a large proportion of suboptimal start sites that do not contain the canonical start codon. Microsatellite expansions can cause non-AUG initiation [193]. These non-AUG start sites are often associated with alternative translation start [194], [195]. Ribosome profiling is one of the primary methods of identifying the occupancy of elongating ribosomes on mRNAs, therefore providing a direct readout of ribosome decoding rates [176]. Secondary structures can conceal or expose binding sites for translation regulators, and it has been shown that certain RBPs bind preferentially at structured RNA while others have a preference for linear forms [196]. Moreover, formation of secondary structures can change the distance between translation-associated motifs, an example being the distance between the stem-loop and the cap [197]. Secondary structure formation can also promote cap-independent translation, and the disruption of an IRES hairpin can in turn reduce translation efficiency in viral [198], [199] and eukaryotic [200] mRNAs. Riboswitches are components of mRNA molecules that can bind a small molecule and directly control gene expression through RNA conformational changes, without proteins being involved. They are found in both prokaryotes and eukaryotes, with most discovered riboswitches being present in bacteria and archaea [201]. The aptamer is a receptor for a small molecule, and it is usually located in the 5′UTR of a mRNA where it forms a secondary structure that binds to the small molecule. The expression platform is the regulatory domain of the riboswitch and it modulates gene expression upon binding of the small molecule. Riboswitches have been found to regulate a number of processes including initiation of translation [202], mRNA decay [203], transcription termination [204] and splicing [205], [206]. For instance, in E. coli the lysine riboswitch when lysine is present it restricts translation initiation and also exposes RNase E cleavage sites [203]. RNA structures can directly interact with the translational machinery and influence the recognition of the translation start [207]. Note that the interaction is complicated by the fact that the translational machinery can unwind and remodel RNA structures [187]. There is also decreased translational efficiency at highly structured 5′UTRs [80], [208]. For example, in the case of BRCA1, a tumor suppressor gene, a longer 5′UTR isoform is expressed only in breast cancer cells, resulting in a 10-fold decrease of translational efficiency due to the formation of a stable complex secondary structure [208]. Finally, the interplay between RNA structure formation and unwinding influences ribosome initiation, scanning and elongation. Therefore, secondary structures can account for differences between mRNA and protein levels [209].

Hairpins enable long range RNA interactions in translation initiation

Early studies indicated that hairpin formation can influence translation efficiency [210]. Hairpins with high thermal stability upstream of the translation start site resulted in reduced translation by up to 85–95%, whereas hairpin formation downstream of an AUG at specific positions resulted in an increase in translation rate by facilitating recognition of initiator codons by ribosomes [211], [212]. Stem length and GC content, both of which increase thermal stability, inhibit translation, while more distant hairpins have a smaller inhibitory effect [213]. Other studies have also indicated that both the GC content of the stem and the positioning of the hairpin relative to the translation start site dramatically influence the translation efficiency [207]. Hairpins at the 5′UTR of ferritin-H and ferritin-L mRNAs act as an iron-responsive element controlling iron levels and are highly dynamic response elements to environmental changes [214]. Another example is a hairpin structure in the c-JUN 5′ UTR which is recognized by eIF3 and is required for initiation of translation [215]. Another study generated a library of half a million 50 bp long 5′UTRs and identified hairpin structures to negatively impact protein levels, especially those with longer stems and shorter loops [216].

G-quadruplexes in translation initiation

RNA G-quadruplexes are enriched at 5′UTRs (Huppert et al. 2005) where they show a higher frequency at the template strand, suggesting a relative depletion of G-quadruplexes at the RNA level [217]. There is also a difference in the density of G-quadruplexes, with the highest density being found within 50 bp of the start of the 5′UTR and a declining frequency moving away from it [217]. It has been shown that G-quadruplexes in the 5′UTR of mRNAs are inhibitory elements [218], and several studies have since shown that G-quadruplexes at the 5′UTR interfere with the recognition by ribosomes [17], [219], [220], [221], [222], [223]. Specifically, experiments involving luciferase plasmid vectors indicate that G-quadruplexes inhibit expression across 5′UTR regions, perhaps by interfering with ribosome scanning. However, in many of these experiments the researchers used controls where guanines had been substituted for uracils, potentially also interfering with RBP binding sites and the GC content [218], [219]. It has also been shown that G-quadruplexes at 5′UTRs of eukaryotic genes can promote translation by favoring recognition of the IRES [224], [225], [226], [227]. In FGF-2, a gene that is associated with tissue development and repair, a G-quadruplex motif together with two hairpin sequences are found within the IRES, and they promote translation in a cap-independent translational program [225]. A G-quadruplex site in the RBP FMRP is a binding site for the protein itself, and it has been suggested that it could in this way control both its own expression levels [228] and its mRNA splicing [134]. In VEGF, an RNA G-quadruplex was shown to be essential for IRES-mediated translation initiation [227], [229], [230]; however other studies have contended its role and provided evidence for inhibitory functions [231], [232]. A study that used massively parallel reporter assays to investigate mRNA translation found that G-quadruplexes in the 5′UTR act as translational inhibitors, and that knockdown of G-quadruplex resolving helicases aggravated these phenotypes [233]. It was also found that RNA G-quadruplex formation could promote the usage of an upstream translation start site by slowing down the pre-initiation complex scanning [80]. The role of secondary structures was systematically explored in a high-throughput experiment where half a million 50 bp randomly generated 5′UTRs were synthesized and tested in yeast. The results showed that several secondary structures, including RNA G-quadruplexes and hairpins, are important contributors to expression levels [216]. RNA G-quadruplexes can either restrict or promote the recognition by ribosomes and even though there are more studies indicating inhibitory functions, it is not clear which effect is more widespread and what features determine if the G-quadruplex will restrict or promote ribosomal recognition.

Splicing and translation associated secondary structures in disease

Regions that are predisposed to secondary structure formation, such as G-quadruplexes have an excess of germline and somatic mutations [235], [236]. The functional role of these structures is supported by the observation that eQTLs are enriched at G-quadruplexes within 5′UTRs and splicing quantitative trait loci (sQTLs) are enriched at G-quadruplex motifs flanking splice sites [70], [132]. The accumulation of R-loops is also associated with genomic instability [237], [238], [239], [240] As secondary structure formation modulates diverse processes including splicing and translation initiation, changes in the mRNA structure have been associated with and can result in human disease. Mutations of alternative splicing factors can lead to R-loop accumulation, which may compromise genomic stability and be relevant in the context of cancer pathogenesis [241], [242]. RNA splicing perturbation by expression of U2AF1 or SRSF2 mutants, mutations that are commonly observed in myelodysplastic syndrome, results in the accumulation of R-loops [243]. In the MAPT gene, also known as tau, in the interface between exon 10 and intron 10, there is a hairpin structure which can mask the splice site [244], [245] and DDX5 was found to be involved in the resolution of this hairpin structure controlling splicing of MAPT (tau) exon 10 [86]. Mutations at the hairpin result in its destabilization, causing inclusion of exon 10 due to increased association with U1 snRNP [244] and results in higher prevalence of neurodegeneration. Hairpin sequences were also identified in the 5′UTR of other transcripts including the amyloid precursor protein [246] and α-synuclein [247], indicating the importance of structure-mediated control of expression levels. In spinal muscular atrophy, a stem-loop RNA structure overlaps with the 5′ splicing site of exon 7 of SMN2 and interference with the structure formation is a therapeutic target against the spinal muscular atrophy molecular phenotype [248]. Sulovari et al. showed that variable number tandem repeats were particularly enriched at Alu elements and found an association between genes differentially spliced or expressed between human and chimpanzee brains [249]. RNA G-quadruplex structures have been identified in several cancer genes, including TP53 and TERT, where they can modulate splicing and protein isoforms [133], [135]. In CD44 an RNA G-quadruplex in intron 8 functions as a splicing enhancer with roles in the control of the epithelial–mesenchymal transition [137], a process that is important for cancer metastasis [250]. One of the canonical translation initiation factors, elF4A, is a DEAD-box RNA helicase that can unwind secondary structures, including RNA G-quadruplexes, and its activity is correlated with the number of secondary structures in the 5′UTR [251]. Perturbation of elF4A can contribute to oncogenesis as it results in formation of RNA G-quadruplexes in the 5′UTRs of mRNAs targeted by elF4A, including many oncogenes, transcription factors, and epigenetic regulators [252]. The expansion of microsatellite repeats at 5′UTRs has been associated with aberrant translation and has been implicated in multiple disorders [193], [253]. The mechanisms involve the formation of secondary structures that interfere with translation and repeat-associated non-AUG translation. One of the most well-studied examples is the expansion of the hexanucleotide GGGGGC in the first intron of the C9orf72 gene which results in frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS). These repeats form different secondary structures including G-quadruplexes, R-loops and hairpins [254], [255], [256] which leads to aborted transcription at the repeat site [254]. Expansion of these repeats results in repeat-associated non-AUG translation and the generation of toxic dipeptide proteins [257], while reducing DHX36 levels in cells derived from C9orf72-linked ALS patients results in reduced dipeptide protein burden due to the formation of RNA G-quadruplexes [258]. In ALS and FTD, Nucleolin binds to the G-quadruplex forming hexanucleotide repeat, resulting in its mislocalization in the cell [254]. In addition, a number of other proteins associated with the ALS pathology such as TDP-43, FUS/TLS, hnRNPA1, hnRNPA2B1, hnRNPA3 and EWSR1 interact with the RNA G-quadruplex [259], [260], [261], [262], [263], [264]. Encouragingly, G-quadruplex binding small molecules ameliorate the pathologies associated with ALS and FTD in model systems, indicating that RNA G-quadruplexes can pose as a therapeutic target [265]. Beta-amyloid precursor protein cleaving enzyme 1 (BACE1) encodes a protein that cleaves amyloid precursor protein resulting in the generation of amyloid-beta peptide, the accumulation of which is a hallmark of Alzheimer’s disease [266]. An RNA G-quadruplex in exon 3 of BACE1 modulates splicing by inhibiting the binding of hnRNP H, thereby promoting a shorter isoform without the proteolytic activity that creates the neurotoxic peptide [267]. ADAM-10 is also associated with Alzheimer’s disease due to its anti-amyloidogenic activity and a RNA G-quadruplex in its 5′UTR represses its expression [268].

Concluding remarks

RNA secondary structures are pervasive, interact with RNA binding proteins and are linked to a large number of important functions, including transcription, splicing and translation. Even though the functional importance of secondary structures has been repeatedly demonstrated, the contribution of RNA structures in these processes remains incompletely understood due to the difficulties in identifying dynamic RNA structures and their mechanisms of action. High-throughput technologies enable the systematic investigation of RNA secondary structures and the design of experiments to quantify their contribution in transcription, splicing and translation enables directly testing their mechanisms of action. New methods to dynamically identify RNA secondary structures are gradually revealing their widespread and diverse contributions in gene regulation. However, it remains difficult to capture their dynamic changes across cellular conditions and their interplay with proteins. The degree to which RNA secondary structure formation is influenced by the tissue and cell type remains largely unstudied. The availability of large scale single cell assays will enable the investigation of associations between secondary structures, the presence of various sequence motifs, and expression levels of RBPs across different cell types. Even more interesting could be the combination of single cell technologies with different small molecules that stabilize specific structures.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  267 in total

1.  Deciphering the rules by which 5'-UTR sequences affect protein expression in yeast.

Authors:  Shlomi Dvir; Lars Velten; Eilon Sharon; Danny Zeevi; Lucas B Carey; Adina Weinberger; Eran Segal
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-05       Impact factor: 11.205

2.  The disease-associated r(GGGGCC)n repeat from the C9orf72 gene forms tract length-dependent uni- and multimolecular RNA G-quadruplex structures.

Authors:  Kaalak Reddy; Bita Zamiri; Sabrina Y R Stanley; Robert B Macgregor; Christopher E Pearson
Journal:  J Biol Chem       Date:  2013-02-19       Impact factor: 5.157

3.  R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters.

Authors:  Paul A Ginno; Paul L Lott; Holly C Christensen; Ian Korf; Frédéric Chédin
Journal:  Mol Cell       Date:  2012-03-01       Impact factor: 17.970

4.  Analyses of mRNA structure dynamics identify embryonic gene regulatory programs.

Authors:  Jean-Denis Beaudoin; Eva Maria Novoa; Charles E Vejnar; Valeria Yartseva; Carter M Takacs; Manolis Kellis; Antonio J Giraldez
Journal:  Nat Struct Mol Biol       Date:  2018-07-30       Impact factor: 15.369

Review 5.  mRNAs, proteins and the emerging principles of gene expression control.

Authors:  Christopher Buccitelli; Matthias Selbach
Journal:  Nat Rev Genet       Date:  2020-07-24       Impact factor: 53.242

6.  Constitutive translation of human α-synuclein is mediated by the 5'-untranslated region.

Authors:  Pelagia Koukouraki; Epaminondas Doxakis
Journal:  Open Biol       Date:  2016-04-20       Impact factor: 6.411

7.  PIF1 family DNA helicases suppress R-loop mediated genome instability at tRNA genes.

Authors:  Phong Lan Thao Tran; Thomas J Pohl; Chi-Fu Chen; Angela Chan; Sebastian Pott; Virginia A Zakian
Journal:  Nat Commun       Date:  2017-04-21       Impact factor: 14.919

8.  UAP56/DDX39B is a major cotranscriptional RNA-DNA helicase that unwinds harmful R loops genome-wide.

Authors:  Carmen Pérez-Calero; Aleix Bayona-Feliu; Xiaoyu Xue; Sonia I Barroso; Sergio Muñoz; Víctor M González-Basallote; Patrick Sung; Andrés Aguilera
Journal:  Genes Dev       Date:  2020-05-21       Impact factor: 11.361

9.  Structural disruption of exonic stem-loops immediately upstream of the intron regulates mammalian splicing.

Authors:  Kaushik Saha; Whitney England; Mike Minh Fernandez; Tapan Biswas; Robert C Spitale; Gourisankar Ghosh
Journal:  Nucleic Acids Res       Date:  2020-06-19       Impact factor: 16.971

10.  Conserved long-range base pairings are associated with pre-mRNA processing of human genes.

Authors:  Svetlana Kalmykova; Marina Kalinina; Stepan Denisov; Alexey Mironov; Dmitry Skvortsov; Roderic Guigó; Dmitri Pervouchine
Journal:  Nat Commun       Date:  2021-04-16       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.