Literature DB >> 23267412

Sizing up long non-coding RNAs: do lncRNAs have secondary and tertiary structure?

Irina V Novikova1, Scott P Hennelly, Karissa Y Sanbonmatsu.   

Abstract

Long noncoding RNAs (lncRNAs) play a key role in many important areas of epigenetics, stem cell biology, cancer, signaling and brain function. This emerging class of RNAs constitutes a large fraction of the transcriptome, with thousands of new lncRNAs reported each year. The molecular mechanisms of these RNAs are not well understood. Currently, very little structural data exist. We review the available lncRNA sequence and secondary structure data. Since almost no tertiary information is available for lncRNAs, we review crystallographic structures for other RNA systems and discuss the possibilities for lncRNAs in the context of existing constraints.

Entities:  

Keywords:  HOTAIR; MALAT; RNA; RNA structure; cancer; epigenetics; hormone receptor; lincRNA; lncRNA; long noncoding RNA; non-coding; secondary structure; structural biology

Mesh:

Substances:

Year:  2012        PMID: 23267412      PMCID: PMC3527312          DOI: 10.4161/bioa.22592

Source DB:  PubMed          Journal:  Bioarchitecture        ISSN: 1949-0992


Introduction

A new class of RNAs has recently emerged as a key player in many rapidly growing areas of research, including epigenetics, hormone signaling, development, stem cell biology, cancer, brain function and plant biology.– The growth of this area has been fueled by recent advances in sequencing technology. These RNAs (long non-coding RNAs, or lncRNAs) are typically 1,000–10,000 residues in length. LncRNAs are often polyadenylated, transcribed by RNA polymerase II and spliced.,,,– While some lncRNAs are found in the cytoplasm, most are localized in the nucleus. Many lncRNAs are associated with histone methylation, chromatin remodeling and subsequent epigenetic effects. In the field of epigenetics, the mechanism by which epigenetic factors find their targets remains largely a mystery. The importance of lncRNAs has been underscored in the context of mammalian genomes, where recent evidence suggests that lncRNAs may provide a missing epigenetic link between DNA, histones and methylation factors. In humans, over 70% of the genome is actively transcribed. In contrast, protein-coding genes constitute only 1–2% of the genome. The active transcription of non-protein coding genes gives rise mainly (80–90%) to lncRNAs. While some lncRNAs, such as MALAT1, are highly abundant transcripts, many lncRNAs do show low count. However, low transcription levels do not necessarily reflect lack of functionality. Studies on the stability of lncRNAs have shown that lncRNAs have stabilities comparable to those of mRNAs (albeit slightly less on average). Here, time scales range from 30 min up to 16 h in the case of MALAT1. Protein half-lives range from 30 min to 2 h. We note that transcription rates range from 1–50 kb/min. From these data, a picture of the nucleus emerges, where lncRNAs are synthesized in minutes and may persist for hours. Long noncoding RNAs (lncRNAs) are very broadly defined by two major characteristics: (1) length of the transcript (> 200 nts) and (2) having little or no potential for translation. Some lncRNAs ('macroRNAs') achieve incredible lengths, extending beyond 90 kB. Examples include the 108 kB Air and the 91 kB kcnq1ot1.– The term lncRNA is traditionally reserved for regulatory RNAs. LncRNAs are often further divided into categories based on their relative position to neighboring protein-coding genes. Natural antisense transcripts (NATs) are transcribed from the antisense strand of protein-coding genes, overlapping at least one exon. Large intervening noncoding RNAs, as known as long intergenic noncoding RNAs (lincRNAs), are positioned far from protein-coding genes. Intronic noncoding RNAs are uniquely transcribed from intron regions of protein-coding genes either in the sense or antisense direction. Bidirectional lncRNAs are transcribed in the antisense direction in the region of the promoter of a protein-coding gene. An exact estimation of the number of lncRNAs is quite challenging due to their cell-specific, tissue-specific, developmental stage specific and disease-specific expression profiles. The most recent estimates place the number of lncRNAs in humans at ~15,000. However, tens of thousands of lncRNAs have been profiled this year alone.– In terms of function, nascent paradigms of lncRNA action include, but are not limited to, critical regulatory roles in embryonic stem cell pluripotency, brain function,– subcellular compartmentalization, and chromatin remodeling.,, Many have been linked to various diseases, such as cancer. We note that lncRNAs play key roles in intracellular and extracellular signaling (SRA, Gas5, LINoCR, BC1, BORG and NRON) and stress response (e.g., SAT III, PRINS, npc536, hsr omega transcript, gadd7, Hsr1 and bace1as). More detailed discussion of functional aspects of lncRNAs can be found in several excellent recent review articles. Due to the large sizes of intact lncRNAs relative to typical biophysical systems, very few structural studies of these RNAs have been performed. By comparison, the high-resolution 3-D structure of the intact ribosome (~5 kB in total) required ~25 y for its solution., For lncRNAs, the following questions remain unanswered: (1) Are lncRNAs highly structured or disordered? (2) Do they contain globular sub-domains, or are they organized linearly in chains of stem-loops? (3) Do lncRNAs exist in ribonucleoprotein complexes or as isolated RNAs that transiently interact with proteins? (4) Do these molecules contain a compact core, or are they more extended? Mechanistic studies of lncRNAs have the potential to be more challenging than ribosomes, because lncRNAs are not as highly conserved nor as highly expressed. Nonetheless, RNA molecules are well known to utilize a wide spectrum of functional elements either at their sequence, secondary or tertiary level. RNA interference and RNA silencing leverage sequence specificity to control gene expression. Riboswitch RNAs regulate gene expression via secondary structure. The ribosome uses its complex tertiary structure to synthesize proteins, in a manner analogous to a protein-based molecular machine. LncRNAs may or may not use aspects of each of these three mechanisms to regulate gene expression. In light of the tremendous variety of lncRNAs, it is possible that all three of these mechanisms are employed by lncRNA systems. A great deal of useful information can be produced using modern structural biology techniques In this review, we provide a summary of current knowledge of sequence and structural features of eukaryotic lncRNAs. Although studies of lncRNA tertiary structure have yet to be performed, we examine known crystallographic structures of other RNAs and explore the possibilities that might occur in lncRNA systems.

Sequence Elements in lncRNAs

Some lncRNAs rely on Watson-Crick base pairing for functional activity. This may be in the form of 'perfect' pairing, where a stretch of the lncRNA forms a continuous sequence of Watson-Crick base pairs with another RNA molecule, such as an mRNA. There are also lncRNAs that implement 'imperfect' pairing, where a stretch of Watson-Crick pairs is interspersed with non-Watson-Crick base pairs. Finally, evidence exists for regions of lncRNAs directly interacting with DNA. In some cases, it has been suggested that base pairing between the RNA and DNA occurs, while in other cases, triple helix mechanisms have been proposed.

miRNA-sequestering lncRNAs

These lncRNAs provide alternative miRNA binding sites to regulate expression levels of protein-coding genes post-transcriptionally. Linc-MD1, involved in muscle differentiation, acts as a competing endogenous RNA (ceRNA), sequestering miR-133 and miR-135 from their target genes.

Half-STAU1-binding site RNAs (1/2-sbsRNAs)

This lncRNA binds to 3′-UTRs via Alu elements in a process known as STAU1-mediated mRNA decay. This event involves imperfect base pairing between the Alu element of the lncRNA and the Alu element of the mRNA. The interaction is recognized by the dsRNA-binding protein STAU1 and results in degradation of the mRNA.

Antisense lncRNAs

These lncRNAs may bind to mRNAs, regulating their splicing. The long noncoding Zeb2NAT transcript originates antisense to the 5′ splice site of mRNA Zeb2. The transcript is known to prevent splicing of this mRNA region, preserving the internal ribosome entry site for efficient translation.

Upstream lncRNAs

These lncRNAs may form triplex complexes with DNA promoter regions. One such lncRNA originates upstream of the dihydrofolate reductase gene (DHFR) in humans. Here, upstream transcription is initiated at the alternative minor promoter site, resulting in the decreased occupancy of transcription factors at the major promoter and subsequent repression of gene expression. Moreover, the noncoding RNA product that originated upstream was found to interact directly with a major DNA promoter site, forming a purine-purine-pyrimidine triplex.

Secondary Elements of lncRNA Structure

In addition to sequence, RNA secondary and tertiary structural motifs often play a central role in the mode of action of RNA, be it specific binding, allosteric, catalytic or structural. While few structural studies of lncRNAs exist, we describe below the hints of structure that have been uncovered to date.

Double stem loops in chromatin remodeling

Many lncRNAs have been shown to play an important role in chromatin remodeling. Large-scale identification of functional lncRNAs has resulted from their association with chromatin proteins., For example, a CLIP-seq investigation of RNA associated with the SFRS1 splicing factor uncovered > 6,000 spliced noncoding RNAs with unknown function. While the full functional repertoire of lncRNAs remains to be delineated, it is clear that lncRNAs play a critical role in chromatin remodeling, often acting in trans via association with chromatin modifying enzymes. Lee and coworkers identified > 9,000 lncRNAs in mouse embryonic stem cells, which interact with the polycomb repressive complex, PRC2. EMSA analysis of a number of PRC2-interacting lncRNAs suggested that binding occurs through EZH2, one of four polycomb proteins domains of PRC2. The remaining three domain proteins of PRC2 are thought to further tighten this interaction. The large number of identified lncRNAs, including RepA/Xist, HOTAIR and Air, suggest the presence of certain common features across the PRC2-interacting family of lncRNAs. In addition, lncRNAs can also associate with the LSD1/CoREST/REST complex, critical in H3K4 demethylation. The lincRNA HOTAIR is an excellent example of this bifunctionality. HOTAIR is ~2.2kB in length and regulates the gene expression of HoxD genes and a number of other genes by recruiting the LSD1 and PRC2 histone modification complexes to targeted loci. Deletion experiments on HOTAIR narrowed the interaction sites down to two modular regions which are responsible for these interactions: (1) a 5′ 300 nt region that binds PRC2, and (2) a 646 nt region located downstream at the 3′-end, responsible for binding to LSD1. This raises the question: while two key motifs are located at the 5′ and 3′-ends of HOTAIR, does the remainder of the sequence have functional importance? The intervening sequence may provide the required distance in terms of spatial organization between two interaction sites. Alternatively, it may contain motifs necessary for targeting, or comprise additional protein binding motifs required for activity, yet to be found. It has been suggested that a double stem-loop RNA motif is present in a PRC2-binding region. Binding motifs for LSD1 have not been identified., Similar to HOTAIR, an A-repeat subregion of Xist, known to bind PRC2, has long been thought to form multiple double stem-loop structures, encoded by the repeat sequences located in this region. Recent detailed chemical probing investigations were not consistent with the proposed arrangements of secondary structure elements and suggested a more complex secondary fold, comprising elongated helical subdomains. Until an additional number of detailed structural/mechanistic investigations for PRC2-interacting lncRNAs are performed, the critical RNA motifs for association with PRC2 will not be clear.

Cloverleaf elements in 3′end processing and in brain evolution

A cloverleaf secondary architecture similar to that found in tRNA has been found in many different regions of long noncoding RNAs. One of its roles has been assigned to 3′-end maturation of the lncRNAs transcripts involved in subcellular organization. These include the MALAT1 and NEAT1 lncRNAs, involved in the formation of nuclear speckles and paraspeckles, respectively. Non-canonical end maturation of the MALAT1 ncRNA involves a cloverleaf secondary element at its 3′-end. This subregion is the most conserved element in both the MALAT1 and NEAT1 sequences, adopting tRNA-like architecture. This structural element is responsible for recruiting RNase P (involved in maturation of tRNA molecules) for cleavage and generation of mature MALAT1 transcripts. The remaining cleaved fragment is further processed by RNase Z/tRNA nucleotidyl transferase to yield a tRNA-like transcript (mascRNA), which is further shuttled to the cytoplasm. Interestingly, the mature 3′-end of MALAT1 sequence comprises only a relatively short and genomically-encoded stretch of poly(A) region, suggested to be stabilized by two conserved U-rich motifs, located upstream. The details of this interaction are currently unknown. The same mechanism of 3′- end processing has been determined for the NEAT1_v2 transcript, generating small and independent tRNA-like molecule as well, named menRNA. Interestingly, cloverleaf architecture has been found in the subregion of another lncRNA, HAR1 (human accelerated region), associated with neocortex development. This region covers 118 nucleotides, which are highly conserved across vertebrates (2 nt change between chicken and chimpanzee), but more divergent in humans (18 mutations relative to chimpanzee)., Rapid changes in sequence homology between chimpanzee and human have been associated with human brain evolution. In vitro structure probing experiments on human (hHAR) and chimpanzee HAR (cHAR) regions showed distinct secondary folds. In chimpanzee, cHAR adopts a rather unstable and extended hairpin architecture; in humans, hHAR folds into a cloverleaf-type element, consisting of a 4-way junction. The authors have mentioned that the chimpanzee sequence could possibly adopt a cloverleaf architecture, but may require a stabilizing factor, the necessity of which is likely to be diminished in human HAR.

Secondary structures in steroid receptor chemistry

We have recently performed extensive chemical and enzymatic investigations of another long noncoding RNA, the steroid receptor RNA activator or, SRA. This study produced the first experimentally determined secondary structure of an intact human lncRNA to our knowledge (Fig. 1). SRA co-activates several steroid receptors (e.g., ER, AR, TR, GR, RAR), and it is known to interact directly with many proteins (e.g., SHARP, SLIRP, DAX-1, TR), suggesting that it may play a scaffolding role in the transcription complex. SRA has also been shown to interact with CTCF. This RNA was one of the first lncRNAs discovered in humans. Our biochemical probing revealed a complex 2D architecture, comprising four major subdomain regions. The identified secondary elements range from small modular helical regions to complex multiway junctions. In total, SRA contains 25 helical segments, 16 terminal loops, 15 internal loops and 5 junction regions. We have also noticed that purine-rich sequences are highly conserved and often located in single-stranded regions such as terminal, internal and junction loops. The same trend in structural preference is generally observed for rRNA. The vast majority of helices in our structure were validated by covariance analysis using multiple sequence alignment across vertebrates.

Figure 1. The first experimentally determined secondary structure of an intact lncRNA, to our knowledge. The steroid receptor RNA activator (SRA) lncRNA contains 4 subdomains, and 25 helices. The structure was determined using four methods of chemical (SHAPE, in-line, DMS) or enzymatic (RNase V1) probing. Covariance analysis based on multiple sequence alignment across vertebrates was used to help validate the structure. In SHAPE probing (selective 2’-hydroxyl acylation analyzed by primer extension), high reactivity corresponds to high mobility and low likelihood for base pairing; low reactivity corresponds to low mobility and high likelihood for base pairing. Orange, high SHAPE reactivity. Yellow, medium reactivity. Grey, low reactivity. Black, no reactivity. Insets: Red, SHAPE reactivity capillary electrophoresis trace for lncRNA. Green, raw blank trace.

Figure 1. The first experimentally determined secondary structure of an intact lncRNA, to our knowledge. The steroid receptor RNA activator (SRA) lncRNA contains 4 subdomains, and 25 helices. The structure was determined using four methods of chemical (SHAPE, in-line, DMS) or enzymatic (RNase V1) probing. Covariance analysis based on multiple sequence alignment across vertebrates was used to help validate the structure. In SHAPE probing (selective 2’-hydroxyl acylation analyzed by primer extension), high reactivity corresponds to high mobility and low likelihood for base pairing; low reactivity corresponds to low mobility and high likelihood for base pairing. Orange, high SHAPE reactivity. Yellow, medium reactivity. Grey, low reactivity. Black, no reactivity. Insets: Red, SHAPE reactivity capillary electrophoresis trace for lncRNA. Green, raw blank trace.

Previously Studied Tertiary Structures of Other RNA Systems

Three-dimensional structures of lncRNAs have not been attempted to date. Here, we review previously solved structures of other RNAs and discuss these in the context of lncRNAs. Prior to 2000, the set of three-dimensional RNA structures roughly consisted of tRNA, various ribozyme and aptamer RNAs, the group I and group II introns, RNA helices and quadruplexes, portions of the bacterial ribosome and components of the spliceosome. The initial high-resolution structures of the ribosome published in 2000 spurred a large number of additional RNA crystallographic studies. These include many different riboswitch RNAs, TLS RNA, RNase P, the signal recognition particle, the HIV-1 frame-shifting element, regions of telomerase RNA, as well as many other ribosome constructs. The ribosome and introns remain the only large RNA (> 200 nts) high-resolution crystallographic structures solved to date. The group I and group II introns are isolated compact RNAs characterized by numerous RNA helices capped with RNA stem-loops (Fig. 2A).,The helices are connected through various junctions. Tertiary contacts between helices, loops and junctions also exist.

Figure 2. Examples of RNA tertiary structures solved by crystallography. (A) Group II intron, solved by Pyle, Toor and coworkers. The intron is a highly-structured isolated RNA with compact core. (B) Telomerase RNA solved by Skordalakes and coworkers. (C) RNase P solved by Mondragon and coworkers. RNase P is a highly structured RNA with a single protein binding domain. (D) Ribosome, Ramakrishnan and coworkers. The ribosome is a highly structured and highly compact RNA complex containing ~50 proteins that help stabilize the RNA structure. The ribosome contains a limited number of factor binding sites. Different factors bind to the same binding sites, regulating protein synthesis.

Figure 2. Examples of RNA tertiary structures solved by crystallography. (A) Group II intron, solved by Pyle, Toor and coworkers. The intron is a highly-structured isolated RNA with compact core. (B) Telomerase RNA solved by Skordalakes and coworkers. (C) RNase P solved by Mondragon and coworkers. RNase P is a highly structured RNA with a single protein binding domain. (D) Ribosome, Ramakrishnan and coworkers. The ribosome is a highly structured and highly compact RNA complex containing ~50 proteins that help stabilize the RNA structure. The ribosome contains a limited number of factor binding sites. Different factors bind to the same binding sites, regulating protein synthesis. An important component of the telomerase complex is telomerase RNA, which is directly bound to telomerase reverse transcriptase and acts as template for nucleotide additions of telomeric regions (Fig. 2B). Human telomerase RNA is ~450 nts. In yeast, it was shown that the est1p binding domain on the RNA can be moved around to other locations within the RNA while maintaining function, suggesting regions of telomerase RNA are highly flexible. RNase P is a ribonuclease responsible for cleaving a precursor sequence from tRNA. The structure is dominated by the RNA (Fig. 2C). A small protein component increases the affinity of tRNA to RNase P. This RNA is highly structured and compact, similar to the group I and group II introns. The ribosome is the universally conserved molecular machine responsible for protein synthesis (Fig. 2D). In bacteria, the ribosome consists of two subunits. The small subunit (30S) contains a ~1.5 kB RNA (16S rRNA) and ~20 different proteins. The large subunit (50S) contains a ~3 kB RNA (23S rRNA), a ~120 nt RNA (5S rRNA) and ~35 different proteins. The two subunits fit together, producing a large cavity between the two, through which the tRNAs enter and exit. Most of the protein factors bind to the ribosome at the GTP-associated center or at one of the three tRNA binding sites. For example, both elongation factors EF-Tu and EF-G bind to the same location on the ribosome at different stages of the elongation cycle. The scaffold of ribosome structure is RNA, while many proteins interspersed throughout the ribosome, providing structural stability to the overall architecture. Functional factors, which help the ribosome proceed through protein synthesis in a GTP-dependent manner, come on and off the ribosome at various stages of protein synthesis.

Macromolecular Complexes and lncRNA Quaternary Interactions

So far, the main evidence for lncRNA participation in quaternary complex formation has been obtained for NEAT1 transcripts. Here, two NEAT1 isoforms (NEAT_V1: 3.7kB and NEAT_V2: 22.7 kB for human) are expressed from the same promoter with similar expression levels. Both isoforms are involved in the formation of specific nuclear compartments called paraspeckles. These are ribonucleoprotein complexes characterized by three distinct proteins (e.g., p54, PSF and PSP1), which all contain RNA-binding motifs., The proposed model of paraspeckle association relies on the initial NEAT_V2 association with PSF and p54, followed by subsequent recruitment of PSP1 and NEAT1_V1. Depletion of either NEAT_V2, p54 or PSF1 results in paraspeckle disintegration; however, depletion of PSP1 did not affect the architecture. Based on immuno-hybridization and in situ hybridization electron microscopy studies, paraspeckle association requires association of multiple NEAT1 transcripts, creating a fiber-like network. Interestingly, using DNA probes specific for subregions of NEAT1, the 5′ and 3′ ends of NEAT1 were localized to the periphery of paraspeckles, while the central regions of longer NEAT1 (NEAT1_v2) transcripts were localized to the inner part of these bodies. The NEAT1 arrangements in the inner region appear to be distributed uniformly. The expression of a shorter isoform of NEAT1 (NEAT1_v1), lacking 3′end, cannot rescue the paraspeckle formation. The following observations suggest the presence of certain functional elements in RNA molecule. Paraspeckles are also formed with a relatively fixed diameter, but vary in their length. Mouse NEAT1 transcripts (shorter by 2 kB compared with human) have a 9% diameter reduction relative to human, leading the authors to conclude that the length of NEAT1 transcript might be also the limiting factor in this arrangement. However, it remains unclear how the final NEAT1 arrangement in paraspeckles is accomplished. There are number of possibilities here. This can be the direct result of RNA-RNA interactions between NEAT1 transcripts to create a complex macromolecular platform. Alternatively, proteins can serve as bridges to connect multiple NEAT1_V2 transcripts, especially, knowing the fact that p54 is able to form heterodimers with PSF and PSP1 proteins.,

Perspectives on lncRNA Structure and Mechanism

lncRNAs are not likely to exist in ribosome-like ribonucleoprotein complexes

As the ribosome is the only RNA structure > 1 kB solved to date, we ask the question, are lncRNA systems similar to ribosomes in their structural composition? We note that our recent structural study of the SRA lncRNA revealed RNA secondary structure similar to the ribosome in its overall architecture. This study, which used multiple forms of extensive chemical probing combined with multiple sequence analysis, demonstrated that SRA has 4 sub-domains with numbers of helices and loops comparable (in proportional terms) to a ribosome subunit. We currently have no information on the tertiary structure of this lncRNA or other lncRNAs. In addition, we do not know if lncRNAs exist in ribonucleoprotein complexes (RNPs) or as isolated RNAs. To compare with the ribosome, we note that it consists of a few long RNAs complexed with many unique (i.e., non-identical) proteins. The total number of protein-coding genes in the human genome is estimated to be ~21,000. As most proteins reside in the cytoplasm, we can reasonably estimate that the number of proteins in the nucleus, Nprotein, nucl < 21,000. Many thousands of lncRNAs have been identified, with most residing in the nucleus. As a very conservative estimate, we use NlncRNA, nucl > 3,000, giving NlncRNA,nucl/Nprotein,nucl > 1/7. We note that many lncRNAs are as large as or larger than the ribosome. In the case of the ribosome itself, the ratio of RNA molecules to protein molecules is NrRNA/Nrp ~1/25 for each subunit. Thus, even if every single unique protein encoded in the human genome formed a complex with a lncRNA, we would still not expect lncRNAs to be similar in structural composition to ribosomes. There are not enough unique proteins to form ribosome-like complexes (with ~25 unique proteins) for each lncRNA. Using an optimistic estimate that 1 in 10 of all proteins binds to a lncRNA, this would still provide less than 1 unique protein per lncRNA. Therefore, we conclude that lncRNAs in the nucleus are not likely to exist in ribosome-like RNP complexes. A few lncRNAs could theoretically exist in ribosome-like complexes. These complexes would be more likely to exist in the cytoplasm. The following possibilities remain: (1) lncRNAs exist in RNP complexes with many repeats of a few proteins, (2) lncRNAs exist in complexes with only a few proteins or (3) lncRNAs exist as isolated RNAs that transiently bind proteins as needed for function. We note that in the case of (1), to produce a similar protein density (number of proteins per RNA) to ribosomes, we would require ~10 proteins bound per 1 kB of lncRNA (e.g., 90 identical proteins bound to MALAT-1). While (1) is certainly possible, we favor (2) or (3).

The large diversity of lncRNAs may produce complexes similar in overall form to telomerase RNA, RNase P or the group I and II introns

While we suspect lncRNA complexes are not similar to ribosomes, we cannot rule out similarity to RNase P, telomerase RNA or the group I and group II introns. In the case of an ‘RNase P-like’ complex, the lncRNA would be highly structured and compact, containing a main protein binding site, where various proteins bind (Fig. 3A). Alternatively, the lncRNA could be decentralized without a compact core. It may contain several distinct protein-binding sites and act as flexible structural tether, as suggested for the telomerase RNA (Fig. 3B). The lncRNA could also be a stand alone, highly structured RNA, similar to the group I and group II introns. In this case, the lncRNA may transiently bind proteins as needed. Finally, another possibility is a highly disordered RNA, containing loosely organized protein binding domains (Fig. 3C). Our experimentally determined secondary structure of SRA is highly organized and more suggestive of a structure with characteristics of Figure 3A. We enumerate various combinations of secondary structure and tertiary structure in Table 1, with column 1 corresponding to Figure 3A, column 6 corresponding to Figure 3B, and columns 6–7 possibly corresponding to Figure 3C.

Figure 3. Possibilities for lncRNA three-dimensional architecture. These homology models represent concepts for possible lncRNA 3D structures. (A) lncRNA (pink) contains a compact tertiary core. The lncRNA may have a main protein (green) binding site, responsible for binding various protein factors. (B) De-centralized scaffold. In this scenario, the lncRNA does not have a compact core. The lncRNA may have several protein (yellow) binding sites. (C) Loosely organized protein binding domain with regions of unstructured RNA. The lncRNA may contain several long stretches of disordered single stranded RNA.

Table 1. Possibilities for structural configurations of lncRNAs

 12345678
Core secondary structure?
+
+
+
+
+
-
-
-
Binding domain secondary structure?
+
+
+
-
-
+
+
-
 
 
 
 
 
 
 
 
 
Core tertiary structure?
+
+
-
+
-
-
-
-
Binding domain tertiary structure?+----+--

Columns 1–8 represent different RNA structural configurations. Column 1 represents a highly structured configuration similar to the ribosome. Column 8 represents an unstructured RNA. Columns 2–7 represent various intermediate cases. Columns 6–8 represent a decentralized structural configuration. Our recent study demonstrates that the entire SRA lncRNA has well-organized secondary structure, corresponding to columns 1–3, depending on the degree of tertiary structure. All cases may also include single-stranded regions that organized upon protein binding.

Figure 3. Possibilities for lncRNA three-dimensional architecture. These homology models represent concepts for possible lncRNA 3D structures. (A) lncRNA (pink) contains a compact tertiary core. The lncRNA may have a main protein (green) binding site, responsible for binding various protein factors. (B) De-centralized scaffold. In this scenario, the lncRNA does not have a compact core. The lncRNA may have several protein (yellow) binding sites. (C) Loosely organized protein binding domain with regions of unstructured RNA. The lncRNA may contain several long stretches of disordered single stranded RNA. Columns 1–8 represent different RNA structural configurations. Column 1 represents a highly structured configuration similar to the ribosome. Column 8 represents an unstructured RNA. Columns 2–7 represent various intermediate cases. Columns 6–8 represent a decentralized structural configuration. Our recent study demonstrates that the entire SRA lncRNA has well-organized secondary structure, corresponding to columns 1–3, depending on the degree of tertiary structure. All cases may also include single-stranded regions that organized upon protein binding.

Possibilities for structure-based mechanisms of lncRNAs

Although many more proteins have been studied in mechanistic detail relative to RNAs, a diverse portfolio of RNA mechanisms has emerged, based on either sequence, secondary or tertiary organization of RNA molecules, as well as combinations of these mechanisms. In sequence-based mechanisms, such as RNA interference by siRNAs and RNA silencing by miRNAs, the RNA plays a very minor structural role. Here, the role of RNA is mainly to add sequence specificity to the process, allowing the RISC complex to find its target and trigger a largely protein-based regulation mechanism., Over the past decade, a new mechanism of regulation has emerged, which is almost entirely based on RNA secondary structure.– In riboswitch RNA systems, two secondary structures compete with each other to control termination of transcription (some riboswitch RNAs also control translation by sequestering the start codon). Here, one sequence in the 5′-UTR of the mRNA codes for two different secondary structures. The presence or absence of a metabolite selects one of the two structures, switching gene expression on or off. For example, in the case of the SAM-I riboswitch, the presence of a metabolite (SAM) causes the RNA to fold into a compact aptamer, favoring formation of the transcriptional terminator helix, turning gene expression of SAM synthetase off. In the absence of the metabolite, a second, alternative helix is formed, preventing formation of the transcriptional terminator helix, turning gene expression on. Riboswitches are more ‘secondary structure specific’ than ‘sequence specific’, often mandating stochastic context-free grammar algorithms for searches, as opposed more conventional BLAST-like searches. Interestingly, artificial riboswitch-like systems were first designed and produced in the lab and only later discovered in bacteria. RNA mechanisms based on tertiary structure are often allosteric and may be described by ‘induced-fit’ or ‘conformational selection’.,, In induced-fit, an event, such as protein- or ligand-binding, triggers a large conformational change. In conformational selection, the system is often frustrated between two conformations. A protein- or ligand-binding event shifts the equilibrium to one of the two conformations. In the case of the ribosome, many conformational fluctuations occur simultaneously and at different time scales. Protein binding or GTP hydrolysis events act to synchronize the fluctuations, shifting the equilibrium to the next basin in the energy landscape, allowing the ribosome to progress through the elongation cycle.–

Time scales and order of events

In addition to the three-dimensional structure of lncRNAs, the order of events and kinetics of these systems is essential for mechanistic understanding. For example, crystallographic structures have been solved for many ribosome complexes; however, the mechanism of ribosome translocation is still not understood. Rapid kinetics studies, define the overall order of events. Single-molecule studies help elucidate the mechanism for transitions between states. A fusion of structural and kinetic information is required to unlock mechanism. Interestingly, the overall order of events can often be obtained before high-resolution crystallographic structures are available. To illustrate potential time scales involved in lncRNA mechanism, we consider the lncRNA DBE-T, a key component of the epigenetic switch associated with Facioscapulohumeral muscular dystrophy (FSHD). This lncRNA is a cis-acting tether that recruits the epigenetic factors D4Z4 and Ash1L to the D4Z4 binding element (DBE) on chromatin, driving histone methylation and 4q35 gene transcription. One possible order of events may be (Fig. 4): (A) transcription, (B) lncRNA folding, (C) epigenetic protein binding to the lncRNA, (D) epigenetic protein binding to the chromatin, and (F) action of the epigenetic protein (e.g., histone methylation). Each of these steps will have its own time scale. Identifying the rate-limiting step will yield significant insight into the mechanism of lncRNA action. Other scenarios are also possible, involving different orders of events and different combinations of steps. For other classes of lncRNAs, entirely different events may occur.

Figure 4. An example of potentially relevant time scales for lncRNA activity: DBE-T lncRNA. DBE-T is a cis-acting tether than recruits epigenetic factors to chromatin. Steps af each have an associated timescale, t. (A) Initial state. (B) Transcription of DBE-T. (C) DBE-T folding. (D) The epigenetic factor binds to the lncRNA. (E) The epigenetic factor binds to chromatin. (F) The epigenetic factor marks the chromatin (e.g., histone methylation).

Figure 4. An example of potentially relevant time scales for lncRNA activity: DBE-T lncRNA. DBE-T is a cis-acting tether than recruits epigenetic factors to chromatin. Steps af each have an associated timescale, t. (A) Initial state. (B) Transcription of DBE-T. (C) DBE-T folding. (D) The epigenetic factor binds to the lncRNA. (E) The epigenetic factor binds to chromatin. (F) The epigenetic factor marks the chromatin (e.g., histone methylation).

Conclusions

The structural biology of lncRNAs presents a brave new RNA world, where many fundamental questions have not been addressed. With the thousands of new lncRNAs recently discovered in disparate areas of biology, it is likely that a zoo of distinct structural architectures and structural mechanisms will be revealed. These may be sequence-based, secondary structure based, tertiary structure based, or some unusual combination of these. The diverse range of lncRNA structures will be accompanied by a corresponding array of kinetic mechanisms. As RNA molecules are notoriously difficult to crystallize, it may be useful to first apply alternative strategies to gain three-dimensional information about lncRNAs. Ultimately, the identification of common structural features and structure/function relationships will help us understand the role of lncRNAs in development and disease. As with many established therapeutic strategies, mechanistic understanding will help lay the foundation for development of lncRNA-based therapy.
  90 in total

1.  Chromosomal silencing and localization are mediated by different domains of Xist RNA.

Authors:  Anton Wutz; Theodore P Rasmussen; Rudolf Jaenisch
Journal:  Nat Genet       Date:  2002-01-07       Impact factor: 38.330

2.  Connecting energy landscapes with experimental rates for aminoacyl-tRNA accommodation in the ribosome.

Authors:  Paul C Whitford; José N Onuchic; Karissa Y Sanbonmatsu
Journal:  J Am Chem Soc       Date:  2010-09-29       Impact factor: 15.419

3.  The noncoding RNA Mistral activates Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to chromatin.

Authors:  Stéphane Bertani; Silvia Sauer; Eugene Bolotin; Frank Sauer
Journal:  Mol Cell       Date:  2011-09-16       Impact factor: 17.970

4.  Free state conformational sampling of the SAM-I riboswitch aptamer domain.

Authors:  Colby D Stoddard; Rebecca K Montange; Scott P Hennelly; Robert P Rambo; Karissa Y Sanbonmatsu; Robert T Batey
Journal:  Structure       Date:  2010-07-14       Impact factor: 5.006

5.  A natural antisense transcript regulates Zeb2/Sip1 gene expression during Snail1-induced epithelial-mesenchymal transition.

Authors:  Manuel Beltran; Isabel Puig; Cristina Peña; José Miguel García; Ana Belén Alvarez; Raúl Peña; Félix Bonilla; Antonio García de Herreros
Journal:  Genes Dev       Date:  2008-03-15       Impact factor: 11.361

6.  Highly ordered spatial organization of the structural long noncoding NEAT1 RNAs within paraspeckle nuclear bodies.

Authors:  Sylvie Souquere; Guillaume Beauclair; Francis Harper; Archa Fox; Gérard Pierron
Journal:  Mol Biol Cell       Date:  2010-09-29       Impact factor: 4.138

7.  Yeast telomerase RNA: a flexible scaffold for protein subunits.

Authors:  David C Zappulla; Thomas R Cech
Journal:  Proc Natl Acad Sci U S A       Date:  2004-06-28       Impact factor: 11.205

8.  Distinctive structures between chimpanzee and human in a brain noncoding RNA.

Authors:  Artemy Beniaminov; Eric Westhof; Alain Krol
Journal:  RNA       Date:  2008-05-29       Impact factor: 4.942

9.  Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.

Authors:  Jeremy R Sanford; Xin Wang; Matthew Mort; Natalia Vanduyn; David N Cooper; Sean D Mooney; Howard J Edenberg; Yunlong Liu
Journal:  Genome Res       Date:  2008-12-30       Impact factor: 9.043

10.  Forces shaping the fastest evolving regions in the human genome.

Authors:  Katherine S Pollard; Sofie R Salama; Bryan King; Andrew D Kern; Tim Dreszer; Sol Katzman; Adam Siepel; Jakob S Pedersen; Gill Bejerano; Robert Baertsch; Kate R Rosenbloom; Jim Kent; David Haussler
Journal:  PLoS Genet       Date:  2006-08-23       Impact factor: 5.917

View more
  58 in total

Review 1.  Minireview: Long noncoding RNAs: new "links" between gene expression and cellular outcomes in endocrinology.

Authors:  Miao Sun; W Lee Kraus
Journal:  Mol Endocrinol       Date:  2013-07-24

2.  A novel approach to represent and compare RNA secondary structures.

Authors:  Eugenio Mattei; Gabriele Ausiello; Fabrizio Ferrè; Manuela Helmer-Citterich
Journal:  Nucleic Acids Res       Date:  2014-04-21       Impact factor: 16.971

3.  Expression of long noncoding RNA-HOX transcript antisense intergenic RNA in oral squamous cell carcinoma and effect on cell growth.

Authors:  Jie Wu; Hongjun Xie
Journal:  Tumour Biol       Date:  2015-06-03

Review 4.  From discovery to function: the expanding roles of long noncoding RNAs in physiology and disease.

Authors:  Miao Sun; W Lee Kraus
Journal:  Endocr Rev       Date:  2014-11-26       Impact factor: 19.871

Review 5.  Long noncoding RNAs in diseases of aging.

Authors:  Jiyoung Kim; Kyoung Mi Kim; Ji Heon Noh; Je-Hyun Yoon; Kotb Abdelmohsen; Myriam Gorospe
Journal:  Biochim Biophys Acta       Date:  2015-07-02

6.  Identification of novel proteins binding the AU-rich element of α-prothymosin mRNA through the selection of open reading frames (RIDome).

Authors:  Laura Patrucco; Clelia Peano; Andrea Chiesa; Filomena Guida; Imma Luisi; Ilenia Boria; Flavio Mignone; Gianluca De Bellis; Silvia Zucchelli; Stefano Gustincich; Claudio Santoro; Daniele Sblattero; Diego Cotella
Journal:  RNA Biol       Date:  2015       Impact factor: 4.652

Review 7.  The lncRNA-MYC regulatory network in cancer.

Authors:  Kaiyuan Deng; Xiaoqiang Guo; Hao Wang; Jiazeng Xia
Journal:  Tumour Biol       Date:  2014-08-20

Review 8.  Non-coding RNAs: the riddle of the transcriptome and their perspectives in cancer.

Authors:  Marios A Diamantopoulos; Panagiotis Tsiakanikas; Andreas Scorilas
Journal:  Ann Transl Med       Date:  2018-06

Review 9.  Environmental Health and Long Non-coding RNAs.

Authors:  Oskar Karlsson; Andrea A Baccarelli
Journal:  Curr Environ Health Rep       Date:  2016-09

10.  Optimizing RNA structures by sequence extensions using RNAcop.

Authors:  Nikolai Hecker; Mikkel Christensen-Dalsgaard; Stefan E Seemann; Jakob H Havgaard; Peter F Stadler; Ivo L Hofacker; Henrik Nielsen; Jan Gorodkin
Journal:  Nucleic Acids Res       Date:  2015-08-17       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.