Michael J Yonkunas1, Nathan J Baird1. 1. Department of Chemistry & Biochemistry, University of the Sciences, Philadelphia, Pennsylvania 19104, USA.
Abstract
The 3' end of the ∼7 kb lncRNA MALAT1 contains an evolutionarily and structurally conserved element for nuclear expression (ENE) which confers protection from cellular degradation pathways. Formation of an ENE triple helix is required to support transcript accumulation, leading to persistent oncogenic activity of MALAT1 in multiple cancer types. Though the specific mechanism of triplex-mediated protection remains unknown, the MALAT1 ENE triplex has been identified as a promising target for therapeutic intervention. Interestingly, a maturation step of the nascent lncRNA 3' end is required prior to triplex formation. We hypothesize that disruption of the maturation or folding process may be a viable mechanism of inhibition. To assess putative cotranscriptional ENE conformations prior to triplex formation, we perform microsecond MD simulations of a partially folded ENE conformation and the ENE triplex. We identify a highly ordered ENE structure prior to triplex formation. Extensive formation of U•U base pairs within the large U-rich internal loops produces a global rod-like architecture. We present a three-dimensional structure of the isolated ENE motif, the global features of which are consistent with small angle X-ray scattering (SAXS) experiments. Our structural model represents a nonprotective conformation of the MALAT1 ENE, providing a molecular description useful for future mechanistic and inhibition studies. We anticipate that targeting stretches of U•U pairs within the ENE motif will prove advantageous for the design of therapeutics targeting this oncogenic lncRNA.
The 3' end of the ∼7 kb lncRNA MALAT1 contains an evolutionarily and structurally conserved element for nuclear expression (ENE) which confers protection from cellular degradation pathways. Formation of an ENE triple helix is required to support transcript accumulation, leading to persistent oncogenic activity of MALAT1 in multiple cancer types. Though the specific mechanism of triplex-mediated protection remains unknown, the MALAT1ENE triplex has been identified as a promising target for therapeutic intervention. Interestingly, a maturation step of the nascent lncRNA 3' end is required prior to triplex formation. We hypothesize that disruption of the maturation or folding process may be a viable mechanism of inhibition. To assess putative cotranscriptional ENE conformations prior to triplex formation, we perform microsecond MD simulations of a partially folded ENE conformation and the ENE triplex. We identify a highly ordered ENE structure prior to triplex formation. Extensive formation of U•U base pairs within the large U-rich internal loops produces a global rod-like architecture. We present a three-dimensional structure of the isolated ENE motif, the global features of which are consistent with small angle X-ray scattering (SAXS) experiments. Our structural model represents a nonprotective conformation of the MALAT1ENE, providing a molecular description useful for future mechanistic and inhibition studies. We anticipate that targeting stretches of U•U pairs within the ENE motif will prove advantageous for the design of therapeutics targeting this oncogenic lncRNA.
RNAs arising from noncoding regions comprise roughly two-thirds or more of the human genome (Woo 2018). A growing number of long noncoding RNA (lncRNA) actively alter gene expression and promote cellular dysfunction in cancer cells (Arun et al. 2018). Regulation of lncRNA transcripts is not fully understood but is thought to follow a deadenylation-dependent degradation pathway (Conrad et al. 2006; Thompson and Parker 2007; Geisler et al. 2012; Tycowski et al. 2012). Uniquely, several viral and mammalian lncRNAs, including those without a canonical poly(A) tail, maintain high transcript levels by evading degradation (Conrad et al. 2006; Brown et al. 2012; Tycowski et al. 2012, 2016; Wilusz et al. 2012). An evolutionarily conserved RNA element for nuclear expression (ENE) near the 3′ ends of these transcripts confers stability. This ENE motif functions as a cis-acting RNA stability element and has been identified across diverse genomes including viruses, plants, fungi, and mammals (Tycowski et al. 2016). Reports indicate stabilization of both lncRNA and mRNA transcripts by the ENE motif, which leads to evasion of cellular degradation pathways, resulting in high levels of transcript accumulation in the nucleus and cytoplasm (Wang et al. 1999; Conrad and Steitz 2005; Muhlrad and Parker 2005; Conrad et al. 2006; Garneau et al. 2008; Brown et al. 2012; Wilusz et al. 2012). The protective function of the ENE is dependent on formation of a triplex between a large U-rich internal loop and the transcript 3′ A-rich tail (Mitton-Fry et al. 2010; Brown et al. 2012; Wilusz et al. 2012). Sequestration of the 3′ end within the triplex precludes degradation in vitro (Ageeli et al. 2018). Two ENE triplex crystal structures have revealed the RNA 3′ end embedded within a triple helical architecture (Mitton-Fry et al. 2010; Brown et al. 2012). The ENE triplex is further stabilized by stacking atop canonical duplexes forming a continuous duplex–triplex–duplex structure.The lncRNA MALAT1 is overexpressed in many human cancer types (Ji et al. 2003; Lin et al. 2007; Lai et al. 2012) and high transcript levels are dependent on the protective ENE triplex (Brown et al. 2012; Wilusz et al. 2012). Maturation of lncRNA MALAT1 involves cleavage of a tRNA-like domain on the 3′ end of the nascent MALAT1 transcript by RNase P followed by formation of the ENE triplex with the matured A-rich tail. The speed of processing and triplex formation is not precisely known, but the dynamic interconversion between an isolated ENE and a well-formed triplex regulates in vitro degradation (Ageeli et al. 2018). Because the nuclear abundance of the oncogenic MALAT1 is regulated by the ENE triplex, this small 94-nt protective element may be a viable drug target (Brown et al. 2012). Furthermore, the crystal structure of the MALAT1ENE triplex supports direct structure–function relationship studies. Overall, the small size of the MALAT1ENE triple helix (M1TH) is readily tractable by both experimental and computational methods, facilitating detailed investigations of triplex-mediated RNA protection and the discovery of potential inhibitory mechanisms to ameliorate cancer progression supported by MALAT1.Singular structures do not define RNA functions in cells; the interplay between alternate conformations determines RNA function and ultimately cellular regulation (Russell et al. 2002; Mahen et al. 2005; Wickiser et al. 2005; Dethoff et al. 2008; Serganov and Nudler 2013; Saldi et al. 2018). With the discovery of an increasing number of lncRNAs connected to cancer and neurobiological dysfunction (Johnsson and Morris 2014; Huarte 2015; Arun et al. 2018), targeting cotranscriptional structures using high-throughput screening is increasing in popularity (Ganser et al. 2018). In fact, owing to the exceptional thermostability of M1TH following triplex formation (Brown et al. 2012; Ageeli et al. 2018), targeting cotranscriptional structures of the MALAT1ENE may represent a feasible approach to drug discovery, where at minimum, a putative highly ordered structure of the isolated ENE motif may become the focus of therapeutic interventions. Several questions remain unanswered regarding putative cotranscriptional structures of the ENE motif prior to triplex formation. Specifically, the absence of the A-rich tail may render the large U-rich internal loop devoid of structure or may bring the U-rich regions in close proximity to form nonnative interactions. The functional implications of either structural arrangement have not been adequately investigated. Previous reports suggest residual structure within the U-rich internal loop. A temperature-dependent peak in the NMR spectrum of an isolated ENE motif, lacking the triplex-forming A-rich sequence, was implicated in formation of U•U pairs within the U-rich loop (Mitton-Fry et al. 2010). Similarly, UV melt analysis of the isolated ENE motif indicates residual structure within the U-rich regions (Ageeli et al. 2018).Here we directly investigate the structure and dynamics of the MALAT1ENE triplex core () and a partially folded ENE conformation () using microsecond molecular dynamics (MD) simulations. We present a quantitative description of the ENE motif structure prior to triplex formation. Limited structural flexibility within the ENE results from highly ordered U•U pairs between the U-rich regions. Small angle X-ray scattering (SAXS) experiments confirm the global properties of our simulated ENE motif structure. Importantly, our results provide a molecular description of a nonprotective conformation of the MALAT1ENE prior to triplex formation (Ageeli et al. 2018), the structure of which represents a putative target for therapeutic intervention.
RESULTS
The 3′ terminus of the nascent MALAT1 transcript forms a tRNA-like domain. A requisite maturation step is required to form the protective triple helix. Cleavage of the tRNA-like domain by RNase P produces the mature 3′ A-rich tail and subsequent formation of the triplex (Brown et al. 2012; Wilusz et al. 2012; Ageeli et al. 2018), comprising stretches of five and four U•A-U triples interrupted by a C+•G-C triplet and a G-C doublet (Fig. 1A,B; Brown et al. 2012; Wilusz et al. 2012). To better understand the molecular underpinnings guiding triplex formation, we performed all-atom MD simulations. Although folding of RNAs can occur on the single microsecond timescale (Hyeon and Thirumalai 2012; Hori et al. 2018), robust folding pathways require millisecond simulations. Because millisecond all-atom simulations of monitoring folding of RNAs of this size are currently not accessible, we performed comparative analyses of two different structural models of the MALAT1 triplex: (i) a structural model representing the native, functional structure of the triplex core, defined by a crystal structure (PDBID:4PLX) (Brown et al. 2012), which we refer to herein as (Fig. 1A,B) and (ii) a partially folded structure wherein the A-rich tail is not hybridized with the ENE, referred to as (Fig. 1C). We generated the initial structural model by rotating the 3′ A-rich tail (3′-tail) away from the ENE core (Fig. 1C). We reasoned that this arrangement of the tail represents a conformation that may generally describe the ENE motif and 3′ tail structures after maturation by RNase P and prior to formation of the triplex. Our initial structural model treats the ENE and 3′ tail as independent structural regions within a single transcript. We use this structural model to evaluate dynamic structural changes that may occur after RNase P maturation, rather than immediately before triplex formation. Importantly, we anticipate that movements of the unstructured 3′ tail may be influenced by nucleotides within the linker. Therefore, in an effort to represent in a more biologically relevant conformation space, we replaced the truncated linker sequence present in the crystal structure with the wild-type human MALAT1 sequence, which we modeled as an ideal A-form helix (see Materials and Methods). Thus, the structure contains 15 linker nucleotides, six additional nucleotides relative to the . Overall, the starting structural models of and have identical atomic coordinates for nucleotides 1–53 (crystallographic numbering), which comprise the ENE core (Fig. 1C). Structural distinctions between the initial models are confined between nucleotide 54 and the 3′ end of the RNA.
FIGURE 1.
M1TH secondary and tertiary structure. The tertiary structure (A) and secondary structure (B) of M1TH (PDBID:4PLX) shown as a ribbon representation and secondary structure diagram, respectively. Each color corresponds to a structural element where the P1 helix is orange, the U-rich regions U1 and U2 are blue, the P2 helix is green, the bulge nucleotides are purple, the linker region is gray, and the 3′ tail is red. (C) The secondary structure diagram of a disrupted 3′ strand M1TH model () is shown using the same structural element coloring as in B.
M1TH secondary and tertiary structure. The tertiary structure (A) and secondary structure (B) of M1TH (PDBID:4PLX) shown as a ribbon representation and secondary structure diagram, respectively. Each color corresponds to a structural element where the P1 helix is orange, the U-rich regions U1 and U2 are blue, the P2 helix is green, the bulge nucleotides are purple, the linker region is gray, and the 3′ tail is red. (C) The secondary structure diagram of a disrupted 3′ strand M1TH model () is shown using the same structural element coloring as in B.
Intrinsic global dynamics of and
MD simulations of both and were monitored using RMSD of backbone atoms aligned and compared to the final equilibrated structure over simulation time at an increment of 2 psec per frame (see Materials and Methods). From a 1.2 µsec simulation of each RNA system, the RMSD of (Fig. 2, black line) and (Fig. 2, red line) was calculated over an 800 and 760 nsec equilibrium trajectory, respectively (see Materials and Methods). The average RMSD of 5 Å (Fig. 2A) is consistent with simulations of nonribosomal RNAs of similar size (Réblová et al. 2006; Priyakumar and MacKerell 2010; Huang et al. 2013; Aytenfisu et al. 2015; Suresh et al. 2016). The RMSD fluctuates between 15 and 30 Å during the first 100 nsec (Fig. 2A). After this initial period of fluctuating structure, continues with RMSD ∼ 18 ± 4 Å over the course of the simulation. Large fluctuations would be expected for a partially folded RNA, though the global RMSD analysis does not indicate whether these structural dynamics are equally distributed throughout the structure. To address this, we also evaluated the RMSD for nucleotides 1–53, which constitute the ENE motif in both and (Fig. 2A, gray and pink lines, respectively). This comparison highlights the stability of the ENE motif despite the distinct RNA structures. While the ENE motif (light pink) makes a conformational transition at the 100–150 nsec time frame (Fig. 2A), large increases in RMSD of the total structure (red) can be primarily attributed to dynamics of the linker (54–70) and the 3′ tail (71–81), which are both peripheral to the ENE core (Supplemental Fig. S1). Overall, the RMSD calculations clearly indicate a globally dynamic behavior for and a globally static behavior for .
FIGURE 2.
Root mean square analysis from MD simulations. (A) RNA backbone RMSD plot of the (black line) compared to (red line) shows the overall deviations from the starting equilibrated structure. The backbone RMSD of only the ENE motif is shown in gray () and pink (). (B) Root mean square fluctuations (RMSF) for (red) and (black). The P2 helix and linker loop exhibit high fluctuations in both simulations while the 3′ tail fluctuates in but not in . A dotted line represents the truncated sequence within the . (C) RMSF mapped to the and structural models according to a color scale from low (blue) to high (red).
Root mean square analysis from MD simulations. (A) RNA backbone RMSD plot of the (black line) compared to (red line) shows the overall deviations from the starting equilibrated structure. The backbone RMSD of only the ENE motif is shown in gray () and pink (). (B) Root mean square fluctuations (RMSF) for (red) and (black). The P2 helix and linker loop exhibit high fluctuations in both simulations while the 3′ tail fluctuates in but not in . A dotted line represents the truncated sequence within the . (C) RMSF mapped to the and structural models according to a color scale from low (blue) to high (red).To assess the degree of local fluctuations in the structure, we calculated the root mean square fluctuation (RMSF) by comparing the RNA backbone for each trajectory frame to the average structure from the last 800 nsec of simulation. In contrast to RMSD, the RMSF provides average dynamic information at the single nucleotide scale. The RMSF for (black line) and (red line) are shown on the same nucleotide axes (Fig. 2B), where a dashed line indicates the truncated region of the linker (Fig. 1B,C). The fluctuations in the structure are overall larger than . Using the lowest fluctuation for each simulation as a respective normalization (∼4 Å for and ∼8 Å for , Supplemental Fig. S1), we evaluated specific regions of heightened flexibility between the two simulated RNAs. This comparison reveals reduced fluctuations in the U-rich strands (Fig. 1B,C) of relative to (Supplemental Fig. S1). The contrasting relative fluctuation within these regions is surprising because this region is specifically involved in a three-stranded triplex in but a heretofore unknown structure in . In fact, there are only three regions where exhibits a higher relative fluctuation compared to : the 5′ portion of P2 (near nucleotide 15), the 3′ region of P1 (near nucleotide 50), and the 3′ A-rich tail (Fig. 2B; Supplemental Fig. S1). The increased fluctuation within P1 and P2 are localized to initial regions of the duplex–triplex junctions due to local structural rearrangements upon removal of the 3′ tail. Increased fluctuations within the 3′ tail are expected given the single-stranded nature of this region in . Mapping the raw fluctuations (Fig. 2B) to the surface of each structure gives a view of the flexibility differences between and (Fig. 2C). Although our simulations demonstrate large increases in both RMSD and RMSF for , structural dynamics are limited to the single-stranded 3′ tail rather than indicative of multiple transiently populated conformations.
The ENE motif forms distinct local interactions
Over the time-scale of our simulation does not fold into a triplex conformation, nor does dissociate into an unfolded conformation. To assess convergence of our simulations, we use the RMS average correlation function (RAC) (Galindo-Murillo et al. 2015), where a RAC value of 0 Å represents a completely converged time-series data set (see Supplemental Methods). Convergence values of <0.6 Å and 0.1 Å for and , respectively, were calculated (Supplemental Fig. S2). The 90% convergence value of the simulation occurs more quickly than the simulation, demonstrating the structural integrity of our system. We quantitatively assess each trajectory using a 2D RMSD-Rg coordinate space, where Rg represents the radius of gyration, comparing the global motion and overall size for each RNA (Supplemental Fig. S3). Overall, the and simulations are stable around a single minimum, where structural variations are attributed to global stretching or compression dynamics (Supplemental Fig. S4; Supplemental Information).Not surprisingly, the triplex structure is highly stable (Supplemental Fig. S3A), consistent with systematic UV melt and FRET experiments demonstrating stable triplex formation across a broad range of multiplexed monovalent and MgCl2 solution conditions (Ageeli et al. 2018). To assess the level of local structural variations for the simulated , we quantify the fraction of specific base pair interactions (fbp) over simulation time (see Materials and Methods), represented on a per nucleotide basis using a heat map (Fig. 3A). Local variations near the G40–C71 base pair are quite dynamic. Specifically, U10 (U1) interacts with A68 and A69 (Tail) equivalently over simulation time. Additionally, aside from interacting with their cognate base-pairing partners, U8 and U9 in the U1 region interact with nucleotides A67 and A66 (Tail), respectively, for >20% of simulation time. These interactions constitute alternative base-pairings or “slippage” and suggest conformational dynamics in the center of the triplex. Regardless, the primary interactions within the triplex region are base-triple interactions between the Tail (nt. 65–75), U2 (nt. 36–46) and U1 (nt. 6–15). Direct interactions between U1 and U2 are minimal (<6% of simulation time). Unfolding of the triplex or dissociation of the 3′ tail does not occur during simulation. Therefore, this interaction map is consistent with a well-formed triplex, and is supported by experimental analysis wherein the triplex is resistant to 3′–5′ exoribonuclease degradation (Ageeli et al. 2018).
FIGURE 3.
Fraction of base pairs (fbp) observed during MD simulations. The fbp interactions for each nucleotide are presented as a grayscale heat map for (A) and (B) . Each heat map represents a fingerprint for highly stable base pairs (black) and transient base pairs (lighter gray) throughout the MD simulation. Regions of interest are labeled with text and colored according to the structural diagrams shown in Figure 1.
Fraction of base pairs (fbp) observed during MD simulations. The fbp interactions for each nucleotide are presented as a grayscale heat map for (A) and (B) . Each heat map represents a fingerprint for highly stable base pairs (black) and transient base pairs (lighter gray) throughout the MD simulation. Regions of interest are labeled with text and colored according to the structural diagrams shown in Figure 1.Previous studies have suggested that the internal loop within the U1 and U2 regions of the isolated ENE motif is highly flexible (Conrad and Steitz 2005; Conrad et al. 2006; Mitton-Fry et al. 2010; Brown et al. 2012; Tycowski et al. 2012, 2016; Donlic et al. 2018). Structural plasticity within this region would result in a dynamic, partially collapsed RNA exhibiting minimal, or highly transient, interactions involving U1 and U2 regions. To assess structural interactions, or lack thereof, within the U-rich regions of the ENE, we calculate fbp for the simulation (Table 1; Fig. 3B). This quantitative description of base pair interactions within the ENE motif reveals a significantly altered fingerprint when compared to (Fig. 3A). The loss of triplex interactions involving the duplex strand (U2), Hoogsteen strand (U1), and the 3′ tail (Tail) are expected, though it is informative to note that their occupancy remains near zero for the duration of the simulation. One new region of interaction is clearly formed during simulation of (Fig. 3B); our analysis reveals a high degree of base pair interactions within the U-rich regions. The types of base pair interactions observed on average in the newly formed U1–U2 region from the simulation are listed in Table 1 and are consistent with cis-Watson–Watson interactions (cWW, Leontis–Westhof notation) observed in antiparallel RNA strands (Leontis and Westhof 2001). However, the first U•U pair between U8 and U45 is dominated by a trans-Watson–Watson interaction 69% of the time and a trans-Sugar–Watson interaction 10% of the time. The most dynamic nucleotide in the region, U9, interacts with U42, U43, and U44 in a cis-Watson–Watson configuration ranging from 13% to 55% of simulation time. The majority of remaining U•U pairs are present in cis-Watson–Watson for >99% of simulation time. This new U1–U2 duplex is formed of primarily U•U base pairs (Fig. 4A,B), some of which exhibit a high degree of alternative base-pair interactions. We depict the alternative pair interactions observed during MD simulations on the secondary structure of (Fig. 4C), where a solid blue arrow indicates the highest populated interaction (typically fbp > 80%) and a dotted arrow indicates transient interactions (10% < fbp < 20%). Formation of the U•U pairs leads to a highly ordered backbone structure in which the phosphate–phosphate distances are decreased relative to a standard A-form duplex (Fig. 4D).
TABLE 1.
Average base pair interactions over 800 nsec simulation time
FIGURE 4.
Base-pair interactions in the M1TH ENE motif. (A) Example of U•U pair observed in simulation at site U8–U45 in a trans-Watson–Crick interaction. The hydrogen bonds in this base pair are present for more than 86% of simulation time. (B) A snapshot of a dynamic region within the ENE including nucleotides U7–U10 on U1 and C41–G47 on U2. Blue dotted lines indicate potential hydrogen bonding. This view effectively depicts the promiscuity of U9, which is within reach of U42 (fbp ∼ 55%), U43 (fbp ∼ 13%), and U44 (fbp ∼ 19%) (Table 1). (C) Zoomed in partial secondary structure of the ENE motif (Fig. 1C) with fbp (Table 1) mapped to each nucleotide pair where a solid blue arrow indicates the highest populated interaction (typically fbp > 80%) and a dotted arrow indicates transient interactions (10% < fbp < 20%). (D) For each base pair detected, the average phosphate–phosphate distance (P–P) was calculated over the MD simulation time. The right axis corresponds to base pairs (black line), and the left axis corresponds to base pairs (red line). Missing values at bulged nucleotides are closed with dotted lines. The horizontal blue dashed lines indicate a structural transition between U1–U2 region with the canonical A-form duplexes (P1 and P2) within each structure. The average phosphate–phosphate distance for helical A-form RNA is indicated by a purple line (van Knippenberg and Hilbers 1986).
Base-pair interactions in the M1TH ENE motif. (A) Example of U•U pair observed in simulation at site U8–U45 in a trans-Watson–Crick interaction. The hydrogen bonds in this base pair are present for more than 86% of simulation time. (B) A snapshot of a dynamic region within the ENE including nucleotides U7–U10 on U1 and C41–G47 on U2. Blue dotted lines indicate potential hydrogen bonding. This view effectively depicts the promiscuity of U9, which is within reach of U42 (fbp ∼ 55%), U43 (fbp ∼ 13%), and U44 (fbp ∼ 19%) (Table 1). (C) Zoomed in partial secondary structure of the ENE motif (Fig. 1C) with fbp (Table 1) mapped to each nucleotide pair where a solid blue arrow indicates the highest populated interaction (typically fbp > 80%) and a dotted arrow indicates transient interactions (10% < fbp < 20%). (D) For each base pair detected, the average phosphate–phosphate distance (P–P) was calculated over the MD simulation time. The right axis corresponds to base pairs (black line), and the left axis corresponds to base pairs (red line). Missing values at bulged nucleotides are closed with dotted lines. The horizontal blue dashed lines indicate a structural transition between U1–U2 region with the canonical A-form duplexes (P1 and P2) within each structure. The average phosphate–phosphate distance for helical A-form RNA is indicated by a purple line (van Knippenberg and Hilbers 1986).Average base pair interactions over 800 nsec simulation timeOur analysis indicates for the first time that the ENE maintains a significantly ordered structure, rather than adopting highly dynamic, flexible conformations as previously suggested. Our simulation results imply specific structures are maintained within the U-rich regions U1 and U2. To determine whether such structure can be experimentally corroborated, we performed SAXS experiments to compare the global structural properties of the RNA in solution with the properties of our simulated structure (see Materials and Methods). The calculated Rg of our simulated ENE is in excellent agreement with experimental SAXS results (Table 2). The experimental Rg for the isolated ENE motif is 27.6 ± 0.6 Å; the calculated Rg from our simulated ENE structure is 27 Å. Additionally, comparison of the probability distribution, or P(r), profiles from experiment and simulation (Supplemental Fig. S5) indicates that both structures are extended, mostly rod-like, with an identical maximum distance (Dmax), within error (Table 2). We attribute small differences in the probability distributions between experimental and simulated ENE motifs to differences in solvation layers between calculated and experimental systems as well as possible over-winding of the structure induced by the MD force-field. Overall, both the simulated and experimental ENE structures maintain a markedly extended conformation, rather than a collapsed structure. Given that MD simulations of result in a highly ordered ENE structure represented by a single conformational minimum, we reasoned that internal structure within the U-rich regions would be necessary to achieve such stability.
TABLE 2.
Experimental and theoretical SAXS calculations
Experimental and theoretical SAXS calculations
A nonprotective conformation of the MALAT1 ENE
Based on the MD-derived high probability base pairs within , we present a new secondary structure (Fig. 5A) depicting the structural interactions within the ENE motif, in which well-ordered U•U pairs are defined with fbp > 0.8 (Fig. 4C). The structure reveals a contiguous double helical stack. This distinct conformation, compared to (Fig. 1B), is a result of rearrangement of the U1 and U2 strands where predominantly new U•U base pairs are formed amid two unpaired nucleotides (U43 and U44 on U2).
FIGURE 5.
A structure of the MALAT1 ENE motif. (A) The secondary structure of the determined by simulation highlights structure within the U1 and U2 regions; blue lines show through-space tertiary interactions. (B) Secondary structure of the isolated wild-type MALAT1 ENE structure including the full-length P2 helix. (C) Three-dimensional model of wild-type MALAT1 ENE shows a non-A-form helical stack in an extended rod-like conformation consistent with SAXS measurements (Table 2; Supplemental Fig. S5).
A structure of the MALAT1ENE motif. (A) The secondary structure of the determined by simulation highlights structure within the U1 and U2 regions; blue lines show through-space tertiary interactions. (B) Secondary structure of the isolated wild-type MALAT1ENE structure including the full-length P2 helix. (C) Three-dimensional model of wild-type MALAT1ENE shows a non-A-form helical stack in an extended rod-like conformation consistent with SAXS measurements (Table 2; Supplemental Fig. S5).In light of the very limited interactions between the ENE and adjacent linker and 3′ tail, the ENE motif represents an independent structural domain, comprising only nucleotides 1–53 of our simulated structure. Consequently, we generated an average structure of the ENE motif from the last 500 nsec of production simulation. However, this average ENE structure lacks the wild-type P2 sequence, which was truncated to facilitate crystallization (Fig. 1A,B). Therefore, to achieve a wild-type model of the ENE motif structure, we modeled the wild-type P2 helix as an A-form helix (Fig. 5B). After manually fitting the P2 helix to the apex of the simulated ENE motif structure, the system was re-solvated and equilibrated for 2 nsec to achieve a final structure of the wild-type ENE (Fig. 5C). Theoretical scattering calculations of the wild-type ENE structure (ENE + WT P2) are in excellent agreement with experimental SAXS measurements of the wild-type ENE motif RNA of the same sequence (Table 2; Supplemental Fig. S5). Overall, the SAXS data indicate an extended structure consistent with the elongated duplex-like stack within the ENE + WT P2 structure. It is clear from the new structure of the wild-type ENE (Fig. 5C) that the newly formed U•U base pairs tolerate a non-A-form helical conformation (Fig. 4D). Stability of this region is maintained by a well-ordered helical U•U stack. This unique topology, while rarely observed experimentally (Baeyens et al. 1995), is highly stable, in contrast to structures containing singular U•U base pairs whose stability is weakened relative to U•G or C•A mismatches (Kierzek et al. 1999; Sheng et al. 2013).
DISCUSSION
Many biological processes utilize alternative conformers to regulate RNA function (Russell et al. 2002; Mahen et al. 2005; Wickiser et al. 2005; Dethoff et al. 2008; Serganov and Nudler 2013; Saldi et al. 2018). Cotranscriptional RNA folding may populate alternative structures including nonproductive conformations (Hua et al. 2018). For the lncRNA MALAT1, 3′ end maturation by RNase P is a requisite step for proper formation of a protective 3′ ENE triplex, which supports high cellular abundance and persistent lncRNA oncogenic processes (Brown et al. 2012; Wilusz et al. 2012). Therefore, a cotranscriptionally folded ENE motif, if highly structured prior to triplex formation, would represent an important nonfunctional structure for therapeutic intervention. Recently, we investigated the mechanism of M1TH triplex formation using a battery of biochemical and biophysical approaches. We determined that prior to liberation of the 3′ end upon RNase P cleavage, a stable P2 and a weak P1 helix are partially formed, suggesting a metastable ENE structure exists prior to triplex formation. (Ageeli et al. 2018). Protection from degradation is conferred through formation of the triplex (Brown et al. 2016; Ageeli et al. 2018). Dynamic structural interconversions between the isolated ENE structure and the well-formed triplex regulate triplex-mediated protection in vitro. Here, we perform microsecond MD simulations to investigate structural variations within the ENE triplex and the isolated ENE motif. We assess the conformational dynamics of the MALAT1ENE prior to triplex formation, which we refer to as . Our quantitative results support a highly ordered ENE structure comprising nonnative U•U pairs, leading to an overall extended tructure. The global properties of this ENE structure are supported by SAXS Rg and P(r) data, confirming the extended duplex conformation of the MALAT1ENE. While the ENE triplex of and isolated ENE of are identical at the nucleotide level, no tendency to convert between conformations was observed on this timescale. A quantitative assessment of convergence shows a converging trajectory, suggesting both conformations are indeed distinct stable conformational states. Therefore, the stable MALAT1ENE may be entirely formed cotranscriptionally; prior to 3′ end maturation of the nascent MALAT1 transcript, the two duplexes and U-rich regions form a stable, extended rod-like structure, which is insufficient to protect MALAT1 from 3′ to 5′ exoribonucleolytic degradation. We present the first evidence of a highly ordered ENE motif stabilized by nonnative U•U pairs. This stable ENE structure unveils a novel putative therapeutic target. Recent studies have utilized numerous small molecule designs to target U•U base pairs (Arambula et al. 2009; Xia et al. 2014). Similar efforts targeting the ENE structure may yield effective inhibition of the lncRNA MALAT1.
MATERIALS AND METHODS
Simulation system preparation
We simulated two different RNA systems, which we refer to as and . The structure includes the triplex region, a truncated P2 helix, and truncated linker sequence. The atomic coordinates for were obtained from the only available crystal structure of the MALAT1 triplex (Brown et al. 2012). For this study, we used chain A of PDB:4PLX (Brown et al. 2012) with the GTP at position 1 and the modified adenosine at position 76 residues removed. We generated the starting structure for a partially folded M1TH, which we refer to as , by first rotating the A65 dihedral (A64:O3′, A65:P, A65:O5′, A65:C5′) from chain A by 65° in the x-plane followed by 20° in the y-plane. These manual molecular adjustments resulted in a conformation with the A-rich tail removed from the ENE. Our starting conformation also includes the wild-type basal linker sequence, which we inserted from position U54 to A70 (Wilusz et al. 2008). These inserted nucleotides were modeled as an ideal A-form RNA helix using the threading module of ROSETTA 3.9 (rosettacommons.org). Molecular modeling and visualization of both RNAs were done using VMD (Humphrey et al. 1996) and the OL3 parameter set for RNA that incorporates modified glycosidic torsion profiles using tleap in AMBERTOOLS16 (Case et al. 2016).All RNA systems were prepared using tleap within AMBERTOOLS 16 (Case et al. 2016). Each RNA molecule was placed in an explicit solvent TIP3P box with Cartesian dimensions (116.3, 116.3, 116.3) Å and (125.5, 63.5, 81.4) Å, providing a solvent buffer of 10 Å and 8 Å for and systems, respectively. The simulation system was neutralized with 74 Na+ ions and with 80 Na+ ions. Atoms P, OP1, OP2, O5′, C5′, C4′, C3′, O3′ were chosen to describe the RNA backbone and placed under harmonic restraints during minimization and initial equilibration phases. Minimization of each system was conducted by 500 steepest descent steps followed by 9500 conjugate gradient steps, restraining all RNA atoms with a force constant 10 kcal mol−1 Å−2 to remove unfavorable contacts. For all MD simulations conducted in this work: a simulation time step of 2 fsec is used, all bonds involving hydrogen are constrained with SHAKE (Ryckaert et al. 1977), a nonbonded cutoff of 9 Å is used, and the default Particle Mesh Ewald (PME) parameters for Amber are used to handle long-range electrostatics. Heating of each system is done over 3000 steps of canonical (NVT) MD from 100 to 300 K using a weak coupling thermostat (Berendsen et al. 1984) at constant pressure. Next, 500 psec of NVT MD equilibration were performed using the above RNA backbone restraints at a constant temperature of 300 K using a weak coupling thermostat (Berendsen et al. 1984). Five, 2 nsec stages of equilibration are performed using NPT MD, maintained at 300 K using Langevin dynamics with a collision frequency of 3 psec−1, and at constant pressure maintained by a Berendsen barostat. Each stage uses a position restrained RNA backbone where the restraints are gradually released over each equilibration stage from an initial force constant of 10 kcal mol−1 Å−2 to a final force constant of 0.5 kcal mol−1 Å−2. Production unrestrained equilibrium NPT MD simulations were performed on a single Linux machine utilizing NVIDIA P2000 GPU cards running PMEMD CUDA in AMBER 16 (Case et al. 2016). The system was simulated for 1.2 µsec. The system was simulated for 1 µsec. Approximately, 800 nsec of equilibrium MD from each simulation system was used in subsequent analyses as determined by RMSD calculations. Translational and rotational motions were removed from the trajectory by fitting all frames to the first frame of the analysis data set using CPPTRAJ in AMBERTOOLS 16 (Case et al. 2016).The root mean square deviation (RMSD) was calculated as
where , r() are the coordinates of atom i at time t after fitting to the reference coordinates , for which the first frame of production simulation was used. The root mean square fluctuation (RMSF) was calculated per nucleotide as
where T is the time and is the average position of particle i over T.
Fraction of base pairs
Base pairs were identified using the DSSR package (Lu et al. 2015) of the 3DNA program (Lu and Olson 2003). All MD trajectories were read into DSSR using the -nmr flag, all 29 base pair references were searched at every frame using the –more option and output as a JSON formatted file. Analyses were performed using custom R (R Core Team 2018) scripts. The fraction of base pair, fbp, is calculated as
where Nbp is the number of frames in a simulation a base pair is observed, and N is the total number of frames in the simulation. Heat map plots were constructed using R (R Core Team 2018).
Small angle X-ray scattering experiments
In vitro transcription of the wild-type isolated ENE motif was performed using PCR templates containing two 2′OMe at the 5′ end of the template strand as previously described (Ageeli et al. 2018). The sequence of the isolated ENE motif is GGAAGGTTTTTCTTTTCCTGAGAAAACAACACGTATTGTTTTCTCAGGTTTTGCTTTTTGGCCTTT. Samples were purified by size-exclusion chromatography in 20 mM HEPES, pH 7.4, 150 mM NaCl and KCl, and 1 mM MgCl2. Data were collected under continuous flow on beamline ID-12 at the Advanced Photon Source (APS) at Argonne National Laboratories, controlled and collected using beamline-specific programs and scripts. Data was collected for buffer samples and RNA samples separately. Igor Pro (WaveMetrics) was used to examine data quality. A Guinier analysis was used to calculate Rg and estimate error from scattering intensities. Initial calculations of the pair-distance probability distribution, P(r), were conducted using a scan of the maximum molecule distance Dmax using Python scripts (Lipfert et al. 2007) and GNOM (Svergun 1992) within the ATSAS analysis package (embl-hamburg.de/). Dmax was determined based on a smooth plot of P(r) where Dmax corresponds to an abscissa intercept at r = 0. The error in Dmax is reported as the interval over which Dmax results in a P(r) function with a smooth intercept at r = 0.
Theoretical scattering calculations
Coordinates were extracted from single frames of MD simulations and imported into CRYSOL (Svergun et al. 1995) where default parameters were selected. The calculated Rg and maximum molecular diameter excluding hydrogen volume were recorded. Theoretical intensities generated by CRYSOL (Svergun et al. 1995) were imported into GNOM (Svergun 1992) where the predicted maximum molecular diameter is input as Dmax. A scan of P(r), the pair distribution function, was performed at Dmax ± 5 Å based on a smooth plot of P(r) where Dmax corresponds to an abscissa intercept at r = 0. The error Rg calculated from theoretical scattering profiles generated from CRYSOL are not explicitly reported by GNOM due to the lack of experimental error in the scattering intensities. A 3% error was calculated and added to the theoretical intensities. The error in Dmax is reported as the interval over which Dmax results in a P(r) function with a smooth intercept at r = 0.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
Authors: Jeremy E Wilusz; Courtney K JnBaptiste; Laura Y Lu; Claus-D Kuhn; Leemor Joshua-Tor; Phillip A Sharp Journal: Genes Dev Date: 2012-10-16 Impact factor: 11.361
Authors: Jonathan F Arambula; Sreenivasa Rao Ramisetty; Anne M Baranger; Steven C Zimmerman Journal: Proc Natl Acad Sci U S A Date: 2009-09-08 Impact factor: 11.205
Authors: Laura R Ganser; Janghyun Lee; Atul Rangadurai; Dawn K Merriman; Megan L Kelly; Aman D Kansal; Bharathwaj Sathyamoorthy; Hashim M Al-Hashimi Journal: Nat Struct Mol Biol Date: 2018-05-04 Impact factor: 15.369