Literature DB >> 26500461

Computational investigation of RNA CUG repeats responsible for myotonic dystrophy 1.

Ilyas Yildirim¹, Debayan Chakraborty, Matthew D Disney, David J Wales, George C Schatz.

Abstract

Despite the importance of the knowledge of molecular hydration entropy (ΔShyd) in chemical and biological processes, the exact calculation of ΔShyd is very difficult, because of the complexity in solute–water interactions. Although free-energy perturbation (FEP) methods have been employed quite widely in the literature, the poor convergent behavior of the van der Waals interaction term in the potential function limited the accuracy and robustness. In this study, we propose a new method for estimating ΔShyd by means of combining the FEP approach and the pan class="Disease">scaled particle theory (or information theory) to separately calculate the electrostatic solute–water interaction term (ΔSelec) and the hydrophobic contribution approximated by the cavity formation entropy (ΔScav), respectively. Decomposition of ΔShyd into ΔScav and ΔSelec terms is found to be very effective with a substantial accuracy enhancement in ΔShyd estimation, when compared to the conventional full FEP calculations. ΔScav appears to dominate over ΔSelec in magnitude, even in the case of polar solutes, implying that the major contribution to the entropic cost for hydration comes from the formation of a solvent-excluded volume. Our hybrid scaled particle theory and FEP method is thus found to enhance the accuracy of ΔShyd prediction by effectively complementing the conventional full FEP method.

Entities: Chemical Disease Gene Mutation

Mesh：

Substances：

Year: 2015 PMID： 26500461 PMCID： PMC4606397 DOI： 10.1021/acs.jctc.5b00728

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Repeat expansion disorders are caused by mutations in DNA where repeats in certain genes become expanded. Once the repeats are transcribed, mRNA (mRNA) folds into a hairpin with repeating internal loop motifs, which can be translated into toxic proteins or sequester proteins and cause disease.[1,2] The mechanisms by which these repeats expand are not fully known. One hypothesis is that formation of non-B-form conformations by the repetitive DNA sequences, such as hairpins, is the reason for the expansion during DNA replication and repair.[3−5] In the expansions of CAG repeats, which are typically found in the coding regions of mRNAs such as Huntingtin (pan class="Gene">HTT), androgen receptor (AR), spinocerebellar ataxia (SCA), and atrophin-1 (ATN1) genes, the transcripts are translated into toxic polyglutamine (polyQ)[6] proteins, resulting in Huntington’s disease (HD), Spinal and Bulbar Muscular Atrophy (SBMA), Spinocerebellar Ataxia Type 1 (SCA1), and Dentatorubral-pallidoluysian atrophy (DRPLA), respectively.[7−12] Furthermore, it has been shown recently that RNA CAG repeat expansions can sequester proteins and contribute to neurodegeneration.[13−15] Another genetic disorder, Friedreich’s ataxia (FRDA), is caused by expansion of GAA repeats.[9,16] A common heritable form of mental retardation, Fragile X-associated tremor ataxia syndrome (FXTAS), is caused by expansion of the CGG segment from 50 to 200 repeats in the 5′-untranslated region (UTR) of the fragile X mental retardation 1 (FMR1) gene.[7,16−23] Once the FMR1 gene is transcribed into mRNA, the expanded CGG repeats fold into a hairpin structure with repeating 1×1 GG internal loops that sequester and inactivate Sam68 protein. Myotonic dystrophy (DM) is another pan class="Disease">genetic disease that is the most common adult-onset form of muscular dystrophy. DM is a neuromuscular disorder that is characterized by muscle weakness and slow relaxation of muscles after contraction.[7,24] It is caused by expansion of CTG and CCTG repeats in DNA,[7,24] with CTG leading to myotonic dystrophy type 1 (DM1) and CCTG to type 2 (DM2). In both DM1 and DM2, once the trinucleotide CTG in the 3′-UTR of the dystrophia myotonica protein kinase (DMPK) gene and the tetranucleotide CCTG repeats in the zinc-finger 9 (ZNF9) gene are transcribed, the CUG and CCUG repeats fold into RNA hairpins with repeating 1×1 UU and 2×2 CU/UC internal loop motifs, respectively,[25] attracting muscleblind-like 1 (MBNL1)[26−29] or cytoplasmic multiprotein complexes[30,31] that cause, for example, muscle chloride channel dysfunction[32,33] and abnormal regulation of insulin receptors.[34] Because the expanded RNA repeats are in the noncoding regions the diseases are caused by RNA gain-of-function.[7,20,21] Even though the disease mechanisms described above are the most well established ones, expanded RNA repeats might play other roles in the cell. As an example, expanded repeats in UTRs are translated via a repeat associated non-ATG translation.[35] The crystal structures of model RNA molecules containing CUG repeats have been determined before by different groups using X-ray diffraction methods.[36−39] In addition, an NMR structure of a single copy of the pan class="Gene">DM1 motif was characterized that displayed an overall A-form structure.[40] Even though all these structures show 1×1 UU internal loops in anti–anti conformations, large amplitude motions in RNA internal loops could not be investigated with the current experimental methods. The conformations that RNA CUG repeats adopt upon protein or small molecule binding require a better understanding of these motions. Helical stacking ↔ unstacking and syn ↔ anti transformation of uridines in 1×1 UU base pairs are two such important large scale motions that have significance in small molecule–RNA binding. Various groups have designed and published small molecules and ligands targeting RNA CUG repeats,[41−46] but at present there is no experimental structure available that reports the atomistic details of a small molecule bound to RNA CUG repeats. Thus, the structural and thermodynamic properties of 1×1 UU base pairs in RNA CUG repeats are of significant interest, since they may provide insight into the conformations that these repeats adopt upon interaction with proteins or small molecules.[42,47] We recently analyzed the properties of 1×1 AA internal loops seen in RNA CAG repeat expansions. The crystal structure of a model RNA with three CAG repeats, r[5′UUGGGC(pan class="Mutation">CAG)3GUCC]2, was found to exhibit terminal 1×1 AA base pairs in syn–anti conformations with the middle 1×1 AA base pair anti–anti.[48] The terminal AA base pairs interact with dangling uridine bases, stabilizing the syn–anti conformational state. The results basically indicated that a dynamic base pair such as 1×1 AA internal loops could transform to another conformation upon interacting with a molecule. Thus, we utilized umbrella sampling MD simulations to predict the 2D free energy surface for 1×1 AA internal loops to investigate the potential stable conformations and transformation pathways.[48] In line with the experimental results, the 2D free energy surface for 1×1 AA internal loops displayed anti–anti and syn–anti conformations as global and local minima, respectively. Furthermore, several stable local minima states in both anti–anti and syn–anti conformations were found for 1×1 AA internal loops. The results indicated that the stable 1×1 RNA internal loop conformations could be exploited for small molecule binding. Indeed, we analyzed the binding mechanism of a substituted naphthyridine (C-2) to RNA 1×1 UU internal loops, which inhibits binding of MBNL1 to RNA CUG repeat expansions.[43] The results indicated that complex structural transformations are associated with binding.[43] Similar results were seen when we analyzed 6′-N-5-hexynoate kanamycin alkyne (K-alkyne) binding to 2×2 CU/UC internal loops in RNA CCUG repeats, which is a lead small molecule targeting RNA causing DM2.[49] Hence, atomistic details of the structure and dynamics of RNA internal loops as well as the transformation pathways extracted from the predicted free energy profiles can yield insights that might be used to understand and develop drugs that will target RNA repeat expansions. Although there are no structures reported for MBNL1 bound to CUG repeats, it is thought that the zinc finger 3 and 4 domains (Zn3/4) of MBNL1 have some functional roles in the binding process.[50] Thus, the present investigations could also shed light on this pathogenic interaction. In the present contribution, structural and thermodynamic properties of 1×1 UU RNA internal loops are predicted using regular and umbrella sampling MD simulations in explicit solvent with an AMBER force field, which includes a χ torsional revision for RNA.[51] We also exploited the discrete path sampling (pan class="Chemical">DPS) method to investigate the energy landscape of the 1×1 UU internal loop in RNA CUG repeats. Three systems were considered: 3×CUG3SYW (r[5′UUGGGC(CUG)3GUCC]2), 3×CUGinf (r[5′C(CUG)3G]2), and 1×CUG (r[5′CCGCUGCGG]2). 3×CUG3SYW and 3×CUGinf were used to predict the dynamic properties of 1×1 UU base pairs. In particular, 3×CUGinf was designed to simulate 1×1 UU RNA internal loops in an infinitely long linear RNA system, while 3×CUG3SYW is an already solved crystal structure with PDB accession code 3SYW.[38] 1×CUG was designed to predict the 2D free energy surfaces of 1×1 RNA UU internal loops using umbrella sampling MD calculations. Similar to our previous RNA CAG repeat study,[48] one of the reaction coordinates was chosen as the χ torsion (rotation around the glycosidic bond) in order to mimic the orientation of the uridine base with respect to sugar, while the other reaction coordinate was a pseudotorsion (θ) mimicking base flipping (base stacking ↔ unstacking). Two 2D potentials of mean force (PMF) surfaces, (θ1,χ) and (θ1,θ2), were calculated for the 1×1 UU base pairs. Our results indicate that 1×1 UU RNA internal loops are dynamic and favor the anti–anti conformation but have several local minima in the 2D PMF profiles displaying both anti–anti and syn–anti conformations. The 2D (θ1,χ) PMF exhibits three pathways for the syn–anti ↔ anti–anti transformation, one of which dominates in temperature jump MD simulations. These results were verified using DPS calculations, which showed that in the dominant kinetic pathway the uridine flips out via the major groove, followed by a rotation around χ to produce a syn → anti transformation. In line with previous studies, the 2D (θ1,θ2) PMF as well as regular MD simulations displayed 1×1 UU base pairs in 2, 1, and 0 anti–anti hydrogen-bond conformations. Similar to the 1×1 AA base pairs studied before,[48] a Na+ binding pocket was discovered near the 1×1 UU base pair, stabilizing the syn–anti conformation. Furthermore, MD simulations of 3×CUG3SYW showed 2×2 GC/CG regions flanked by 1×1 UU base pairs to have stable but distorted forms, which indicates flexibility. The present results reveal atomistic details of the structure and thermodynamics of 1×1 UU base pairs in RNA pan class="Chemical">CUG repeat expansions that are inaccessible with current experimental methods. This insight should help us to design better compounds targeting RNA CUG repeats in future work.

Methods

Preparation of Model Systems for MD Simulations

1×CUG (Figure A) was employed to predict the 2D free energy surfaces of 1×1 pan class="Chemical">UU base pairs using umbrella sampling calculations. 3×CUG3SYW (Figure B) was considered to model the structural properties of a crystal structure (PDB accession code 3SYW),[38] which has three consecutive CUG repeats. Finally, 3×CUGinf (Figure C) should provide insight into the dynamics of 1×1 UU base pairs in an infinitely long linear RNA system. The initial structure of 1×CUG was built from the already solved NMR structure (PDB accession code 2L8U).[40] A homology model of 3×CUGinf was built from the crystal structure data of 3SYW(38) with UU base pairs in either anti–anti or syn–anti conformations. Two initial structures were created for 3×CUGinf; one where all 1×1 UU base pairs were in anti–anti conformations (3×CUGinf/anti–anti) and another where the middle UU base pair (U6U17) was in syn–anti (3×CUGinf/syn–anti). This was done in order to investigate potential syn–anti ↔ anti–anti transformations. For the initial structure of 3×CUG3SYW, data from the crystal structure 3SYW(38) was used, including dangling uridine residues, which were utilized experimentally to crystallize the system. As shown in Figure B, the dangling uridine residues were kept in the structure in order to mimic the original system as far as possible. The Leap module of AMBER MD package[52] was used to neutralize all the systems with Na+ and/or Cl– ions[53] and to solvate with TIP3P water molecules.[54] In 3×CUGinf, 10 extra Cl– and Na+ ions were randomly included in the systems. The 1×CUG and 3×CUG3SYW were designed to have truncated octahedral boxes, while 3×CUGinf was designed to have a cubic box that created an infinitely long RNA system (Figure C). The 1×CUG, 3×CUGinf, and 3×CUG3SYW systems had 4025, 2265, and 10766 water molecules, respectively. Once equilibrated, these systems had 0.67, 0.21, and 0.18 mM Na+ concentrations, respectively. We also designed another 3×CUGinf system, which was neutralized with 20 Na+ ions, had the middle UU basepair in syn–anti conformation, and was solvated with 4333 water molecules to yield a Na+ concentration of 0.24 mM. Two sets of MD simulations for this system were run. The Amber force field[55] with revised χ[51] and α/γ[56] torsional parameters were used in both regular and umbrella sampling MD simulations, as these revised parameters have been shown to improve the predictions.[48,51,57−63]

Figure 1

Model RNA sequences and initial structures used in regular and umbrella sampling MD simulations. For illustrative purposes, the hydrogen atoms of the water molecules are not shown. The oxygen atoms of water molecules are represented with red dots. Blue and cyan spheres represent Na+ and Cl– ions, respectively. RNA structures are shown in the “new ribbons” format.[65] The opaque region of C represents the unit cell, while transparent regions represent the neighboring cells to illustrate the design of the infinitely long RNA system. Residue numbers described for each system are used in the analyses.

Model RNA sequences and initial structures used in regular and umbrella sampling MD simulations. For illustrative purposes, the hydrogen atoms of the pan class="Chemical">water molecules are not shown. The oxygen atoms of water molecules are represented with red dots. Blue and cyan spheres represent Na+ and Cl– ions, respectively. RNA structures are shown in the “new ribbons” format.[65] The opaque region of C represents the unit cell, while transparent regions represent the neighboring cells to illustrate the design of the infinitely long RNA system. Residue numbers described for each system are used in the analyses.

Molecular Dynamics Simulations

Each system was first minimized and equilibrated in two steps (see also Table S1).[48,57,58] After equilibration, constant pressure dynamics with isotropic positional scaling was used in all the production runs. The reference pressure was set to 1 atm with a pressure relaxation time of 2 ps. Bonds involving pan class="Chemical">hydrogen atoms were constrained with SHAKE.[64] In all the MD simulations an atom-based long-range hard cutoff of 8.0 Å was used. A total of 1 μs MD was run on 3×CUG3SYW. As described above, two independent MD calculations were run on 3×CUGinf, 3×CUGinf/anti–anti and 3×CUGinf/syn–anti, with total MD simulation times of 500 and 606 ns, respectively, with a 1 fs time step. These sets of MD simulations were run in order to characterize any syn–anti ↔ anti–anti transformations for the UU base pairs. The temperatures of all MD simulations were kept at 300 K. The PMEMD module in AMBER12[52] was used to run the MD simulations. Trajectory files were written at intervals of 5000 fs time steps. For all these calculations, over 138 K CPU hours were required.

Umbrella Sampling Simulations

The reaction coordinates shown in Figure , which we previously utilized to study RNA 1×1 AA base pairs,[48] were used to build the 2D PMF surfaces for RNA 1×1 UU internal loops. Two different 2D potential of mean force (PMF) surfaces, (θ1 vs χ) and (θ1 vs θ2), were calculated for 1×1 pan class="Chemical">UU base pairs using the model system 1×CUG (Figure A). Nucleic acid bases can rotate around glycosidic bonds (rotation around χ torsional angle), which are responsible for the syn and anti base orientations with respect to sugar (Figure A). While purine residues prefer both syn and anti orientations, pyrimidine residues mostly prefer anti. Hence, one of the reaction coordinates was chosen to be the χ torsional rotation. The second reaction coordinate was chosen to imitate base stacking ↔ unstacking, where the center-of-mass pseudodihedral angle (θ) was defined to mimic this motion (Figure B). In order to build each 2D PMF surface, 36 × 36 = 1296 umbrella sampling windows were simulated. Initial structures with different (θ1,χ) and (θ1,θ2) combinations were created for the model system 1×CUG (see Section S1 in the Supporting Information for the details).

Figure 2

Reaction coordinates utilized to build the 2D PMF surfaces for RNA 1×1 UU base pairs. In A, the torsional angle highlighted in red is χ (O4′–C1′–N1–C2) responsible for the syn ↔ anti transformation. In order to investigate the base stacking ↔ unstacking, pseudodihedral angles, θ1 and θ2, defined by the center-of-masses (COMs) of four atom groups in B were chosen as reaction coordinates.

Reaction coordinates utilized to build the 2D PMF surfaces for RNA 1×1 UU base pairs. In A, the torsional angle highlighted in red is χ (O4′–C1′–N1–C2) responsible for the pan class="Gene">syn ↔ anti transformation. In order to investigate the base stacking ↔ unstacking, pseudodihedral angles, θ1 and θ2, defined by the center-of-masses (COMs) of four atom groups in B were chosen as reaction coordinates.

Temperature Jump MD Simulations

Due to the relatively large energy barriers we did not see any syn ↔ anti transformations in the MD simulations of 3×pan class="Chemical">CUG3SYW and 3×CUGinf (except for one case, which had low Na+ concentration). Thus, we studied the structural properties of 1×1 UU base pairs with respect to temperature using 1×CUG. Two systems were considered: one where the 1×1 UU base pair is in an anti–anti conformation (χ = 200° and θ1 = 40°) and another where it is in a syn–anti conformation (χ = 40° and θ1 = 350°). For each case, 10 independent MD simulations were run where the temperature was increased from 300 to 400 K by 5 K for each 1 ns MD time interval. Once the temperature reached 400 K, another 19 ns MD was run in each case. No restraints were imposed on the system. MD simulations were carried out with the pmemd.MPI module of AMBER12.[52] A total of over 800 ns MD was run in these simulations. We should note that this particular calculation is somewhat similar to simulated annealing protocol and was done to force the system to overcome the energy barriers responsible of syn ↔ anti transformation. Because 400 K is higher than the boiling point, a full MD would not mean much.

Umbrella Sampling of RNA Mononucleosides

We utilized umbrella sampling MD simulations to predict the 1D PMF profiles of A, G, U, and pan class="Chemical">C mononucleosides along the χ torsional axis (Figure A). The procedure is similar to the umbrella sampling simulations described above, except that only the individual mononucleosides were solvated with 738 TIP3P water molecules[54] and simulated. χ torsional angles were incremented by 10° that created 36 umbrella sampling windows for each system. After minimization and 1 ns equilibration, each window was simulated for 5 ns. The χ torsions in the umbrella sampling MD simulations were restrained in a square bottom well with parabolic sides and 30 kcal/(mol rad2) force constants.

Discrete Path Sampling

The discrete path sampling[66,67] method (DPS) has previously been used to study the energy landpan class="Disease">scapes for a diverse range of atomic and molecular systems.[68−70] In particular, this approach has proved to be effective in exploring landscapes that feature broken ergodicity,[71] leading to multiple relaxation time scales. DPS is based on geometry optimization techniques and is efficient in sampling “rare event” dynamics, which are typically inaccessible in conventional unbiased simulation methods.[72,73] In the present work we have used DPS to explore the energy landscape of the RNA CUG repeats, to supplement our results from unbiased molecular dynamics and umbrella sampling simulations (see Section S2 in the Supporting Information for the details). DPS simulations were seeded from conformations corresponding to the major basins on the 2D PMF of the single pan class="Chemical">CUG repeat, choosing a diverse range of pseudotorsion and χ torsion values. These conformations were locally minimized using a modified version of the LBFGS algorithm,[74] to build an initial database of minima. Geometry optimizations were deemed to have converged when the root-mean-square gradient fell below 10 kcal mol. Connection attempts were made between these minima in a pairwise fashion, starting from the highest energy minimum in the database to the lowest. Once sufficient connections were found, a disconnectivity graph[75−77] analysis revealed that syn–anti and anti–anti states of the uridine base pairs were likely to emerge as competing funnels on the energy landscape. The stationary point databases were further expanded by refining the discrete paths between the syn–anti and anti–anti end points, using systematic applications of the SHORTCUT BARRIER and UNTRAP schemes, which have been described in detail in previous work[78] and are available within our PATHSAMPLE code.[79] DPS simulations were deemed to have converged sufficiently when the rate constants corresponding to the syn–anti ↔ anti–anti conformational transition were stable to within an order of magnitude. The final stationary point database consisted of 21927 minima and 25467 transition states. Free energies were estimated from the database by employing the harmonic superposition approximation.[80] Here, the molecular partition function is approximated as a sum over harmonic vibrational densities of states corresponding to the catchment basins of each minimum.[81,82]

Analysis

The Ptraj module of AMBER12[52] was used for dihedral, root-mean-square deviation (rmsd), and cluster analyses. The weighted histogram analysis method[83,84] (WHAM) was used to calculate the PMF surfaces using WHAM ver. 2.0.4 written by Alan Grossfield.[85] In the WHAM analysis, the last 1.5 ns data of each umbrella sampling simulation was used, and periodicity was turned on in both dimensions. X3DNA[86] was used to extract global structural properties of 3×CUG3SYW from the MD trajectory.

Calculation of Rate Constants

The rate constants corresponding to the conformational transitions between the syn–anti and anti–anti states were calculated from the stationary point database using a graph transformation method, as described in a previous work.[87] The estimates for the rate constants at 300 K were obtained using a self-consistent regrouping scheme,[88] which recursively lumps minima separated by free energy barriers below a threshold into one macrostate. This is an attractive approach, as the free energy surface then becomes a function of the experimental observation time, due to the lumping of minima separated by low barriers.[89] Furthermore, this method alleviates any bias arising due to the choice of original end points, as they are expanded into ensembles of structures that are assumed to be in local equilibrium on the time pan class="Disease">scale of the rate-determining step of interest.

Disconnectivity Graphs

The potential and free energy landscapes were visualized in the form of disconnectivity graphs.[75−77] The disconnectivity graph provides a powerful yet simple representation of the energy landpan class="Disease">scapes and is free of potential artifacts that can arise from low dimensional projections onto selected order parameters.[90,91] This representation retains all of the kinetic information in the database and thus provides a way to rigorously test the validity of the reaction coordinates employed in our 2D umbrella sampling simulations (see Section S3 in the Supporting Information for the details).

Results and Discussion

Detailed analyses of each system are described below. In summary, we found that the MD trajectories for 3×CUG3SYW and 3×pan class="Chemical">CUGinf show stable 1×1 UU base pairs both in anti–anti and syn–anti conformations. Except for one case (3×CUGinf with low Na+ concentration), no syn–anti ↔ anti–anti transformations are observed. This result was attributed to the high energy barriers observed in umbrella sampling and DPS calculations, which predicted three syn–anti ↔ anti–anti transformation pathways. Temperature jump MD simulations were performed, which revealed that one of the pathways predicted by umbrella sampling and DPS calculations dominates the syn–anti → anti–anti transformation. For illustration we use different colors to represent each stable 1×1 UU base pair conformation to highlight the transformations (Figure ).

Figure 3

Conformational states of 1×1 UU base pairs seen in MD trajectories of 3×CUG3SYW and 3×CUGinf. Color notation was used to represent each state to make visualization of rmsd plots (vide infra) easier. Black (A), red (B), and cyan (C) represent anti–anti conformations with 2, 1, and 0 hydrogen-bond. Green (D), blue (E), yellow (F), and orange (G) are used for syn–anti conformations. Note that, in anti–anti UU conformations, the system is symmetric. As a result, there are two conformations each in A and B representing the same state.

Conformational states of 1×1 UU base pairs seen in MD trajectories of 3×pan class="Chemical">CUG3SYW and 3×CUGinf. Color notation was used to represent each state to make visualization of rmsd plots (vide infra) easier. Black (A), red (B), and cyan (C) represent anti–anti conformations with 2, 1, and 0 hydrogen-bond. Green (D), blue (E), yellow (F), and orange (G) are used for syn–anti conformations. Note that, in anti–anti UU conformations, the system is symmetric. As a result, there are two conformations each in A and B representing the same state.

Structure of r(3×CUG3SYW)

The initial structure of 3×CUG3SYW solved by X-ray diffraction with 2.20 Å resolution is shown in Figure B.[38] The system contains three consecutive RNA pan class="Chemical">CUG repeats with two dangling uridines which were included experimentally to facilitate crystallization. We analyzed the global structural properties of 3×CUG3SYW and compared them to the published crystal structure. Furthermore, the conformational preference of each 1×1 UU base pair, as well as the 2×2 CG/GC base pairs flanked by two neighboring 1×1 UU base pairs (Figure ), along with the dynamics of dangling uridines, were analyzed.

Figure 4

Rmsd analysis of 3×CUG3SYW. See Figure B for residue numbering. Rmsd’s (in Å) of U8U33, G9C10/G31C32, U11U30, G12C13/G28C29, and U14U27 are plotted with respect to the stable conformations shown in Figures and 5. Note that all the UU base pairs are in anti–anti conformations with stable forms represented by black, red, and cyan (Figure A–C, and Movie S1). In the rmsd analysis of the 1×1 UU base pairs (U8U33, U11U30, and U14U27), conformational states with rmsd greater than 1 Å represent symmetry related versions of the states. As noted in Figure , UU base pairs in states colored black and red have two identical conformations representing 2 and 1 hydrogen-bond states, respectively. The plots were created in this form to display the transformations of 1×1 UU base pairs during the trajectory. Note also that the stable GC/CG conformation represented by magenta only forms when the flanking 1×1 UU base pairs are both in the 2 hydrogen-bond and opposite symmetry states (black) (see Movie S2).

Rmsd analysis of 3×CUG3SYW. See Figure B for residue numbering. Rmsd’s (in Å) of U8U33, G9C10/G31C32, U11U30, G12C13/G28C29, and U14U27 are plotted with respect to the stable conformations shown in Figures and 5. Note that all the pan class="Chemical">UU base pairs are in anti–anti conformations with stable forms represented by black, red, and cyan (Figure A–C, and Movie S1). In the rmsd analysis of the 1×1 UU base pairs (U8U33, U11U30, and U14U27), conformational states with rmsd greater than 1 Å represent symmetry related versions of the states. As noted in Figure , UU base pairs in states colored black and red have two identical conformations representing 2 and 1 hydrogen-bond states, respectively. The plots were created in this form to display the transformations of 1×1 UU base pairs during the trajectory. Note also that the stable GC/CG conformation represented by magenta only forms when the flanking 1×1 UU base pairs are both in the 2 hydrogen-bond and opposite symmetry states (black) (see Movie S2).

Figure 5

Stable 2×2 GC/CG conformations flanked by 1×1 UU base pairs. (A) GC base pairs in canonical Watson–Crick base pair forms. (B) One of the GC base pairs is distorted and loses a hydrogen-bond. In this form, the −NH2 group of guanine forms an attractive electrostatic interaction with the 2′-OH group of cytidine. (C) Both GC base pairs are distorted and lose one hydrogen-bond each. Both GC base pair conformations are in the distorted GC form described in B. The distortions happen only when the flanking 1×1 UU base pairs are both in the 2 hydrogen-bond but opposite symmetry states (see Movie S2).

Global Structural Properties of 3×CUG3SYW

The main difference between the 3SYW crystal structure and the 3×CUG3SYW structure is that the former is an almost frozen system while the latter is a system equilibrated at 300 K and thus includes dynamics. The minor (mgw) and major groove widths (Mgw) in the crystal structure of 3SYW are around 15 and 16 Å, respectively (Figure S1). Analysis of the MD trajectory shows that the mgw fluctuates around the crystal structure value (Figure S1), which is globally in an A-form conformation. On the other hand, the MD trajectory shows a Mgw that fluctuates around 20 Å. The expanded Mgw suggests that the inclusion of 1×1 pan class="Chemical">UU internal loops gives flexibility to the system. To understand the flexibility of each region in the RNA duplex, we analyzed the rmsd of 3×CUG3SYW with respect to the crystal structure. In the analysis, the sequence was broken down into 2 and 5 base pair segments so that we could investigate which regions in the RNA were responsible for the flexible behavior (Figure S2). The results show that the regions including UU base pairs are more flexible than the other regions in 3SYW (Figure S2). Globally, the central region, which includes 3 CUG repeats, is more flexible than the terminal regions, which includes Watson–Crick GC and Wobble GU base pairs (Figure S2). This result is expected because individual fluctuations of 1×1 UU internal loops will accumulate and yield global flexible regions as seen in 3SYW. It is noteworthy, however, that the overexpanded Mgw might be a force field artifact. Even though it is known that RNA internal loops introduce flexibility into the system, the overexpanded Mgw might be due to an improper description of base stacking and/or backbone torsions.

Conformational Preference of 1×1 UU Base Pairs

The crystal structure of 3SYW has three 1×1 UU base pairs in anti–anti conformations. Cluster analyses of the MD trajectory show three unique anti–anti conformations for the 1×1 pan class="Chemical">UU base pairs, in line with the previous experimental results,[40] which represent 2, 1, and 0 hydrogen-bond states, colored black, red, and cyan, respectively, in the rmsd plots (Figures and 4). During the 1 μs MD simulation, we observed anti–anti ↔ anti–anti transformations numerous times (Figure and Movie S1). Hydrogen-bond analyses indicate that the 1×1 UU base pairs populate the 2, 1, and 0 hydrogen-bond states with probabilities 43%, 41%, and 16%, respectively. Furthermore, in the regions where the 2×2 GC/CG base-pairs are in canonical Watson–Crick forms the black ↔ red transformations are very fast because the conformational states are very similar to each other (Figures A,B and 4). The 0 hydrogen-bond state (cyan) seems to act like an intermediate state and is less stable than to the 1 (red) and 2 (black) hydrogen-bond states. Hence, the dynamic behavior of the 1×1 UU base pairs yields flexible regions in the RNA structure described above (Figure S2).

Conformational States of 2×2 GC/CG Base-Pairs Flanked by UU Base-Pairs in 3SYW

The structure of 3×CUG3SYW (Figure B) is not as rigid as an infinitely long RNA. Thus, the effects of its flexibility will appear in weak regions of 3×pan class="Chemical">CUG3SYW. The 1×1 UU base pairs will be the first to acquire this flexibility, as noted above. The next weakest regions are 2×2 GC/CG base-pairs flanked by these UU base-pairs. Even though the GC base pairs in these regions are in Watson–Crick forms, they will not be as rigid as in regular RNA stem regions due to the flanking UU base pairs. We observed similar results when analyzing RNA CAG and CCUG repeat expansions.[48,49] Indeed, our analysis shows that 2×2 GC/CG regions flanked by UU base-pairs are flexible and can have a stable noncanonical conformation (Figures and 5C and Movie S2). In the rmsd analysis of G9C10/G31C32 and G12C13/G28C29 the magenta color represents this stable noncanonical conformation (Figure C), which persists for as long as 200 ns, as is apparent during 420 and 610 ns intervals in the MD simulation (magenta color in Figure ). Note, however, that this stable noncanonical GC/CG conformation only forms when the flanking 1×1 UU base pairs are both in stable 2 hydrogen-bond and opposite symmetry states (black) (Figure and Movie S2). In this noncanonical GC/CG form, each GC base pair loses two Watson–Crick hydrogen-bonds, while forming an attractive electrostatic interaction between the −NH2 group of guanine and the 2′-OH group of the ribose of its Watson–Crick partner (Figure ). The results indicate that if the flanking base pairs (in this case the 1×1 UU) transform to energetically favorable states they can distort the repeating 2×2 GC/CG regions (Movie S2). This particular phenomenon is one of the reasons why we observe an overexpanded Mgw in the MD trajectory (Figure S1). As described above, the particular distorted GC/CG conformation shown in Figure forms when the flanking UU base pairs are in 2 hydrogen-bond states (Figure ). The overexpression of the 2 hydrogen-bond states might be a force field artifact, which could indirectly cause the overexpanded Mgw we observed (Figure S1). Therefore, caution is required when analyzing the results, even though we are using an improved RNA force field. Stable 2×2 GC/CG conformations flanked by 1×1 UU base pairs. (A) GC base pairs in canonical Watson–Crick base pair forms. (B) One of the GC base pairs is distorted and loses a pan class="Chemical">hydrogen-bond. In this form, the −NH2 group of guanine forms an attractive electrostatic interaction with the 2′-OH group of cytidine. (C) Both GC base pairs are distorted and lose one hydrogen-bond each. Both GC base pair conformations are in the distorted GC form described in B. The distortions happen only when the flanking 1×1 UU base pairs are both in the 2 hydrogen-bond but opposite symmetry states (see Movie S2).

Structural Preference of Dangling Uridine Bases in 3SYW

As noted above, the RNA sequence shown in Figure B includes dangling uridines, which were used experimentally to help crystallize the system. We kept these pan class="Chemical">uridine ends and analyzed their structural preferences using 3×CUG3SYW. The initial conformations of the dangling uridine ends are shown in Figures B and 6A. At later times, however, they transform to a stable state where U2 and U21 are both unstacked, while U1 and U20 are stacked on top of the guanine of the terminal GC base pairs (G3C38 and G22C19) (Figure C). In this form, both U1 and U20 have flipped upside down sugars (Figure C), as seen previously in NMR and MD analyses of the single-stranded RNA structure GACC.[58] Crystal structures basically are frozen systems, while MD simulations include dynamics. Although the crystal structure of 3SYW has the dangling uridine ends being inserted into the RNA major grooves, this does not mean that they will remain there when the system is equilibrated at room temperature. Indeed, the MD simulation of 3×CUG3SYW shows the dangling uridine ends in very different conformations, where stacking interactions play an important role in the final conformations (Figure C). Once the dangling uridine ends transformed to this final stable conformation (Figure C) they stayed there for the rest of the MD simulation (see Supporting Information). However, we should note that this particular result might be a force field artifact caused by the improper description of the backbone, the nonbonded interactions, or solvent molecules. Chen and Garcia recently showed that the van der Waals parameters of adenine base atoms required revision,[92] and a similar revision might be required for uridine.

Figure 6

Conformational transformation of the dangling uridine bases (from the MD trajectory of 3×CUG3SYW). (A) Initial state where dangling uridines are bent toward the RNA major groove. (B) Dangling uridines stacked over the RNA terminal base pairs. (C) Final stable state where uridine (U1 and U20) is flipped upside down and stacked over the RNA terminal base pair, while the other uridine (U2 and U19) is unstacked. For residue numbering see Figure B.

Conformational transformation of the dangling uridine bases (from the MD trajectory of 3×pan class="Chemical">CUG3SYW). (A) Initial state where dangling uridines are bent toward the RNA major groove. (B) Dangling uridines stacked over the RNA terminal base pairs. (C) Final stable state where uridine (U1 and U20) is flipped upside down and stacked over the RNA terminal base pair, while the other uridine (U2 and U19) is unstacked. For residue numbering see Figure B.

Structure of r(3×CUGinf)

In DM1, expanded repeats can have sizes between 50 and 10000.[9] These RNA pan class="Chemical">CUG repeats adopt hairpin structures with very long nonhairpin regions.[93] Hence, the system in Figure C was designed specifically to study the behavior of 1×1 UU base pairs in an infinitely long linear RNA system with continuous CUG repeats. Because a full turn of an A-form RNA duplex requires ∼11 basepairs, we designed the system shown in Figure C, such that the RNA duplex is along the z-axis, has 11 basepairs, and stacks perfectly with the neighboring RNA duplexes when periodic boundary conditions are turned on. No restraints (either positional or torsional) were imposed on the RNA duplex; stacking interactions between the neighboring GC base pairs keep the system in a linear form automatically. This trick, which we previously applied to RNA CCUG repeats to investigate the dynamic properties of CU/UC internal loops,[49] allows us to design an infinitely long linear RNA duplex (see Movie S3). Although r(3×CUGinf) is not a perfect RNA CUG repeat expansion, it is rather a linear RNA duplex having three continuous CUG repeats and then a GGCC/CCGG tetramer region (see Figure C). If we leave off the “end-fraying” effects in the finite duplexes, this does give the same structural properties of finite duplexes. Ultimately the periodic trick will break down when the length of the structure is longer than the persistence length. Nevertheless, the designed system allowed us to “implicitly” force the RNA duplex to stay in a linear form due to the stacking of the neighboring RNA duplexes and, hence, study the behavior of 1×1 UU base pairs in a linear RNA geometry (see Supporting Information). Two MD simulations were run; one where the initial conformations of all the UU base pairs were anti–anti (3×CUGinf/anti–anti) and another where the initial conformations of U3U20 and U9U14 were anti–anti and U6U17 was syn–anti (3×CUGinf/syn–anti). We further completed two independent MD simulations on 3×CUGinf/syn–anti, which had low Na+ concentration (0.24 mM).

3×CUGinf/anti–anti

A 500 ns MD simulation was run for this system. The initial conformations of all the UU base pairs were anti–anti and stayed in this form during the MD simulation (Figure ). Similar to the results seen for 3×pan class="Chemical">CUG3SYW, three 1×1 UU base pair conformations are observed in the MD trajectory representing 2, 1, and 0 hydrogen-bond states (Figures and 7). Even though no anti–anti → syn–anti transformations are seen, numerous anti–anti ↔ anti–anti transformations are observed (Figure ). A hydrogen-bond analysis indicates that the 1×1 UU base pairs in 3×CUGinf/anti–anti populate states with 2, 1, and 0 hydrogen-bonds with probabilities 33%, 60%, and 7%, respectively. The reason for the increase in the 1 hydrogen-bond state compared to 3×CUG3SYW is due to the rigidity of the 2×2 GC/CG regions flanked by 1×1 UU base-pairs in 3×CUGinf. Structurally, the 2 (black) and 1 (red) hydrogen-bond states are alike, while the 0 (cyan) hydrogen-bond state acts like an intermediate (Figure A–C). Thus, the black ↔ red transformations will be fast if no large distortions occur in the 2×2 GC/CG regions. In 3×CUG3SYW, because 2×2 GC/CG regions are flexible the 1×1 UU base pairs have a higher probability of transforming to the stable 2 hydrogen-bond states. On the other hand, analyses showed that the conformations of the 2×2 GC/CG regions in 3×CUGinf/anti–anti were always in canonical Watson–Crick GC base pairs with 3 hydrogen-bonds. The results indicate that the 2×2 GC/CG regions in 3×CUGinf/anti–anti are structurally more rigid than the ones in 3×CUG3SYW. As a result, it is harder for the 1×1 UU base-pairs to transform to a stable 2 hydrogen-bond state without distorting the 2×2 GC/CG regions in 3×CUGinf/anti–anti.

Figure 7

RMSD analysis of 1×1 UU base pairs seen in the 3×CUGinf/anti–anti MD simulation (See Figure C for residue numbering). The initial structures of all the UU base pairs were in anti–anti. The color scheme described in Figure was used to display the conformational transformations. Note that black and red conformations (Figure ) represent 2- and 1-hydrogen-bond states, which can be described by two symmetric orientations (Figure A,B). Rmsd states greater than 1 Å represent the symmetry related orientations.

RMSD analysis of 1×1 UU base pairs seen in the 3×pan class="Chemical">CUGinf/anti–anti MD simulation (See Figure C for residue numbering). The initial structures of all the UU base pairs were in anti–anti. The color scheme described in Figure was used to display the conformational transformations. Note that black and red conformations (Figure ) represent 2- and 1-hydrogen-bond states, which can be described by two symmetric orientations (Figure A,B). Rmsd states greater than 1 Å represent the symmetry related orientations.

3×CUGinf/syn–anti

An over 600 ns MD simulation was run for this system. As described above, the initial conformation was designed to have the middle UU base pair (pan class="Chemical">U6U17) in a syn–anti conformation and the terminal UU base pairs (U3U20 and U9U14) in anti–anti. This design enables us to investigate potential syn–anti → anti–anti transformation pathways. Figure shows the rmsd and Na+ binding analyses for this system. Except for one case, which had low Na+ concentration, no anti–anti ↔ syn–anti transformations were observed. Similar to the 3×CUGinf/anti–anti results, numerous anti–anti ↔ anti–anti transformations were seen for U3U20 and U9U14, where 2, 1, and 0 hydrogen-bond states were occupied with probabilities 47%, 48%, and 5%, respectively. However, U6U17, which stayed in the syn–anti conformation, exhibited only 1 and 0 hydrogen-bond states with probabilities 93% and 7%, respectively. The green and blue conformations shown in Figure display 1 hydrogen-bond state when the 1×1 UU base pair has a syn–anti conformation. Even though no syn–anti → anti–anti transformation was observed for U6U17 in this system we see that there are at least four times where U6, which is in the syn conformation, unstacked from the helical axis, colored orange in the rmsd plot (Figures G and 8). In this orange state (Figure G), U6 is unstacked from the helical axis via the minor groove but does not complete the syn → anti transformation (Movie S4). One reason for not observing the syn → anti transformation is the Na+ binding pocket seen near U6 (Figure ). The transformation of U6 from a stacked state (green, blue, and yellow colored states displayed in Figure ) to an unstacked state (orange) only happens when the Na+ binding pocket is not occupied by a Na+ ion (Movie S4). However, this state was short-lived because Na+ ions were found to rapidly occupy the binding pocket even after U6 tried to unstack and then stabilize the syn–anti UU conformation by bringing back U6 to the stacked conformation (see MD simulation times ∼205, ∼ 360, ∼ 420, and ∼545 ns in Figure ). As a result, no syn → anti transformation was observed.

Figure 8

Rmsd of 1×1 UU base pairs and Na+ binding to the pocket seen in 3×CUGinf/syn–anti MD simulations (see Figure C for residue numbering). The initial structures of the terminal UU base pairs, U3U20 and U9U14, were anti–anti, while U6U17 was syn–anti. The color scheme described in Figure was used to display conformational transformations of the UU base pairs. See Figure caption for details of the rmsd analysis. A Na+ binding pocket was observed near U6 when it was in the syn conformation (Figure ). In the Na+ binding analysis different colors represent different Na+ ions in the system where the distance between each ion and the center-of-mass (COM) of the three atoms shown in Figure (O2P and O2 of U6, and N7 of G7) were calculated. Note that when the binding pocket was not occupied by a Na+ ion, U6 tried to unstack from the helical axis and transformed to the orange state (Figure G). However, this state was short-lived, and no syn ↔ anti transformation was observed (Movie S4).

Figure 9

Na+ binding pocket was observed near U6 when it was in the syn orientation. When occupied by a Na+ ion, there are three attractive electrostatic interactions. As shown in Figure , these interactions stabilize the syn–anti UU orientation. When no Na+ ion was present in the pocket, U6 unstacked via the minor groove (Movie S4). A full syn → anti transformation, however, was not observed, due to the rapid occupation of the pocket by Na+ ions. This can be seen in the rmsd analyses of U6U17 (Figure ) where the orange colored state has a short lifetime.

Rmsd of 1×1 UU base pairs and Na+ binding to the pocket seen in 3×pan class="Chemical">CUGinf/syn–anti MD simulations (see Figure C for residue numbering). The initial structures of the terminal UU base pairs, U3U20 and U9U14, were anti–anti, while U6U17 was syn–anti. The color scheme described in Figure was used to display conformational transformations of the UU base pairs. See Figure caption for details of the rmsd analysis. A Na+ binding pocket was observed near U6 when it was in the syn conformation (Figure ). In the Na+ binding analysis different colors represent different Na+ ions in the system where the distance between each ion and the center-of-mass (COM) of the three atoms shown in Figure (O2P and O2 of U6, and N7 of G7) were calculated. Note that when the binding pocket was not occupied by a Na+ ion, U6 tried to unstack from the helical axis and transformed to the orange state (Figure G). However, this state was short-lived, and no syn ↔ anti transformation was observed (Movie S4). Na+ binding pocket was observed near U6 when it was in the pan class="Gene">syn orientation. When occupied by a Na+ ion, there are three attractive electrostatic interactions. As shown in Figure , these interactions stabilize the syn–anti UU orientation. When no Na+ ion was present in the pocket, U6 unstacked via the minor groove (Movie S4). A full syn → anti transformation, however, was not observed, due to the rapid occupation of the pocket by Na+ ions. This can be seen in the rmsd analyses of U6U17 (Figure ) where the orange colored state has a short lifetime. In order to investigate further the role of Na+ ions, we designed another 3×CUGinf system, which had a low Na+ ion concentration. We did two independent MD simulations on this new system. In the first set, we observed a pan class="Gene">syn → anti transformation within 60 ns (Figure S3). This transformation happened right after the loss of a Na+ ion from the binding pocket (Figure S3). In the second set, however, we did not observe any syn → anti transformation within 200 ns (Figure S4). During the MD simulation of the second set, the Na+ binding pocket was occupied by different Na+ ions almost all the time (Figure S4). The results indicate that the syn–anti UU conformation is intrinsically stable and is further stabilized by Na+ ions through the Na+ binding pocket.

Umbrella Sampling Results for 1×1 UU Base Pairs

The MD results for 3×CUG3SYW and 3×CUGinf have provided details of the dynamics of each system. Except for one case (3×CUGinf with low Na+ concentration), we did not observe any syn–anti ↔ anti–anti transformation for 1×1 UU base pairs. To further explore this result, we utilized umbrella sampling MD simulations to build two 2D PMF surfaces, (θ1,χ) and (θ1,θ2) (variables defined in Figure ), to extract the transformation pathways for 1×1 UU base pairs as well as other potentially stable UU conformations. Note, however, that the calculated 2D PMF surfaces are approximations to the true multidimensional energy landscapes. For example, in the analysis we assumed that the flanking GC basepairs will not be distorted during base stacking ↔ unstacking. It is possible that the flanking GC basepairs might stack on top of each other once the uridines are fully unstacked from the helical axis. Such phenomena were omitted from the analysis. The choice of pseudodihedral shown in Figure B was made after careful analysis, where we tried other reaction coordinates to mimic base stacking ↔ unstacking. The use of pseudodihedrals to study base flipping in nucleic acids was previously done by us and others.[48,94−96]

2D (θ1,χ) PMF Surface

Figure shows the 2D (θ1,χ) PMF surface predicted by umbrella sampling calculations for 1×1 UU base pairs. There is excellent overlap of the (θ1,χ) distributions calculated for each umbrella sampling run (Figures S5 and S6). Three pathways were found for pan class="Gene">syn–anti ↔ anti–anti transformations: P1, where uridine unstacks via the minor groove in order to make the syn ↔ anti transformation, P2, where uridine follows a direct transformation from syn ↔ anti, while stacked within the RNA helical axis, and P3, where uridine unstacks via the major groove to perform the syn ↔ anti transformation. These pathways are similar to the ones predicted for RNA CAG repeats, but details of the free energy landscape (Figure S7) are different.[48] Energy barriers in the 2D (θ1,χ) PMF surface predicted for 1×1 AA base pairs are roughly 2 to 4 kcal/mol lower than for the 1×1 UU base pairs considered in Figure (see also Figure S7). This result is expected because adenosine mononucleoside is known to prefer ∼40% syn and ∼60% anti, while cytidine is known to prefer over 90% anti,[51] which results in different energy barriers (Figure S7).

Figure 10

2D (θ1,χ) PMF surface predicted for 1×1 UU base pairs by umbrella sampling calculations. The results include 11 minima, where (g) is the global minimum conformation (anti–anti). Three transformation pathways for syn–anti ↔ anti–anti are predicted. Energy barriers, such as the ones seen in b ↔ e and j ↔ g transformations, might be the reason why we could not see any syn–anti ↔ anti–anti transformation in regular MD simulations at 300 K. See Figure S7 for a comparison with the results previously obtained for 1×1 AA base pairs.[48]

2D (θ1,χ) PMF surface predicted for 1×1 UU base pairs by umbrella sampling calculations. The results include 11 minima, where (g) is the global minimum conformation (anti–anti). Three transformation pathways for pan class="Gene">syn–anti ↔ anti–anti are predicted. Energy barriers, such as the ones seen in b ↔ e and j ↔ g transformations, might be the reason why we could not see any syn–anti ↔ anti–anti transformation in regular MD simulations at 300 K. See Figure S7 for a comparison with the results previously obtained for 1×1 AA base pairs.[48]

2D (θ1,θ2) PMF Surface

Figure shows the 2D (θ1,θ2) PMF surface predicted by umbrella sampling calculations for 1×1 UU base pairs (for details of the reaction coordinates see Figure B). The overlap of the (θ1,θ2) distributions calculated for each umbrella sampling run is shown in Figure S8. This particular analysis was performed to investigate the correlation of stacking ↔ unstacking of each base in a 1×1 pan class="Chemical">UU base pair. Note, however, that the initial conformations of the 1×1 UU base pairs are anti–anti. Thus, the effect of syn ↔ anti transformations is omitted in this analysis. First we note that the 2D PMF surface is symmetric, which is expected for 1×1 UU base pairs in the RNA CUG (Figure ). Second, we find that unstacking of each uridine is negatively correlated. Hence, when one of the uridines starts to unstack the other uridine in the 1×1 UU base pair tends to stay in the helical axis. This result is illustrated in Figure with double-ended arrows, here emphasizing the channels directing the stacking and unstacking for each uridine. Third, the results indicate the lowest free energy states describing the different stacked 1×1 UU base pair conformations. States A and B in Figure represent a mix of 2 and 1 hydrogen-bond conformations, while state C represents a mix of 1 and 0 hydrogen-bond. As described before, the conformations of 2 and 1 hydrogen-bond UU base pairs are almost identical and, therefore, overlap in Figure . A detailed hydrogen-bond analysis is displayed in Figure S9. In the 2D (θ1,θ2) PMF surface, regions around (θ1 = 40, θ2 = 70) and (θ1 = 70, θ2 = 40) represent 2 hydrogen-bond conformations, while the other regions of states A and B shown in Figure represent 1 hydrogen-bond conformations (Figure S9). The A ↔ B, A ↔ C, and B ↔ C transformations displayed in Figure were also observed in the MD simulations of 3×CUG3SYW and 3×CUGinf (Figures and 7).

Figure 11

2D (θ1,θ2) PMF surface predicted for 1×1 UU base pairs by umbrella sampling calculations. Double-ended arrows indicate the channels directing the stacking and unstacking of uridine in a 1×1 UU base pair. The lowest free energy states describe different stacked 1×1 UU base pair conformations. A and B represent a mix of 2 and 1 hydrogen-bond states, while C represents a mix of 1 and 0 hydrogen-bond states (see Figure S9 for hydrogen-bond analyses).

2D (θ1,θ2) PMF surface predicted for 1×1 UU base pairs by umbrella sampling calculations. Double-ended arrows indicate the channels directing the stacking and unstacking of pan class="Chemical">uridine in a 1×1 UU base pair. The lowest free energy states describe different stacked 1×1 UU base pair conformations. A and B represent a mix of 2 and 1 hydrogen-bond states, while C represents a mix of 1 and 0 hydrogen-bond states (see Figure S9 for hydrogen-bond analyses).

Comparison of Energy Barriers and Base Orientations in RNA Mononucleosides

As described above, the syn and anti are two important RNA conformations. Even though the 2D (θ1,χ) PMF profile displayed high energy barriers, we also investigated individual RNA pan class="Chemical">mononucleosides of A, G, U, and C, to see if there are intrinsic differences between purines and pyrimidines. Umbrella sampling calculations were utilized to create the 1D PMF profiles for each RNA residue along the χ torsional axis (Figure S10). The results indicate that pyrimidines mostly prefer anti, while purines can have significant populations in syn (Figure S10). In A and G, ΔΔGsyn→anti (ΔGanti – ΔGsyn) differences are approximately 1.40 and 1.21 kcal/mol, respectively, while in U and C they are −0.39 and −0.21 kcal/mol, respectively (Figure S10). The percentage of anti conformations extracted from ΔΔGsyn→anti for A, G, U, and C at 300 K are 9%, 12%, 66%, and 58%, respectively, in line with previous results.[51] Furthermore, the energy barriers seen in purines are lower than the ones seen in pyrimidines (Figure S10). In 1D PMF profiles, there are two barriers for the syn ↔ anti transformation: one around χ ∼ 120° (PB) and the other around χ ∼ 360° (PA) (Figure S10). While the energy barrier around χ ∼ 120° is only slightly larger in pyrimidines, the barrier around χ ∼ 360° is over 2 kcal/mol higher than for purines (Figure S10). The results indicate that purines in free mononucleoside forms would prefer the syn orientation over anti and will readily sample the syn ↔ anti transformation via PA and PB due to the low energy barriers, while pyrimidines would prefer anti over syn in free mononucleoside forms and will follow PB to perform the syn ↔ anti transformation (Figure S10).

Results of Temperature Jump MD Simulations

Even though the umbrella sampling calculations predicted three transformation pathways for syn–anti ↔ anti–anti, we did not observe any such transformation in the 600 ns MD simulation of 3×pan class="Chemical">CUGinf/syn–anti (except when Na+ concentration was low). An intermediate state represented in orange (Figure G), corresponding to the potential minimum state b shown in Figure , was observed in the MD trajectory of 3×CUGinf/syn–anti (Figure ) where U6 (in syn conformation) unstacked from the helical axis. However, the b → e transformation through P1 was not observed (Figure ) even though this pathway was followed by 1×1 AA base pairs in the syn–anti → anti–anti transformation.[48] One reason for not seeing the full transformation is due to the high energy barriers observed in the free energy profile of the 1×1 UU base pair (Figure ), which were increased by the Na+ binding pocket near UU stabilizing the syn–anti UU conformation (Figures and 9). We therefore utilized 1×CUG (Figure A) and increased the temperature of the system slowly from 300 to 400 K to study the syn–anti ↔ anti–anti transformations in 1×1 UU base pairs. Ten independent MD simulations starting with syn–anti and anti–anti 1×1 UU base pair conformations were run. Six out of 10 of the syn–anti systems ended up with anti–anti UU conformations, while no anti–anti → syn–anti was observed. The syn–anti → anti–anti transformations observed in the temperature jump MD simulations occurred via P1 and P3, which were followed 1 and 5 times, respectively (for definitions of pathways see Figure ). Conformational states observed in the temperature jump MD simulations and the pathways followed during the syn–anti → anti–anti transformations are shown in Figure S11. Pathways P1 and P3 include unstacking via minor and major grooves, respectively. In the temperature jump MD simulations, even though numerous i ↔ b transformations were observed, only one of them overcame the energy barrier and followed the b ↔ e transformation (Figure ). These results are different from the corresponding RNA CAG results, where the favorable syn–anti ↔ anti–anti pathway is via P1 in 1×1 AA base pairs (Figure S7).[48] Indeed, the relative energy barriers of b ↔ e and j ↔ g transformations are 2.4 and 6.0 kcal/mol in the 2D (θ1,χ) PMF surface for 1×1 AA base pairs, while the same barriers for 1×1 UU base pairs are 4.4 and 3.8 kcal/mol, respectively (Figure S7). As a result, P1 and P3 are the favored syn–anti ↔ anti–anti transformation pathways for 1×1 AA and UU base pairs, respectively, due to the associated energy barriers.

Energy Landscape of 1×1 UU Base Pairs (Disconnectivity Graphs)

The potential and free energy disconnectivity graphs for the RNA system containing one CUG repeat are shown in Figures and S12, respectively. The free energy disconnectivity graph was computed at 300 K using a regrouping threshold of 2 kcal/mol.[76,97]

Figure 12

Potential energy disconnectivity graph for RNA CUG repeat. The red and blue branches lead to minima with the UU base pair in syn–anti and anti–anti conformations, respectively. The green and pink branches lead to minima, in which the uridine is unstacked via minor and major grooves, respectively. Some representative structures corresponding to different conformational states are also shown.

Potential energy disconnectivity graph for RNA CUG repeat. The red and blue branches lead to minima with the pan class="Chemical">UU base pair in syn–anti and anti–anti conformations, respectively. The green and pink branches lead to minima, in which the uridine is unstacked via minor and major grooves, respectively. Some representative structures corresponding to different conformational states are also shown. To make a direct comparison with the 2D PMF obtained from umbrella sampling calculations, the graphs were colored according to the pseudotorsion and χ torsion angles. The disconnectivity graph in Figure clearly illustrates the double-funnel character of the underlying potential energy landscape. The anti–anti and the pan class="Gene">syn–anti states of the UU base pair corresponding to the two major funnels are colored blue and red, respectively. This observation is in agreement with the 2D PMF obtained from umbrella sampling calculations, where the two major free energy basins correspond to the anti–anti and syn–anti conformational states. The two funnels are separated by high energy barriers, leading to two competing morphologies. To quantify the barriers separating the two conformational states at 300 K, we have estimated the rate constants for the syn–anti ↔ anti–anti transformations using the graph transformation[87] employing a regrouping threshold[88] of 2 kcal/mol. This choice is appropriate, as similar values were obtained in terms of rate constants and disconnectivity graphs for a range of values around this selection. The rate constant for the conformational transition from syn–anti to anti–anti is estimated as 17240 s. The reverse transformation (i.e., anti–anti → syn–anti) is at least 2 orders of magnitude slower, with an estimated rate constant of 174 s. These rate constants approximately correspond to time scales of 58 μs and 6 ms, respectively. The presence of such competing morphologies separated by high barriers is normally associated with multiple relaxation time scales and features in the heat capacity profile.[71] The minima corresponding to the flipped-out conformations of the UU base pair are colored green and purple for unstacking through the minor and major grooves, respectively. These conformations do not constitute any major funnel on the energy landpan class="Disease">scape and are distributed over a wide energy range in the disconnectivity graph. The free energy disconnectivity graph at 300 K (Figure S12) displays essentially the same features as the potential energy landscape, with two major funnels corresponding to the syn–anti and anti–anti states. In general, we find that the selected order parameters distinguish the two major funnels quite well. Nonetheless, the intermixing of colors in certain regions of the graph suggests that structural order parameters alone are not sufficient to correctly represent the complexities of the underlying landscape.[91,98] Hence, we use the recursive regrouping scheme instead,[88] which is based on kinetic criteria, to estimate the rate constants.

Pathways between syn–anti and anti–anti Conformational States in UU Base Pairs

The kinetically relevant pathways between the syn–anti and anti–anti conformational states of the pan class="Chemical">uridine base pairs were extracted from the stationary point databases using the recursive enumeration algorithm,[88] which is based on kinetic criteria to estimate the rate constants.[99,100] The diverse range of pathways that exist between the two conformational states can be classified in terms of three distinct mechanisms: (i) Pathway P1 (Figure ), in which the uridine flips out via minor groove (Figure S13A) prior to the rotation of the uracil base around its glycosidic bond; (ii) Pathway P2 (Figure ), in which the uridine transforms from syn to anti without unstacking significantly from the helical axis (Figure S13B); and (iii) Pathway P3 (Figure ), in which the uridine flips out via the major groove (Figure S13C) followed by rotation of the uracil base around its glycosidic bond resulting a transformation from syn to anti. For pathways P1 and P3, unstacking of pan class="Chemical">uridine outside the helical axis (via minor and major grooves, respectively) precedes the rotation of the uracil base around its glycosidic bond that causes the syn ↔ anti transformation. These transitions are effectively the rate-determining steps in the two mechanisms. For pathway P2, the rotation around the glycosidic bond resulting in the flip of the χ torsion from syn to anti is associated with the surmounting of a transition state and corresponds to the rate-determining step. Although pathways P1 and P3 have similar path lengths, P3 is associated with relatively lower barriers and emerges as the “fastest pathway” in our kinetic transition network. Pathway P2 is longer than paths P1 and P3 and makes a smaller contribution. Thus, the properties of our kinetic transition network, in particular the existence of three different pathways from syn to anti conformational states as well as the mechanistic details of the transformation, are consistent with the predictions of the 2D (θ1,χ) PMF profile obtained from umbrella sampling.

Targeting RNA CUG Repeats with Small Molecules

In DM1, inhibition of RNA-pan class="Gene">MBNL1 binding through small molecules, which will target 1×1 UU base pairs, could ameliorate the disease. Therefore, understanding the structural and thermodynamic properties of RNA 1×1 UU internal loops could help us discover the binding mechanism of small molecules to RNA internal loops and thus the design rules for drugs, which will effectively target these RNA structures. Previously, we studied the binding mechanism of a substituted naphthyridine (C-2) that targets RNA CUG repeats.[43] The results indicated complex structural transformations of the RNA internal loop in the small molecule binding process. C-2 had a total 16 stable C-2/CUG binding modes. The lowest two binding modes displayed 1×1 UU base pair in conformations near (θ1 = −55, χ = 200) and (θ1 = −55, θ2 = −55). Indeed, (θ1 = −55, χ = 200) represents one of the local minima, “e” in Figure (also see Figure S7). Furthermore, (θ1 = −55, θ2 = −55) is one of the local minima in Figure . In other words, upon C-2 binding to RNA CUG, 1×1 UU base pair was transformed from global minimum g (anti–anti) to local minimum e (anti–anti) (Figure ). Our previous results combined with the current work suggest that small molecules can exploit the stable local minima states of 1×1 UU base pairs if the final bound state is thermodynamically favorable.

Insights into the Binding of Proteins to CUG Repeats

It is known that MBNL1 binds to RNA pan class="Chemical">CUG repeats, but the details are uncertain.[27,101−103] Patel and co-workers solved the crystal structure of the MBNL1 tandem zinc finger 3 and 4 domains (Zn3/4) interacting with a single-stranded RNA, r(CGCUGU), which shows guanosine residues interacting with Zn3/4.[50] Studies have suggested that the binding to CUG hairpins occurs upon a conformational change in the structure that melts the hairpin.[28,50,104] Based on the results of 3×CUG3SYW, we discovered that tandem GC/CG base pairs flanked by 1×1 UU internal loops are less stable than when they are located in RNA stem regions. This result might give insight into how MBNL1 binds to RNA CUG repeat expansions. The distortions seen in tandem GC/CG regions flanked by 1×1 UU base pairs might enhance the binding of MBNL1 to RNA CUG repeat expansions. These studies also suggest that small molecules that stabilize RNA repeats could be effective modulators of cellular function.[104]

RNA CUG Loops in the Literature

Using the CoSSMos database[105] developed by Znosko and co-workers we found 73 RNA structures that have CUG motifs with 1×1 pan class="Chemical">UU loops (Table S2). The biological systems that have the RNA CUG motifs include RNA, ribosomes, zinc finger/RNA, transcription/RNA, and ribosome/antibiotic complexes (Table S2). Importantly, there are many RNAs that have single copies of the CUG motif, which might be targeted through binding to multiple motifs in a CUG hairpin simultaneously[38,106] or using small molecules that cooperatively bind to repeats.[43] The RNA CUG motifs in these structures either interact with Mg2+ ions or have tertiary interactions. Note that all the 1×1 UU base pairs in these CUG motifs have anti–anti conformations. Using an in-house code, we extracted all the RNA CUG motifs from the structures and calculated the (θ1,χ) and (θ1,θ2) values for each 1×1 pan class="Chemical">UU base pair. The results were then plotted on top of the MD results, which showed that the regions sampled by MD simulations were in line with the conformations seen in the literature (Figure S14). Basically, the 1×1 UU base pairs in these structures have 2, 1, or 0 hydrogen-bonds and prefer anti–anti. This observation suggests that the interaction of ions or tertiary interactions does not disturb the base orientations of uridines with respect to sugar, so the 1×1 UU base pairs prefer to stay in anti–anti. Furthermore, our binding studies of compound C-2 to RNA CUG repeats included over 25 μs MD simulations where 1×1 UU base pair stayed in anti–anti almost all the time.[43] These results are in line with the umbrella sampling calculations, where the free energy difference of syn–anti to anti–anti (ΔG – ΔG) is over 5 kcal/mol for 1×1 UU base pairs (Figure ), while no significant difference was seen in the umbrella sampling calculations of RNA CAG repeats (Figure S7).[48]

Summary and Conclusions

Trinucleotide and pan class="Chemical">tetranucleotide repeat expansions caused by mutations in certain genes lead to neuromuscular disorders. A promising therapeutic strategy is to target the transcribed RNA with small molecules. These RNA transcripts are known to form stable hairpin structures with 1×1 or 2×2 internal loops regularly placed in the stem regions. However, a thorough understanding of the atomistic details of RNA hairpins formed by expanded repeats is a must for designing rules and, therefore, better compounds, which will target RNA repeat expansions at nanomolar levels. In the present contribution we utilized regular and umbrella sampling MD simulations of model RNA CUG systems and explained the structural and thermodynamic properties of 1×1 pan class="Chemical">UU internal loops. The results indicate that RNA 1×1 UU internal loops are dynamic and can form more than one stable conformation where the global minimum conformation of the 1×1 UU base pair is anti–anti. Our MD results indicate that the 1×1 UU base pairs have 2, 1, and 0 hydrogen-bond states, which are in line with the 73 NMR/X-RAY structures we analyzed, as well as with previous studies.[40] Moreover, the global picture of the 2D (θ1,χ) PMF surface for 1×1 UU base pairs has the same topology as one we previously determined for 1×1 AA base pairs, but with different energy barriers and free energy differences.[48] While the syn–anti AA conformation is a possible state for RNA CAG repeats upon small molecule binding it is not as stable as anti–anti in RNA CUG repeats. This conclusion follows from umbrella sampling studies where the anti–anti UU conformation is over 5 kcal/mol more favorable than syn–anti. The result is also supported by experimental studies where no syn–anti UU conformations were seen in the reported structures with RNA CUG motifs. Additionally, the lowest free energy binding state of a compound to a model RNA CUG system exhibited a conformation in line with the present results described.[43] Thus, the local minima conformations discovered for 1×1 UU base pairs by the umbrella sampling calculations could be important targets for small molecule binding. Our studies also showed that a Na+ binding pocket similar to the one detected in RNA CAG repeats exists for RNA CUG repeats, and it stabilizes the syn–anti UU conformation. This pocket includes three attractive electrostatic interactions between Na+ ions and RNA backbone/bases, so it could be used in drug design where electropositive groups such as −NH2 and −NH are incorporated in small molecules to utilize this binding pocket for a higher binding affinity. Hence, our results should help explain the mechanism behind small molecule driven structural change in RNA CUG repeat expansions. Our results also indicate that there is an intrinsic difference between RNA CAG and CUG repeats, where pan class="Chemical">purines prefer both syn and anti, while pyrimidines predominantly prefer anti. Furthermore, purine mononucleosides have lower energy barriers compared to pyrimidines. As a result, the differences we see in the 2D PMF surfaces and MD calculations for RNA CAG and CUG repeats can be traced to intrinsic properties in the torsion dependence of the force field. These results are, therefore, dependent on improvements implemented recently in the RNA force field. It was shown that the χ torsional parameters for the AMBER force field needed revision to make physically meaningful computational predictions including structural and thermodynamic properties of RNA mononucleosides, single-stranded RNA, RNA structures with modified residues, and RNA hairpins.[51,57,58,60,107,108] Using these revised RNA force fields, other RNA repeat expansions such as CCUG, CCG, and CGG will be studied and compared to RNA CAG and CUG repeats. These studies should produce insight that can be used in understanding the mechanisms of different RNA repeat expansions and designing drugs that will target them.

83 in total

1. Computer Folding of RNA Tetraloops? Are We There Yet?

Authors: Petra Kührová; Pavel Banáš; Robert B Best; Jiří Šponer; Michal Otyepka
Journal: J Chem Theory Comput Date: 2013-03-07 Impact factor: 6.006

2. Polyglutamine domains are substrates of tissue transglutaminase: does transglutaminase play a role in expanded CAG/poly-Q neurodegenerative diseases?

Authors: A J Cooper; K F Sheu; J R Burke; O Onodera; W J Strittmatter; A D Roses; J P Blass
Journal: J Neurochem Date: 1997-07 Impact factor: 5.372

Review 3. High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations.

Authors: Alan A Chen; Angel E García
Journal: Proc Natl Acad Sci U S A Date: 2013-09-16 Impact factor: 11.205

Review 4. Milestones in ataxia.

Authors: Thomas Klockgether; Henry Paulson
Journal: Mov Disord Date: 2011-05 Impact factor: 10.338

Review 5. Triplet repeats, RNA secondary structure and toxic gain-of-function models for pathogenesis.

Authors: R Galvão; L Mendes-Soares; J Câmara; I Jaco; M Carmo-Fonseca
Journal: Brain Res Bull Date: 2001 Oct-Nov 1 Impact factor: 4.077

6. Structure of even/odd trinucleotide repeat sequences modulates persistence of non-B conformations and conversion to duplex.

Authors: Amalia Avila Figueroa; Douglas Cattie; Sarah Delaney
Journal: Biochemistry Date: 2011-05-04 Impact factor: 3.162

7. The (CGG)n repeat element within the 5' untranslated region of the FMR1 message provides both positive and negative cis effects on in vivo translation of a downstream reporter.

Authors: Li-Sheng Chen; Flora Tassone; Parminder Sahota; Paul J Hagerman
Journal: Hum Mol Genet Date: 2003-09-30 Impact factor: 6.150

Review 8. Pathogenic mechanisms of myotonic dystrophy.

Authors: Johanna E Lee; Thomas A Cooper
Journal: Biochem Soc Trans Date: 2009-12 Impact factor: 5.407

9. RNA structure of trinucleotide repeats associated with human neurological diseases.

Authors: Krzysztof Sobczak; Mateusz de Mezer; Gracjan Michlewski; Jacek Krol; Wlodzimierz J Krzyzosiak
Journal: Nucleic Acids Res Date: 2003-10-01 Impact factor: 16.971

10. Conformational preferences of modified uridines: comparison of AMBER derived force fields.

Authors: Indrajit Deb; Joanna Sarzynska; Lennart Nilsson; Ansuman Lahiri
Journal: J Chem Inf Model Date: 2014-04-03 Impact factor: 4.956

8 in total

1. Computational Investigation of RNA A-Bulges Related to the Microtubule-Associated Protein Tau Causing Frontotemporal Dementia and Parkinsonism.

Authors: David J Wales; Matthew D Disney; Ilyas Yildirim
Journal: J Phys Chem B Date: 2019-01-02 Impact factor: 2.991

2. Structure and Dynamics of RNA Repeat Expansions That Cause Huntington's Disease and Myotonic Dystrophy Type 1.

Authors: Jonathan L Chen; Damian M VanEtten; Matthew A Fountain; Ilyas Yildirim; Matthew D Disney
Journal: Biochemistry Date: 2017-06-29 Impact factor: 3.162

Review 3. RNA Structural Dynamics As Captured by Molecular Simulations: A Comprehensive Overview.

Authors: Jiří Šponer; Giovanni Bussi; Miroslav Krepl; Pavel Banáš; Sandro Bottaro; Richard A Cunha; Alejandro Gil-Ley; Giovanni Pinamonti; Simón Poblete; Petr Jurečka; Nils G Walter; Michal Otyepka
Journal: Chem Rev Date: 2018-01-03 Impact factor: 60.622

Review 4. RNA toxicity and foci formation in microsatellite expansion diseases.

Authors: Nan Zhang; Tetsuo Ashizawa
Journal: Curr Opin Genet Dev Date: 2017-02-14 Impact factor: 5.578

5. Role of salt-bridging interactions in recognition of viral RNA by arginine-rich peptides.

Authors: Lev Levintov; Harish Vashisth
Journal: Biophys J Date: 2021-10-26 Impact factor: 4.033

6. In silico discovery of substituted pyrido[2,3-d]pyrimidines and pentamidine-like compounds with biological activity in myotonic dystrophy models.

Authors: Àlex L González; Piotr Konieczny; Beatriz Llamusi; Estefanía Delgado-Pinar; José I Borrell; Jordi Teixidó; Enrique García-España; Manuel Pérez-Alonso; Roger Estrada-Tejedor; Rubén Artero
Journal: PLoS One Date: 2017-06-05 Impact factor: 3.240

7. Computational Investigation of Bending Properties of RNA AUUCU, CCUG, CAG, and CUG Repeat Expansions Associated With Neuromuscular Disorders.

Authors: Amirhossein Taghavi; Ilyas Yildirim
Journal: Front Mol Biosci Date: 2022-04-11

Review 8. Molecular conformations and dynamics of nucleotide repeats associated with neurodegenerative diseases: double helices and CAG hairpin loops.

Authors: Feng Pan; Yuan Zhang; Pengning Xu; Viet Hoang Man; Christopher Roland; Keith Weninger; Celeste Sagui
Journal: Comput Struct Biotechnol J Date: 2021-04-26 Impact factor: 7.271

8 in total