Despite the importance of the knowledge of molecular hydration entropy (ΔShyd) in chemical and biological processes, the exact calculation of ΔShyd is very difficult, because of the complexity in solute–water interactions. Although free-energy perturbation (FEP) methods have been employed quite widely in the literature, the poor convergent behavior of the van der Waals interaction term in the potential function limited the accuracy and robustness. In this study, we propose a new method for estimating ΔShyd by means of combining the FEP approach and the scaled particle theory (or information theory) to separately calculate the electrostatic solute–water interaction term (ΔSelec) and the hydrophobic contribution approximated by the cavity formation entropy (ΔScav), respectively. Decomposition of ΔShyd into ΔScav and ΔSelec terms is found to be very effective with a substantial accuracy enhancement in ΔShyd estimation, when compared to the conventional full FEP calculations. ΔScav appears to dominate over ΔSelec in magnitude, even in the case of polar solutes, implying that the major contribution to the entropic cost for hydration comes from the formation of a solvent-excluded volume. Our hybrid scaled particle theory and FEP method is thus found to enhance the accuracy of ΔShyd prediction by effectively complementing the conventional full FEP method.
Despite the importance of the knowledge of molecular hydration entropy (ΔShyd) in chemical and biological processes, the exact calculation of ΔShyd is very difficult, because of the complexity in solute–water interactions. Although free-energy perturbation (FEP) methods have been employed quite widely in the literature, the poor convergent behavior of the van der Waals interaction term in the potential function limited the accuracy and robustness. In this study, we propose a new method for estimating ΔShyd by means of combining the FEP approach and the pan class="Disease">scaled particle theory (or information theory) to separately calculate the electrostatic solute–water interaction term (ΔSelec) and the hydrophobic contribution approximated by the cavity formation entropy (ΔScav), respectively. Decomposition of ΔShyd into ΔScav and ΔSelec terms is found to be very effective with a substantial accuracy enhancement in ΔShyd estimation, when compared to the conventional full FEP calculations. ΔScav appears to dominate over ΔSelec in magnitude, even in the case of polar solutes, implying that the major contribution to the entropic cost for hydration comes from the formation of a solvent-excluded volume. Our hybrid scaled particle theory and FEP method is thus found to enhance the accuracy of ΔShyd prediction by effectively complementing the conventional full FEP method.
Repeat
expansion disorders are caused by mutations in DNA where
repeats in certain genes become expanded. Once the repeats are transcribed,
mRNA (mRNA) folds into a hairpin with repeating internal loop motifs,
which can be translated into toxic proteins or sequester proteins
and cause disease.[1,2] The mechanisms by which these
repeats expand are not fully known. One hypothesis is that formation
of non-B-form conformations by the repetitive DNA sequences, such
as hairpins, is the reason for the expansion during DNA replication
and repair.[3−5] In the expansions of CAG repeats, which are typically
found in the coding regions of mRNAs such as Huntingtin (pan class="Gene">HTT), androgen
receptor (AR), spinocerebellar ataxia (SCA), and atrophin-1 (ATN1)
genes, the transcripts are translated into toxic polyglutamine (polyQ)[6] proteins, resulting in Huntington’s disease
(HD), Spinal and Bulbar Muscular Atrophy (SBMA), Spinocerebellar Ataxia
Type 1 (SCA1), and Dentatorubral-pallidoluysian atrophy (DRPLA), respectively.[7−12] Furthermore, it has been shown recently that RNA CAG repeat expansions can sequester proteins and contribute to neurodegeneration.[13−15] Another genetic disorder, Friedreich’s ataxia (FRDA), is
caused by expansion of GAA repeats.[9,16] A common heritable
form of mental retardation, Fragile X-associated tremor ataxia syndrome
(FXTAS), is caused by expansion of the CGG segment from 50 to 200
repeats in the 5′-untranslated region (UTR) of the fragile
X mental retardation 1 (FMR1) gene.[7,16−23] Once the FMR1 gene is transcribed into mRNA, the expanded CGG repeats fold into a hairpin structure with repeating
1×1 GG internal loops that sequester and inactivate Sam68 protein.
Myotonic dystrophy (DM) is another pan class="Disease">genetic disease that is the
most common adult-onset form of muscular dystrophy. DM is a neuromuscular
disorder that is characterized by muscle weakness and slow relaxation
of muscles after contraction.[7,24] It is caused by expansion
of CTG and CCTG repeats in DNA,[7,24] with CTG leading to
myotonic dystrophy type 1 (DM1) and CCTG to type 2 (DM2). In both
DM1 and DM2, once the trinucleotideCTG in the 3′-UTR of the
dystrophia myotonica protein kinase (DMPK) gene and
the tetranucleotideCCTG repeats in the zinc-finger 9 (ZNF9) gene are transcribed, the CUG and CCUG repeats fold into RNA hairpins with repeating 1×1
UU and 2×2 CU/UC internal loop motifs, respectively,[25] attracting muscleblind-like 1 (MBNL1)[26−29] or cytoplasmic multiprotein complexes[30,31] that cause,
for example, muscle chloride channel dysfunction[32,33] and abnormal regulation of insulin receptors.[34] Because the expanded RNA repeats are in the noncoding regions
the diseases are caused by RNA gain-of-function.[7,20,21] Even though the disease mechanisms described
above are the most well established ones, expanded RNA repeats might
play other roles in the cell. As an example, expanded repeats in UTRs
are translated via a repeat associated non-ATG translation.[35]
The crystal structures of model RNA molecules
containing CUG repeats have been determined
before by different groups
using X-ray diffraction methods.[36−39] In addition, an NMR structure
of a single copy of the pan class="Gene">DM1 motif was characterized that displayed
an overall A-form structure.[40] Even though
all these structures show 1×1 UU internal loops in anti–anti conformations, large amplitude motions
in RNA internal loops could not be investigated with the current experimental
methods. The conformations that RNA CUG repeats
adopt upon protein or small molecule binding require a better understanding
of these motions. Helical stacking ↔ unstacking and syn ↔ anti transformation of uridines
in 1×1 UU base pairs are two such important large scale motions
that have significance in small molecule–RNA binding. Various
groups have designed and published small molecules and ligands targeting
RNA CUG repeats,[41−46] but at present there is no experimental structure available that
reports the atomistic details of a small molecule bound to RNA CUG repeats. Thus, the structural and thermodynamic properties
of 1×1 UU base pairs in RNA CUG repeats
are of significant interest, since they may provide insight into the
conformations that these repeats adopt upon interaction with proteins
or small molecules.[42,47]
We recently analyzed the
properties of 1×1 AA internal loops
seen in RNA CAG repeat expansions. The crystal
structure of a model RNA with three CAG repeats,
r[5′UUGGGC(pan class="Mutation">CAG)3GUCC]2, was found to exhibit terminal 1×1
AA base pairs in syn–anti conformations with the middle 1×1 AA base pair anti–anti.[48] The terminal
AA base pairs interact with dangling uridine bases, stabilizing the syn–anti conformational state. The
results basically indicated that a dynamic base pair such as 1×1
AA internal loops could transform to another conformation upon interacting
with a molecule. Thus, we utilized umbrella sampling MD simulations
to predict the 2D free energy surface for 1×1 AA internal loops
to investigate the potential stable conformations and transformation
pathways.[48] In line with the experimental
results, the 2D free energy surface for 1×1 AA internal loops
displayed anti–anti and syn–anti conformations as global
and local minima, respectively. Furthermore, several stable local
minima states in both anti–anti and syn–anti conformations
were found for 1×1 AA internal loops. The results indicated that
the stable 1×1 RNA internal loop conformations could be exploited
for small molecule binding. Indeed, we analyzed the binding mechanism
of a substituted naphthyridine (C-2) to RNA 1×1 UU internal loops,
which inhibits binding of MBNL1 to RNA CUG
repeat expansions.[43] The results indicated
that complex structural transformations are associated with binding.[43] Similar results were seen when we analyzed 6′-N-5-hexynoate
kanamycinalkyne (K-alkyne) binding to 2×2 CU/UC internal loops
in RNA CCUG repeats, which is a lead small
molecule targeting RNA causing DM2.[49] Hence,
atomistic details of the structure and dynamics of RNA internal loops
as well as the transformation pathways extracted from the predicted
free energy profiles can yield insights that might be used to understand
and develop drugs that will target RNA repeat expansions. Although
there are no structures reported for MBNL1 bound to CUG repeats, it is thought that the zinc finger 3 and 4 domains (Zn3/4)
of MBNL1 have some functional roles in the binding process.[50] Thus, the present investigations could also
shed light on this pathogenic interaction.
In the present contribution,
structural and thermodynamic properties
of 1×1 UU RNA internal loops are predicted using regular and
umbrella sampling MD simulations in explicit solvent with an AMBER
force field, which includes a χ torsional revision for RNA.[51] We also exploited the discrete path sampling
(pan class="Chemical">DPS) method to investigate the energy landscape of the 1×1 UU
internal loop in RNA CUG repeats. Three systems
were considered: 3×CUG3SYW (r[5′UUGGGC(CUG)3GUCC]2), 3×CUGinf (r[5′C(CUG)3G]2), and 1×CUG (r[5′CCGCUGCGG]2). 3×CUG3SYW and 3×CUGinf were used to predict the dynamic properties of 1×1
UU base pairs. In particular, 3×CUGinf was designed
to simulate 1×1 UU RNA internal loops in an infinitely long linear
RNA system, while 3×CUG3SYW is an already solved crystal
structure with PDB accession code 3SYW.[38] 1×CUG
was designed to predict the 2D free energy surfaces of 1×1 RNA
UU internal loops using umbrella sampling MD calculations. Similar
to our previous RNA CAG repeat study,[48] one of the reaction coordinates was chosen as
the χ torsion (rotation around the glycosidic bond) in order
to mimic the orientation of the uridine base with respect to sugar,
while the other reaction coordinate was a pseudotorsion (θ)
mimicking base flipping (base stacking ↔ unstacking). Two 2D
potentials of mean force (PMF) surfaces, (θ1,χ)
and (θ1,θ2), were calculated for
the 1×1 UU base pairs. Our results indicate that 1×1 UU
RNA internal loops are dynamic and favor the anti–anti conformation but have several local
minima in the 2D PMF profiles displaying both anti–anti and syn–anti conformations. The 2D (θ1,χ)
PMF exhibits three pathways for the syn–anti ↔ anti–anti transformation, one of which dominates in temperature jump MD simulations.
These results were verified using DPS calculations, which showed that
in the dominant kinetic pathway the uridine flips out via the major
groove, followed by a rotation around χ to produce a syn → anti transformation. In line
with previous studies, the 2D (θ1,θ2) PMF as well as regular MD simulations displayed 1×1 UU base
pairs in 2, 1, and 0 anti–anti hydrogen-bond conformations. Similar to the 1×1 AA base pairs
studied before,[48] a Na+ binding
pocket was discovered near the 1×1 UU base pair, stabilizing
the syn–anti conformation.
Furthermore, MD simulations of 3×CUG3SYW showed 2×2
GC/CG regions flanked by 1×1 UU base pairs to have stable but
distorted forms, which indicates flexibility.
The present results
reveal atomistic details of the structure and
thermodynamics of 1×1 UU basepairs in RNA pan class="Chemical">CUG repeat expansions that are inaccessible with current experimental
methods. This insight should help us to design better compounds targeting
RNA CUG repeats in future work.
Methods
Preparation
of Model Systems for MD Simulations
1×CUG
(Figure A) was employed
to predict the 2D free energy surfaces of 1×1 pan class="Chemical">UU base pairs using
umbrella sampling calculations. 3×CUG3SYW (Figure B) was considered
to model the structural properties of a crystal structure (PDB accession
code 3SYW),[38] which has three consecutive CUG repeats. Finally, 3×CUGinf (Figure C) should provide insight into the dynamics
of 1×1 UU base pairs in an infinitely long linear RNA system.
The initial structure of 1×CUG was built from the already solved
NMR structure (PDB accession code 2L8U).[40] A homology
model of 3×CUGinf was built from the crystal structure
data of 3SYW(38) with UU base pairs in either anti–anti or syn–anti conformations. Two initial structures
were created for 3×CUGinf; one where all 1×1
UU base pairs were in anti–anti conformations (3×CUGinf/anti–anti) and another where the middle UU base pair (U6U17) was in syn–anti (3×CUGinf/syn–anti). This was done in order to investigate potential syn–anti ↔ anti–anti transformations. For the initial structure of 3×CUG3SYW, data from the crystal structure 3SYW(38) was used, including dangling uridine residues, which were
utilized experimentally to crystallize the system. As shown in Figure B, the dangling uridine
residues were kept in the structure in order to mimic the original
system as far as possible. The Leap module of AMBER MD package[52] was used to neutralize all the systems with
Na+ and/or Cl– ions[53] and to solvate with TIP3P water molecules.[54] In 3×CUGinf, 10 extra Cl– and Na+ ions were randomly included in the systems. The
1×CUG and 3×CUG3SYW were designed to have truncated
octahedral boxes, while 3×CUGinf was designed to have
a cubic box that created an infinitely long RNA system (Figure C). The 1×CUG, 3×CUGinf, and 3×CUG3SYW systems had 4025, 2265,
and 10766 water molecules, respectively. Once equilibrated, these
systems had 0.67, 0.21, and 0.18 mM Na+ concentrations,
respectively. We also designed another 3×CUGinf system,
which was neutralized with 20 Na+ ions, had the middle
UU basepair in syn–anti conformation,
and was solvated with 4333 water molecules to yield a Na+ concentration of 0.24 mM. Two sets of MD simulations for this system
were run. The Amber force field[55] with
revised χ[51] and α/γ[56] torsional parameters were used in both regular
and umbrella sampling MD simulations, as these revised parameters
have been shown to improve the predictions.[48,51,57−63]
Figure 1
Model
RNA sequences and initial structures used in regular and
umbrella sampling MD simulations. For illustrative purposes, the hydrogen
atoms of the water molecules are not shown. The oxygen atoms of water
molecules are represented with red dots. Blue and cyan spheres represent
Na+ and Cl– ions, respectively. RNA structures
are shown in the “new ribbons” format.[65] The opaque region of C represents the unit cell, while
transparent regions represent the neighboring cells to illustrate
the design of the infinitely long RNA system. Residue numbers described
for each system are used in the analyses.
Model
RNA sequences and initial structures used in regular and
umbrella sampling MD simulations. For illustrative purposes, the hydrogen
atoms of the pan class="Chemical">water molecules are not shown. The oxygen atoms of water
molecules are represented with red dots. Blue and cyan spheres represent
Na+ and Cl– ions, respectively. RNA structures
are shown in the “new ribbons” format.[65] The opaque region of C represents the unit cell, while
transparent regions represent the neighboring cells to illustrate
the design of the infinitely long RNA system. Residue numbers described
for each system are used in the analyses.
Molecular Dynamics Simulations
Each system was first
minimized and equilibrated in two steps (see also Table S1).[48,57,58] After equilibration, constant pressure dynamics with isotropic positional
scaling was used in all the production runs. The reference pressure
was set to 1 atm with a pressure relaxation time of 2 ps. Bonds involving
pan class="Chemical">hydrogen atoms were constrained with SHAKE.[64] In all the MD simulations an atom-based long-range hard cutoff of
8.0 Å was used. A total of 1 μs MD was run on 3×CUG3SYW. As described above, two independent MD calculations were
run on 3×CUGinf, 3×CUGinf/anti–anti and 3×CUGinf/syn–anti, with total MD
simulation times of 500 and 606 ns, respectively, with a 1 fs time
step. These sets of MD simulations were run in order to characterize
any syn–anti ↔ anti–anti transformations for the
UU base pairs. The temperatures of all MD simulations were kept at
300 K. The PMEMD module in AMBER12[52] was
used to run the MD simulations. Trajectory files were written at intervals
of 5000 fs time steps. For all these calculations, over 138 K CPU
hours were required.
Umbrella Sampling Simulations
The
reaction coordinates
shown in Figure ,
which we previously utilized to study RNA 1×1 AA base pairs,[48] were used to build the 2D PMF surfaces for RNA
1×1 UU internal loops. Two different 2D potential of mean force
(PMF) surfaces, (θ1 vs χ) and (θ1 vs θ2), were calculated for 1×1 pan class="Chemical">UU
base pairs using the model system 1×CUG (Figure A). Nucleic acid bases can rotate around
glycosidic bonds (rotation around χ torsional angle), which
are responsible for the syn and anti base orientations with respect to sugar (Figure A). While purine residues prefer both syn and anti orientations, pyrimidine residues
mostly prefer anti. Hence, one of the reaction coordinates
was chosen to be the χ torsional rotation. The second reaction
coordinate was chosen to imitate base stacking ↔ unstacking,
where the center-of-mass pseudodihedral angle (θ) was defined
to mimic this motion (Figure B). In order to build each 2D PMF surface, 36 × 36 =
1296 umbrella sampling windows were simulated. Initial structures
with different (θ1,χ) and (θ1,θ2) combinations were created for the model system
1×CUG (see Section S1 in the Supporting Information for the details).
Figure 2
Reaction coordinates utilized to build the 2D PMF surfaces
for
RNA 1×1 UU base pairs. In A, the torsional angle highlighted
in red is χ (O4′–C1′–N1–C2)
responsible for the syn ↔ anti transformation. In order to investigate the base stacking ↔
unstacking, pseudodihedral angles, θ1 and θ2, defined by the center-of-masses (COMs) of four atom groups
in B were chosen as reaction coordinates.
Reaction coordinates utilized to build the 2D PMF surfaces
for
RNA 1×1 UU basepairs. In A, the torsional angle highlighted
in red is χ (O4′–C1′–N1–C2)
responsible for the pan class="Gene">syn ↔ anti transformation. In order to investigate the base stacking ↔
unstacking, pseudodihedral angles, θ1 and θ2, defined by the center-of-masses (COMs) of four atom groups
in B were chosen as reaction coordinates.
Temperature Jump MD Simulations
Due to the relatively
large energy barriers we did not see any syn ↔ anti transformations in the MD simulations of 3×pan class="Chemical">CUG3SYW and 3×CUGinf (except for one case, which
had low Na+ concentration). Thus, we studied the structural
properties of 1×1 UU base pairs with respect to temperature using
1×CUG. Two systems were considered: one where the 1×1 UU
base pair is in an anti–anti conformation (χ = 200° and θ1 = 40°)
and another where it is in a syn–anti conformation (χ = 40° and θ1 = 350°).
For each case, 10 independent MD simulations were run where the temperature
was increased from 300 to 400 K by 5 K for each 1 ns MD time interval.
Once the temperature reached 400 K, another 19 ns MD was run in each
case. No restraints were imposed on the system. MD simulations were
carried out with the pmemd.MPI module of AMBER12.[52] A total of over 800 ns MD was run in these simulations.
We should note that this particular calculation is somewhat similar
to simulated annealing protocol and was done to force the system to
overcome the energy barriers responsible of syn ↔ anti transformation. Because 400 K is higher than the boiling
point, a full MD would not mean much.
Umbrella Sampling of RNA
Mononucleosides
We utilized
umbrella sampling MD simulations to predict the 1D PMF profiles of
A, G, U, and pan class="Chemical">C mononucleosides along the χ torsional axis (Figure A). The procedure
is similar to the umbrella sampling simulations described above, except
that only the individual mononucleosides were solvated with 738 TIP3P
water molecules[54] and simulated. χ
torsional angles were incremented by 10° that created 36 umbrella
sampling windows for each system. After minimization and 1 ns equilibration,
each window was simulated for 5 ns. The χ torsions in the umbrella
sampling MD simulations were restrained in a square bottom well with
parabolic sides and 30 kcal/(mol rad2) force constants.
Discrete Path Sampling
The discrete path sampling[66,67] method (DPS) has previously been used to study the energy landpan class="Disease">scapes
for a diverse range of atomic and molecular systems.[68−70] In particular, this approach has proved to be effective in exploring
landscapes that feature broken ergodicity,[71] leading to multiple relaxation time scales. DPS is based on geometry
optimization techniques and is efficient in sampling “rare
event” dynamics, which are typically inaccessible in conventional
unbiased simulation methods.[72,73] In the present work
we have used DPS to explore the energy landscape of the RNA CUG repeats, to supplement our results from unbiased molecular
dynamics and umbrella sampling simulations (see Section S2 in the Supporting Information for the details).
DPS simulations were seeded from conformations corresponding to the
major basins on the 2D PMF of the single pan class="Chemical">CUG repeat, choosing a diverse range of pseudotorsion and χ torsion
values. These conformations were locally minimized using a modified
version of the LBFGS algorithm,[74] to build
an initial database of minima. Geometry optimizations were deemed
to have converged when the root-mean-square gradient fell below 10 kcal mol. Connection attempts were made between these minima in a
pairwise fashion, starting from the highest energy minimum in the
database to the lowest. Once sufficient connections were found, a
disconnectivity graph[75−77] analysis revealed that syn–anti and anti–anti states of the uridine base pairs were likely to emerge as competing
funnels on the energy landscape. The stationary point databases were
further expanded by refining the discrete paths between the syn–anti and anti–anti end points, using systematic applications
of the SHORTCUT BARRIER and UNTRAP schemes, which have been described
in detail in previous work[78] and are available
within our PATHSAMPLE code.[79] DPS simulations
were deemed to have converged sufficiently when the rate constants
corresponding to the syn–anti ↔ anti–anti conformational
transition were stable to within an order of magnitude. The final
stationary point database consisted of 21927 minima and 25467 transition
states. Free energies were estimated from the database by employing
the harmonic superposition approximation.[80] Here, the molecular partition function is approximated as a sum
over harmonic vibrational densities of states corresponding to the
catchment basins of each minimum.[81,82]
Analysis
The Ptraj module of AMBER12[52] was used
for dihedral, root-mean-square deviation (rmsd),
and cluster analyses. The weighted histogram analysis method[83,84] (WHAM) was used to calculate the PMF surfaces using WHAM ver. 2.0.4
written by Alan Grossfield.[85] In the WHAM
analysis, the last 1.5 ns data of each umbrella sampling simulation
was used, and periodicity was turned on in both dimensions. X3DNA[86] was used to extract global structural properties
of 3×CUG3SYW from the MD trajectory.
Calculation
of Rate Constants
The rate constants corresponding
to the conformational transitions between the syn–anti and anti–anti states were calculated from the stationary point database
using a graph transformation method, as described in a previous work.[87] The estimates for the rate constants at 300
K were obtained using a self-consistent regrouping scheme,[88] which recursively lumps minima separated by
free energy barriers below a threshold into one macrostate. This is
an attractive approach, as the free energy surface then becomes a
function of the experimental observation time, due to the lumping
of minima separated by low barriers.[89] Furthermore,
this method alleviates any bias arising due to the choice of original
end points, as they are expanded into ensembles of structures that
are assumed to be in local equilibrium on the time pan class="Disease">scale of the rate-determining
step of interest.
Disconnectivity Graphs
The potential
and free energy
landscapes were visualized in the form of disconnectivity graphs.[75−77] The disconnectivity graph provides a powerful yet simple representation
of the energy landpan class="Disease">scapes and is free of potential artifacts that can
arise from low dimensional projections onto selected order parameters.[90,91] This representation retains all of the kinetic information in the
database and thus provides a way to rigorously test the validity of
the reaction coordinates employed in our 2D umbrella sampling simulations
(see Section S3 in the Supporting Information for the details).
Results and Discussion
Detailed analyses of each system are described below. In summary,
we found that the MD trajectories for 3×CUG3SYW and
3×pan class="Chemical">CUGinf show stable 1×1 UU base pairs both in anti–anti and syn–anti conformations. Except for one case
(3×CUGinf with low Na+ concentration),
no syn–anti ↔ anti–anti transformations are observed.
This result was attributed to the high energy barriers observed in
umbrella sampling and DPS calculations, which predicted three syn–anti ↔ anti–anti transformation pathways. Temperature
jump MD simulations were performed, which revealed that one of the
pathways predicted by umbrella sampling and DPS calculations dominates
the syn–anti → anti–anti transformation. For illustration
we use different colors to represent each stable 1×1 UU base
pair conformation to highlight the transformations (Figure ).
Figure 3
Conformational states
of 1×1 UU base pairs seen in MD trajectories
of 3×CUG3SYW and 3×CUGinf. Color notation
was used to represent each state to make visualization of rmsd plots
(vide infra) easier. Black (A), red (B), and cyan (C) represent anti–anti conformations with 2,
1, and 0 hydrogen-bond. Green (D), blue (E), yellow (F), and orange
(G) are used for syn–anti conformations. Note that, in anti–anti UU conformations, the system is symmetric. As a result,
there are two conformations each in A and B representing the same
state.
Conformational states
of 1×1 UU basepairs seen in MD trajectories
of 3×pan class="Chemical">CUG3SYW and 3×CUGinf. Color notation
was used to represent each state to make visualization of rmsd plots
(vide infra) easier. Black (A), red (B), and cyan (C) represent anti–anti conformations with 2,
1, and 0 hydrogen-bond. Green (D), blue (E), yellow (F), and orange
(G) are used for syn–anti conformations. Note that, in anti–anti UU conformations, the system is symmetric. As a result,
there are two conformations each in A and B representing the same
state.
Structure of r(3×CUG3SYW)
The initial
structure of 3×CUG3SYW solved by X-ray diffraction
with 2.20 Å resolution is shown in Figure B.[38] The system
contains three consecutive RNA pan class="Chemical">CUG repeats
with two dangling uridines which were included experimentally to facilitate
crystallization. We analyzed the global structural properties of 3×CUG3SYW and compared them to the published crystal structure.
Furthermore, the conformational preference of each 1×1 UU base
pair, as well as the 2×2 CG/GC base pairs flanked by two neighboring
1×1 UU base pairs (Figure ), along with the dynamics of dangling uridines, were analyzed.
Figure 4
Rmsd analysis of 3×CUG3SYW.
See Figure B for residue
numbering. Rmsd’s (in Å) of U8U33, G9C10/G31C32, U11U30, G12C13/G28C29, and U14U27 are plotted
with respect to the stable conformations shown in Figures and 5. Note that all the UU base pairs are in anti–anti conformations with stable forms represented by black,
red, and cyan (Figure A–C, and Movie S1). In the rmsd
analysis of the 1×1 UU base pairs (U8U33, U11U30, and U14U27),
conformational states with rmsd greater than 1 Å represent symmetry
related versions of the states. As noted in Figure , UU base pairs in states colored black and
red have two identical conformations representing 2 and 1 hydrogen-bond
states, respectively. The plots were created in this form to display
the transformations of 1×1 UU base pairs during the trajectory.
Note also that the stable GC/CG conformation represented by magenta
only forms when the flanking 1×1 UU base pairs are both in the
2 hydrogen-bond and opposite symmetry states (black) (see Movie S2).
Rmsd analysis of 3×CUG3SYW.
See Figure B for residue
numbering. Rmsd’s (in Å) of U8U33, G9C10/G31C32, U11U30, G12C13/G28C29, and U14U27 are plotted
with respect to the stable conformations shown in Figures and 5. Note that all the pan class="Chemical">UU base pairs are in anti–anti conformations with stable forms represented by black,
red, and cyan (Figure A–C, and Movie S1). In the rmsd
analysis of the 1×1 UU base pairs (U8U33, U11U30, and U14U27),
conformational states with rmsd greater than 1 Å represent symmetry
related versions of the states. As noted in Figure , UU base pairs in states colored black and
red have two identical conformations representing 2 and 1 hydrogen-bond
states, respectively. The plots were created in this form to display
the transformations of 1×1 UU base pairs during the trajectory.
Note also that the stable GC/CG conformation represented by magenta
only forms when the flanking 1×1 UU base pairs are both in the
2 hydrogen-bond and opposite symmetry states (black) (see Movie S2).
Figure 5
Stable 2×2 GC/CG
conformations flanked by 1×1 UU base
pairs. (A) GC base pairs in canonical Watson–Crick base pair
forms. (B) One of the GC base pairs is distorted and loses a hydrogen-bond.
In this form, the −NH2 group of guanine forms an
attractive electrostatic interaction with the 2′-OH group of
cytidine. (C) Both GC base pairs are distorted and lose one hydrogen-bond
each. Both GC base pair conformations are in the distorted GC form
described in B. The distortions happen only when the flanking 1×1
UU base pairs are both in the 2 hydrogen-bond but opposite symmetry
states (see Movie S2).
Global Structural Properties of 3×CUG3SYW
The main difference between the 3SYW crystal structure and the 3×CUG3SYW structure is that the former
is an almost frozen system while the latter is a system equilibrated
at 300 K and thus includes dynamics. The minor (mgw) and major groove
widths (Mgw) in the crystal structure of 3SYW are around 15 and 16 Å, respectively
(Figure S1). Analysis of the MD trajectory
shows that the mgw fluctuates around the crystal structure value (Figure S1), which is globally in an A-form conformation.
On the other hand, the MD trajectory shows a Mgw that fluctuates around
20 Å. The expanded Mgw suggests that the inclusion of 1×1
pan class="Chemical">UU internal loops gives flexibility to the system. To understand the
flexibility of each region in the RNA duplex, we analyzed the rmsd
of 3×CUG3SYW with respect
to the crystal structure. In the analysis, the sequence was broken
down into 2 and 5 base pair segments so that we could investigate
which regions in the RNA were responsible for the flexible behavior
(Figure S2). The results show that the
regions including UU base pairs are more flexible than the other regions
in 3SYW (Figure S2). Globally, the central region, which
includes 3 CUG repeats, is more flexible than
the terminal regions, which includes Watson–Crick GC and Wobble
GU base pairs (Figure S2). This result
is expected because individual fluctuations of 1×1 UU internal
loops will accumulate and yield global flexible regions as seen in 3SYW. It is noteworthy,
however, that the overexpanded Mgw might be a force field artifact.
Even though it is known that RNA internal loops introduce flexibility
into the system, the overexpanded Mgw might be due to an improper
description of base stacking and/or backbone torsions.
Conformational
Preference of 1×1 UU Base Pairs
The crystal structure
of 3SYW has
three 1×1 UU basepairs in anti–anti conformations. Cluster analyses of
the MD trajectory show three unique anti–anti conformations for the 1×1 pan class="Chemical">UU base pairs, in line
with the previous experimental results,[40] which represent 2, 1, and 0 hydrogen-bond states, colored black,
red, and cyan, respectively, in the rmsd plots (Figures and 4). During the
1 μs MD simulation, we observed anti–anti ↔ anti–anti transformations numerous times (Figure and Movie S1).
Hydrogen-bond analyses indicate that the 1×1 UU base pairs populate
the 2, 1, and 0 hydrogen-bond states with probabilities 43%, 41%,
and 16%, respectively. Furthermore, in the regions where the 2×2
GC/CG base-pairs are in canonical Watson–Crick forms the black
↔ red transformations are very fast because the conformational
states are very similar to each other (Figures A,B and 4). The 0
hydrogen-bond state (cyan) seems to act like an intermediate state
and is less stable than to the 1 (red) and 2 (black) hydrogen-bond
states. Hence, the dynamic behavior of the 1×1 UU base pairs
yields flexible regions in the RNA structure described above (Figure S2).
Conformational States of
2×2 GC/CG Base-Pairs Flanked by
UU Base-Pairs in 3SYW
The structure of 3×CUG3SYW (Figure B) is not as rigid as an infinitely
long RNA. Thus, the effects of its flexibility will appear in weak
regions of 3×pan class="Chemical">CUG3SYW. The 1×1 UU base pairs will
be the first to acquire this flexibility, as noted above. The next
weakest regions are 2×2 GC/CG base-pairs flanked by these UU
base-pairs. Even though the GC base pairs in these regions are in
Watson–Crick forms, they will not be as rigid as in regular
RNA stem regions due to the flanking UU base pairs. We observed similar
results when analyzing RNA CAG and CCUG repeat expansions.[48,49] Indeed, our
analysis shows that 2×2 GC/CG regions flanked by UU base-pairs
are flexible and can have a stable noncanonical conformation (Figures and 5C and Movie S2). In the rmsd analysis
of G9C10/G31C32 and G12C13/G28C29 the magenta color
represents this stable noncanonical conformation (Figure C), which persists for as long
as 200 ns, as is apparent during 420 and 610 ns intervals in the MD
simulation (magenta color in Figure ). Note, however, that this stable noncanonical GC/CG
conformation only forms when the flanking 1×1 UU base pairs are
both in stable 2 hydrogen-bond and opposite symmetry states (black)
(Figure and Movie S2). In this noncanonical GC/CG form, each
GC base pair loses two Watson–Crick hydrogen-bonds, while forming
an attractive electrostatic interaction between the −NH2 group of guanine and the 2′-OH group of the ribose
of its Watson–Crick partner (Figure ). The results indicate that if the flanking
base pairs (in this case the 1×1 UU) transform to energetically
favorable states they can distort the repeating 2×2 GC/CG regions
(Movie S2). This particular phenomenon
is one of the reasons why we observe an overexpanded Mgw in the MD
trajectory (Figure S1). As described above,
the particular distorted GC/CG conformation shown in Figure forms when the flanking UU
base pairs are in 2 hydrogen-bond states (Figure ). The overexpression of the 2 hydrogen-bond
states might be a force field artifact, which could indirectly cause
the overexpanded Mgw we observed (Figure S1). Therefore, caution is required when analyzing the results, even
though we are using an improved RNA force field.
Stable 2×2 GC/CG
conformations flanked by 1×1 UU basepairs. (A) GC base pairs in canonical Watson–Crick base pair
forms. (B) One of the GC base pairs is distorted and loses a pan class="Chemical">hydrogen-bond.
In this form, the −NH2 group of guanine forms an
attractive electrostatic interaction with the 2′-OH group of
cytidine. (C) Both GC base pairs are distorted and lose one hydrogen-bond
each. Both GC base pair conformations are in the distorted GC form
described in B. The distortions happen only when the flanking 1×1
UU base pairs are both in the 2 hydrogen-bond but opposite symmetry
states (see Movie S2).
Structural Preference of Dangling Uridine Bases in 3SYW
As noted
above, the RNA sequence shown in Figure B includes dangling uridines, which were
used experimentally to help crystallize the system. We kept these
pan class="Chemical">uridine ends and analyzed their structural preferences using 3×CUG3SYW. The initial conformations of the dangling uridine ends
are shown in Figures B and 6A. At later times, however, they transform
to a stable state where U2 and U21 are both
unstacked, while U1 and U20 are stacked on top
of the guanine of the terminal GC base pairs (G3C38 and G22C19) (Figure C). In this form, both U1 and
U20 have flipped upside down sugars (Figure C), as seen previously in NMR and MD analyses
of the single-stranded RNA structure GACC.[58] Crystal structures basically are frozen systems, while MD simulations
include dynamics. Although the crystal structure of 3SYW has the dangling
uridine ends being inserted into the RNA major grooves, this does
not mean that they will remain there when the system is equilibrated
at room temperature. Indeed, the MD simulation of 3×CUG3SYW shows the dangling uridine ends in very different conformations,
where stacking interactions play an important role in the final conformations
(Figure C). Once the
dangling uridine ends transformed to this final stable conformation
(Figure C) they stayed
there for the rest of the MD simulation (see Supporting Information). However, we should note that this particular
result might be a force field artifact caused by the improper description
of the backbone, the nonbonded interactions, or solvent molecules.
Chen and Garcia recently showed that the van der Waals parameters
of adenine base atoms required revision,[92] and a similar revision might be required for uridine.
Figure 6
Conformational
transformation of the dangling uridine bases (from
the MD trajectory of 3×CUG3SYW). (A) Initial state
where dangling uridines are bent toward the RNA major groove. (B)
Dangling uridines stacked over the RNA terminal base pairs. (C) Final
stable state where uridine (U1 and U20) is flipped
upside down and stacked over the RNA terminal base pair, while the
other uridine (U2 and U19) is unstacked. For
residue numbering see Figure B.
Conformational
transformation of the dangling uridine bases (from
the MD trajectory of 3×pan class="Chemical">CUG3SYW). (A) Initial state
where dangling uridines are bent toward the RNA major groove. (B)
Dangling uridines stacked over the RNA terminal base pairs. (C) Final
stable state where uridine (U1 and U20) is flipped
upside down and stacked over the RNA terminal base pair, while the
other uridine (U2 and U19) is unstacked. For
residue numbering see Figure B.
Structure of r(3×CUGinf)
In DM1, expanded
repeats can have sizes between 50 and 10000.[9] These RNA pan class="Chemical">CUG repeats adopt hairpin structures
with very long nonhairpin regions.[93] Hence,
the system in Figure C was designed specifically to study the behavior of 1×1 UU
base pairs in an infinitely long linear RNA system with continuous
CUG repeats. Because a full turn of an A-form
RNA duplex requires ∼11 basepairs, we designed the system shown
in Figure C, such
that the RNA duplex is along the z-axis, has 11 basepairs,
and stacks perfectly with the neighboring RNA duplexes when periodic
boundary conditions are turned on. No restraints (either positional
or torsional) were imposed on the RNA duplex; stacking interactions
between the neighboring GC base pairs keep the system in a linear
form automatically. This trick, which we previously applied to RNA
CCUG repeats to investigate the dynamic properties
of CU/UC internal loops,[49] allows us to
design an infinitely long linear RNA duplex (see Movie S3). Although r(3×CUGinf) is not a perfect
RNA CUG repeat expansion, it is rather a linear
RNA duplex having three continuous CUG repeats
and then a GGCC/CCGG tetramer region (see Figure C). If we leave off the “end-fraying”
effects in the finite duplexes, this does give the same structural
properties of finite duplexes. Ultimately the periodic trick will
break down when the length of the structure is longer than the persistence
length. Nevertheless, the designed system allowed us to “implicitly”
force the RNA duplex to stay in a linear form due to the stacking
of the neighboring RNA duplexes and, hence, study the behavior of
1×1 UU base pairs in a linear RNA geometry (see Supporting Information). Two MD simulations were run; one
where the initial conformations of all the UU base pairs were anti–anti (3×CUGinf/anti–anti) and another
where the initial conformations of U3U20 and
U9U14 were anti–anti and U6U17 was syn–anti (3×CUGinf/syn–anti). We further completed two independent
MD simulations on 3×CUGinf/syn–anti, which had low Na+ concentration (0.24 mM).
3×CUGinf/anti–anti
A 500 ns MD simulation was run for this system.
The initial conformations of all the UU basepairs were anti–anti and stayed in this form during the
MD simulation (Figure ). Similar to the results seen for 3×pan class="Chemical">CUG3SYW, three
1×1 UU base pair conformations are observed in the MD trajectory
representing 2, 1, and 0 hydrogen-bond states (Figures and 7). Even though
no anti–anti → syn–anti transformations are seen,
numerous anti–anti ↔ anti–anti transformations are observed
(Figure ). A hydrogen-bond
analysis indicates that the 1×1 UU base pairs in 3×CUGinf/anti–anti populate
states with 2, 1, and 0 hydrogen-bonds with probabilities 33%, 60%,
and 7%, respectively. The reason for the increase in the 1 hydrogen-bond
state compared to 3×CUG3SYW is due to the rigidity
of the 2×2 GC/CG regions flanked by 1×1 UU base-pairs in
3×CUGinf. Structurally, the 2 (black) and 1 (red)
hydrogen-bond states are alike, while the 0 (cyan) hydrogen-bond state
acts like an intermediate (Figure A–C). Thus, the black ↔ red transformations
will be fast if no large distortions occur in the 2×2 GC/CG regions.
In 3×CUG3SYW, because 2×2 GC/CG regions are flexible
the 1×1 UU base pairs have a higher probability of transforming
to the stable 2 hydrogen-bond states. On the other hand, analyses
showed that the conformations of the 2×2 GC/CG regions in 3×CUGinf/anti–anti were
always in canonical Watson–Crick GC base pairs with 3 hydrogen-bonds.
The results indicate that the 2×2 GC/CG regions in 3×CUGinf/anti–anti are
structurally more rigid than the ones in 3×CUG3SYW. As a result, it is harder for the 1×1 UU base-pairs to transform
to a stable 2 hydrogen-bond state without distorting the 2×2
GC/CG regions in 3×CUGinf/anti–anti.
Figure 7
RMSD analysis of 1×1 UU base pairs seen in the 3×CUGinf/anti–anti MD simulation
(See Figure C for
residue numbering). The initial structures of all the UU base pairs
were in anti–anti. The color
scheme described in Figure was used to display the conformational transformations. Note
that black and red conformations (Figure ) represent 2- and 1-hydrogen-bond states,
which can be described by two symmetric orientations (Figure A,B). Rmsd states greater than
1 Å represent the symmetry related orientations.
RMSD analysis of 1×1 UU basepairs seen in the 3×pan class="Chemical">CUGinf/anti–anti MD simulation
(See Figure C for
residue numbering). The initial structures of all the UU base pairs
were in anti–anti. The color
scheme described in Figure was used to display the conformational transformations. Note
that black and red conformations (Figure ) represent 2- and 1-hydrogen-bond states,
which can be described by two symmetric orientations (Figure A,B). Rmsd states greater than
1 Å represent the symmetry related orientations.
3×CUGinf/syn–anti
An over 600 ns MD simulation
was run for this
system. As described above, the initial conformation was designed
to have the middle UU basepair (pan class="Chemical">U6U17) in a syn–anti conformation and the terminal
UU base pairs (U3U20 and U9U14) in anti–anti.
This design enables us to investigate potential syn–anti → anti–anti transformation pathways. Figure shows the rmsd and Na+ binding
analyses for this system. Except for one case, which had low Na+ concentration, no anti–anti ↔ syn–anti transformations
were observed. Similar to the 3×CUGinf/anti–anti results, numerous anti–anti ↔ anti–anti transformations were seen for U3U20 and U9U14, where 2, 1, and 0 hydrogen-bond
states were occupied with probabilities 47%, 48%, and 5%, respectively.
However, U6U17, which stayed in the syn–anti conformation, exhibited
only 1 and 0 hydrogen-bond states with probabilities 93% and 7%, respectively.
The green and blue conformations shown in Figure display 1 hydrogen-bond state when the 1×1
UU base pair has a syn–anti conformation. Even though no syn–anti → anti–anti transformation was observed for U6U17 in this
system we see that there are at least four times where U6, which is in the syn conformation, unstacked from
the helical axis, colored orange in the rmsd plot (Figures G and 8). In this orange state (Figure G), U6 is unstacked from the helical axis
via the minor groove but does not complete the syn → anti transformation (Movie S4). One reason for not observing the syn → anti transformation is the Na+ binding pocket seen near U6 (Figure ). The transformation of U6 from
a stacked state (green, blue, and yellow colored states displayed
in Figure ) to an
unstacked state (orange) only happens when the Na+ binding
pocket is not occupied by a Na+ ion (Movie S4). However, this state was short-lived because Na+ ions were found to rapidly occupy the binding pocket even
after U6 tried to unstack and then stabilize the syn–anti UU conformation by bringing
back U6 to the stacked conformation (see MD simulation
times ∼205, ∼ 360, ∼ 420, and ∼545 ns
in Figure ). As a
result, no syn → anti transformation
was observed.
Figure 8
Rmsd of 1×1 UU base pairs and Na+ binding
to the
pocket seen in 3×CUGinf/syn–anti MD simulations (see Figure C for residue numbering). The initial structures
of the terminal UU base pairs, U3U20 and U9U14, were anti–anti, while U6U17 was syn–anti. The color scheme described in Figure was used to display
conformational transformations of the UU base pairs. See Figure caption for details
of the rmsd analysis. A Na+ binding pocket was observed
near U6 when it was in the syn conformation
(Figure ). In the
Na+ binding analysis different colors represent different
Na+ ions in the system where the distance between each
ion and the center-of-mass (COM) of the three atoms shown in Figure (O2P
and O2 of U6, and N7 of G7) were calculated. Note that when the binding pocket was not occupied
by a Na+ ion, U6 tried to unstack from the helical
axis and transformed to the orange state (Figure G). However, this state was short-lived,
and no syn ↔ anti transformation
was observed (Movie S4).
Figure 9
Na+ binding pocket was observed near U6 when
it was in the syn orientation. When occupied by a
Na+ ion, there are three attractive electrostatic interactions.
As shown in Figure , these interactions stabilize the syn–anti UU orientation. When no Na+ ion was present
in the pocket, U6 unstacked via the minor groove (Movie S4). A full syn → anti transformation, however, was not observed, due to the
rapid occupation of the pocket by Na+ ions. This can be
seen in the rmsd analyses of U6U17 (Figure ) where the orange
colored state has a short lifetime.
Rmsd of 1×1 UU basepairs and Na+ binding
to the
pocket seen in 3×pan class="Chemical">CUGinf/syn–anti MD simulations (see Figure C for residue numbering). The initial structures
of the terminal UU base pairs, U3U20 and U9U14, were anti–anti, while U6U17 was syn–anti. The color scheme described in Figure was used to display
conformational transformations of the UU base pairs. See Figure caption for details
of the rmsd analysis. A Na+ binding pocket was observed
near U6 when it was in the syn conformation
(Figure ). In the
Na+ binding analysis different colors represent different
Na+ ions in the system where the distance between each
ion and the center-of-mass (COM) of the three atoms shown in Figure (O2P
and O2 of U6, and N7 of G7) were calculated. Note that when the binding pocket was not occupied
by a Na+ ion, U6 tried to unstack from the helical
axis and transformed to the orange state (Figure G). However, this state was short-lived,
and no syn ↔ anti transformation
was observed (Movie S4).
Na+ binding pocket was observed near U6 when
it was in the pan class="Gene">syn orientation. When occupied by a
Na+ ion, there are three attractive electrostatic interactions.
As shown in Figure , these interactions stabilize the syn–anti UU orientation. When no Na+ ion was present
in the pocket, U6 unstacked via the minor groove (Movie S4). A full syn → anti transformation, however, was not observed, due to the
rapid occupation of the pocket by Na+ ions. This can be
seen in the rmsd analyses of U6U17 (Figure ) where the orange
colored state has a short lifetime.
In order to investigate further the role of Na+ ions,
we designed another 3×CUGinf system, which had a low
Na+ ion concentration. We did two independent MD simulations
on this new system. In the first set, we observed a pan class="Gene">syn → anti transformation within 60 ns (Figure S3). This transformation happened right
after the loss of a Na+ ion from the binding pocket (Figure S3). In the second set, however, we did
not observe any syn → anti transformation within 200 ns (Figure S4). During the MD simulation of the second set, the Na+ binding pocket was occupied by different Na+ ions almost
all the time (Figure S4). The results indicate
that the syn–anti UU conformation
is intrinsically stable and is further stabilized by Na+ ions through the Na+ binding pocket.
Umbrella Sampling
Results for 1×1 UU Base Pairs
The MD results for 3×CUG3SYW and 3×CUGinf have provided details of
the dynamics of each system. Except for
one case (3×CUGinf with low Na+ concentration),
we did not observe any syn–anti ↔ anti–anti transformation
for 1×1 UU base pairs. To further explore this result, we utilized
umbrella sampling MD simulations to build two 2D PMF surfaces, (θ1,χ) and (θ1,θ2) (variables
defined in Figure ), to extract the transformation pathways for 1×1 UU base pairs
as well as other potentially stable UU conformations. Note, however,
that the calculated 2D PMF surfaces are approximations to the true
multidimensional energy landscapes. For example, in the analysis we
assumed that the flanking GC basepairs will not be distorted during
base stacking ↔ unstacking. It is possible that the flanking
GC basepairs might stack on top of each other once the uridines are
fully unstacked from the helical axis. Such phenomena were omitted
from the analysis. The choice of pseudodihedral shown in Figure B was made after
careful analysis, where we tried other reaction coordinates to mimic
base stacking ↔ unstacking. The use of pseudodihedrals to study
base flipping in nucleic acids was previously done by us and others.[48,94−96]
2D (θ1,χ) PMF Surface
Figure shows
the 2D (θ1,χ) PMF surface predicted by umbrella
sampling calculations
for 1×1 UU basepairs. There is excellent overlap of the (θ1,χ) distributions calculated for each umbrella sampling
run (Figures S5 and S6). Three pathways
were found for pan class="Gene">syn–anti ↔ anti–anti transformations: P1, where
uridine unstacks via the minor groove in order to make the syn ↔ anti transformation, P2, where
uridine follows a direct transformation from syn ↔ anti, while stacked within the RNA helical axis, and P3,
where uridine unstacks via the major groove to perform the syn ↔ anti transformation. These
pathways are similar to the ones predicted for RNA CAG repeats, but details of the free energy landscape (Figure S7) are different.[48] Energy barriers in the 2D (θ1,χ)
PMF surface predicted for 1×1 AA base pairs are roughly 2 to
4 kcal/mol lower than for the 1×1 UU base pairs considered in Figure (see also Figure S7). This result is expected because adenosinemononucleoside is known to prefer ∼40% syn and ∼60% anti, while cytidine is known to
prefer over 90% anti,[51] which results in different energy barriers (Figure S7).
Figure 10
2D (θ1,χ) PMF surface predicted
for 1×1
UU base pairs by umbrella sampling calculations. The results include
11 minima, where (g) is the global minimum conformation (anti–anti). Three transformation pathways for syn–anti ↔ anti–anti are predicted. Energy barriers, such
as the ones seen in b ↔ e and j ↔ g transformations,
might be the reason why we could not see any syn–anti ↔ anti–anti transformation in regular MD simulations at 300 K. See Figure S7 for a comparison with the results previously
obtained for 1×1 AA base pairs.[48]
2D (θ1,χ) PMF surface predicted
for 1×1
UU basepairs by umbrella sampling calculations. The results include
11 minima, where (g) is the global minimum conformation (anti–anti). Three transformation pathways for pan class="Gene">syn–anti ↔ anti–anti are predicted. Energy barriers, such
as the ones seen in b ↔ e and j ↔ g transformations,
might be the reason why we could not see any syn–anti ↔ anti–anti transformation in regular MD simulations at 300 K. See Figure S7 for a comparison with the results previously
obtained for 1×1 AA base pairs.[48]
2D (θ1,θ2) PMF Surface
Figure shows the
2D (θ1,θ2) PMF surface predicted
by umbrella sampling calculations for 1×1 UU basepairs (for
details of the reaction coordinates see Figure B). The overlap of the (θ1,θ2) distributions calculated for each umbrella
sampling run is shown in Figure S8. This
particular analysis was performed to investigate the correlation of
stacking ↔ unstacking of each base in a 1×1 pan class="Chemical">UU base pair.
Note, however, that the initial conformations of the 1×1 UU base
pairs are anti–anti. Thus,
the effect of syn ↔ anti transformations
is omitted in this analysis. First we note that the 2D PMF surface
is symmetric, which is expected for 1×1 UU base pairs in the
RNA CUG (Figure ). Second, we find that unstacking of each
uridine is negatively correlated. Hence, when one of the uridines
starts to unstack the other uridine in the 1×1 UU base pair tends
to stay in the helical axis. This result is illustrated in Figure with double-ended
arrows, here emphasizing the channels directing the stacking and unstacking
for each uridine. Third, the results indicate the lowest free energy
states describing the different stacked 1×1 UU base pair conformations.
States A and B in Figure represent a mix of 2 and 1 hydrogen-bond conformations, while
state C represents a mix of 1 and 0 hydrogen-bond. As described before,
the conformations of 2 and 1 hydrogen-bond UU base pairs are almost
identical and, therefore, overlap in Figure . A detailed hydrogen-bond analysis is displayed
in Figure S9. In the 2D (θ1,θ2) PMF surface, regions around (θ1 = 40, θ2 = 70) and (θ1 = 70, θ2 = 40) represent 2 hydrogen-bond conformations, while the
other regions of states A and B shown in Figure represent 1 hydrogen-bond conformations
(Figure S9). The A ↔ B, A ↔
C, and B ↔ C transformations displayed in Figure were also observed in the
MD simulations of 3×CUG3SYW and 3×CUGinf (Figures and 7).
Figure 11
2D (θ1,θ2) PMF surface
predicted
for 1×1 UU base pairs by umbrella sampling calculations. Double-ended
arrows indicate the channels directing the stacking and unstacking
of uridine in a 1×1 UU base pair. The lowest free energy states
describe different stacked 1×1 UU base pair conformations. A
and B represent a mix of 2 and 1 hydrogen-bond states, while C represents
a mix of 1 and 0 hydrogen-bond states (see Figure S9 for hydrogen-bond analyses).
2D (θ1,θ2) PMF surface
predicted
for 1×1 UU basepairs by umbrella sampling calculations. Double-ended
arrows indicate the channels directing the stacking and unstacking
of pan class="Chemical">uridine in a 1×1 UU base pair. The lowest free energy states
describe different stacked 1×1 UU base pair conformations. A
and B represent a mix of 2 and 1 hydrogen-bond states, while C represents
a mix of 1 and 0 hydrogen-bond states (see Figure S9 for hydrogen-bond analyses).
Comparison of Energy Barriers and Base Orientations in RNA Mononucleosides
As described above, the syn and anti are two important RNA conformations. Even though the 2D (θ1,χ) PMF profile displayed high energy barriers, we also
investigated individual RNA pan class="Chemical">mononucleosides of A, G, U, and C, to
see if there are intrinsic differences between purines and pyrimidines.
Umbrella sampling calculations were utilized to create the 1D PMF
profiles for each RNA residue along the χ torsional axis (Figure S10). The results indicate that pyrimidines
mostly prefer anti, while purines can have significant
populations in syn (Figure S10). In A and G, ΔΔGsyn→anti (ΔGanti – ΔGsyn) differences are approximately 1.40 and
1.21 kcal/mol, respectively, while in U and C they are −0.39
and −0.21 kcal/mol, respectively (Figure S10). The percentage of anti conformations
extracted from ΔΔGsyn→anti for A, G, U, and C at 300 K are 9%, 12%, 66%, and 58%, respectively,
in line with previous results.[51] Furthermore,
the energy barriers seen in purines are lower than the ones seen in
pyrimidines (Figure S10). In 1D PMF profiles,
there are two barriers for the syn ↔ anti transformation: one around χ ∼ 120°
(PB) and the other around χ ∼ 360° (PA) (Figure S10). While the energy
barrier around χ ∼ 120° is only slightly larger
in pyrimidines, the barrier around χ ∼ 360° is over
2 kcal/mol higher than for purines (Figure S10). The results indicate that purines in free mononucleoside forms
would prefer the syn orientation over anti and will readily sample the syn ↔ anti transformation via PA and PB due
to the low energy barriers, while pyrimidines would prefer anti over syn in free mononucleoside forms
and will follow PB to perform the syn ↔ anti transformation (Figure S10).
Results of Temperature Jump MD Simulations
Even though
the umbrella sampling calculations predicted three transformation
pathways for syn–anti ↔ anti–anti, we did not observe any
such transformation in the 600 ns MD simulation of 3×pan class="Chemical">CUGinf/syn–anti (except
when Na+ concentration was low). An intermediate state
represented in orange (Figure G), corresponding to the potential minimum state b shown in Figure , was observed
in the MD trajectory of 3×CUGinf/syn–anti (Figure ) where U6 (in syn conformation)
unstacked from the helical axis. However, the b → e transformation
through P1 was not observed (Figure ) even though this pathway was followed
by 1×1 AA base pairs in the syn–anti → anti–anti transformation.[48] One reason for not
seeing the full transformation is due to the high energy barriers
observed in the free energy profile of the 1×1 UU base pair (Figure ), which were increased
by the Na+ binding pocket near UU stabilizing the syn–anti UU conformation (Figures and 9). We therefore utilized 1×CUG (Figure A) and increased the temperature of the system
slowly from 300 to 400 K to study the syn–anti ↔ anti–anti transformations in 1×1 UU base pairs. Ten independent MD simulations
starting with syn–anti and anti–anti 1×1 UU base pair
conformations were run. Six out of 10 of the syn–anti systems ended up with anti–anti UU conformations, while no anti–anti → syn–anti was observed. The syn–anti → anti–anti transformations
observed in the temperature jump MD simulations occurred via P1 and P3, which were followed 1 and 5 times, respectively
(for definitions of pathways see Figure ). Conformational states observed in the
temperature jump MD simulations and the pathways followed during the syn–anti → anti–anti transformations are shown in Figure S11. Pathways P1 and P3 include unstacking via minor and major grooves, respectively.
In the temperature jump MD simulations, even though numerous i ↔
b transformations were observed, only one of them overcame the energy
barrier and followed the b ↔ e transformation (Figure ). These results are different
from the corresponding RNA CAG results, where
the favorable syn–anti ↔ anti–anti pathway is via P1 in 1×1 AA base pairs (Figure S7).[48] Indeed, the relative energy barriers of b ↔
e and j ↔ g transformations are 2.4 and 6.0 kcal/mol in the
2D (θ1,χ) PMF surface for 1×1 AA base
pairs, while the same barriers for 1×1 UU base pairs are 4.4
and 3.8 kcal/mol, respectively (Figure S7). As a result, P1 and P3 are the favored syn–anti ↔ anti–anti transformation pathways for 1×1
AA and UU base pairs, respectively, due to the associated energy barriers.
Energy Landscape of 1×1 UU Base Pairs (Disconnectivity
Graphs)
The potential and free energy disconnectivity graphs
for the RNA system containing one CUG repeat
are shown in Figures and S12, respectively. The free energy
disconnectivity graph was computed at 300 K using a regrouping threshold
of 2 kcal/mol.[76,97]
Figure 12
Potential energy disconnectivity graph
for RNA CUG repeat. The red and blue branches
lead to minima with the UU base
pair in syn–anti and anti–anti conformations, respectively.
The green and pink branches lead to minima, in which the uridine is
unstacked via minor and major grooves, respectively. Some representative
structures corresponding to different conformational states are also
shown.
Potential energy disconnectivity graph
for RNA CUG repeat. The red and blue branches
lead to minima with the pan class="Chemical">UU base
pair in syn–anti and anti–anti conformations, respectively.
The green and pink branches lead to minima, in which the uridine is
unstacked via minor and major grooves, respectively. Some representative
structures corresponding to different conformational states are also
shown.
To make a direct comparison with
the 2D PMF obtained from umbrella
sampling calculations, the graphs were colored according to the pseudotorsion
and χ torsion angles. The disconnectivity graph in Figure clearly illustrates
the double-funnel character of the underlying potential energy landscape.
The anti–anti and the pan class="Gene">syn–anti states of the UU base pair
corresponding to the two major funnels are colored blue and red, respectively.
This observation is in agreement with the 2D PMF obtained from umbrella
sampling calculations, where the two major free energy basins correspond
to the anti–anti and syn–anti conformational states.
The two funnels are separated by high energy barriers, leading to
two competing morphologies. To quantify the barriers separating the
two conformational states at 300 K, we have estimated the rate constants
for the syn–anti ↔ anti–anti transformations using
the graph transformation[87] employing a
regrouping threshold[88] of 2 kcal/mol. This
choice is appropriate, as similar values were obtained in terms of
rate constants and disconnectivity graphs for a range of values around
this selection. The rate constant for the conformational transition
from syn–anti to anti–anti is estimated as 17240
s. The reverse transformation
(i.e., anti–anti → syn–anti) is at least 2 orders of
magnitude slower, with an estimated rate constant of 174 s. These rate constants approximately correspond
to time scales of 58 μs and 6 ms, respectively. The presence
of such competing morphologies separated by high barriers is normally
associated with multiple relaxation time scales and features in the
heat capacity profile.[71]
The minima
corresponding to the flipped-out conformations of the
UU basepair are colored green and purple for unstacking through the
minor and major grooves, respectively. These conformations do not
constitute any major funnel on the energy landpan class="Disease">scape and are distributed
over a wide energy range in the disconnectivity graph. The free energy
disconnectivity graph at 300 K (Figure S12) displays essentially the same features as the potential energy
landscape, with two major funnels corresponding to the syn–anti and anti–anti states.
In general, we find that the selected
order parameters distinguish
the two major funnels quite well. Nonetheless, the intermixing of
colors in certain regions of the graph suggests that structural order
parameters alone are not sufficient to correctly represent the complexities
of the underlying landscape.[91,98] Hence, we use the recursive
regrouping scheme instead,[88] which is based
on kinetic criteria, to estimate the rate constants.
Pathways between syn–anti and anti–anti Conformational
States in UU Base Pairs
The kinetically relevant pathways
between the syn–anti and anti–anti conformational states
of the pan class="Chemical">uridine base pairs were extracted from the stationary point
databases using the recursive enumeration algorithm,[88] which is based on kinetic criteria to estimate the rate
constants.[99,100] The diverse range of pathways
that exist between the two conformational states can be classified
in terms of three distinct mechanisms: (i) Pathway P1 (Figure ), in which the uridine flips
out via minor groove (Figure S13A) prior
to the rotation of the uracil base around its glycosidic bond; (ii)
Pathway P2 (Figure ), in which the uridine transforms from syn to anti without unstacking significantly from the helical axis
(Figure S13B); and (iii) Pathway P3 (Figure ), in which the
uridine flips out via the major groove (Figure S13C) followed by rotation of the uracil base around its glycosidic
bond resulting a transformation from syn to anti.
For pathways P1 and P3, unstacking of pan class="Chemical">uridine
outside the helical axis (via minor and major grooves, respectively)
precedes the rotation of the uracil base around its glycosidic bond
that causes the syn ↔ anti transformation. These transitions are effectively the rate-determining
steps in the two mechanisms. For pathway P2, the rotation around the
glycosidic bond resulting in the flip of the χ torsion from syn to anti is associated with the surmounting
of a transition state and corresponds to the rate-determining step.
Although pathways P1 and P3 have similar path lengths, P3 is associated
with relatively lower barriers and emerges as the “fastest
pathway” in our kinetic transition network. Pathway P2 is longer
than paths P1 and P3 and makes a smaller contribution. Thus, the properties
of our kinetic transition network, in particular the existence of
three different pathways from syn to anti conformational states as well as the mechanistic details of the
transformation, are consistent with the predictions of the 2D (θ1,χ) PMF profile obtained from umbrella sampling.
Targeting
RNA CUG Repeats with Small
Molecules
In DM1, inhibition of RNA-pan class="Gene">MBNL1 binding through
small molecules, which will target 1×1 UU base pairs, could ameliorate
the disease. Therefore, understanding the structural and thermodynamic
properties of RNA 1×1 UU internal loops could help us discover
the binding mechanism of small molecules to RNA internal loops and
thus the design rules for drugs, which will effectively target these
RNA structures. Previously, we studied the binding mechanism of a
substituted naphthyridine (C-2) that targets RNA CUG repeats.[43] The results indicated complex
structural transformations of the RNA internal loop in the small molecule
binding process. C-2 had a total 16 stable C-2/CUG binding modes. The lowest two binding modes displayed 1×1
UU base pair in conformations near (θ1 = −55,
χ = 200) and (θ1 = −55, θ2 = −55). Indeed, (θ1 = −55,
χ = 200) represents one of the local minima, “e”
in Figure (also
see Figure S7). Furthermore, (θ1 = −55, θ2 = −55) is one of
the local minima in Figure . In other words, upon C-2 binding to RNA CUG, 1×1 UU base pair was transformed from global minimum g (anti–anti) to local minimum e (anti–anti) (Figure ). Our previous results combined with the
current work suggest that small molecules can exploit the stable local
minima states of 1×1 UU base pairs if the final bound state is
thermodynamically favorable.
Insights into the Binding of Proteins to
CUG Repeats
It is known that MBNL1
binds to RNA pan class="Chemical">CUG repeats, but the details are
uncertain.[27,101−103] Patel and co-workers solved the crystal
structure of the MBNL1 tandem zinc finger 3 and 4 domains (Zn3/4)
interacting with a single-stranded RNA, r(CGCUGU), which shows guanosine
residues interacting with Zn3/4.[50] Studies
have suggested that the binding to CUG hairpins
occurs upon a conformational change in the structure that melts the
hairpin.[28,50,104] Based on
the results of 3×CUG3SYW, we discovered that tandem
GC/CG base pairs flanked by 1×1 UU internal loops are less stable
than when they are located in RNA stem regions. This result might
give insight into how MBNL1 binds to RNA CUG repeat expansions. The distortions seen in tandem GC/CG regions
flanked by 1×1 UU base pairs might enhance the binding of MBNL1
to RNA CUG repeat expansions. These studies
also suggest that small molecules that stabilize RNA repeats could
be effective modulators of cellular function.[104]
RNA CUG Loops in the
Literature
Using the CoSSMos database[105] developed
by Znosko and co-workers we found 73 RNA structures that have CUG motifs with 1×1 pan class="Chemical">UU loops (Table S2). The biological systems that have the RNA CUG motifs include RNA, ribosomes, zinc finger/RNA, transcription/RNA,
and ribosome/antibiotic complexes (Table S2). Importantly, there are many RNAs that have single copies of the
CUG motif, which might be targeted through
binding to multiple motifs in a CUG hairpin
simultaneously[38,106] or using small molecules that
cooperatively bind to repeats.[43] The RNA
CUG motifs in these structures either interact
with Mg2+ ions or have tertiary interactions. Note that
all the 1×1 UU base pairs in these CUG
motifs have anti–anti conformations.
Using an in-house code, we extracted all the RNA CUG motifs from the structures and calculated the (θ1,χ) and (θ1,θ2) values for
each 1×1 pan class="Chemical">UU base pair. The results were then plotted on top of
the MD results, which showed that the regions sampled by MD simulations
were in line with the conformations seen in the literature (Figure S14). Basically, the 1×1 UU base
pairs in these structures have 2, 1, or 0 hydrogen-bonds and prefer anti–anti. This observation suggests
that the interaction of ions or tertiary interactions does not disturb
the base orientations of uridines with respect to sugar, so the 1×1
UU base pairs prefer to stay in anti–anti. Furthermore, our binding studies of compound C-2 to
RNA CUG repeats included over 25 μs MD
simulations where 1×1 UU base pair stayed in anti–anti almost all the time.[43] These results are in line with the umbrella sampling calculations,
where the free energy difference of syn–anti to anti–anti (ΔG – ΔG) is over 5 kcal/mol
for 1×1 UU base pairs (Figure ), while no significant difference was seen in the
umbrella sampling calculations of RNA CAG repeats
(Figure S7).[48]
Summary and Conclusions
Trinucleotide and pan class="Chemical">tetranucleotide
repeat expansions caused by mutations
in certain genes lead to neuromuscular disorders. A promising therapeutic
strategy is to target the transcribed RNA with small molecules. These
RNA transcripts are known to form stable hairpin structures with 1×1
or 2×2 internal loops regularly placed in the stem regions. However,
a thorough understanding of the atomistic details of RNA hairpins
formed by expanded repeats is a must for designing rules and, therefore,
better compounds, which will target RNA repeat expansions at nanomolar
levels.
In the present contribution we utilized regular and
umbrella sampling
MD simulations of model RNA CUG systems and
explained the structural and thermodynamic properties of 1×1
pan class="Chemical">UU internal loops. The results indicate that RNA 1×1 UU internal
loops are dynamic and can form more than one stable conformation where
the global minimum conformation of the 1×1 UU base pair is anti–anti. Our MD results indicate
that the 1×1 UU base pairs have 2, 1, and 0 hydrogen-bond states,
which are in line with the 73 NMR/X-RAY structures we analyzed, as
well as with previous studies.[40] Moreover,
the global picture of the 2D (θ1,χ) PMF surface
for 1×1 UU base pairs has the same topology as one we previously
determined for 1×1 AA base pairs, but with different energy barriers
and free energy differences.[48] While the syn–anti AA conformation is a possible
state for RNA CAG repeats upon small molecule
binding it is not as stable as anti–anti in RNA CUG repeats. This conclusion
follows from umbrella sampling studies where the anti–anti UU conformation is over 5 kcal/mol
more favorable than syn–anti. The result is also supported by experimental studies where no syn–anti UU conformations were seen
in the reported structures with RNA CUG motifs.
Additionally, the lowest free energy binding state of a compound to
a model RNA CUG system exhibited a conformation
in line with the present results described.[43] Thus, the local minima conformations discovered for 1×1 UU
base pairs by the umbrella sampling calculations could be important
targets for small molecule binding. Our studies also showed that a
Na+ binding pocket similar to the one detected in RNA CAG repeats exists for RNA CUG
repeats, and it stabilizes the syn–anti UU conformation. This pocket includes three attractive
electrostatic interactions between Na+ ions and RNA backbone/bases,
so it could be used in drug design where electropositive groups such
as −NH2 and −NH are incorporated in small
molecules to utilize this binding pocket for a higher binding affinity.
Hence, our results should help explain the mechanism behind small
molecule driven structural change in RNA CUG repeat expansions.
Our results also indicate that there is
an intrinsic difference
between RNA CAG and CUG repeats, where pan class="Chemical">purines prefer both syn and anti, while pyrimidines predominantly prefer anti. Furthermore, purine mononucleosides have lower energy barriers
compared to pyrimidines. As a result, the differences we see in the
2D PMF surfaces and MD calculations for RNA CAG and CUG repeats can be traced to intrinsic
properties in the torsion dependence of the force field. These results
are, therefore, dependent on improvements implemented recently in
the RNA force field. It was shown that the χ torsional parameters
for the AMBER force field needed revision to make physically meaningful
computational predictions including structural and thermodynamic properties
of RNA mononucleosides, single-stranded RNA, RNA structures with modified
residues, and RNA hairpins.[51,57,58,60,107,108] Using these revised RNA force
fields, other RNA repeat expansions such as CCUG, CCG, and CGG will
be studied and compared to RNA CAG and CUG repeats. These studies should produce insight that
can be used in understanding the mechanisms of different RNA repeat
expansions and designing drugs that will target them.
Authors: Jonathan L Chen; Damian M VanEtten; Matthew A Fountain; Ilyas Yildirim; Matthew D Disney Journal: Biochemistry Date: 2017-06-29 Impact factor: 3.162
Authors: Jiří Šponer; Giovanni Bussi; Miroslav Krepl; Pavel Banáš; Sandro Bottaro; Richard A Cunha; Alejandro Gil-Ley; Giovanni Pinamonti; Simón Poblete; Petr Jurečka; Nils G Walter; Michal Otyepka Journal: Chem Rev Date: 2018-01-03 Impact factor: 60.622
Authors: Àlex L González; Piotr Konieczny; Beatriz Llamusi; Estefanía Delgado-Pinar; José I Borrell; Jordi Teixidó; Enrique García-España; Manuel Pérez-Alonso; Roger Estrada-Tejedor; Rubén Artero Journal: PLoS One Date: 2017-06-05 Impact factor: 3.240