Complex RNA structures are constructed from helical segments connected by flexible loops that move spontaneously and in response to binding of small molecule ligands and proteins. Understanding the conformational variability of RNA requires the characterization of the coupled time evolution of interconnected flexible domains. To elucidate the collective molecular motions and explore the conformational landscape of the HIV-1 TAR RNA, we describe a new methodology that utilizes energy-minimized structures generated by the program "Fragment Assembly of RNA with Full-Atom Refinement (FARFAR)". We apply structural filters in the form of experimental residual dipolar couplings (RDCs) to select a subset of discrete energy-minimized conformers and carry out principal component analyses (PCA) to corroborate the choice of the filtered subset. We use this subset of structures to calculate solution T1 and T(1ρ) relaxation times for (13)C spins in multiple residues in different domains of the molecule using two simulation protocols that we previously published. We match the experimental T1 times to within 2% and the T(1ρ) times to within less than 10% for helical residues. These results introduce a protocol to construct viable dynamic trajectories for RNA molecules that accord well with experimental NMR data and support the notion that the motions of the helical portions of this small RNA can be described by a relatively small number of discrete conformations exchanging over time scales longer than 1 μs.
Complex RNA structures are constructed from helical segments connected by flexible loops that move spontaneously and in response to binding of small molecule ligands and proteins. Understanding the conformational variability of RNA requires the characterization of the coupled time evolution of interconnected flexible domains. To elucidate the collective molecular motions and explore the conformational landscape of the HIV-1 TAR RNA, we describe a new methodology that utilizes energy-minimized structures generated by the program "Fragment Assembly of RNA with Full-Atom Refinement (FARFAR)". We apply structural filters in the form of experimental residual dipolar couplings (RDCs) to select a subset of discrete energy-minimized conformers and carry out principal component analyses (PCA) to corroborate the choice of the filtered subset. We use this subset of structures to calculate solution T1 and T(1ρ) relaxation times for (13)C spins in multiple residues in different domains of the molecule using two simulation protocols that we previously published. We match the experimental T1 times to within 2% and the T(1ρ) times to within less than 10% for helical residues. These results introduce a protocol to construct viable dynamic trajectories for RNA molecules that accord well with experimental NMR data and support the notion that the motions of the helical portions of this small RNA can be described by a relatively small number of discrete conformations exchanging over time scales longer than 1 μs.
It is now widely recognized
that many RNA molecules are predisposed
to forming complexes with proteins by fluctuating spontaneously through
an ensemble of structural states. This dynamic mode of RNA–protein
recognition is referred to as “conformational capture”.
A description of the physical principles involved in forming RNA–protein
complexes via conformational capture requires complete description
of the dynamics of these structurally labile RNA molecules, including
a characterization of long- and short-lived conformational states
sampled by the RNA. However, experimental characterization of transitory
states is complicated by the fact that the rate of transition may
be too fast to allow for a comprehensive catalogue of all states.[1] Thus, although progress has been made recently
in experimentally isolating the partially folded states of proteins,[2,3] the complete elucidation of protein or nucleic acid dynamics requires
analytical and computational modeling to complement experimental observations.
A common approach toward this goal is to fit a limited set of model
parameters to experimental data that are sensitive to dynamics, such
as NMR relaxation rates, line shapes, or residual dipolar couplings.[4−7] This procedure involves guessing-and-checking, using physical constraints
on possible motions of the labeled residue(s) to guide the model-building
process. However, this semianalytic approach becomes complicated when
further degrees of freedom and new free parameters are required by
the model to fit the data adequately.A purely computational
approach to the description of molecular
states requires an accurate potential energy function (PEF), followed
by either molecular dynamics (MD) or energy-minimization calculations.
Molecular dynamics simulations are able to generate dynamic trajectories
of the molecule and, in principle, explore molecular parameter space
if sufficient numbers of trajectories are available; examples are
found using the AMBER[8−12] and CHARMM packages[13−15] and others, such as Lindorff-Larsen et al.[16] However, the extrapolation of dynamics in nucleic
acids to time scales of the order of microseconds or longer, where
many conformational changes are expected to take place, has only recently
begun to be explored.[17,18]Energy-minimization techniques
also rely on a well-validated energy
function but involve the subsequent alteration of the relative conformations
of parts of the molecule in an iterative manner to find the global
energy minima.[19−23] It is then possible to generate multiple sample structures more
easily than for a complete MD calculation. In the current manuscript,
we utilize structures generated by energy-minimization techniques
as a complement to MD-based analyses of dynamics. Use of energy-minimized
structures is facilitated by the availability of structures on the
Rosetta server.[24] Other servers such as
the MC-SYM/MC-FOLD pipeline[25] are also
available that allow the user to obtain all-atom RNA models.We use the HIV-1 TAR (trans-activation response) RNA molecule as
a model for a dynamics simulation based on a set of 500 low energy
models generated using the program FARFAR[22] for the 29-nucleotide apical section of the RNA[26] (Figure 1). The TAR RNA binds the
viral regulatory protein Tat,[27] a critical
transcription elongation factor essential for viral replication. The
Tat binding site surrounds the single-stranded, trinucleotide (UCU)
bulge[27,28] and is contained within the 29-nucleotide
construct. The bulge region interlinking the two helical stems is
the primary binding site for the Tat protein[28,29] and exhibits considerable flexibility[4,30−32] allowing for the two helical regions to adopt a wide range of relative
orientations.[4,33] Protein binding is believed to
occur by “conformational capture”, where the free TAR
RNA exchanges between multiple conformers, one or more of which are
amenable to Tat binding.[34] TAR RNA also
provides a common RNA structural motif, where two helices are connected
by a single-stranded bulge on one end and a backbone “hinge”
at the other. Other RNA’s exhibit similar structures, such
as K-turns,[35] or the HIV-1 RRE (rev-response
element).[36,37] It is therefore worthwhile to characterize
the dynamics of such fundamental motifs, especially to characterize
“large-scale” motions (as opposed to more localized
motions) of one helical domain relative to another.
Figure 1
29-nucleotide apical
stem-loop of HIV-1 TAR RNA: (A) secondary
structure; (B) sample tertiary structure. The different submotifs
and the component residues used in the analyses are color-coded: upper
helix (turquoise), lower helix (red), and the single-stranded bulge
(green).
29-nucleotide apical
stem-loop of HIV-1 TAR RNA: (A) secondary
structure; (B) sample tertiary structure. The different submotifs
and the component residues used in the analyses are color-coded: upper
helix (turquoise), lower helix (red), and the single-stranded bulge
(green).Earlier work in our group used
solid-state NMR to identify intermediate
rate motions for the residue U38 in the upper helix.[4] We more recently characterized the motions of U40 and U42
in the lower helix (Wei Huang, unpublished data). These data indicate
that the two helices move relative to each other at slow rates (∼105 to 106 s–1) relative to the
rotational diffusion rate of the entire molecule (∼108 s–1). These results motivated us to look at the
distributions of the orientations of the upper helix relative to the
lower helix among the set of lowest energy structures, as characterized
by the Euler angles of an upper helix-attached reference frame relative
to a different frame attached to the lower helix. Reduction to a set
of three angles characterizes many features of the dynamical trajectory
of this particular structural motif, while simplifying a very large-dimensional
problem to a set of three essential coordinates. Local base librations,
i.e., rotations of the bases around a base normal (representing vibrations
of the base around the equilibrium base-paired orientation) or rotations
of the base around the glycosidic bond (for the single stranded bases),
are included in the simulations as well and provide atomistic detail
regarding the motions of individual residues.Here we extend
our prior studies of nonrigid rotation and helix
reorientation in TAR RNA to a full description of the concerted dynamics
of the upper and lower helices and the bulged loop. Our approach describes
a dynamic trajectory based on an ensemble of energy minimized structures
with the Rosetta program FARFAR. Because FARFAR does not provide a
Boltzmann-weighted distribution of states (and thus does not provide
the entropy and free energy) our calculations rely on fitting phenomenological
parameters to the data. To reduce the complexity of the problem, a
predominant set of conformers is selected that describes its dynamic
conformational ensemble. We used experimental residual dipolar couplings
(RDCs) obtained from partial alignment of the TAR RNA by one alignment
medium to achieve this selection, and other groups have recently used
similar filtering techniques.[18,38,39] A summary of some of these techniques has been published recently.[40] In addition to filtering out a set of long-lived
conformers along the stochastic trajectory, we acquire information
regarding their relative populations. We complement the RDC filter
with principal component analyses (PCAs) based on a judiciously chosen
set of backbone torsion angles, as done to establish conformational
clustering and free energy landscapes[41] of RNAs[12] and proteins,[39,42] and a PCA can serve as a useful aid in setting up dynamics calculations.Finally,
we use the outcome of that analysis to calculate solution
spin–lattice relaxation times T1 and the rotating-frame
spin–lattice relaxation time T1ρ for 13C spins in nucleotides located in the upper and lower helices
and in the bulge. The protocol would allow extensions to other observables
as well, but the current work focuses on these to provide a clear
proof of method. We direct the interested reader to more extensive
work on other NMR relaxation parameters and their simulations.[30,43] We use two methods to perform these calculations. The first approach
(slow exchange or “SE” model) assumes an effectively
infinite time scale of exchange (i.e., the conformational exchange
is much slower than any other relevant motional time scale). The second
approach (general rate or “GR” model) allows for an
arbitrary time scale of exchange (including one that overlaps with
the rotational diffusion time scale). By combining complementary experimental
and analytical techniques into a single framework, we have been able
to construct a viable dynamic trajectory for the TAR RNA. We provide
a flowchart of the methodology in Figure 2.
Figure 2
Protocol
used in this manuscript for the simulation of domain-dynamics
in a molecule.
Protocol
used in this manuscript for the simulation of domain-dynamics
in a molecule.
Theoretical
and Computational Methods
Solid-State NMR Models for
the Lower Helix of
TAR
We present here the solid-state NMR (ssNMR)-derived models
that served as motivations for the solution relaxation simulations.
ssNMR studies of the dynamics of the uridine bases U38, U23, and U25
in TAR were carried out using samples selectively deuterated at the
5- and 6-carbon base sites.[4,44] U38 was chosen to represent
the dynamics of the upper helical stem, whereas U23 and U25 are of
interest on account of their positions in the single-stranded bulge.
More recently, 5,6-2H labels were introduced at the U40
and U42 positions in TAR, in correspondence with the lower helix (U42)
and an unstable base pair that closes the bulge region (U40) (Wei
Huang, unpublished data). The data included line shapes as well as
T1Z and T1Q relaxation times collected on samples
hydrated in all cases to 16 water molecules per nucleotide to reproduce
conditions where motions were shown to be solution-like.[4,44] Each of the upper helical and single-stranded sites investigated
had characteristically distinct spectral features. Within the lower
helix, U40 and U42 showed results similar to each other, but distinct
from the upper helix or single-stranded sites. Motional models generated
to fit the data included a slower base motion consisting of jump between
two equally populated sites, superposed on a faster motion occurring
around the normal to the base-plane for the helical residues and around
the glycosidic bond for the bulge residues. The analyses of the spectra
recorded for the upper and lower helical sites were done independently
resulting in two different sets of parameters.The U38 base
was modeled as undergoing a two-site ±4° jump process around
the base-normal at a rate of 2 × 108 s–1 in addition to a two-equivalent-site conformational exchange process,
where the upper helix underwent a 9° bend and a 15° twist
at a rate of ∼106 s–1 relative
to a crystal-fixed frame.[4] A similar model
could fit the data for the lower helix bases U40 and U42 as well,
resulting in a 0–18° bending motion and a 18–25°
twisting motion of the lower helix, but at slower rates on the order
of 105 s–1. By analyzing relaxation data,
we observed small amplitude (±6° to ±9°) local
motions of the base at a rate of ∼108 s–1 for both U40 and U42. The U23 data were fit by significantly different
models, involving a local two-site jump about the glycosidic bond
of ±11° at a rate of ∼1010 s–1, and a 24° hop of the base at a rate of ∼108 s–1, whereas U25 was modeled as experiencing a
30° jump at 6 × 107 s–1 in
addition to a much slower twisting of the base (6 × 105 s–1) with a large amplitude of ±40°.We utilized the same two-site jump models for the local base motions
of both the helical and bulge residues, under the rationale that solid-state
sample conditions should be able to replicate solution conditions
at least in the small-amplitude local motions at the hydration conditions
of our studies. We also used the time scale of the local base jumps
as a starting point for simulations of the solution conditions. Finally,
we used the observation that conformational exchange motions in solution
conditions occur on a time scale similar to that in solid-state samples.
This is reflected in our use of the “slow exchange”
formalism for part of the relaxation simulations, which assumes an
exchange process much slower than all other motional rates. We did,
however, also consider the more general case of an arbitrary rate
of conformational exchange.
Solution-State Simulations
Structures
were generated using the program FARFAR.[22] The torsion angles that were altered in the generation of the structures
used here were those of the residues in the bulge and the bulge-adjacent
base pairs of the two helical domains. The 500 lowest energy structures
represented a distribution in energy of about 13 Rosetta units, where
1 Rosetta unit is approximately 1 kBT (Rhiju Das, private communication).To reduce the
dimensionality of the problem, we assumed that the simple motif of
a helix–bulge–helix can be represented by a reduced
set of relative helical orientation parameters. A set of three Euler
angles transforming between the two helices is taken to be sufficient
to discriminate between helical conformations (Figure 3) if the helices behave as rigid units.[33,45]
Figure 3
Relation
between the Euler angles determined from the atomic coordinates
and the associated domain motions.
Relation
between the Euler angles determined from the atomic coordinates
and the associated domain motions.We have used experimental residual dipolar couplings (RDCs)
for
several bonds along the molecule. Because RDCs are long-time weighted
averages over all conformational states of the molecule, they potentially
provide means of extracting both the best-fit conformers and their
relative populations. The structures that were selected by this RDC-filtering
process, along with the best-fit populations, were subsequently used
to simulate the solution relaxation times.We incorporated the
local base motions obtained by fitting the
ssNMR data into the RDC-selected structures and calculated the solution
relaxation times using previously published methods.[6,7] When a small amount of variation for the local amplitudes and rates
of motion, and for the rates of conformational exchange between the
selected structures, was allowed, the solution relaxation times for
the 17 residues in the molecule could be calculated (7 in the lower
helix, 3 in the bulge, and 7 in the upper helix) with this method.
Helical Parametrization of Structures
The relative
orientations of the upper and lower helices were quantified
by considering the orientation of the normal vector of the U38 base
relative to a frame defined by a z-axis aligned with
the lower helical axis. The method of evaluating this relative orientation
and the subsequent binning of structures within this scheme is as
follows:Define
the upper and lower helical
axes for all structures using the program 3DNA.[46] The upper helical axis is taken to be the average local
helical axis of the A27-U38::G28-C37 and G28-C37::C29-G36 dinucleotide
steps, where the base pairs flanking the bulge were excluded due to
possible distortions from ideal A-form helical structure. The lower
helical axis is calculated similarly as the average over the C19-G43::A20-U42
and A20-U42::G21-C41 dinucleotide steps.Define the lower helix coordinate
frame (LHF) by choosing the lower helical axis (calculated above)
as the z-axis and the perpendicular from the z-axis to the G43 C8 atom as the y-axis.
This choice of the y-axis was made as the 500 structures
did not differ in the orientations of the first few base pairs (including
C19-G43), so the y-axis would be the same across
all structures. However, the specific choice of the G43 C8 atom for
this purpose was arbitrary.Calculate the α angles for each
structure, defined as the angle between the projection of the upper
helical axis onto the xy-plane of the LHF and the x-axis of the LHF.Calculate the β angles, defined
as the angle between the upper helical axis and lower helical axis.The γ angle is then
defined
by the orientation of the normal of U38 base (the vector perpendicular
to the C4–C2 and C6–C2 bonds) about the upper helical
axis. Extracting this information from the structures requires first
removing the α and β dependence by rotating the original
U38 base-normal vector v⃗U38norm about the fixed LHF axes as follows: v⃗U38norm′ = R⃡(−β)R⃡(−α)v⃗U38norm. The resultant vectors are distributed
around the LHF z-axis as a function of their γ
angles. The Euler angles described above are related to the domain
motions as shown in Figure 1.Bin the 500 structures as a function
of the Euler angle set {α, β, γ}. The bins were
chosen in 10° increments for α (0° ≤ α
≤ 360°) and β (0° ≤ β ≤
180°) angles, resulting in 36 and 18 bins, respectively. Instead
of binning the γ angle in terms of degree increments, we fixed
the number of bins to 5 because of an observed correlation between
the α and γ angles, which results in a shift in the values
of γ for every α bin, as well as the restriction of the
γ values to only a portion of the full phase space. Thus, trying
to bin all possible γ angles would have unnecessarily increased
the computational time.
RDC Constraints
To select a subset
of structures for multisite jump simulations from a starting set of
500 energy-minimized structures generated by FARFAR, we use experimental
RDCs[47] in conjunction with the predictive
program PALES.[48] We only consider the data
from a PEG/Hexanol mixture[49] because this
uncharged medium aligns the charged RNA only through steric hindrance
of the overall rotation. This situation does not require a detailed
characterization of the RNA charge density, as would be required for
the simulation of RDCs measured in charged alignment media. Extension
to charged media data may be considered in the future.The purely
steric version of PALES was used to calculate the eigenvalues and
eigenvectors of the Saupe alignment tensor of the molecule by sampling
multiple allowed orientations in the presence of a flat (infinitely
large) obstruction. The alignment tensor eigenvalues are then used
in conjunction with directional information for a particular bond
relative to the alignment tensor principal axis frame (PAS). For the
purposes of simulating the experimental RDCs, the bond orientations
are considered to be static within each conformer. The only motions
to be considered are then the exchanges between distinct conformers.
To include dynamic sampling of multiple conformations, we use the
weighted average of the RDCs of each conformer RDCISTotal = ∑pRDCIS for the RDC of a bond between spins I and S, with
the population of each conformer i being represented by p.The residues considered for
this study were C19, A20, G21, A22,
U40, C41, U42, and G43 from the lower helical stem, U23, C24, and
U25 from the bulge, and G26, A27, G28, C29, G36, C37, U38, and C39
from the upper helical stem. The bond types included the C6–H6
bond (pyrimidines), the C8–H8 bond (purines), and the C5–C6
bond (pyrimidines) from the bases, the C1′–N1 (pyrimidines)
and the C1′–N9 (purines) glycosidic bonds, the C1′–H1′
and C4′–H4′ bonds from the furanose rings, and
the C5′–H5′ and C5′–H5″
bonds from the backbone. RDCs for some bond types were not available
for some of the residues. Furthermore, the RDCs for the bulge residues
were considered separately, and not in the same simulations as the
helical residues. This is because the bulge is likely to be significantly
more mobile than the helices and will sample a significantly larger
number of configurations, requiring a different set of simulation
conditions.Effective bond lengths of the species in question
are required
in the final PALES calculation. The values we chose for the bond lengths
are the aliphatic C–H bond length, rCH,Aliph = 1.1 Å,[50,51] the aromatic C–H bond
length, rCH,Arom = 1.09 Å (average
of Ying et al.[51] and Allen et al.[52]), the aromatic C–C bond length, rCC,Arom = 1.4 Å,[53] the glycosidic C–N bond length for the cytosines, rCN,GlycCyt = 1.47 Å,[53] and the glycosidic C–N bond lengths for the remaining
base types, rCN,GlycRest = 1.48 Å.[53]To gain a geometric perspective on the
best-fit structures from
the RDC comparison, we binned the simulated couplings for each structure
according to the {α, β, γ} angles determined for
the 500 structures. The RDCs within each bin were then uniformly averaged
to produce a single bin-RDC for each bond type and residue, rather
than retaining the values for individual structures. This was done
because, although the structures within each bin have similar helical
conformations, they may differ in the orientations of bonds of certain
residues relative to the large-scale conformations. By averaging over
these variations in the bonds, we effectively included in the simulations
the small amplitude thermal fluctuations of the atomic bond orientations.To select the best-fit set of structures, we started with an arbitrary
initial set of N structures and allowed the choice
of the (N + 1)th angular bin to float while attempting
to optimize the total χ2 as well as the Pearson’s
correlation coefficient between the simulated and experimental RDCs.
In addition, we varied the relative weights of the (N + 1) bins. Once the best-fit parameters were obtained for this first
iteration, the (N + 1)th bin so obtained replaced
the lowest probability structure from the remaining N, and the search was repeated for a second iteration and beyond.
To make the final choice of N, we started with N = 2 and after minimizing the χ2 for a
choice of two structures, we added a third and repeated the process.
Proceeding thus, we found that N = 5 structures (i.e.,
5 bins which also happened to have only one structure in each of them)
were sufficient to produce the best-fit to the RDC data, with no improvement
after the fifth iteration. In addition, as a separate, independent
check of the results of the previous procedure and to allow for the
variation of the number of bins more easily, we generated a Markov-chain
Monte Carlo (MCMC) simulation to search through the bins and populations
for a best-fit to the RDCs, with the potential for varying the number
of bins. The number of parameters for an N-conformer
search was 2N – 1 (N bin
choices + N – 1 probabilities to be floated).
The Markov-chain Monte Carlo method did not yield better results than
the iterative technique.
PCA of the Torsional Degrees
of Freedom
To determine the number of exchanging conformers
required to describe
the dynamics in TAR RNA, and to corroborate the results of our RDC
filter, we performed a principal component analysis (PCA), following
procedures applied to molecular ensembles of proteins,[39] and RNAs.[12,41,42] The covariance matrix σ = (q – q)(q – q)
was calculated as described by Mu et al.,[42] i.e., by proposing the following variable set {q2}:where φ is the jth torsion angle of interest,
and Ntorsion is the number of torsion
angles used in the analysis. The use of the cosine and sine functions
removes complications associated with the periodicity of the torsion
angles by helping to uniquely identify particular values of the angles.The covariance matrix of these variables is calculated with averages
over the full ensemble of structures and is subsequently diagonalized.
The eigenvalues (and associated eigenvectors) are arranged in descending
order, with the highest values representing modes with the largest
contributions to the structural scatter. In our results, we have found
that 2 or 3 modes contain a majority (∼70%) of the total variance
in the data. In addition, histograms of the projections of the ensembles
for each eigenvalue are examined for Gaussianity. As discussed in
the above references,[12,41] a mode whose histogram consists
of a single Gaussian-like peak only represents continuous fluctuations
about a central structure, whereas multimodal distributions describe
discrete conformations separated by free-energy barriers. Thus, from
the perspective of assessing the conformational transitions of the
molecule, only modes with multimodal distributions are considered
relevant. This adds another layer of information regarding the molecular
dynamics. Our results for PCAs with different choices of torsion angle
sets (as described in the following) indicate that the first 2 or
3 modes were non-unimodal in distribution and therefore of primary
significance in describing conformational exchanges.We performed
several PCAs, each with a different choice of coordinates.
The first set included all torsional “suites”[54] along the bulge and hinge region (PCA method
1). The sugar-to-sugar suite as defined in the above reference consists
of the set of 6 backbone torsion angles {ε, ζ, α,
β, γ, δ} as well as the glycosidic torsion angle
χ. In our study, we focused attention only on the 6 backbone
torsion angles. The residues included were A22, U23, C24, U25, G26,
C39, and U40. These were meant to encompass the conformationally relevant
part of the molecule under the assumption of relatively rigid helices.
The number of torsion angles considered was therefore 42 (7 bases
×6 torsion angles) resulting in an 84 dimensional PCA (because
the cosine and sine functions of the angles are treated as separate
variables). However, the PCA of this set resulted in several modes
which contributed substantially, with no clear clusters in any of
the largest modes.The inclusion of the single-stranded bulge
region may have caused
the lack of structure in the PCA projections calculated in this first
method, because single-stranded regions are significantly more flexible
and may add a level of disorder to the torsional distribution. To
isolate the interhelical motions, we evaluated the PCA of only those
torsional “suites” that extend from U38 to C39 and from
C39 to U40 (PCA method 2). These suites include only the hinge region
of the molecule, and resulted in a 24-dimensional PCA.To check
the robustness of the clustering results obtained from
method 2, we then carried out a series of PCAs by successively including
an additional sugar-to-sugar backbone suite: U38 through C41 (PCA
method 3), U38 through U42 (PCA method 4), C37 through U42 (PCA method
5), and C37 through U40 (PCA method 6). The results of these analyses
and the relation to the results from the RDC fit will be discussed
in the Results below.
Relaxation Time Simulations
The evaluation
of the solution relaxation times is based on two previously explored
techniques.[6,7,55] Both methods
involve the calculation of the two-time correlation function for the
orientation of an atomic bond located within a nonrigid Brownian rotator
relative to a fixed laboratory frame. The evaluation proceeds by introducing
an intervening reference frame associated with the principal axis
system (PAS) for the rotational diffusion tensor, which is time dependent
as a result of exchange between different structural conformers. The
correlation function then becomes dependent on Wigner rotation matrices
whose arguments are the Euler angles that orient the rotational diffusion
tensor of the molecule relative to the laboratory frame. In turn,
the Fokker–Planck equation in three-dimensions allows the evaluation
of the transition probabilities from one set of Euler angle orientations
to another. The form of the Fokker–Planck equation, which accounts
for coupling between rotational diffusion and conformational changes,
iswhere P(Ω⃗,β,t|Ω⃗0,α) is the probability
that the molecule will transition from a diffusion tensor-to-laboratory
frame orientation of Ω⃗0 at time 0 and in
a conformational state α to an orientation of Ω⃗
and a conformational state β at time t, given
diffusion tensor elements of Dβ in the
conformational state β and an exchange rate of R between conformational
states β and γ. This equation can also formally cover
the case of continuous transitions to new conformational states by
allowing the number of conformers to be infinite. However, in all
cases considered in the original references, and here as well, only
discrete jumps will be considered.If the exchange rate is much
slower than the rate of diffusion and all other time scales of exchange
processes, yet faster than the relaxation rates themselves, the rotational
diffusion problem is effectively decoupled from conformational exchange.
Under these conditions, the SE model applies, and it is possible to
calculate the relaxation rates for the molecule as the weighted-average
over the relaxation rates for the individual conformational states
of the molecule:[6]Expressions for the
NOEs can also be derived by appropriate weighting
of the NOEs of individual conformational states. The procedure to
calculate the relaxation times for each conformer are provided in
the original reference.Solution of eq 2 for the case of a general
exchange rate (i.e., the GR model) between conformers involves considerably
more analytical and computational processing.[7,55] We
have solved this problem for the case in which the eigenvectors of
the diffusion tensors of the exchanging conformers are coincident
at the moment of exchange (see Ryabov et al. for general case[56]) and correlation functions have been published.[55] This general rate analysis also indicates that
the slow exchange regime occurs at time scales longer than about 1
μs.For the carbons considered in this analysis, the rotating
frame z-relaxation time T1ρ was
measured instead of T2.[4,30] Under
the application of a weak spin-lock field and the assumption of Lorentzian
spectral densities, T1ρ contains
the same spectral density information as T2.[57] Therefore, for the purposes of this
paper we operate under this assumption and simulate the T2 relaxation times.The structure set and the corresponding
populations preselected
by the RDC filter are the basis for simulations of the relaxation
times using eq 2. Eigenvalues and eigenvectors
of the rotational diffusion tensor are first calculated using the
public-domain program HYDRONMR.[58] Then
orientations of the atomic bonds of the residues of interest are calculated
with respect to this axis system, i.e., the principal axis system
of the rotational diffusion tensor. The orientational parameters,
together with the diffusion tensor eigenvalues are input into the
two algorithms we have derived for simulations of the relaxation times
of nonrigidly rotating macromolecules: (a) the “slow-exchange”
formalism,[6] describing the case where the
conformational jumps occur at a rate much slower than the rate of
overall rotational diffusion of the molecule, and (b) the general
rate formalism,[7,55] where arbitrary rates of exchange
are allowed. The “slow-exchange” formalism, though merely
a limiting case of the general rate theory, has the advantage of being
significantly faster and easier to implement and so is considered
here.The residues considered were A20, G21, A22, U40, C41,
U42, and
G43 from the lower helical stem, U23, C24, and U25 from the bulge,
and G26, A27, G28, C29, G36, U38, and C39 from the upper helical stem.
In the current work, we have only simulated the motions of the bases
of these residues: the C6–H6 bonds for the pyrimidines and
the C8–H8 bonds for the purines.The parameters used
in the simulation include the atomic element
radius (AER),[58,59] the radius of the beads used
in the HYDRONMR calculation of the diffusion tensor, and the bond
lengths. The AER was chosen to be 2.3 Å and the bond lengths
for the carbon–hydrogen bonds for the aromatic bases were chosen
to be 1.1 Å, both choices having been justified in previously
published work.[6] The viscosity was chosen
to be 1.096 cP[60] to correspond to the conditions
of the solution experiments (99.9% D2O at 25 °C).We have also incorporated the two-site base motions inspired by
simulations of the solid-state NMR (ssNMR) data: the so-called “base
libration” occurs around a vector normal to the plane of the
base in the case of helical residues, whereas the two-site motion
is modeled to be around the glycosidic bond for the bulge residues.
We floated the values of the rates and amplitudes of these two-site
jumps relative to the ssNMR models, which were found to be on a time
scale much shorter than that of the conformational exchange. For the
slow exchange simulations, these internal, local motional rates and
amplitudes were the only free parameters, whereas in the general rate
simulations we floated the conformational exchange rates between the
states. The fitting procedures were carried out in a combination of
grid-searches and MCMC techniques.
Results
RDC Filter of the Structural Ensembles
Five structures
provided the best χ2 values to fit
the RDC data, as obtained by iteratively searching through the bins
and updating the choices of bins and relative populations. They are
shown in Figure 4, and key characteristics
are summarized in Table 1. For simplicity,
the structures will be referred to by their respective bend angles.
Thus, the highest population structure will be called the “45°
structure”, the second highest populate structure as the “61°
structure”, and so on.
Figure 4
Five structures obtained by the RDC-filtering
procedure to represent
the ensemble of Tar conformation that describes the experimental data,
together with their interhelical Euler angles and population percentages.
Table 1
Twist of Upper Helix
Relative to Lower
Helix (α), Interhelical Bend Angle (β), Twist of Upper
Helix around Its Own Helical Axis (γ) (Figure 3)
structure
α (deg)
β
(deg)
γ (deg)
% relative population
1
185
45
191
29
2
185
61
189
27
3
282
115
194
26
4
67
76
326
13
5
244
132
116
5
Five structures obtained by the RDC-filtering
procedure to represent
the ensemble of Tar conformation that describes the experimental data,
together with their interhelical Euler angles and population percentages.The χ2 for the best-fit
set of structures was
11 460 for a set of 48 RDCs, with 9 degrees of freedom (5 bin
choices and 4 probabilities), for a reduced χr2 = 302 (=11460/(48
– 9 – 1)), whereas the Pearson’s correlation
coefficient was 0.72. It is to be noted that the χ2 represent unweighted values; i.e., the discrepancies between the
experimental and simulation RDC values are not inversely weighted
with the error bars of the RDC values (which have not been calculated).
This assumption is equivalent to assuming that the error on all measurements
is 1 Hz, which is likely to be a considerable underestimation of the
true error bars. The comparison between experiment and simulation
is shown in Figure 5. In the figure, the RDCs
from different bond types are clustered together for each residue.
We further attempted an MCMC fit procedure with the possibility of
6, 7, or 8 structures but were unable to improve upon the fit. It
is possible that this fit may be improved by the continuation of the
MCMC procedure to a greater numbers of iterations, but the PCA analysis
reported in the following section provides further corroboration that
the model found by RDC fitting represents the conformational landscape
of the molecule well.
Figure 5
Comparison of the experimental RDCs (red triangles) with
the RDCs
generated by the best-fit simulation parameters (blue circles) for
the helical residues in HIV-1 TAR RNA. The residues associated with
the RDC values are labeled as well. The values shown include those
for backbone (C5′–C5′, C5′–H5″),
furanose (C1′–H1′, C4′–H4′),
glycosidic (C1′–N1 for pyrimidines, C1′–N9
for purines), and base (C5–C6 and C6–H6 for pyrimidines,
C8–H8 for purines) bonds.
Comparison of the experimental RDCs (red triangles) with
the RDCs
generated by the best-fit simulation parameters (blue circles) for
the helical residues in HIV-1 TAR RNA. The residues associated with
the RDC values are labeled as well. The values shown include those
for backbone (C5′–C5′, C5′–H5″),
furanose (C1′–H1′, C4′–H4′),
glycosidic (C1′–N1 for pyrimidines, C1′–N9
for purines), and base (C5–C6 and C6–H6 for pyrimidines,
C8–H8 for purines) bonds.Plotting the calculated RDCs against the experimental values
(Figure 6), we found that the trend, on average,
is toward
an underestimation of the RDCs by the simulations. The dashed blue
line in Figure 6 is the best-fit line to the
data and has a slope of 0.4 and a y-intercept of
−3.9. An underestimation of the RDCs may arise from using a
smaller degree of alignment in the simulations than in the actual
experimental situation. One possible source of this discrepancy may
be the current assumption of a simple steric model for the alignment
of the molecule by PEG/hexanol. Recent work[61] has proposed that there are subtleties in the alignment process,
including the possible contributions of complex alignment medium topology
and electrostatic alignment, that are not incorporated in simulations
using only the basic steric version of the PALES algorithm.
Figure 6
Plot of calculated
RDC values vs experimental RDC values for the
best-fit set of structures and populations. The dashed blue line is
a linear fit to the data, and the solid red line is the ideal case
where the calculated and experimental values match perfectly.
Plot of calculated
RDC values vs experimental RDC values for the
best-fit set of structures and populations. The dashed blue line is
a linear fit to the data, and the solid red line is the ideal case
where the calculated and experimental values match perfectly.This result is a cautionary statement
in the application of simple
steric models in the simulation of potentially complex alignment media.
We attempted to fit RDC data collected in glucopone/hexanol mixtures
and in Pf1filamentous bacteriophage media and found that the models
selected were different (two or three of the structures chosen were
the same compared to the PEG/hexanol model). One obvious reason for
this was the availability of RDC data for different bonds in the different
media. However, there is potentially a fundamental difference in the
alignment properties of the media as well. For example, the Pf1 Phage
medium is negatively charged[62] and, thus,
as may be the case with PEG/hexanol as well, the alignment has an
electrostatic component. This must be taken into account more carefully
in future analyses.To further discern the causes of the discrepancy
in the fit, we
looked at all helical RDC values that contributed to a deviation of
magnitude 10 or greater and found a set of 18 RDCs: 8 correspond to
C1′–H1′ furanose ring bonds, 2 to C4′–H4′
furanose ring bonds, and the remaining 6 to C6H6/C8H8 base bonds,
occurring in 12 helical residues in both the upper and lower stems
as well as among all four nucleotide types. Removal of these RDCs
gave χ2 = 802 and χr2 = 40. The large deviations observed
for furanose ring bonds indicate the existence of additional motions
localized to the furanose rings, as has been reported for DNA,[5] that have not been accounted for using the current
set of 500 structures. These motions involve an exchange between the
C2′-endo and C3′-endo conformations, and on the time
scale of the RDCs these motions may be averaged out to produce an
intermediate conformation. It is likely that the furanose ring samples
these conformations even for the residues stacked in a helical configuration.
A similar argument holds for those base bonds that show a large discrepancy
in RDC values: there may be additional vibrations in the base orientations
that are not adequately sampled by the 500 structures.We also
compared the five structures selected above to the set
of 20 lowest energy TAR RNA structures recently generated on the basis
of NOE data[47] that have not been constrained
by RDC data. Upon calculating helical axes and orientations in the
same manner as for the 500 FARFAR structures in this manuscript, we
find that 11 out of 20 of the structures, including 3 out of the 5
structures that best fit the NOE data, occur within the same 10°
bend angle bin as the highest population structure from the RDC fit
described above. Thus, we believe that our approach identifies a predominant
conformation set.To test the robustness of the search algorithm,
we performed two
simulations of fitting a reduced data set upon the random removal
of (a) 10 RDCs and (b) 15 RDCs (different RDCs were deleted in each
of these two cases). Removing the first set of 10 resulted in the
selection of the same 5 conformers as from the full set, along with
a sixth new structure with a population of 4%. The populations of
the 5 full-set best-fit structures were slightly different (maximum
change of 12%). Removal of 15 RDCs reproduced 4 out of the 5 full-set
best-fit structures, and two new conformers with populations of 7%
and 5%. The maximum change in population among the 4 best-fit structures
was 7% in this case. These results indicate that the choice of structures
is robust to a reduction in the size of the experimental data set.Jumps between the five conformers shown in Figure 4 require a combination of bending and twisting about either
the lower or the upper helices. Among the entire set of energy-minimized
structures, there was an observed correlation between the α
and γ angles, as has been reported previously.[45,63,64] These correlations may be reflected
even in the jumps among this set of five structures, representing
a free energy landscape where the exchanges between minima involve
coupled shifts in Euler angle values.
PCA Clusters
We carried out a series
of PCAs to (a) identify the choice of torsion angles that best captures
interhelical motions, (b) corroborate the RDC-filtered set by overlaying
the five chosen structures on the clusters obtained from the PCA of
choice, and (c) identify jump matrix elements for the exchanges between
the five RDC-filtered structures in the dynamics calculation.Given the lack of clear clustering results from PCA method 1, which
incorporates all backbone torsion angles in the bulge and hinge, we
examined whether torsion angles on one side of the helical joint would
suffice to describe the interhelical reorientation (PCA method 2,
incorporating the backbone torsion angle suites between U38 and U40).
When this was done, only the first two principal components (PCs)
contributed significantly, accounting for 75% of the fluctuations
in the molecule (Figure 7). Furthermore, these
two PCs were the only ones with a multimodal probability distribution
across the 500 structures. This non-single-Gaussian distribution signals
the presence of conformational clusters in energy minima separated
by significant free energy barriers.[12,41] The map of
these two PCs is shown in Figure 7A and shows
the presence of three to four major conformational clusters. The large
red dots superimposed on the 2D plot correspond to the five structures
selected from the RDC filtering procedure. Parts B and C of Figure 7 show the population distribution histograms for
PC 1 and PC 2, respectively. Again, the positions of the five structures
from the RDC filter are indicated by red arrows.
Figure 7
Principal component analysis
of the torsion angle “suites”
along the hinge (residues U38 through U40). (A) Principal components
1 and 2 for the 500 structures, represented by blue dots, with the
positions of the five best-fit structures marked explicitly by red
dots, and further encircled for clarity. (B) Histogram of the first
principal component (corresponding to the largest eigenvalue of the
covariance matrix), with the positions of the five best-fit structures
marked explicitly by red arrows. (C) Histogram of the second principal
component (corresponding to the next-to-largest eigenvalue of the
covariance matrix), with the positions of the five best-fit structures
marked explicitly by red arrows.
Principal component analysis
of the torsion angle “suites”
along the hinge (residues U38 through U40). (A) Principal components
1 and 2 for the 500 structures, represented by blue dots, with the
positions of the five best-fit structures marked explicitly by red
dots, and further encircled for clarity. (B) Histogram of the first
principal component (corresponding to the largest eigenvalue of the
covariance matrix), with the positions of the five best-fit structures
marked explicitly by red arrows. (C) Histogram of the second principal
component (corresponding to the next-to-largest eigenvalue of the
covariance matrix), with the positions of the five best-fit structures
marked explicitly by red arrows.Three points are noteworthy. First, the five structures chosen
as the best-fit set match up well with the main conformational clusters
obtained from this PCA, suggesting that our RDC-filtered set has captured
the relevant information about the major conformational clusters of
the molecule. Second, structures with similar interhelical bend angles
have similar values of each of the principal components. The principal
component values of the two structures with bend angles of 115°
and 132° occur in close proximity to each other. The same is
true of the pair of structures with bend angles of 61° and 76°
(it is true, however, that the 45° structure does not differ
significantly from the 61° structure; the PCA suggests that there
is a free energy barrier that separates even these two neighboring
structures). This is important because other PCA methods (described
below) separate structures with similar interhelical orientations,
possibly due to the presence of additional degrees of freedom that
do not contribute significantly to interhelical reorientation. Finally,
the sums of the probabilities of the best-fit structures within each
cluster are similar to each other: the 45° bend structure has
a population of 29%, the 61° and 76° structures have a joint
population of 40%, and the 115° and 132° structures have
a joint population of 31%. Because the histogram heights do not correlate
well with the nearly uniform probability distribution, we fit the
jump rates numerically, as described below.Subsequent attempts
at testing the robustness of our cluster analysis
yielded further interesting results. PCA method 3 (U38 through C41)
showed a marked difference in clustering of the structures. Three
PCs contributed significantly, yielding about 70% of the total fluctuations,
with all three now being multimodal. However, the five structures
do not all seem to fall within major clusters. Moreover, the 61°
and 76° structures no longer fall within the same cluster, nor
do the structures with bend angles of 115° and 132°. This
occurs because of the intervention of the torsion angles associated
with U40. When the helical parameters of the 500 structures were searched,
about 37% of the structures U40 did not form a canonical Watson–Crick
pairing with A22, indicating considerable conformational variability
of this residue, at least within the physical picture generated by
the energy-minimization ensemble. To test this hypothesis, we extended
the PCA up to U42 (PCA method 4) on the lower stem and up to C37 on
the upper stem (PCA method 5). Both these methods gave very similar
clustering to method 3, with three significant PCs. We interpret this
observation as confirmation that, after the inclusion of U40, the
remaining residues behave fairly rigidly and do not change the results
of the PCA. As a final confirmation of this conclusion, we carried
out a PCA including only torsion angles from C37 to U40 (PCA method
6). The clustering results for this PCA proved to be the same as for
PCA method 2.Given the change in clustering associated with
the inclusion of
the U40 degrees of freedom, we utilized the results of PCA method
2 to set up the rate matrix for the relaxation time simulations. In
general, for N conformers, the number of combinations of pairwise
rate constants that need to be fit is C2. Given the clustering suggested by
the PCA, we reduce the fit problem from 10 (=5C2 for the five RDC filter structures) to five parameters
in the following manner. For pairs of structures that occur within
the same cluster in the two PC distributions, we allow for only one
distinct exchange rate between both members of the pair in that cluster
and any structure in another cluster. In addition, there is also one
intracluster exchange rate for each cluster. Thus, the rates used
in the fitting process were for the following exchanges:These exchange processes are
shown graphically in Figure 8. The assumption
in this parameter reduction is
that all the structures within a cluster are separated from other
clusters by similar free energy barriers.
Figure 8
Exchange between clusters
inferred from the PCA, together with
the five rates used in the jump matrix for the relaxation time simulations.
Exchange between clusters
inferred from the PCA, together with
the five rates used in the jump matrix for the relaxation time simulations.
Calculation
of Solution Relaxation Times
The RDC-filtered conformer set
of five structures, together with
the relative probabilities, were used to calculate the T1, T2 and NOE values. We used
two different approaches to calculate the C6–H6 (pyrimidine)
and C8–H8 (purine) relaxation times: (a) the slow exchange
method, where the assumption is that the exchanges occur at an infinitely
slow rate (compared to the rotational correlation time), and (b) the
general rate method, where we fit the relaxation times by allowing
the exchange rates to vary arbitrarily.
Slow Exchange
Method
The general rate
analysis has shown[7] that the slow exchange
regime in TAR RNA effectively occurs for time scales longer than about
1 μs. Thus, the results of this subsection assume conformational
exchanges occur on a scale longer than 1 μs. As a starting point
for the fitting process, we use rates and jump amplitudes for the
base librations close to those obtained to fit the solid-state NMR
data of the uridines,[4,6,7] and
changed both the rates and amplitudes in small increments to improve
on the χ2. Using the fit to the T1 values
as a benchmark, we found that it suffices to fit only two base-libration
rates, one for the upper helix residues and one for the lower helix
residues. This simulation model was inspired by the results of the
two solid-state NMR analyses. The exceptions, however, are the parameters
for U40 and U42 in the lower helix. We obtain a significantly better
fit by using the rates and jump amplitudes for the upper helix for
these two residues, indicating that these residues are more similar
in local base motion to the upper helix. We found that there were
several nearly degenerate minima or best-fit “windows”
in the χ2 plot for the local motion parameters, with
the upper and lower helical parameters behaving independently. The
rates in these “windows” vary between ∼107 s–1 and ∼109 s–1, whereas the amplitudes are less than 20°.We did attempt
simulations of additional models of base-libration such as a treatment
of the rates of purines and pyrimidines independently, and the assumption
of a constant rate across the entire molecule. The model described
above was the best among the three considered. There is always the
possibility that more complex models of base libration rates may fit
the data better; for example, we may treat the libration rate of each
individual residue as an independent parameter, or choose the purines
and pyrimidines in the lower helix as independent from the purines
and pyrimidines, respectively. However, this would increase the number
of free parameters in the problem, and we chose the above model as
a balance between an arbitrary increase in free parameters and an
attempt at a physically realistic representation.The following
representative values of the local motion parameters
simulate the relaxation times well:upper helix, U40 and U42 base-libration
rate = 4.6 × 107 s–1upper helix, U40 and U42 base-libration
jump amplitude = 13.7°lower helix (without U40 and U42)
base-libration rate = 6.6 × 108 s–1lower helix (without
U40 and U42)
base-libration jump amplitude = 9.8°The match between the experimental and simulation relaxation T1 and T2 values
is shown in Figure 9. For quantitative comparison,
we calculated the root-mean-square deviation (RMSD) across the 14
helical T1 values to obtain an RMSD of
5.3 ms. The corresponding RMSD for T2 comparisons
was 1.5 ms.
Figure 9
Relaxation time simulations for the C6–H6 (pyrimidine) and
C8–H8 (purine) bonds using the slow exchange method, and comparisons
of residuals (discrepancies relative to experimental values) to statistical
error bars: (A) T1 simulations (blue circles) compared
to the experimental T1 values (red triangles); (B) T2 simulations (blue circles) compared to the experimental T2 values (red triangles); (C) Difference between simulation
T1 and experimental T1 values, together with
the statistical error bars on experimental data (red dashed lines)
at ±3.2 ms; and (C) Difference between simulation T2 and experimental T2 values, together with the statistical
error bars on experimental data (red dashed lines) at ±0.5 ms.
Relaxation time simulations for the C6–H6 (pyrimidine) and
C8–H8 (purine) bonds using the slow exchange method, and comparisons
of residuals (discrepancies relative to experimental values) to statistical
error bars: (A) T1 simulations (blue circles) compared
to the experimental T1 values (red triangles); (B) T2 simulations (blue circles) compared to the experimental T2 values (red triangles); (C) Difference between simulation
T1 and experimental T1 values, together with
the statistical error bars on experimental data (red dashed lines)
at ±3.2 ms; and (C) Difference between simulation T2 and experimental T2 values, together with the statistical
error bars on experimental data (red dashed lines) at ±0.5 ms.The error bars shown in Figure 9 describe
the statistical error in the measurements. However, the potential
systematic error, not quantified by the authors,[30] is larger. Yet, even at this level of uncertainty there
is a consistent difference between the best-fit local jump amplitudes
for the upper and lower helices, with the lower helix amplitudes being
larger on average (this is not reflected in the parameter set shown
above but is true for the best-fit windows in general).Thus,
the solution relaxation times can be fit to an RMSD close
to the statistical error in the experiments by assuming a slow exchange
rate (i.e., slower than 106 s–1) between
the five structural configurations of TAR RNA shown in Figure 4 with populations determined by RDC filtering and
further corroborated by principal component analysis. The model assumed
base-libration rates that range between 107 and 109 s–1, and libration-rate-dependent jump
amplitudes less than about 20°. The solid-state NMR parameters
for the local base-libration parameters fall within the windows described
above, consistent with the notion that solid-state experiments are
able to capture the solution-state local motions accurately.
General Rate Method
Although the slow
exchange method provides an approximate scale for exchange processes,
the general rate method could further resolve the values of the exchange
rates between conformers. For ease of comparison, we used the local
base motion rates and jump amplitudes from the slow-exchange fit,
whose results were shown in Figure 9. Although
several combinations of 5 PCA-inspired rates were found that could
be fit to the 14 helical T1 and 14 helicalT2 values with comparable RMSD’s, in
all cases the inter- and intracluster exchange rates were on the order
of 104–105 s–1, which
confirms the validity of the slow exchange approximation. For example,
the set of inter- and intracluster exchange rates in Table 2 yield the base relaxation times shown in Figure 10 with an RMSD for T1 values of 5.2 ms and an RMSD for T2 values
of 1.4 ms. Interestingly, there is no clear distinction between the
intercluster and intracluster rates, as may be expected from a significant
difference in the free energy barriers.
Table 2
Intercluster
(k12, k13, k23) and Intracluster
(k2, k3) Exchange
Rates As Defined in Figure 8 Used To Produce
Relaxation Time Simulations in
Figure 10a
k23
k12
k23
k3
k2
1.2 × 104 s–1
4.0 × 104 s–1
6.1 × 104 s–1
3.4 × 104 s–1
4.6 × 105 s–1
Local
base motion parameters
are the same as for Figure 9.
Figure 10
Relaxation time simulations for the C6–H6
(pyrimidine) and
C8–H8 (purine) bonds using the general rate method with inter-
and intracluster exchange rates from Table 2, and using the same local base motion parameters as for Figure 9: (A) T1 simulations
(blue circles) compared to the experimental T1 values (red triangles); (B) T2 simulations (blue circles) compared to the experimental T2 values (red triangles); (C) difference between
simulation T1 and experimental T1 values, together with the statistical error
bars on experimental data (red dashed lines) at ±3.2 ms; (C)
difference between simulated and experimental T2 values, together with the statistical error bars on experimental
data (red dashed lines) at ±0.5 ms.
Local
base motion parameters
are the same as for Figure 9.Relaxation time simulations for the C6–H6
(pyrimidine) and
C8–H8 (purine) bonds using the general rate method with inter-
and intracluster exchange rates from Table 2, and using the same local base motion parameters as for Figure 9: (A) T1 simulations
(blue circles) compared to the experimental T1 values (red triangles); (B) T2 simulations (blue circles) compared to the experimental T2 values (red triangles); (C) difference between
simulation T1 and experimental T1 values, together with the statistical error
bars on experimental data (red dashed lines) at ±3.2 ms; (C)
difference between simulated and experimental T2 values, together with the statistical error bars on experimental
data (red dashed lines) at ±0.5 ms.To summarize, if we fix the local motion parameters to the
best-fit
set obtained using the slow-exchange method but float the conformational
exchange rate parameters, we obtain a slight improvement in the quality
of the fit. Best fits to experimental relaxations using general rate
theory are achieved with inter- and intracluster exchange rates on
the order of 104–105 s–1, thus justifying the slow exchange approximation of the prior section.
We did not float both local motion parameters and conformational exchange
parameters simultaneously, but it is reasonable to assume that the
results will be similar, especially when we constrain the scale of
the local base motion parameters to those observed under solid-state
conditions.
Bulge Residues
In addition to helical
residues, we also simulated the relaxation times for the bulge residues
using the slow exchange and general rate algorithms. In Table 3, we present relaxation time simulations assuming
exchange between the conformations shown in Figure 4 using the slow exchange and general rate algorithms. The
base rotation rate is selected as 5 × 108 s–1 and the amplitude as 15° (to yield good matches to the T1 values). For the general rate simulations,
inter- and intracluster exchange rates shown in Table 2 were assumed.
Table 3
Relaxation Times
for the Bulge C6–H6
(Pyrimidine) and C8–H8 (Purine) Bonds Simulated Using Both
the Slow Exchange and General Rate Methods
residue
experimental T1 (in
ms)
experimental T2 (in
ms)
slow exchange T1 (in ms)
slow exchange T2 (in ms)
general
rate T1 (in
ms)
general rate T2 (in
ms)
U23
328.3
29.1
322.5
24.3
321.9
24.3
C24
320.8
35.2
321.5
24.4
320.8
24.4
U25
no experimental
values: assumed similar to C24
312.0
25.3
311.5
25.4
We obtain simulated T1 values that
are within 2% of experimental data, but simulated and experimental T2 values for U23 deviate by about 16%; for C24,
the relative deviation is even greater. This is likely due to the
fact that the T2 relaxation time has a
spectral density component (the J(0) term) that makes
this observable sensitive to motions much slower than the Larmor frequencies
of carbon-13 and protons (which are on the order of nanoseconds).
The fact that we have been unable to capture the T2 values may indicate that there are additional slower
motions of these relatively underconstrained residues that are missing
from the conformer set we have selected.Although we have been
able to successfully match most of the solution
relaxation times to almost within the statistical error bars using
the RDC-filtered conformer set, we must address the basic question,
are the relaxation times sensitive to motions occurring at rates on
the order of microseconds or slower? Though the T1 time has no spectral density dependence slower than
a time scale of a nanosecond, the T2 are
determined by slower motions as well and their expressions contain
a dependence on the J(0) spectral density. More importantly,
the slow exchange and general rate methods depend on the fact that
the time scales of rotational diffusion of many molecules (including
the TAR RNA considered here) overlap with the time scales to which
both T1 and T2 are sensitive. Thus, two different conformers, with slightly different
diffusion tensors will have different characteristic relaxation times
when calculated separately. Even the slow exchange averaging process
will result in a unique linear combination of relaxation rates that
becomes discernible when enough data points are compared. The general
rate theory loses some sensitivity to dynamics for rates much slower
than a microsecond, but our least-squares simulations for relaxation
times have shown that there is still discernible information to be
gained at these time scales.
Discussion
In this manuscript, we introduce a methodology based on energy-minimized
structures that ties together structural and dynamic data, as well
as solid-state and solution NMR, to build a dynamic trajectory for
the HIV-1 TAR RNA to atomic-level detail. The protocol uses (a) solid-state
NMR data to acquire information about local motions of the bases,
(b) solution NMR RDC data to identify conformational states and their
relative populations, (c) PCA analyses to identify degrees of freedom
relevant to the overall reconfiguration of the molecule, as well as
to corroborate the clustering of structures and choose parameters
for the dynamics analysis, and (d) solution NMR relaxation time simulation
techniques previously developed to simulate experimental data and
fit the jump rates between the molecular conformers. The experimental
data utilized covers a wide spectrum of motional time scales, from
the picosecond scale for solution relaxation times and micro- to nanosecond
times derived from solid-state NMR line shapes to the submillisecond
time scale investigated with RDCs. The method reproduces relaxation
data at multiple helical residues within the molecule using only five
structures out of a set of just 500 possible conformers.The
advantage of this approach lies in the coverage of multiple
time scales, including long time scales that are difficult to sample
with MD, and the ease with which energy-minimized conformers may be
obtained for small-to-midsized molecules using public-access software
like the Rosetta suite of programs.[65] The
use of these structures inverts the problem of dynamics relative to
MD methods: instead of starting with an initial structure and running
the clock forward from t = 0, we filter out long-lived
structures using experimental data and interconnect them in a stochastic
trajectory. Thus, the results from such a technique would prove valuable
to corroborate MD-based simulations, and the protocol could provide
a less computationally intensive alternative to extended molecular
dynamic simulations.We have been able to simulate relaxation
times of most of the helical
residues in the molecule with limited conformational sampling from
an already small set of structures. The success in matching the experimental
data indicates that, to gain an understanding of the gross motional
properties of RNA, it is sufficient to sample the limited phase space
of the particular structural motifs that constitute the molecule.
This protocol was designed to derive a geometric picture of interhelical
reorientation based on the limited conformational space available
to the torsion angles in and around the bulge and hinge regions. The
assumption that the helices are rigid and the requirement to close
the loop formed by the single stranded bulge, adjacent helical base-pairs
and the hinge backbone should restrict conformational possibilities
for the entire molecule. Steric hindrances and limitations on the
“stretchability” of the single-stranded region would
further impose constraints on molecular reorientation.[45,63,64] In practice, we found that parameter
space for the reorientation of one helix relative to the other was
expanded by the possibility of one of the bulge adjacent base pairs
opening up. Among the energy-minimized structures, a significant number
had a missing A22-U40 base pair. The U40 base often forms a stabilizing
interaction with U25 instead and, among the full set of 500 structures,
sometimes with U23 and C24 as well. We allowed these possibilities
to occur in our sample set to reflect fluctuations in the residue
orientations, as well as the impacts of these fluctuations on overall
molecular conformation. Thus, we believe that models generated using
well-vetted potential energy functions can identify sites where new
intramolecular bonding and conformational variability might occur.Furthermore, we made a conscious choice to characterize the structural
bins by their Euler angles. This choice of parameters has been made
previously[33,66] to enhance reproducibility and
comparability to other analyses of the molecule. Such a parametrization
represents the core ingredient of most molecular analyses: reduction
of the dimensions of the problem to render it tractable. Molecular
studies often aim to distill out a few degrees of freedom that are
implicated in determining either the structure or dynamics of the
system, and many different techniques (Ramachandran plots,[67] phenomenological models,[4,5] PCA
analyses[12,42]) are directed toward identifying a minimal
set of relevant coordinates.
Assumptions in the Protocol
The methodology
relies on the assumption that energy-minimized structures sufficiently
populate the available conformational space, i.e., on the assumption
of ergodicity. If ergodic behavior holds, then a sufficiently representative
characterization of the energy landscape of the phase space will allow
a calculation of the requisite time averages of observables. In the
case of the TAR RNA, the structural motif (helix–bulge–helix)
is fairly simple, and it is possible to cover a large region of the
interhelical orientation space with a relatively small number of structures.
For more complex structural motifs, it would be necessary to generate
a sample set that covers both the space of molecular reorientations
and the range of conformations of local residues relative to a fixed
large-scale molecular orientation. Even in the current work, it is
possible that we have under-sampled the full range of conformations
available to the bulged loop. A richer sample of both interhelical
orientations and orientations of bulge residues relative to particular
interhelical orientations may improve the RDC fit.A second
assumption is that the slow exchange (SE) and general rate (GR) methods
assume coincidence of the diffusion tensors at the moment of exchange.
This is not a significant problem for conformers that are not significantly
different, but it could pose problems for conformers that are widely
separated in conformational space. We have not attempted to quantify
the actual deviation in relaxation times on account of this assumption.
This issue could be addressed in combination with molecular dynamics
simulations.Finally, it bears mentioning that there is a 2-fold
degeneracy
in the choice of the unit eigenvectors of the rotational diffusion
tensors, with the negative of a given choice of unit vector being
acceptable as the eigenvector as well. The choice of eigenvectors
does not change the results for the slow exchange formalism (the expressions
are invariant to such changes) but does impact the general rate theory
expressions. For example, keeping the z-axis the
same, the two choices of a right-handed coordinate system vary by
180° and would artificially introduce such an extra jump into
the calculations. The means of consistently dealing with such a jump
is to track the diffusion tensors as a function of changes in the
shape of the molecule, either visually or geometrically and ensure
that there is no additional change in the diffusion tensor orientations
due to axis inversions. In our particular case, we tested our calculations
by artificially inverting the orientation of the diffusion tensors
of some of the structures and found very small changes in the relaxation
times as a result: a change of at most ∼0.25 ms in T1 and at most ∼0.02 ms in T2. These changes would not significantly impact the conclusions
of this manuscript and a detailed analysis is omitted here.
Comparison with Previous Results
We
examined the five structures selected from the RDC-filter in light
of unbound TAR RNA structures generated using NOE constraints (but
not RDC constraints) by one of the authors.[47] These 20 energy-minimized structures, termed the TAR2013 series,
have β angles in the range 38–59°, but the five
lowest energy structures cover a smaller range of 45–53°.
This range corresponds well to the bend angle of highest population
structure we have obtained (i.e., β = 45°). In fact, the
lowest energy TAR2013 structure has an α value of 202°
and a γ value of 225°, similar to the “45°
structure” in our current analysis.We previously published
two studies where the new methods developed to simulate relaxation
times[6,7,55] were applied
to U38 relaxation data and used to select regions of interhelical
motional parameter space that fit the data. The models consisted of
two-site jumps between the lowest energy structure of 1ANR and structures
artificially modified from that structure to reflect changes in interhelical
orientation. The closest approximation to a two-site jump in this
manuscript is found by considering only exchanges between pairs of
conformers within the three most populated structures (the 45°
structure, the 61° structure, and the 115° structure, which
are almost equal in population). The exchange between the 45°
and the 61° structures involves a bend angle modification of
16°, and a twist angle about the upper helix of 2°. Cross-checking
this parameter set against the results of applying the general rate
theory,[7,55] we find that the U38 data was fit by a two-site
jump model with a twist of 0° and bend angle between 5°
and 12° (among other possible models). Thus, the 45° and
61° structures fit the profiles of two structures selected previously
on the basis of the U38 data alone. Exchanges between either of these
structures and the 115° structure, however, are of a magnitude
not simulated in the previous studies.We can also make a few
basic comparisons to the results of Dayie
et al.,[43] even though the authors consider
the HIV-2 TAR RNA, as both molecules consist of a helix–bulge–helix
motif. Given our focus on the C6 and C8 atoms in the current work,
we first observe a similarity in the fact that in the relaxation data
used in this work (from Bardaro et al.[30]) the helical residues seem to have similar 13C (C6 and
C8) T1 relaxation times, a fact observed
in the work of Dayie et al. as well. However, the U2313C T1ρ is nearly half the value
of the corresponding U25 time in Dayie et al., whereas Bardaro et
al. report values much closer in magnitude. Also, the magnitudes of
the T1 and T1ρ relaxation times in the two papers are different. We mention these
facts to bring up three relevant considerations in interpreting the
results of the two different sets of experiments: (a) the obvious
difference made by the presence of only two residues in the bulge
in HIV-2 RNA versus three in the HIV-1 RNA; (b) the fact that the
relaxation times of Dayie et al. include a component from the C5–C6
dipolar coupling for pyrimidines, whereas this coupling is explicitly
suppressed in Bardaro et al. (see Shajani and Varani[68]); (c) the use of a model-free analysis in Dayie et al.
versus the SE and GR methods used herein. Notwithstanding these caveats,
the common results we can extract are that both papers observe significant
flexibility in the U23 and U25 residues and rigid, slow motions in
the helices.
Comparison with Extended
MD Simulations
To examine the extent to which our approach
matches results obtained
by the MD approach, we compare our results to those of Salmon et al.,[18] where the authors describe the selection of
conformers from 8.2 μs MD simulations of the TAR RNA using Pf1
phage-aligned RDC data sets. It was reported that the best fit to
RDC data is obtained with a set of 20 conformations selected from
the full MD ensemble. Though it is not possible to compare the absolute
values of the interhelical bend and twist angles due to differing
methods of characterizing the helices and their relative orientations,
we can compare the spans of the angles reported for their ensemble
to those in ours. The bend angles in the Salmon et al. ensemble of
20 span 88° (from 3° to 91°) whereas those in our ensemble
span 87°, the rotations of the upper helix about the lower helical
axis span 191° in their ensemble whereas those in our set span
215°, and the rotations of the upper helix about its own symmetry
axis span 224° in their ensemble and 210° in ours. Thus
the span of angles obtained by the two approaches are in excellent
agreement. Moreover, the full, prefilter ensembles in both papers
show correlations between the α and γ twist angles. A
more fine-grained comparison relates to the behavior of individual
residues. We have already mentioned that the A22-U40 base pair is
often found to be open among the full set of 500 structures. We also
find that, among our five RDC-filtered structures, four (the 45°,
76°, 115°, and 132° structures) lack a A22-U40 base
pair. However, the G26-C39 base pair is maintained in all five of
these structures. Salmon et al. find a similar asymmetry between the
A22-U40 base pair and the G26-C39 base pairs in their RDC-selected
ensemble, with the former adopting a broader conformational distribution
and the latter being in an A-form helix-like conformation.Differences
are nonetheless observed with regards to the bulge conformation. Salmon
et al. report the occurrence of three clusters within their RDC-selected
ensemble: a 66% population cluster with A22 stacked on U23, a 19%
population cluster with U23 flipped out, and a 15% cluster with paired
U25-U40 and unpaired U23 and C24. With regard to the third cluster,
nearly 30% of the 500 energy-minimized structures used in our analysis
show a U25-U40 pair, with three of the RDC-filtered structures (the
45°, 76°, and 132° structures) included in this list
as well. The 115° structure simply lacks the A22-U40 pair and
does not have any alternative pairings of either residue. Salmon et
al. stated that the U25-U40 pair is predicted to be the second most
energetically favorable bulge conformation in MC-fold. A visual inspection
of our structures shows the following behavior for the bulge:The 45° structure
has U23 flipped
into the interhelical space but not stacked, C24 is flipped in but
not stacked and U25 is paired with U40.The 61° structure has U23 stacked
on A22 and has C24 and U25 flipped out.The 76° structure has U23 flipped
out, C24 stacked with U25, and U25 paired with U40.The 115° structure has U23 flipped
out, C24 flipped in and stacked close to U25, and U25 flipped in and
stacked only with C24.The 132° structure has U23 and
C24 flipped in but not stacked and has U25 paired with U40.Thus, only one of the structures in our
ensemble (the 61°
structure with a 27% population) has a significant A22-U23 stacking
interaction and two have U23 flipped out (39% total population), a
clear deviation from the results of Salmon et al., suggesting that
the conformational variability of this region is more than can be
captured by a small number of sampled structures. A solution to this
problem is to generate energy-minimized structures where the interhelical
orientation is fixed (or nearly so) and the bulge flexibility is evaluated
under the constraint of fixed end points. Such an analysis would establish
the inherent conformational flexibility of the bulge.
Confirming Hydrodynamics Calculations
To cross-check the
results of the program HYDRONMR, we recalculated
the diffusion tensors using the program BEST,[69] which tessellates the solvent-accessible surface of the molecule
and calculates the various diffusion properties using a finite element
analysis. The molecule was uniformly hydrated to a hydration shell
thickness of 1.1 Å.[70] The eigenvalues
of the rotational diffusion tensors (in ascending order) of the two
programs are compared in Table 4.
Table 4
Comparison of Rotational Diffusion
Tensor Eigenvalues As Calculated by HYDRONMR and BEST
structure
Drx1 (BEST) (107 s–1)
Dr1 (HYDRONMR) (107 s–1)
Dr2 (BEST) (107 s–1)
Dr2 (HYDRONMR) (107 s–1)
Dr3 (BEST) (107 s–1)
Dr3 (HYDRONMR) (107 s–1)
45°
1.764
1.826
1.807
1.873
3.523
3.805
61°
1.549
1.617
1.591
1.658
3.176
3.428
76°
2.460
2.503
2.506
2.561
3.323
3.426
115°
2.119
2.152
2.134
2.175
3.415
3.553
132°
1.783
1.789
1.807
1.809
3.310
3.407
We find that the diffusion
tensor eigenvalues as found by BEST
were uniformly smaller than those found by HYDRONMR, indicating that
HYDRONMR underestimated the hydration effect relative to BEST. The
unit eigenvectors were very similar between the two programs. We subsequently
calculated the relaxation times using the slow exchange formalism
and found a T1 shift of at most 18 ms
and a T2 shift of at most 1.2 ms, corresponding
to a shift of about 4% of the experimental values for both relaxation
times. This may modify the choice of parameters described above, but
we believe that the impact will not be substantial.
Conclusions
We have carried out a characterization of the
essential dynamics
of the TAR RNA molecule using techniques with time scale sensitivities
ranging from subnanosecond (solid-state and solution relaxation times)
to millisecond (RDCs). We have been able to capture the long-time
scale behavior of the conformational exchange processes that characterize
this molecule and fit experimental relaxation times very well, with
exchanges between discrete conformers occurring at time scales longer
than 1 μs. The similarities of results of this method with those
of extended MD simulations provide independent corroboration of our
conformational analysis. Further computational explorations and sample-size
increases will enhance the results obtained by this methodology.
Authors: John R Horton; Gary Ratner; Nilesh K Banavali; Niu Huang; Yongseok Choi; Martin A Maier; Victor E Marquez; Alexander D MacKerell; Xiaodong Cheng Journal: Nucleic Acids Res Date: 2004-07-23 Impact factor: 16.971
Authors: Kresten Lindorff-Larsen; Robert B Best; Mark A Depristo; Christopher M Dobson; Michele Vendruscolo Journal: Nature Date: 2005-01-13 Impact factor: 49.962
Authors: Maximillian H Bailor; Anthony M Mustoe; Charles L Brooks; Hashim M Al-Hashimi Journal: Curr Opin Struct Biol Date: 2011-04-14 Impact factor: 6.809
Authors: Prashant S Emani; Gregory L Olsen; Dorothy C Echodu; Gabriele Varani; Gary P Drobny Journal: J Phys Chem B Date: 2010-11-10 Impact factor: 2.991