RNA is central to the proper function of cellular processes important for life on earth and implicated in various medical dysfunctions. Yet, RNA structural biology lags significantly behind that of proteins, limiting mechanistic understanding of RNA chemical biology. Fortunately, solution NMR spectroscopy can probe the structural dynamics of RNA in solution at atomic resolution, opening the door to their functional understanding. However, NMR analysis of RNA, with only four unique ribonucleotide building blocks, suffers from spectral crowding and broad linewidths, especially as RNAs grow in size. One effective strategy to overcome these challenges is to introduce NMR-active stable isotopes into RNA. However, traditional uniform labeling methods introduce scalar and dipolar couplings that complicate the implementation and analysis of NMR measurements. This challenge can be circumvented with selective isotope labeling. In this review, we outline the development of labeling technologies and their application to study biologically relevant RNAs and their complexes ranging in size from 5 to 300 kDa by NMR spectroscopy.
RNA is central to the proper function of cellular processes important for life on earth and implicated in various medical dysfunctions. Yet, RNA structural biology lags significantly behind that of proteins, limiting mechanistic understanding of RNA chemical biology. Fortunately, solution NMR spectroscopy can probe the structural dynamics of RNA in solution at atomic resolution, opening the door to their functional understanding. However, NMR analysis of RNA, with only four unique ribonucleotide building blocks, suffers from spectral crowding and broad linewidths, especially as RNAs grow in size. One effective strategy to overcome these challenges is to introduce NMR-active stable isotopes into RNA. However, traditional uniform labeling methods introduce scalar and dipolar couplings that complicate the implementation and analysis of NMR measurements. This challenge can be circumvented with selective isotope labeling. In this review, we outline the development of labeling technologies and their application to study biologically relevant RNAs and their complexes ranging in size from 5 to 300 kDa by NMR spectroscopy.
RNA
is central to medicine, chemical and structural biology, and
basic research. For more than a half-century, it has been known that
the code of life is imprinted in DNA sequences, following the so-called
“sequence hypothesis”, usually wrongly labeled as the
“central dogma” in the popular parlance.[1] In the last several decades, it has become increasingly
clear that the functions of cells are also transacted by DNA’s
lesser-known relative, RNA.[2] Indeed, the
varied roles that RNAs play in both normal and dysfunctional cells
have motivated RNA-based therapeutic development, as highlighted by
the recent SARS COV-2 mRNA vaccines.[3−9] Additionally, RNAs are central to the workings of molecular nanomachines
such as the ribosome[10−12] and the spliceosome[13−15] to name a few. Moreover,
thanks to the advent of genomic sequencing efforts, we now understand
that the amount of RNA sequence transcribed in humans exceeds the
number of protein sequences translated by at least 50-fold (Figure A).[16] Paradoxically, the number of RNA-only structures deposited
in the Protein Data Bank (PDB) remains below 1%, whereas the number
of protein-only structures is a staggering 87% (Figure B). This paucity undercuts current understanding
of RNA structure–function relationships.
Figure 1
(A) Percentage
of protein coding and nonprotein coding genomic
material in selected genomes.[16] Organismal
complexity increases with RNA coding but decreases with protein coding
capacity as a percentage of the DNA genomic output. (B) Percentage
of RNA-only and protein-only structures deposited in the PDB. Given
that this analysis excluded DNA-only structures and structures of
protein–DNA/RNA complexes, the percentages do not sum to 100%.
(C) Percentage of RNA-only and protein-only structures deposited in
the Nucleic Acid Database (NDB) and PDB, sorted by structure determination
technique. Given that this analysis is self-contained within categories,
the percentages sum to 100%. NMR accounts for a larger fraction of
RNA structures as compared to proteins. PDB and NDB statistics were
accessed from https://www.rcsb.org/ and http://ndbserver.rutgers.edu/ in January 2022.
Nuclear magnetic
resonance (NMR) spectroscopy accounts for ∼35%
of the RNA structures deposited in the PDB and ∼7% of the protein
structures, making it competitive with other biophysical tools such
as X-ray crystallography and more recently cryo-electron microscopy
(cryo-EM) (Figure C).[17] Moreover, NMR spectroscopy provides
high-resolution structural dynamic information in solution, rendering
it an ideal tool to study RNA and its interactions with macromolecules
or small drug-like compounds or both.[18−25] However, unlike proteins, which are made up of 20 unique amino acid
building blocks, RNAs are composed of only four aromatic nucleotides
[i.e., adenosine (Ade or A), guanosine (Gua or G), cytidine (Cyt or
C), and uridine (Uri or U)] that resonate over a very narrow chemical
shift region. This poor chemical shift dispersion is further exacerbated
with increasing RNA size. To overcome these limitations, novel isotope
labeling strategies that incorporate atom-specific labels (e.g., uridine 13C6) or expand the number of NMR probes beyond the traditional 1H–15N and 1H–13C spin pairs (e.g., 13C–19F) have been
developed.In this review, we will outline the development of
isotope labeling
technologies for RNA NMR and some of the exciting new applications
enabled by these labels to study small-to-large RNAs. Specifically,
we will begin by detailing the benefits afforded by each common NMR-active
isotope (Section ).
Next, we outline the various technologies that incorporate such labels
into RNA building blocks and eventually into RNA (Section ). This discussion will center
around chemo-enzymatic labeling, a method that our group has extensively
developed for the past near-decade. Next, we will examine how these
labels benefit dynamics measurements (Section ) and can be leveraged to study interactions
involving large RNA systems (Section ). Finally, to conclude, we will comment on how isotope
labeling can advance the field of RNA chemical and structural biology
(Section ).
Stable Isotopes in NMR Spectroscopy
Frederick Soddy
is credited with coining the word “isotope”
from the Greek isos (i̋σος) and topos (τóπος)
meaning “same place”,[26] with
the idea that stable isotopes are chemical elements that occupy the
same position in the periodic table but differ in mass due to a different
number of neutrons within the atomic nucleus. Stable isotopes have
been used in a wide range of applications in industry, academia, and
medicine.[26] In particular, stable isotopes
have significantly impacted methods such as NMR and mass spectrometry
(MS). For this work, we will focus on how these probes impact RNA
NMR spectroscopy, with special emphasis on proton (hydrogen-1 or 1H), deuterium (hydrogen-2 or 2H), carbon-13 (13C), nitrogen-15 (15N), fluorine-19 (19F), and phosphorus-31 (31P) (Table ).
Table 1
Stable
Isotopes Relevant to RNA NMR
Spectroscopy[27,28]
isotope
natural abundance (%)
γ (rad Hz T–1)
spin
1H
99.99
26.752 × 107
1/2
2H
0.01
4.107 × 107
1
12C
98.90
NMR inactive
NMR
inactive
13C
1.10
6.728 × 107
1/2
14N
99.63
1.934 × 107
1
15N
0.37
–2.713 × 107
1/2
19F
100.00
25.181 × 107
1/2
31P
100.00
10.839 × 107
1/2
Proton Isotope
The proton isotope
has high natural abundance (∼100%) and the highest sensitivity
of NMR receptive and stable nuclei (Table ). Therefore, homonuclear two-dimensional
(2D) 1H–1H NMR methods were attractive
in the early days of NMR analysis. However, the very limited resolution
of ribose and aromatic nucleobase resonances in the RNA 1H spectra restricted such studies to small RNAs (<5 kDa). Within
the ribose, all protons with the exception of H1′ (i.e., H2′,
H3′, H4′, H5′, and H5′′) are clustered
within a narrow ∼0.6–0.8 ppm range (Figure A).[29] Within the nucleobase, the chemical shift distribution of all protons
is limited to 1 ppm or less, except for imino protons with a dispersion
of ∼4 ppm (Figure A).[30,31] Taken together, the distribution
of proton resonances leads to severe chemical shift overlap that worsens
as RNAs grow in size due to increased line broadening (Figure B). This, in part, explains
the paucity of NMR structures of large RNAs (e.g., > 60 nt) (Figure C).
Figure 2
(A) 1H NMR
spectrum of a 61 nt RNA emphasize the narrow
chemical shift dispersion of RNA protons. Here, bp and nc refer to
canonical Watson–Crick base pair and noncanonical base pairs,
respectively. A schematic of RNA ribose and nucleobase structures
and numbering are shown above the spectrum. (B) Nucleobase region
of 1H NMR spectra for RNAs of increasing size. Both signal
overlap and broad linewidths worsen as RNAs grow in size. In fact,
for the best visual representation, the signals corresponding to the
61 and 232 nt RNAs were increased to display them on a similar scale
to that of the 14 nt RNA. (C) Histogram of RNA NMR structures in the
NDB, sorted by RNA size (in nt, bin = 10 nt). Given the challenges
faced by RNA NMR, there are only 23 NMR structures corresponding to
RNAs > 60 nt. NDB statistics were accessed from http://ndbserver.rutgers.edu/ in January 2022.
Heteronuclear 15N and 13C Isotopes
Unlike protons, with a chemical shift span of
2–15 ppm, 15N and 13C nuclei in nucleic
acids have larger chemical shift distributions among the various atomic
sites. For example, 13C nuclei in RNA have chemical shifts
from 61 (C5′-ribose) to 170 (nonprotonated pyrimidine nucleobase
C4) ppm, and 15N nuclei from 70 (amino nitrogen) to 240
(nonprotonated purine nucleobase N7) ppm.[29−31] Introduction
of the 15N isotope (0.37%) into RNA nucleobases circumvents
the extensive line broadening arising from the electric quadrupole
moment of the naturally abundant 14N isotope (99.63%) (Table ). Incorporation of
the 15N isotope has several additional advantages. As a
spin 1/2 nucleus with low gyromagnetic ratio (γ) (Table ), the 15N isotope
provides very narrow spectral lines. Nitrogen atoms, like protons
and carbon, are distributed in nucleic acid major and minor grooves,
and both grooves serve as important sites for metal, drug, or macromolecule
interactions. However, given the wider chemical shift dispersion of 15N over the 1H nucleus and its narrower linewidths
over 13C and 1H nuclei, 15N is more
suited to monitor those grooves, especially in larger RNAs. However,
nitrogen’s low-γ is also an “Achilles heel”.
In the absence of appropriate NMR cryogenic probes and the availability
of high magnetic fields, detecting low-γ nuclei such as 15N has been very unattractive. Increasing the availability
of such probes is expected to reverse this trend. Nevertheless, these
considerations suggest that the shortcomings of proton NMR can be
overcome by heteronuclear NMR methods.[32]Beginning in the 1980s, several groups introduced 15N, 2H, and 13C labels to facilitate NMR studies
of RNAs and proteins.[33−42] Depending on the scientific question, these labels were introduced
uniformly or selectively using bacteria in vivo or
enzyme catalyzed synthesis in vitro. Selective enrichment
was achieved by growing auxotrophs on obligate chemically synthesized
compounds. 13C-labeling of bacterial tRNAs[33−35] and 15N-labeling of tRNA and 5S rRNA enabled various
atomic sites in these RNAs to be monitored by NMR. Uniform 15N-labeling was also applied to 5S rRNA in vivo.[36−39] To extend this labeling to additional RNAs, several research groups
developed in vitro methods to convert ribonucleoside
5′-monophosphates isolated from bacteria grown on 15N-, 2H-, and 13C-sources into the corresponding
triphosphates for in vitro transcription.[43−47] These uniform 15N- and 13C-labeling technologies
did extend the use of NMR to medium-sized RNAs (MW < 20 kDa). However,
two perennial challenges of low signal-to-noise and decreased spectral
resolution remained. The latter problem arises from the reintroduction
of spectral overlap along the heteronuclear dimension as the RNA grows
in size, and the former arises from increased relaxation that results
from the slower overall tumbling of large biomolecules. The next section
will describe recent labeling methods to overcome both problems.
Deuteration in Context of Heteronuclear 15N and 13C Isotopes
Deuteration (i.e.,
replacement of protons with deuterons) simplifies the multiplicity
of spin–spin interactions, eliminates nonessential resonance
lines, reduces spectral crowding, helps to identify coupling patterns,
and improves calculation of coupling constants with precision.[48] Given the smaller γ of the deuterium spin
relative to proton (γD ≈ γH/6.5) (Table ), the
relaxation rates for deuterated nuclei are scaled proportionally by
2% [(γD/γH)[2] ≈ 0.02]. By eliminating competing relaxation pathways of
dipolar coupled protons, deuteration suppresses spin diffusion within
a relaxation network, leading to smaller linewidths and higher signal-to-noise
for the remaining protons and directly attached 13C and 15N nuclei.[47−49] Given these advantages, 2H-labeling has
played an important role in probing the structure, dynamics, and interactions
of large RNAs by NMR.[17,50−55](A) Percentage
of protein coding and nonprotein coding genomic
material in selected genomes.[16] Organismal
complexity increases with RNA coding but decreases with protein coding
capacity as a percentage of the DNA genomic output. (B) Percentage
of RNA-only and protein-only structures deposited in the PDB. Given
that this analysis excluded DNA-only structures and structures of
protein–DNA/RNA complexes, the percentages do not sum to 100%.
(C) Percentage of RNA-only and protein-only structures deposited in
the Nucleic Acid Database (NDB) and PDB, sorted by structure determination
technique. Given that this analysis is self-contained within categories,
the percentages sum to 100%. NMR accounts for a larger fraction of
RNA structures as compared to proteins. PDB and NDB statistics were
accessed from https://www.rcsb.org/ and http://ndbserver.rutgers.edu/ in January 2022.(A) 1H NMR
spectrum of a 61 nt RNA emphasize the narrow
chemical shift dispersion of RNA protons. Here, bp and nc refer to
canonical Watson–Crick base pair and noncanonical base pairs,
respectively. A schematic of RNA ribose and nucleobase structures
and numbering are shown above the spectrum. (B) Nucleobase region
of 1H NMR spectra for RNAs of increasing size. Both signal
overlap and broad linewidths worsen as RNAs grow in size. In fact,
for the best visual representation, the signals corresponding to the
61 and 232 nt RNAs were increased to display them on a similar scale
to that of the 14 nt RNA. (C) Histogram of RNA NMR structures in the
NDB, sorted by RNA size (in nt, bin = 10 nt). Given the challenges
faced by RNA NMR, there are only 23 NMR structures corresponding to
RNAs > 60 nt. NDB statistics were accessed from http://ndbserver.rutgers.edu/ in January 2022.
Fluorination
in Context of 15N, 2H, and 13C Isotopes
In addition to 2H, magnetically active nuclei such as 19F have
valuable spectroscopic properties that confer clear advantages in
the study of macromolecular structure and conformational changes.[56] These benefits include the 100% natural abundance
of 19F (Table ), a comparably large γ (94% of 1H) (Table ), and a superior
chemical shift dispersion that is ∼6-fold that of 1H.[18,57] Furthermore, 19F is sensitive
to changes in its local chemical environment, making it a useful probe
of conformational changes.[18,56,57] Finally, fluorine has an atomic radius (1.35 Å) slightly larger
than that of a hydrogen (1.20 Å) but slightly smaller than that
of a methyl group (2.00 Å). The 19F nuclei is therefore
expected to substitute for either group without serious structural
perturbations,[58] making it a valuable tool
for the in vitro study of medically important RNAs.[59] Finally, 19F is virtually absent
in biological systems and therefore offers 19F NMR a biorthogonal
advantage of background-free drug screening.[60] Taken together, 19F is an attractive probe for studying
RNAs in solution. Details of new technologies developed to incorporate 19F into nucleobases will be presented in Section , and its utility to expand
NMR studies to larger RNAs will be discussed in Section .
Preparation
of 15N, 2H, 13C, and 19F Isotope-Labeled RNA
A number of companies (Cassia LLC,
Cambridge Isotope Laboratories
(CIL), INNotope, Sigma-Aldrich, and Silantes) offer isotope-labeled
RNA building blocks with uniform and selective labeling. However,
most comprehensive labels are made by academic laboratories using
biochemical, biomass, chemical, and chemo-enzymatic approaches, as
reviewed in the past.[61−67] In this section, we outline promising developments in the chemical
synthesis of isotope-labeled purine [i.e., adenine (Ade or A) and
guanine (Gua or G)] and pyrimidine [i.e., cytosine (Cyt or C) and
uracil (Ura or U) (in RNA) or thymine (Thy or T) (in DNA)] nucleobases
and their incorporation into RNA. The main approaches to obtain isotope-labeled
RNA are enzymatic or solid-phase chemical synthesis. The enzymatic
approach involves DNA template-directed T7 RNA polymerase-based in vitro transcription using ribonucleoside 5′-triphosphates
(rNTPs).[44,45,68−76] The alternative method is chemical solid-phase synthesis using RNA
phosphoramidites (amidites).[77−80] Both methods can use unlabeled and isotope-labeled
building blocks (rNTPs and amidites) to generate versatile RNA labeling
patterns, as recently reviewed.[61,62,66,67]
Chemical
Synthesis of Nucleobases
In this section, we give a general
overview of the chemical synthetic
methods to label RNA nucleobases at specific positions with 15N, 2H, 13C, and 19F isotopes. These
nucleobases can then serve as the building blocks for the synthesis
of the rNTPs or amidites that enable the eventual enzymatic or chemical
production of labeled RNAs of defined sequence and length.
Specific 13C Labeling
Pyrimidine
Synthesis with 15N, 2H, 13C, and 19F Labels
The uracil nucleobase is easily assembled
using a method initially
devised by Roberts and Poulter,[81] later
streamlined by SantaLucia and Tinoco and co-workers,[71] and further improved by Kreutz and co-workers.[82] In the original synthetic eight-step pathway
described by Roberts and Poulter, the 13C label can be
placed in any position of the six-membered ring simply by changing
the 13C-source.[81] SantaLucia
and Tinoco and co-workers streamlined this to a three-step reaction
scheme to make 13C-labeled cyanoacetyl urea from inexpensive
commercially available 13C-labeled precursors.[71] A slightly modified approach from Kreutz and
co-workers uses bromoacetic rather than chloroacetic acid. Bromoacetic
acid is the preferred starting material due to the lower costs and
better handling of the cyanide reagent.[74,82] Other methods
with fewer steps exist such as condensation of malic or propiolic
acid and urea.[83,84] Even though these are straightforward
two-step reactions, execution is not as convenient or cost-effective.Using the Poulter-SantaLucia-Kreutz approach,[71,74,81,82] [1-13C]- and [2-13C]-bromoacetic acid selectively incorporate 13C at uracil C4 and C5, respectively. Use of 13C-urea, on the other hand, delivers 13C at the C2 site,
and that of 13C-potassium cyanide (13C-KCN)
labels the C6 site. Finally, 15N-urea installs 15N at N1 and N3. All possible uracil heteroatom positions can therefore
be labeled in good yields, and these reactions can be easily scaled
to gram quantities.[74,82] An example of a synthetic scheme
using the Poulter-SantaLucia-Kreutz approach[71,74,81,82] is shown for
uracil C6 labeling (Scheme ).[82,85] In brief, bromoacetic acid 1 reacts with 13C-KCN and sodium carbonate (Na2CO3) in a Kolbe nitrile reaction to form 2-[cyano-13C]acetic acid 2. Treatment of 2 with urea in the presence of acetic anhydride (Ac2O)
then yields a urea intermediate 3 that can be readily
converted to [6-13C]-uracil 4 using a palladium
catalyst (e.g., Pd/BaSO4) under hydrogen atmosphere (H2). Given that pyrimidine H5/H6 protons have three-bond scalar
coupling (J ≈ 8 Hz[29])
and strong dipolar coupling (H5–H6 distance of 2 Å) that
complicate NMR experiments, selective and quantitative deuteration
can be achieved by reacting 4 with triethylamine (TEA)
to form the desired [6-13C, 5-2H]-uracil 5.[85] Taken together, 5 was synthesized with four-steps in 63% overall yield (Scheme ).[82,85]
Scheme 1
Synthetic Route to [6-13C, 5-2H]-Uracil[82,85]
Given the valuable spectroscopic
properties of 19F (Section ), uracil can
be fluorinated with the commercially available Selectfluor, as recently
reported.[18,57,86] This synthetic
scheme is similar to that described for uracil C6 labeling (Scheme ),[82,85] except using [2-13C]-bromoacetic acid 6 as
starting material. Kolbe nitrile reaction of 6 forms
an intermediate 7 that reacts with 15N-urea
and Ac2O to yield 8. Addition of Pd/BaSO4 in H2 to 8 then forms [5-13C, 1,3-15N2]-uracil 9, which can
then be fluorinated with Selectfluor to yield [5-13C, 5-19F, 1,3-15N2]-uracil (5FU) 10. Again, selective and quantitative deuteration of H6 can remove
coupling (J ≈ 7.1 Hz[88])
that complicates NMR spectra by heating 10 5FU in sodium
deuteroxide (NaOD) to form [6-2H]-5FU 11.[18,86,87] In summary, 11 was
synthesized in five-steps with a total yield of 38% (Scheme ).[18,57,82,85,86]
Scheme 2
Synthetic Route to [6-2H]-5FU[18,57,82,85,86]
Finally, thymine C6 can be selectively labeled with a three-step
synthesis in a manner similar to uracil labeling (Schemes and 2).[18,57,82,85,86] In brief, bromopropionic
acid 12 is used in a Kolbe nitrile reaction followed
by addition of urea and Ac2O to form intermediates 13 and 14.[89,90] Then reaction of 14 with Pd/BaSO4 in H2 forms the desired
[6-13C]-thymine 15 in 45% overall yield (Scheme ).[89]
Scheme 3
Synthetic Route to [6-13C]-Thymine[89]
Purine Synthesis with C8 Specific Labeling
As with pyrimidines, purine nucleobases can be selectively labeled
with 13C and 15N isotopes using commercially
available precursor compounds. In the early 1990s, SantaLucia and
Tinoco and co-workers described an effective purine synthesis using 13C-formic acid to label purine C8.[71] More recently, Kreutz and co-workers streamlined and improved the
efficiency of such labeling in one-step reactions.[75,85,91] Here, the condensation of 13C-formic
acid 16 with morpholine forms morpholinium formate intermediate
that immediately reacts with either 4,5,6-triaminopyrimidine 17 to yield [8-13C]-adenine 18 (Scheme ) or 2,5,6-triaminopyrimidin-4-ol
sulfate 19 to form [8-13C]-guanine 20 (Scheme ) with 64%
and 94% yield, respectively.[75,85]
Scheme 4
Synthetic Route to
[8-13C]-Adenine[75,85]
Scheme 5
Synthetic Route to [8-13C]-Guanine[75,85]
Purine
Synthesis with C2 Specific Labeling
As with purine C8 labeling,
adenine C2 can be readily labeled.
Labeling C2 is attractive because its chemical shift can monitor protonation
at adenine N1,[92] which cannot be achieved
with 15N NMR experiments due to severe line broadening.[92−94] Unlike the environments of single-stranded RNA, those in structured
RNAs can shift the pKa values of protonated
adenosine or cytidines significantly toward neutrality, serving both
catalytic and structural functions in RNA enzymes.[94−97] The 13C isotope can
be incorporated at the purine C2 site starting with 5-aminoimidazole-4-carboxamide
(AICA) and ethylsodium 13C-xanthate to form [2-13C]-hypoxanthine, [2-13C]-adenine, or [2-13C]-guanine.[98]A preferred alternative for purine C2
labeling uses the method of Battaglia and Ouwerkerk and co-workers,
wherein sodium ethoxide (C2H5ONa) mediates cyclization
of ethyl cyanoacetate 21 with 13C-thiourea 22 to give [2-13C]-6-amino-2-thiouracil 23.[99,100] Unlabeled sodium nitrite (NaNO2) is then used for nitrosylation (the 15N-labeled form
can also be used to introduce a second isotope label) to form 24. Then sodium dithionite (Na2S2O4) mediates the reduction of the nitroso group to yield 25 followed by desulfurization over Raney-Nickel to form the
diaminopyrimidine 26.[101] Treatment of the product with sulfuric (H2SO4) and formic (HCOOH) acids yields [2-13C]-hypoxanthine 27.[102] Subsequent reaction with
phosphorus oxychloride (POCl3) and N,N-dimethylaniline (N,N-DMA) yields [2-13C]-6-chloropurine 28.[103] In the final step, reaction with methanolic
NH3 in a microwave reactor yields the desired [2-13C]-adenine 29 (Scheme ).[100] Alternative purine
synthesis pathways have been devised to enable specific labeling of
adenine C2 or any purine nitrogen position.[98,100,102,104−108] We recently synthesized [7-15N]-labeled 29 through intermediates 21–23 and 15N-labeled intermediates 24–28 using the Battaglia-Ouwerkerk approach[99,100] and demonstrated its utility in NMR analysis of RNA structure and
dynamics (Scheme ).[104]
Scheme 6
Synthetic Route to [2-13C]-Adenine
Adapted with permission from
Dayie and co-workers. Copyright 2020 Springer Nature.[104] Adenine can be labeled at N7 by using 15N-labeled sodium nitrite in the second chemical step.
Synthetic Route to [2-13C]-Adenine
Adapted with permission from
Dayie and co-workers. Copyright 2020 Springer Nature.[104] Adenine can be labeled at N7 by using 15N-labeled sodium nitrite in the second chemical step.
Specific 15N Labeling
Several approaches have been reported for the
synthesis of atom-specific 15N-labeled nucleobases and
nucleosides as well as their incorporation
into the corresponding rNTPs and amidites for RNA synthesis.[98,100−102,104−114] Here, we highlight those methods that allow streamlined 15N-labeled nucleobase synthesis in high yield. These labeling patterns
permit direct monitoring of Watson–Crick base pairs or analysis
of interconverting duplex, triplex, and quadruplex structures by multidimensional
NMR.[110−113]
Pyrimidine N1, N3, and N4 Labeling
As described above, using the Poulter-SantaLucia-Kreutz approach,[71,74,81,82]15N-urea delivers 15N at uracil N1 and N3
sites. Cytosine labeling, on the other hand, occurs through uracil,
given that the corresponding CTP can be built directly from enzymatic
conversion (with ammonium chloride, NH4Cl) from UTP[74,115] or by chemical synthesis from a transiently protected uridine amidite.[85] In this way, all uracil isotope labeling patterns
will be retained in CTP and cytidine amidites. Moreover, additional 15N-labeling of the cytidine N4 amino group can be achieved
using 15NH4Cl in the enzymatic[74] or chemical[85] reaction, as will
be described in Sections and 3.3.
Purine
N1, N3, N7, and N9 Labeling
Synthesis of adenine N1 labeling
occurs in two-steps.[101] Here, commercially
available 5-aminoimidazole-4-carbonitrile 30 reacts with
diethoxymethyl acetate (DEMA) to yield intermediate 31. Subsequent reaction of 31 with aqueous ammonia
(NH3) readily forms the desired product [1-15N]-adenine 32 with a total yield of 60% (Scheme )[101]
Scheme 7
Synthetic Route to [1-15N]-Adenine[101]
Adenine labeled at N3, on the
other hand, can be synthesized in
six steps.[108] In brief, commercially available
4-imidazolecarboxylic acid 33 is nitrated with
ammonium nitrate (NH415NO3) to afford
5-[nitro-15N]1H-imidazole-4-carboxylic acid 34. Activation of 34 with 1,1′-carbonyldiimidazole
(CDI) in dimethylformamide (DMF) and excess NH3 forms
carboxamide 35. Importantly, addition of 15NH4Cl in this step can also introduce a 15N
label at the N1 site, permitting the eventual production of [1,3-15N2]-adenine.[108] Catalytic
reduction of 35 affords [5-15N]-AICA 36. Ring closure of 36 with triethyl orthoformate
(HC(OC2H5)3) gives a hypoxanthine
intermediate 37, which readily forms [3-15N]-6-chloropurine 38 upon chlorination with POCl3 and N,N-DMA. Finally, ammonolysis
with ammonium hydroxide (NH4OH) yields the desired [3-15N]-adenine 39 with ∼47% total yield (Scheme ).[108]
Scheme 8
Synthetic Route to [3-15N]-Adenine[108]
Adenine N3 and its amino group
can also be labeled at by 15NH4Cl and 15NH4OH in the second and final chemical steps, respectively.
Synthetic Route to [3-15N]-Adenine[108]
Adenine N3 and its amino group
can also be labeled at by 15NH4Cl and 15NH4OH in the second and final chemical steps, respectively.In addition, purine N7 labeling is readily achieved
and has been
widely adapted.[99,100,102,104,106,111] For example, synthesis of [7-15N]-guanine is achieved in three-steps. Nitrosylation of commercially
available 2,6-diaminopyrimidin-4-ol 40 by Na15NO2 yields 2,6-diamino-5-[nitroso-15N]pyrimidin-4-ol 41. Reduction of 41 with
sodium dithionite followed by acidification by H2SO4 forms 2,6-diamino-5-[amino-15N]pyrimidin-4-ol 42. In the final step, reflux with formamide (HCONH2) followed by HCOOH provides the desired [7-15N]-guanine 43 with a total yield of 65%[500] (Scheme ).
Scheme 9
Synthetic
Route to [7-15N]-Guanine
Dayie and co-workers.[500]
Synthetic
Route to [7-15N]-Guanine
Dayie and co-workers.[500]Several direct routes
to 15N-labeled adenine initiate
from commercially available aminopyrimidines.[102,106] However, Micura and Kreutz and co-workers[111] employed a sodium ethoxide mediated cyclization of 21 with 44 to form 6-amino-2-thiouracil 45.[116] Subsequent nitrosylation of 45 installs the 15N label using Na15NO2 to yield the nitroso-containing 46.[102] A sodium dithionite mediated reduction of the
nitroso group forms 47 and desulfurization over Raney-Nickel
affords 48.[102] Subsequent
treatment with H2SO4 and HCOOH yields hypoxanthine 49,[102] which was then reacted with
POCl3 and N,N-DMA to
give [7-15N]-6-chloropurine 50.[103] In the final step, reaction with methanolic
NH3 in a microwave reactor gives the desired [7-15N]-adenine 51 with a total yield of 18% (Scheme ).[100,104,106,111] As mentioned above, we recently showcased the same synthetic scheme
while also incorporating selective 13C2 labeling.[104]
Scheme 10
Synthetic Route to [7-15N]-Adenine
Adapted with permission from
Dayie and co-workers. Copyright 2020 Springer Nature.[104] Adenine C2 can also be labeled if 13C-labeled thiourea is used as the starting material.
Synthetic Route to [7-15N]-Adenine
Adapted with permission from
Dayie and co-workers. Copyright 2020 Springer Nature.[104] Adenine C2 can also be labeled if 13C-labeled thiourea is used as the starting material.Finally, in the synthesis of N9-labeled adenine, 5-amino-4,6-dichloropyrimidine 52 is converted to a [9-15N]-6-chloropurine 53 using aqueous 15NH3 and DEMA.[117] Then a reaction with aqueous NH3 yields the desired [9-15N]-adenine 54. This
simple three-step reaction proceeds with an overall yield of 79% (Scheme ).[117]
Scheme 11
Synthetic Route to [9-15N]-Adenine[117]
Nucleobase Labels: Summary and Outlook
As described in Sections and 3.1.2, and shown in Schemes –11, a wide range of isotope-labeled nucleobases (Table ) are now available
to the scientific community. Of all synthetic procedures, purine C8
sites are most readily labeled in one chemical step in a single day
and with high yield (64–94%) (Table ). Conversely, adenine N3 is the least readily
labeled, taking 11 days (Figure ). Adenine C2 and N7 have the lowest overall yields
of 18% (Table ). In
future work, it would be advantageous to focus on improving yields
and reducing the number of chemical steps. Nevertheless, these RNA
labeling patterns are commonly chosen based on the experimental information
required and less often dictated by the relative time and yield of
the building blocks.
Table 2
Summary of All Nucleobase
Labels As
Outlined in Schemes –11
nucleobase label
time (days)a
chemical stepsb
yield (%)
ref
[8-13C]-adenine
1
1
64
(75), (85)
[8-13C]-guanine
1
1
94
(75), (85)
[2-13C]-adeninec
2.5
7 (1)
18
(104)
[1-15N]-adenine
2.5
2 (1)
60
(101)
[3-15N]-adenine
11
6 (2)
47
(108)
[7-15N]-adeninec
2.5
7 (1)
18
(104)
[7-15N]-guanined
1.5
3
65
[9-15N]-adenine
5.5
3 (3)
79
(117)
[6-13C, 5-2H]-uracil
7
4
63
(82), (85)
[5-13C, 5-19F, 6-2H]-uracil
8
5
38
(18), (57), (82), (85), (86)
[6-13C]-thymine
2.5
3
45
(89)
Total reaction time was based on
the time required for all chemical steps. In addition, 16 h were added
for any explicit mention of overnight procedures, and 24 h were added
for any chromatographic purifications.
Number in parentheses represents
the number of chromatographic purification steps.
All data for [2-13C]-adenine
and [7-15N]-adenine labeling came from the same doubly
labeled [2-13C, 7-15N]-adenine labeling scheme.[104]
This
synthetic procedure is from
Dayie and co-workers.[500]
Total reaction time was based on
the time required for all chemical steps. In addition, 16 h were added
for any explicit mention of overnight procedures, and 24 h were added
for any chromatographic purifications.Number in parentheses represents
the number of chromatographic purification steps.All data for [2-13C]-adenine
and [7-15N]-adenine labeling came from the same doubly
labeled [2-13C, 7-15N]-adenine labeling scheme.[104]This
synthetic procedure is from
Dayie and co-workers.[500]
Chemo-enzymatic
Labeling
With chemically
synthesized isotope-labeled nucleobases in-hand, this section outlines
the various enzymatic methods that can be used to build them into
isotope-labeled rNTPs (and dNTPs). Alternatively, this can be accomplished
using Escherichia coli(45,118−120) or Methylophilus methylotrophus(44) grown on 13C- or 15N-enriched media,
as reviewed elsewhere.[62,66]
Enzymatic
Coupling of Nucleobase and Ribose
Sources
The first enzymatic approach to prepare isotope-labeled
rNTPs was the Gilles-Schramm-Williamson pentose phosphate pathway
method,[65,121−124] which uses isotope-labeled d-glucoses as the precursor and requires 14 enzymes (Table ) and several coenzymes.
This method is appealing for uniform ribose labeling using commercially
available uniformly 13C- or 2H-labeled d-glucoses.
Table 3
Enzymes of Glycolysis, Pentose Phosphate,
and Nucleotide Biosynthesis and Salvage Pathway for rNTP Synthesis
enzymea
abbreviation
EC number
source
Gilles-Schramm-Williamson
and
Co-workers[65,121−124]
Hexokinase
HXK
2.7.1.1
Baker’s yeast
Glucose-6-phosphate isomerase
PGI1
5.3.1.9
Baker’s yeast
Glucose-6-phosphate dehydrogenase
ZWF
1.1.1.49
L. mesenteroides
Phosphogluconate dehydrogenase
GND
1.1.1.44
Torula yeast
Ribose-5-phosphate isomerase
RPI1
5.3.1.6
Spinach
Phosphoribosylpyrophosphate synthetase
PRPPS
2.7.6.1
E. coli
Adenine phosphoribosyltransferase
APRT
2.4.2.7
JM109/pTTA6
Uracil phosphoribosyltransferase
UPRT
2.4.2.9
JM109/pTTU2
Xanthine-guanine
phosphoribosyltransferase
XGPRT
2.4.2.22
JM109/pTTG2
Nucleoside-monophosphate
kinase
NMPK
2.7.4.4
Bovine
liver
Myokinase (Adenylate kinase)
MK
2.7.4.3
Rabbit muscle
Guanylate kinase
GK
2.7.4.8
Porcine brain
3-Phosphoglycerate mutase
YIBO
5.4.2.1
Rabbit muscle
Enolase
ENO
4.2.1.11
Baker’s
yeast
Pyruvate kinase
PYKF
2.7.1.40
Rabbit muscle
Glutamate dehydrogenase (NAD(P)+)
GLUD
1.4.1.3
Bovine liver
CTP synthase
CTPS
6.3.4.2
JM109/pMW5
l-Lactate dehydrogenase
LDH
1.1.1.27
Rabbit muscle
Dayie and Co-workers[74,75,128]
Ribokinase
RK
2.7.1.15
E. coli
Creatine kinase
CK
2.7.3.2
Chicken muscle
UMP kinase
UMPK
2.7.4.22
E. coli
Serianni and Co-workers[129]
Purine nucleoside phosphorylase
PNPase
2.4.2.1
E. coli
Xanthine oxidase
XO
1.1.3.22
Buttermilk
Catalase
CT
1.11.1.6
Bovine
liver
Uridine phosphorylase
UPase
2.4.2.3
E. coli
Given that there
is overlap in the
enzymes used in the methods of Schramm-Williamson and co-workers[65,121−124] and Dayie and co-workers,[74,75,128] only the unique enzymes are listed for the latter. All enzymes are
commercially available except APRT, UPRT, XGPRT, CTPS, and RK.[128] These are currently only available in a few
academic laboratories. At some point, these plasmids would be available
at Addgene.
Given that there
is overlap in the
enzymes used in the methods of Schramm-Williamson and co-workers[65,121−124] and Dayie and co-workers,[74,75,128] only the unique enzymes are listed for the latter. All enzymes are
commercially available except APRT, UPRT, XGPRT, CTPS, and RK.[128] These are currently only available in a few
academic laboratories. At some point, these plasmids would be available
at Addgene.In brief, hexokinase
(HXK) (EC 2.7.1.1) phosphorylates 13C-labeled d-glucose 55 at its O6 position to
yield glucose-6-phosphate 56. Then glucose-6-phosphate
dehydrogenase (ZWF) (EC 1.1.1.49) oxidizes 56 to 6-phosphogluconate 57, and phosphogluconate dehydrogenase (GND) (EC 1.1.1.44)
further oxides 57 to 58. Finally, ribose-5-phosphate
isomerase (RPI1) (EC 5.3.1.6) isomerizes 58 to ribose-5-phosphate 59. Following isomerization, phosphoribosylpyrophosphate synthetase
(PRPPS) (EC 2.7.6.1) pyrophosphorylates 59 at its O1′
site to yield 60. Then, adenine (APRT) (EC 2.4.2.7),
guanine (XGPRT) (EC 2.4.2.22), or uridine (UPRT) (2.4.2.9) phosphoribosyl
transferases facilitate the nucleophilic attack of the adenine or
guanine N9 or uracil N1 to the C1′ of 60 to yield
5′-monophosphates 61–63, respectively.
Adenylate (MK) (EC 2.7.4.3), guanylate (GK) (EC 2.7.4.8), or nucleoside
monophosphate (NMPK) (EC 2.7.4.4) kinases phosphorylate 61–63 to form the 5′-diphosphates 64–66, respectively. Pyruvate kinase (PYKF)
(EC 2.7.1.40) then catalyzes the final phosphorylation to form the
5′-triphosphates 67–69 (Scheme ).[65,121−124] Finally, UTP 69 can be converted to CTP 70 by CTP synthase (CTPS) (EC 6.3.4.2) (Scheme ).[65,121−124] Importantly, 15N-labeling of the cytidine amino group
can be achieved by using 15NH3 in the final
step (Scheme ).[65,121−124]
Scheme 12
Enzymatic Synthesis of Isotope-Labeled rNTPs from d-Glucose
Sources[65,121−124]
Moreover, Williamson and Hennig and co-workers demonstrated that
the Gilles-Schramm-Williamson method[65,121−124] is compatible with 19F-labeled nucleobases[58,125,126] by synthesizing [2-19F]-ATP,[126] [5-19F]-UTP,[125] and [5-19F]-CTP.[125] However, d-ribose is a more cost-effective labeled
precursor than d-glucose for the selective 13C-
or 2H-ribose labeling of rNTPs.[127]On the basis of earlier work by Whitesides and co-workers,[130−132] our group truncated the relatively complex Gilles-Schramm-Williamson
method[65,121−124] to use 10 enzymes instead of
18, and two cofactor regeneration systems (dATP and creatine phosphate)
(Table ). This chemo-enzymatic
labeling[74,75,128] is a versatile
technology to couple nucleobase to ribose followed by subsequent phosphorylation
to the rNTP in a one-pot enzymatic reaction.[74,75,128] The nucleobase and ribose building blocks
can be unlabeled, isotope-labeled, chemically synthesized, or commercially
available. This method therefore permits a diverse set of labeling
patterns. Moreover, this approach has many advantages over previously
reported de novo(72,73) or chemical[133−137] synthesis methods including fewer enzymes, fewer synthetic steps,
and greater yields. This method affords the facile coupling of chemically
synthesized uniformly 15N- and 13C/15N-labeled uracil (Scheme )[82,85] to commercially available unlabeled d-ribose and 13C-labeled d-ribose. The resulting
uniformly 15N-labeled and uniformly 13C/15N-labeled UTP provided 338- and 14-fold savings over the
commercially available material from CIL, respectively. However, the
main advantage of chemo-enzymatic synthesis is the ability to generate
noncommercially available atom-specific labeling patterns.We
showcased the power of this method with the synthesis of [1′,5′,6-13C3, 1,3-15N2]-pyrimidine
rNTPs using six enzymes (Table ).[74] We also used this method to
synthesize [1′,8-13C2]-, or [2′,8-13C2]-, or [1′,5′,8-13C3]-ATPs and -GTPs with five enzymes (Table ).[75] First, 13C-labeled D-ribose 71 was phosphorylated at
its O5 position by ribokinase (RK) (EC 2.7.1.15) to yield ribose-5-phosphate 72 followed by pyrophosphorylation at the O1 site by PRPPS
to afford 73. Then APRT, XGPRT, or UPRT catalyzed the
nucleophilic attack of the adenine or guanine N9 or uracil N1 to the
C1′ of 73 to yield 5′-monophosphates 74–76, respectively. Phosphorylation of 74–76 is achieved by MK, GK, or UMP kinase
(UMPK) (EC 2.7.4.22) to form the 5′-diphosphates 77–79, respectively. Creatine kinase (CK) (EC 2.7.3.2)
then facilitates the final phosphorylation to afford the 5′-triphosphates 80–82 (Scheme ).[74,75,128] Similar to the Gilles-Schramm-Williamson method,[65,121−124] a final 15N label can be introduced at the CTP 83 amino group if 15NH4Cl is used alongside
CTPS in the final enzymatic step (Scheme ).[74,75,128] These atom-specifically labeled rNTPs can then be used with in vitro transcription to make RNAs without any size limit.
Importantly, these labeling patterns reduced spectral crowding, increased
signal-to-noise ratios, facilitated direct carbon detection experiments,
and eliminated 13C–13C scalar and dipolar
couplings.[63,74,75,86,104]
Scheme 13
Enzymatic
Synthesis of Isotope-Labeled rNTPs from d-Ribose
Sources[74,75,128]
As with the Gilles-Schramm-Williamson method,[65,121−124] the approach developed by Dayie and co-workers[74,75,128] is also compatible with 19F-labeled
nucleobases (e.g., [2-19F]-adenine and [5-19F]-uracil[18,86]). It is worth noting that Serianni
and co-workers have also developed a complementary approach to enzymatically
couple nucleobase and ribose sources using four enzymes (Table ).[129] Their method uses hypoxanthine 84 and 1-O-acetyl-2,3,5-tri-O-benzoyl-α-d-ribofuranoside (ATBR) 85 in a Vorbrüggen
reaction (detailed in Scheme ) to yield inosine 86. Then purine nucleoside
phosphorylase (PNPase) (EC 2.4.2.1) replaces the hypoxanthine moiety
on the C1 position of 86 with a phosphate group to give
α-d-ribofuranosyl-1-phosphate sodium salt (αR1P) 87 (Table ).[129] Then 87 is glycosylated
by PNPase with adenine or guanine or by UPase (EC 2.4.3.2) with uracil
to form nucleosides 88–90, respectively
(Scheme ).[129] Products 88–90 can then be converted to the desired rNTP or amidite with further
enzymatic or chemical synthesis.
Scheme 15
Synthetic Route
to [6-13C, 5-2H]-Uridine 2′-O-TOM Amidite[85]
Scheme 14
Enzymatic Synthesis of Isotope-Labeled
Nucleosides from Inosine[129]
Enzymatic Methods for Position-Specific
Labeling
While these chemo-enzymatic methods enable straightforward
atom-specific labeling, they rely solely on DNA template-directed
T7 RNA polymerase-based in vitro transcription and
are therefore unable to incorporate these labels position-specifically
(e.g., nucleotide 5). Fortunately, there are two alternative enzymatic
methods capable of such position-specific labeling, both of which
are compatible with the isotope-labeled rNTP building blocks described
above. Wang and co-workers developed a hybrid solid–liquid
phase transcription technique that employs an automated robotic platform
known as position-selective labeling of RNA (PLOR).[138] In PLOR, the DNA template is attached to beads and RNA
synthesis is initiated by the addition of T7 RNA polymerase and a
mixture of three of the four rNTP building blocks (e.g., ATP, GTP,
and CTP). The beads are then washed and a new rNTP mixture is added,
this time containing the previously omitted building block. Thus,
PLOR can incorporate any isotope-labeled rNTP (e.g., [6-13C, 5-2H]-UTP) position-specifically, assuming the desired
labeling site (e.g., uridine 10) does not coincide with a stretch
of identical nucleotides (e.g., UUU). While isotope labeling by PLOR
has aided NMR studies of RNA,[138−140] its widespread use is still
limited due to the requisite equipment needed and its laborious nature.Schwalbe and co-workers developed an alternative chemo-enzymatic
approach for position-specific labeling.[141] Importantly, this method uses standard laboratory equipment and
commercially available enzymes T4 RNA ligase 1 (EC 6.5.1.3), recombinant
shrimp alkaline phosphatase (rSAP) (EC 3.1.3.1), and T4 RNA ligase
2 (EC 6.5.1.3), making it more accessible than PLOR. In their method,
a modified nucleoside 3′,5′-biphosphate is incorporated
at the 3′-end of an RNA fragment by T4 RNA ligase 1 followed
by dephosphorylation by rSAP and DNA-splinted ligation by T4 RNA ligase
2. This technique has been used to introduce modified nucleosides
(i.e., photocaged, photoswitchable, and isotope-labeled) into RNAs
up to 392 nts. While this method holds great promise for NMR applications,
low yields of bis-phosphorylation (6–22%) and ligation (9–49%)
reactions are a major drawback.[141] More
recent efforts by Schwalbe and co-workers to improve this technology
include the addition of magnetic streptavidin beads as a solid-support
and 5′-biotinylated RNA.[142]
rNTP Labels: Summary and Outlook
As described in Section and shown
in Scheme , the chemo-enzymatic
labeling method developed by
Dayie and co-workers[74,75,128] permits the synthesis of a versatile assortment of rNTPs with atom-specific
isotope labels (Table ). While there are other enzymatic methods to generate both atom-specific
(e.g., the Gilles-Schramm-Williamson[65,121−124] or Serriani[129] methods shown in Schemes and 14, respectively) and position-specific (e.g., PLOR[138] and the Schwalbe method[141,142]) labels, no other technique offers the versatility and simplicity
that is afforded by the Dayie method. Our one-pot chemo-enzymatic
approach can produce isotope-labeled purine and pyrimidine rNTPs in
a few days and with high yield (75–95%) (Table ). The main disadvantage of this method is
the need to express and purify five noncommercial enzymes in-house
(Table ). However,
providing these plasmids to Addgene will make our method widely accessible
to the field.
Table 4
Summary of rNTP Labels Made from Chemo-enzymatic
Synthesis[74,75,128]
rNTP labela
time (days)b
enzymatic stepsc
yield (%)
ref
[8-13C]-ATP
1.5
1 (1)
90
(75)
[8-13C]-GTP
1.5
1 (1)
75
(75)
[1′,5′,6-13C3, 1,3-15N2]-CTP
3
3 (2)
95
(74)
[1′,5′,6-13C3, 1,3-15N2]-UTP
2.5
2 (2)
90
(74)
[8-13C]-adenine and -guanine
were coupled to [1-13C]-, or [2-13C]-, or [1,5-13C2]-d-ribose to generate a variety of
ATPs and GTPs.[75] The [6-13C,
1,3-15N2]-uracil and -cytosine nucleobases,
on the other hand, were coupled to [1′,5′-13C2]-d-ribose only.[74] Nevertheless, the reported times, enzymatic steps, and yields are
representative of all ATP, GTP, CTP, and UTP reactions made with this
method.
Total reaction time
was based on
the time required for all chemical steps. In addition, 24 h were added
for any chromatographic purification.
Number in parentheses represents
the number of chromatographic purification steps. Since the time of
our original publication,[74] pyrimidine
rNTP synthesis now only requires one chromatographic purification.[18,86]
[8-13C]-adenine and -guanine
were coupled to [1-13C]-, or [2-13C]-, or [1,5-13C2]-d-ribose to generate a variety of
ATPs and GTPs.[75] The [6-13C,
1,3-15N2]-uracil and -cytosine nucleobases,
on the other hand, were coupled to [1′,5′-13C2]-d-ribose only.[74] Nevertheless, the reported times, enzymatic steps, and yields are
representative of all ATP, GTP, CTP, and UTP reactions made with this
method.Total reaction time
was based on
the time required for all chemical steps. In addition, 24 h were added
for any chromatographic purification.Number in parentheses represents
the number of chromatographic purification steps. Since the time of
our original publication,[74] pyrimidine
rNTP synthesis now only requires one chromatographic purification.[18,86]
Synthesis
of Labeled RNA Phosphoramidites
While the enzymatic production
of RNA with isotope-labeled rNTPs[44,45,69−75] is the most widely used approach to obtain labeled RNA, an attractive
alternative is to use isotope-labeled amidites and solid-phase synthesis.
Like PLOR introduced by Wang and co-workers[138] and the chemo-enzymatic approach developed by Schwalbe and co-workers,[141,142] the amidite method offers the advantage of position-specific RNA
labeling. However, even though amidite labeling is currently the most
effective and widely used method for position-specific labeling, its
utility for NMR studies is limited to RNAs ≈ 60 nt.
15N and 13C Labeling
The Kreutz
and Micura groups have used isotope-labeled nucleobases
to prepare 2′-O-tert-butyldimethylsilyl
(tBDMS) and 2′-O-[(triisopropylsilyl)oxy]methyl
(TOM) phosphoramidites for NMR studies,[57,82,85,89,110,111,143,144] as recently reviewed.[61] A representative example of [6-13C, 5-2H]-pyrimidine 2′-O-TOM amidite
syntheses is shown in Schemes and 16.[85] In brief, [6-13C, 5-2H]-uracil 5 is coupled to ATBR under Vorbrüggen
conditions[137] to give the 2′,3′,5′-O-benzoyl
(Bz)-protected 91, which is then fully deprotected to
nucleoside 92 after treatment with methylamine (CH3NH2) in ethanol (C2H5OH).
Addition of 4,4′-dimethoxytrityl chloride (DMT-Cl) and TOM-Cl
protects the 5′- and 2′-hydroxyl (OH) to form 93 and 94, respectively. Finally, phosphitylation
of the 3′-OH of 94 with 2-cyanoethyl N,N-diisopropylchlorophosphoramidite (CEP-Cl)
and N,N-diisopropylethylamine
(DiPEA) yields the desired [6-13C, 5-2H]-uridine
2′-O-TOM amidite 95 with five-steps
in 22% total yield (Scheme ).[85]
Scheme 16
Synthetic Route to [6-13C, 5-2H]-N4–Ac-Cytidine 2′-O-TOM Amidite[85]
The corresponding cytidine derivative is obtained from 94 in four additional steps (Scheme ).[85] First, the 3′-OH
of 94 is transiently acetylated with Ac2O
to afford 96. Then treatment with 2,4,6-triisopropylbenzenesulfonyl
chloride (TiBSC) and TEA yields the 5′-O-DMT-2′-O-TOM cytidine 97, which is immediately N4-acetylated (Ac) with Ac2O to form 98. Finally, 3′-OH phosphitylation yields the desired [6-13C, 5-2H]-N4–Ac-cytidine 2′-O-TOM amidite 99. Starting from uracil 5, this cytidine synthesis has an overall yield of 14% (Scheme ).[85]In contrast to pyrimidines, the starting purine is
protected before
beginning the nucleosidation reaction. Representative examples of
[8-13C]-purine 2′-O-TOM amidite syntheses are shown
in Schemes and 18.[85] Starting with [8-13C]-adenine 18, N6-Bz-protected adenine 100 is formed with a yield of 86%. A subsequent Vorbrüggen
reaction[137] gives the 2′,3′,5′-O-Bz-protected 101, which is readily 2′,3′,5′-O-deprotected to nucleoside 102 after treatment
with sodium hydroxide (NaOH) in pyridine and C2H5OH. Then, 5′-OH tritylation, 2′-OH TOM protection,
and 3′-OH phosphitylation yields 103, 104, and 105, respectively. Taken together, [8-13C]-N6-Bz-adenosine 2′-O-TOM amidite 105 was synthesized with 17% total yield (Scheme ).[85]
Scheme 17
Synthetic Route to [8-13C]-N6-Bz-Adenosine
2′-O-TOM Amidite[85]
Scheme 18
Synthetic Route to [8-13C]-N2-iBu-Guanosine
2′-O-TOM Amidite[85]
Guanosine synthesis, on the
other hand, proceeds from a N2-isobutyryl (iBu) protected
guanine 106 made from [8-13C]-guanine 20 with a yield of 77%. From there,
however, synthesis proceed as with adenine. That is, 106 is reacted under Vorbrüggen conditions[137] to form 107, which is then 2′,3′,5′-O-deprotected to nucleoside 108. Again, 5′-OH
tritylation, 2′-OH TOM protection, and 3′-OH phosphitylation
yields 109, 110, and 111, respectively.
In summary, [8-13C]-N2-iBu-guanosine 2′-O-TOM amidite 111 was synthesized with an overall
yield of 18% (Scheme ).[85] Importantly, Schemes −18 can be
adapted to prepare 2′-O-tBDMS amidites simply by altering the 2′-OH protection reaction
steps.However, these 2′-O-tBDMS
or 2′-O-TOM amidites are not suitable for
producing RNAs > 60 nts. Instead, amidites with 2-cyanoethoxymethyl
(CEM) as the 2′-OH protecting group[145,146] are used, due to its increased coupling efficiency, which rivals
that in DNA synthesis.[80] Using a protocol
developed by Yano and co-workers,[145,146] Kreutz and
co-workers prepared [6-13C, 5-2H]-pyrimidine,
[8-13C]-purine, and the modified [1,3-15N2]-dihydrouridine and [2,8-13C2]-inosine
2′-O-CEM amidites.[91] While the benefits of the CEM amidite method are attractive for
obvious reasons, it has not gained widespread use due to the commercial
unavailability of both unlabeled and isotope-labeled CEM amidites.
19F Labeling and Post-transcriptional
Modifications
Another benefit of labeling with amidites is
the position-specific incorporation of modified building blocks. Indeed,
many epigenetic and post-transcriptional modifications modulate the
structure, dynamics, and folding of RNAs, and NMR is providing new
insights into their functions.[147] These
studies have been greatly aided by the synthesis of 13C-
or 15N-labeled amidites bearing modifications such as uridine
5-oxyacetic acid (cmo5U)[148] and
N6-methyladenine (m6A).[149,150] In collaboration with the Al-Hashimi group, Kreutz and co-workers
synthesized a 15N-labeled cmo5U amidite.[148] Their synthetic route begins from bromoacetic
acid 1 and through intermediates 112 and 113 to assemble [1,3-15N2]-uracil 114, as in Schemes (82,85) and 2.[18,57,82,86] Then 114 was coupled to ATBR under Vorbrüggen
conditions, 2′,3′,5′-O-deprotected,
and hydroxylated at the C5 position to yield 115, 116, and 117, respectively. Addition of para-toluene
sulfonic acid (pTSA) and dimethoxypropane ((CH3)2C(OCH3)2) then formed the 2′,3′,5′-O-protected nucleoside 118. Reacting 118 with ethyl-2-iodo acetate in C2H5OH and NaOH transformed the 5-OH into an ethylcarboxymethoxy
group while also deprotecting the 5′-OH to afford 119. After transient 2′,3′-O-deprotection
of 119 to form 120, the 3′- and 5′-OH
were immediately protected along with 2′-O-tBDMS protection to yield 121 by adding
di-tert-butylsilyl bis(trifluoromethanesulfonate)
(DtBS) and tBDMS-Cl. Addition of
pyridine and CH3OH to 121 forms 122, and subsequent treatment with nitrophenyl ethanol (NPE), N-dimethyl aminopyridine (DMAP), and N-ethyl-N′-(3-dimethyl aminopropyl) carbodiimide (EDC) construct
the NPE-protected cmo5 group to yield 123.
Reaction of 123 with hydrogen fluoride (HF) affords the
3′,5′-O-deprotected 124, which can then be 5′-O-tritylatyed to yield 125. Finally, phosphitylation of the 3′-OH of 125 with 2-cyanoethyl N,N,N′,N′-tetraisopropylphosphorodiamidite
(TiPCEP) yields 126 (Scheme ).[148] Taken together,
[1,3-15N2]-cmo5U 2′-O-tBDMS amidite 126 was synthesized
with 15 steps in 1% total yield (Scheme ).[148]
Scheme 19
Synthetic
Route to [1,3-15N2]-cmo5-Uridine
2′-O-tBDMS Amidite[148]
Another example from the Al-Hashimi and Kreutz groups showcases
the synthesis of a 13C-labeled m6A amidite.[149] Their synthetic route begins with ethyl cyanoacetate 21 and 13C-thiourea 22 and through
intermediates 23–25 to assemble [2-13C]-5,6-diamino-4-pyrimidinone 26, as in Scheme .[104] In contrast to Scheme , however, H13COOH was used with H2SO4 to introduce a second 13C label and form [2,8-13C2]-hypoxanthine 127. Then the familiar
Vorbrüggen reaction of 127 with ATBR yields the
2′,3′,5′-O-Bz-protected 128 followed by addition of sulfuryl chloride (SO2Cl2) to yield 6-chloropurine nucleoside 129. Sequential addition of CH3NH2 in C2H5OH and then H2O affords the m6A nucleoside 130. Again, the synthetic route ends with
2′-O-tBDMS protection, 5′-O-tritylation, and 3′-O-phosphitylation
to yield 131, 132, and 133,
respectively (Scheme ).[149] In summary, [2,8-13C2]-N6-methyladenosine 2′-O-tBDMS amidite 133 was synthesized
in 11 steps with an overall yield of 4% (Scheme ).[149]
Scheme 20
Synthetic
Route to [2,8-13C2]-N6-Methyladenosine
2′-O-tBDMS
Amidite[149]
Commercially, INNotope has 13C-labeled N1-methyladenine, m6A, and N3-methylcytidine
2′-O-tBDMS amidites available.
Finally, [1,3-15N2]-pseudouridine (Ψ)
amidites can be made from 15N-labeled uracil with 11 steps
in 6% total yield.[151]Additionally,
building on the work shown in Scheme ,[18,57,82,85,86] Kreutz and
co-workers showcased new methods to incorporate 19F–13C into the pyrimidine nucleobase of
amidites.[18,57,86] Starting from
[6-13C]-uracil 4, fluorination is achieved
with Selectfluor to yield 5FU 134, as in Scheme .[18,57,82,85,86] The remaining chemical steps are similar for other
2′-O-tBDMS amidite syntheses
(Schemes [148] and 20(149)). That is, 134 is coupled to ATBR under Vorbrüggen
conditions, 2′,3′,5′-O-deprotected,
and then 3′,5′-O-protected and 2′-O-tBDMS protected to yield 135, 136, and 137, respectively. Finally, 137 is 5′-O-tritylated, and 3′-O-phosphitylated to yield 138 and 139, respectively (Scheme ).[57] Taken together, [5-13C, 5-19F]-uridine 2′-O-tBDMS amidite 139 was synthesized with six-steps
in 8% total yield (Scheme ).[57] The corresponding cytidine
derivative is obtained from 137 through intermediates 140–142 to afford the desired 143 (Scheme ),[57] as in Scheme .[85] In summary, [5-13C, 5-19F]-N4-Ac-cytidine 2′-O-tBDMS amidite 143 was synthesized
in eight-steps with an overall yield of 4% (Scheme ).[57] These labeling
topologies not only capitalize on the beneficial spectroscopic properties
of the 19F nuclei (Section ) but also open the door to NMR studies
of large RNAs, as will be discussed in greater detail in Section .
Scheme 21
Synthetic
Route to [5-13C, 5-19F]-Uridine 2′-O-tBDMS Amidite[57]
Scheme 22
Synthetic Route to [5-13C, 5-19F]-Cytidine
2′-O-tBDMS Amidite[57]
Synergy between Phosphoramidites and Chemo-enzymatic
Labeling
In principle, any nucleobase labeling scheme described
in Section can
be coupled to any commercially available 13C- or 2H-labeled d-ribose (from Omicron Biochemicals or CIL) with
the chemo-enzymatic method (Section ) and built into an amidite with a variety
of 2′-OH protecting groups (Section ). Indeed, our group recently made [1′,8-13C2]-N6-Bz-adenosine 2′-O-tBDMS144 and [1′,6-13C2, 5-2H]-uridine 2′-O-CEM[152] amidites via chemo-enzymatic
synthesis, dephosphorylation with rSAP, and chemical synthesis. These
amidites can then be used to make RNA via solid-phase synthesis. Given
that the Kreutz and Micura groups have implemented a wide variety
of atom-specific labeling schemes into the nucleobase of RNAs,[57,61,82,85,89,110,111,143,144] this hybrid approach is only needed if ribose labeling is desired
in a position-specific manner. However, INNotope and Silantes have
[1′,2,8-13C3]-N6-Ac-adenosine,
[1′,8-13C2]-adenosine, [1′,8-13C2]-N2-Ac-guanosine, [1′,6-13C2, 5-2H]-uridine, and [1′,6-13C2, 5-2H]-N4-Ac-cytidine
2′-O-tBDMS amidites available.
Phosphoramidite Labels: Summary and Outlook
As described in Sections and 3.3.2, and shown in Schemes –22, again, a wide range of isotope-labeled amidites
(Table ) are becoming
available to the scientific community. For all synthetic protocols,
pyrimidine C6/C5 and purine C8 sites are most readily labeled. The
production of these 2′-O-TOM amidites is streamlined[85] and proceeds quickly (∼1 week) and with
adequate yields (14–18%) (Table ). The introduction of 19F labels and post-transcriptional
modifications, on the other hand, dramatically increases the time
of synthesis (i.e., up to 10 days) and reduces the overall reaction
yields (i.e., as low as 1%) (Table ). Nevertheless, the benefits afforded by the position-specific
incorporation of these labels into RNA more than offsets these shortcomings.
As with nucleobase labeling, researchers are typically motivated by
the scientific question they are pursuing rather than the relative
yields of each labeling reaction. Still, improvements in reaction
yields and reduction in chemical steps would be advantageous for future
work.
Table 5
Summary of All RNA Phosphoramidite
Labels As Outlined in Schemes –22
RNA phosphoramidite labela
time (days)b
chemical stepsc
yield (%)
ref
[8-13C]-N6-Bz-adenosine (TOM)
4.5
5 (4)
17
(85)
[2,8-13C2]-N6-methyladenosine
(tBDMS)
8
11 (5)
4
(149)
[8-13C]-N2-Ac-guanosine (TOM)
5
5 (4)
18
(85)
[6-13C, 5-2H]-N4-Ac-cytidine
(TOM)
8
8 (6)
34
(85)
[5-13C,
5-19F]-N4-Ac-cytidine
(tBDMS)
10
8 (6)
4
(57)
[6-13C, 5-2H]-uridine (TOM)
4
5 (3)
22
(85)
[5-13C, 5-19F]-uridine (tBDMS)
7.5
6 (4)
8
(57)
[1,3-15N2]-cmo5-uridine (tBDMS)
8
15 (3)
1
(148)
The 2′-OH
protecting groups
are listed in the parentheses.
Total reaction time was based on
the time required for all chemical steps. In addition, 16 h were added
for any explicit mention of overnight procedures and 24 h were added
for any chromatographic purifications.
Reactions for amidites harboring
post-transcriptional modifications begin with isotope-labeled precursors
whereas reactions for unmodified amidites begin with isotope-labeled
protected nucleobase. Also, the number in parentheses represents the
number of chromatographic purification steps.
The 2′-OH
protecting groups
are listed in the parentheses.Total reaction time was based on
the time required for all chemical steps. In addition, 16 h were added
for any explicit mention of overnight procedures and 24 h were added
for any chromatographic purifications.Reactions for amidites harboring
post-transcriptional modifications begin with isotope-labeled precursors
whereas reactions for unmodified amidites begin with isotope-labeled
protected nucleobase. Also, the number in parentheses represents the
number of chromatographic purification steps.
Current State of RNA Labeling:
Where We Are
and Where We Are Headed
Despite the synergy between the synthesis
of nucleobases (Section ), rNTPs (Section ), and amidites (Section ), and their contribution to RNA labeling for applications
with solution NMR spectroscopy, a number of insurmountable limitations
remain for RNAs prepared enzymatically (using, e.g., T7 RNA polymerase)
and chemically (i.e., solid-phase synthesis). The former is incapable
of position-specific labeling and the latter is size limited, even
though both methods can install isolated 1H–13C spin pairs into RNA that remove the 13C–13C scalar and dipolar couplings that are normally present
in uniformly labeled RNA, as will be detailed in Section .Again, unlike DNA
template-directed in vitro transcription, a tremendous
advantage to the field is that amidite labeling and solid-phase synthesis
can provide direct read-outs of the biophysical consequences of post-transcriptional
modifications. This will be discussed in greater detail in Section . However,
despite this strength, the “size problem” of solid-phase
synthesis limits the production of RNAs to ∼60 nt, beyond which
it is exceedingly difficult to prepare RNA in high yield and sufficient
purity for NMR studies. Even though the 2′-O-CEM[91,145,146] protecting
group initially held promise for synthesizing larger RNAs, it has
not gained widespread use. Conversely, while much larger RNAs can
be transcribed enzymatically, larger RNAs always carry with them more
extensive signal overlap and broader linewidths. These complications
make NMR analysis of RNAs > 60 nt extremely difficult, even when
atom-specific
labeling is used. However, introducing 13C–19F spin pairs into RNA,[18,57,86] leveraging the spectral properties of the 15N nuclei,[53,153] or combining selective deuteration with 1H NMR[17,53−55] all hold promise to lessen the burden imposed by
overlap and broad lines. This will be discussed in detail in Section .It is clear
that elucidating the structure, interactions, and dynamics
of large RNAs and their complexes (e.g., those implicated in viral
transcription, splicing, nuclear export, translation, packaging, and
particle assembly) requires developing breakthrough technologies and
new experimental strategies to solve the structures of such large
RNAs rapidly and accurately. While the advances in the synthesis of
atom-specific isotope-labeled rNTPs and amidites are essential first
steps in this direction, the ability to incorporate these labels position-specifically
will be a game changer for RNA structural and chemical biology. Overnight,
it would transform our ability to perform position-specific readouts in vitro and in vivo. Moreover, it would
enable scientists to peer directly into the active site of RNA enzymes,
visualize the binding pockets of RNA–drug complexes, and exquisitely
map out the interfaces of RNA–protein, RNA–RNA, or RNA–DNA–RNA
hybrids. At least that is the dream. While we await these technological
advances, the availability of these isotope-labeled RNA building blocks
with diverse labeling topologies (Figure ) still bodes well to address structural
dynamic features of RNAs with NMR spectroscopy as well as MS or small
angle neutron/X-ray scattering. The remaining sections highlight how
the labels described in Section can be exploited to study RNA structure, interactions,
and dynamics by NMR spectroscopy.
Figure 3
List of possible atom-specifically isotope-labeled
nucleobase and
ribose labeling patterns. These can be coupled to form rNTPs via chemo-enzymatic
synthesis but also converted into amidites with further chemical synthesis.
Nucleobase labeling patterns (unmodified and modified) are based on
the synthetic schemes described in Sections and 3.3. These
need not be mutually exclusive, and some labeled sites can be incorporated
simultaneously. Labeled ribose, on the other hand, is available from
commercial sources (Omicron Biochemicals and CIL).
List of possible atom-specifically isotope-labeled
nucleobase and
ribose labeling patterns. These can be coupled to form rNTPs via chemo-enzymatic
synthesis but also converted into amidites with further chemical synthesis.
Nucleobase labeling patterns (unmodified and modified) are based on
the synthetic schemes described in Sections and 3.3. These
need not be mutually exclusive, and some labeled sites can be incorporated
simultaneously. Labeled ribose, on the other hand, is available from
commercial sources (Omicron Biochemicals and CIL).
NMR Probes of Macromolecular Dynamics
Originating more than 45 years ago, early investigations of RNA
dynamics were limited to the study of bacterial tRNAs using one-dimensional
(1D) NMR methods.[154] More than a decade
later, development of 1D and 2D heteronuclear polarization transfer
schemes to measure heteronuclear relaxation rates[155−157] uniquely positioned solution NMR spectroscopy to probe protein[158−161] and RNA[162−166] dynamics. With multidimensional NMR spectroscopy, we can measure
the dynamics of ribose, nucleobase, and phosphorus nuclei distributed
along the entire RNA structure.[167−171] We can especially characterize motions that
range from picosecond to seconds and visualize conformers that are
transient and sparsely populated (Figure ). For these low populated states, we can
extract chemical shifts (structure), rates (kinetics), and populations
(thermodynamics) under various physiological conditions of temperature,
salt, pH, and cellular environment. Finally, we can examine how the
cellular milieu modulates the structure, dynamics, and interactions
of RNA in real time.
Figure 4
Dynamic processes in RNA and corresponding NMR methods
and RNA
nuclei that can be used to characterize such motions. The highlighted 15N and 13C sites have been used extensively in
NMR spin relaxation and relaxation dispersion experiments,[147] whereas 31P[169,170] and 2H[171] sites are probed
less frequently. Alternative time charts can be found elsewhere.[147,168,172,173]
Dynamic processes in RNA and corresponding NMR methods
and RNA
nuclei that can be used to characterize such motions. The highlighted 15N and 13C sites have been used extensively in
NMR spin relaxation and relaxation dispersion experiments,[147] whereas 31P[169,170] and 2H[171] sites are probed
less frequently. Alternative time charts can be found elsewhere.[147,168,172,173]
Probing Fast Motions with
Uniform and Selective
Labels
On the picosecond (ps)-to-nanosecond (ns) (ps-ns)
time scales, spin relaxation provides information about the amplitude
and time scale of motions powered by the bond vectors (e.g., 15N–1H, 13C–1H, 13C–19F, 1H–1H) reorienting relative to the external applied magnetic field
(Figure ).[168,174−176] Longitudinal relaxation describes the return
to the equilibrium distribution of spins along the z-axis, with a characteristic exponential time constant T1 (or rate constant R1 = 1/T1). Transverse relaxation, on the other hand,
describes the decay of magnetization in the transverse xy-plane, with a characteristic decay time constant T2 (or rate constant R2 = 1/T2). Larger R2 values
produce broader peaks and lower peak heights in an NMR experiment.
The linewidth, defined as full-width at half-height (given in Hz),
is Δν1/2 = R2/π.
The heteronuclear Overhauser effect (hNOE) measures the enhancement
of the heteroatom magnetization that arises from saturating the proton
magnetization, and is mediated by their dipolar interaction.For an isolated pair of spin-1/2 nuclei S and I (here, S is 15N, 13C, 31P, 19F; and I
is 1H), R1, R2, and the hNOE of nucleus S are related to the rotational
diffusion tensor of the molecule according to well-known relations:[177,178]where , σ = σ33 – σ11, σ =
σ33 – σ22, σ11, σ22, and σ33 are the principal
components of
the chemical shielding anisotropy (CSA) tensor,[179,180]J(ω) is a spectral density function, which
is assumed to be a Lorentzian (e.g., simplest form is ), γ is the gyromagnetic ratio of spin i, rSI is the distance between spins I
and S, h is Planck’s constant, and Rex is the exchange contribution to R2 due to slow (i.e., microsecond-to-millisecond, μs-ms)
motions. The raw data represented by the three relaxation parameters
(R1, R2, and hNOE) reveal the nucleotide level variation of the dynamic
motions encoded in the RNA primary sequence. Additional motional variables
such as the overall correlation time (τC) and generalized
order parameter (S) can be fit within a Model Free formalism[181,182] to describe fast (i.e., ps-ns) motions. Though, for reasons enumerated
below, this becomes problematic for large uniformly labeled RNAs.[183]The RNA motions reported by R1, R2, and hNOE are easily
probed by 13C[162−165,183−188] and 15N[162,166,189] nuclei. 15N sites are present in the four nucleobases
at the following sites: adenosine (Ade)-H2-N1, Ade-H2-N3, Ade-H8-N7,
and Ade-H8-N9, guanosine (Gua)-H1-N1, Gua-H8-N7, and Gua-H8-N9, uridine
(Uri)-H3-N3, and Uri-H6-N1, and cytidine (Cyt)-H6-N1 (Figures and 4). These are suitable reporters of hydrogen-bonding and non-hydrogen-bond
dynamics that occur in base-paired and nonbase-paired regions. However,
solvent exposed imino regions are usually broadened beyond detection.
Nonprotonated nitrogen sites such as Ade-N1 and Ade-N3, purine (Pur)-N7
and Pur-N9, and pyrimidine (Pyr)-N1 remain underutilized. The limited
availability of directly protonated imino nitrogen probes has made
protonated carbons an attractive alternative for probing RNA relaxation.
These sites are found in both the ribose (C1′–C5′)
and nucleobase (Ade-C2, Pur-C8, Pyr-C5, and Pyr-C6) moieties (Figures and 4).Despite the greater number of detectable 13C nuclei
in RNA, complications arise for measurements and analysis of 13C relaxation. First, the carbon sites are linked by intricate
multibond couplings (i.e., to 15N, 13C, and 1H nuclei) that are proximally positioned within 3 Å or
less. Therefore, 13C spins do not approximate an isolated
two-spin system. In uniformly labeled samples, these extensive dipolar
couplings complicate 13C R1 rate measurements
and analysis[119,120,184,185,187,188,190−194] in biopolymers of large size (τC > 7 ns). Given
this fact, our group has developed pulse schemes (based on the isolated 1H–15N backbone amide spin pair in proteins[195]) to leverage the isolated 1H–13C spin pairs afforded by our atom-specifically labeled RNA
samples (Figure ).
Figure 5
Pulse
scheme for transverse relaxation optimized spectroscopy (TROSY)-detected
experiments for measuring (A) rotating-frame (R1ρ) (from which R2 can be
calculated[185,195]) and (B) 13C R1 rates in selectively labeled RNA, adapted
from previous reports.[195] Quadrature detection
and sensitivity-enhanced/gradient-selection is implemented using the
Rance-Kay[196,197] echo/antiecho scheme with the
polarity of G1 inverted and phase Φ4 and
Φ5 incremented 180° for each second FID of the
quadrature pair.
Pulse
scheme for transverse relaxation optimized spectroscopy (TROSY)-detected
experiments for measuring (A) rotating-frame (R1ρ) (from which R2 can be
calculated[185,195]) and (B) 13C R1 rates in selectively labeled RNA, adapted
from previous reports.[195] Quadrature detection
and sensitivity-enhanced/gradient-selection is implemented using the
Rance-Kay[196,197] echo/antiecho scheme with the
polarity of G1 inverted and phase Φ4 and
Φ5 incremented 180° for each second FID of the
quadrature pair.Theoretical simulations
of R1 rates for Pyr-C5 and Pyr-C6,
ribose C1′, Ade-C2, and Pur-C8 in uniformly and selectively
labeled RNAs suggest that the various 1H–13C, 13C–13C, and 13C–15N dipolar couplings (Figure A) present in uniformly labeled samples lead to overestimated R1 rates (Figure B). Moreover, this discrepancy, measured by the R1 difference (where R1 difference = [100 × (R1,uni – R1,sel)/R1,uni)]),
increases with higher molecular weights and magnetic field strengths
(Figure B). Experimental
measurements with our customized pulse sequence (for selectively labeled
RNA) (Figure ) and
those of others[185] (for uniformly labeled
RNA), corroborated our simulations, suggesting that these discrepancies
in R1 cannot be wholly ignored, even for
fairly isolated Ade-C2 and Pur-C8 sites.[187,188] Taken together, the contribution of 13C–13C dipolar interactions needs to be explicitly taken into consideration
in data analysis of uniformly labeled RNA. Spin relaxation measurements
on uniformly labeled RNA from Al-Hashimi and co-workers[185] demonstrate that this is not an insurmountable
hurdle. Nevertheless, the focus of our discussion on RNA dynamics
will center on slower conformational exchange motions, which will
be discussed in Section .
Figure 6
Dipolar couplings complicate dynamics measurements in uniformly
labeled RNA. (A) Nucleobase and ribose structures shown to highlight
dipolar coupling networks to nuclei of interest (i.e., Ade-C2 and
Ade-C8, Uri-C6, and ribose C1′). Distances are shown in units
of Å. (B) Simulated R1 rates and R1 difference (defined as above) for the nuclei
highlighted in panel A. R1 simulations
were carried out for 800 MHz field and R1 difference simulations were run at multiple magnetic fields. All
simulations were carried out at various τC values,
and additional details can be found in the original works.[187,188]
Dipolar couplings complicate dynamics measurements in uniformly
labeled RNA. (A) Nucleobase and ribose structures shown to highlight
dipolar coupling networks to nuclei of interest (i.e., Ade-C2 and
Ade-C8, Uri-C6, and ribose C1′). Distances are shown in units
of Å. (B) Simulated R1 rates and R1 difference (defined as above) for the nuclei
highlighted in panel A. R1 simulations
were carried out for 800 MHz field and R1 difference simulations were run at multiple magnetic fields. All
simulations were carried out at various τC values,
and additional details can be found in the original works.[187,188]
Probing
Slow Motions with Uniform and Selective
Labels: Relaxation Dispersion and Saturation Transfer Methods
Spin-1/2 nuclei with a positive gyromagnetic ratio either align parallel
(α, high-populated, favorable energetic state) to the static
NMR magnetic field (B0) or antiparallel
(β, low-populated, unfavorable state). The net bulk magnetization,
oriented parallel to B0, can be realigned
with radiofrequency (RF) pulses along a direction perpendicular to B0. The magnetization then precesses about B0 at a resonant Larmor frequency (ω) characteristic
of the nucleus. When Fourier transformed, this detectable oscillating
time-domain signal yields a frequency-domain NMR spectrum with signals
at characteristic frequencies for each nucleus. When referenced against
a standard frequency (e.g., sodium-3-(trimethylsilyl)-1-propanesulfonate
(DSS) for 1H), we obtain a field-independent chemical shift
that is directly proportional to the energy difference between the
α and β states.For RNA exchanging between two states
A and B, the chemical shift difference (Δω) between the
two states and the exchange rate constant [kex, sum of the forward (kAB) and
reverse (kBA) rate constants] or the exchange
lifetime (τex = 1/kex) determine if two distinct NMR peaks are observed and what signal
intensity and linewidth are obtained for a given nucleus.[198,199] In the slow exchange regime, two distinct peaks are detected at
the chemical shifts of the individual states, and the peak intensities
are proportional to the populations of each state. In the fast exchange
regime, kex is much larger than Δω,
and therefore, a single peak is observed at the population-weighted
average chemical shift. In the intermediate exchange regime, which,
as its name implies, lies between the fast and slow time scales, kex ≈ Δω.Regardless
of the exchange regime, if chemical exchange is present, R2 increases by Rex, which
depends on kex and Δω
and can therefore be modulated by magnetic field strength.[198−202] Dynamics on the intermediate and slow time scales (i.e., μs-ms)
can be characterized with relaxation dispersion (RD) using R1ρ,[202,203] Carr–Purcell–Meiboom–Gill
(CPMG),[204−206] or chemical exchange saturation transfer
(CEST)[207] experiments (Figure ). Moreover, even processes
slower than seconds can be studied with real-time NMR (Figure ).[208]For two-site exchange, a general expression for the R2 rate constant (RCPMG(τcp)) for state A (where pA > pB), that encompasses all
conformational exchange
time scales, is given by the Carver-Richards equation:[198,209]where R2A/B and pA/B are the R2 rate
and relative
populations of the A/B state, respectively. A main disadvantage of
the CPMG experiment is that only the magnitude (and not the sign)
of Δω is obtained. Still, this disadvantage of the CPMG
experiment is offset by the relative ease of its implementation and
data analysis. That is, conformational exchange is easily detected
by a nonflat CPMG curve when plotting R2,eff versus vCPMG (Figure A). Nonexchanging nuclei, on the other hand,
have no dependence of R2,eff on vCPMG and therefore appear as flat curves (Figure A).
Figure 7
Simulated NMR RD experiments.
(A) CPMG curves for two nuclei: one
in exchange (in red, Rex > 0) using
the
parameters kex = 794 s–1, pB = 8.7%, and Δω = 228
Hz (150 MHz 13C-Larmor frequency) and one without (in black, Rex = 0, or Δω = 0, or both), based
on published data.[210] (B) CEST profile
for a given nuclei showing evidence of two states A and B. Calculations
assumed kex = 121 s–1, pB = 10.8%, γ(1H)B0/2π = 600 MHz, Δω = −4 ppm, R1A = R1B, T = 0.3 s, and the B1 fields
specified on the figure. (C) R1ρ profile for a given nuclei showing evidence of two states A and
B. Calculations used the same parameters as in (B) but with different
B1 fields, which are again specified on the figure. As
seen in the CEST and R1ρ profiles,
at higher B1 fields, linewidths broaden to the point where
state B becomes increasingly difficult to detect. CEST and R1ρ profiles are based on published data.[211]
Simulated NMR RD experiments.
(A) CPMG curves for two nuclei: one
in exchange (in red, Rex > 0) using
the
parameters kex = 794 s–1, pB = 8.7%, and Δω = 228
Hz (150 MHz 13C-Larmor frequency) and one without (in black, Rex = 0, or Δω = 0, or both), based
on published data.[210] (B) CEST profile
for a given nuclei showing evidence of two states A and B. Calculations
assumed kex = 121 s–1, pB = 10.8%, γ(1H)B0/2π = 600 MHz, Δω = −4 ppm, R1A = R1B, T = 0.3 s, and the B1 fields
specified on the figure. (C) R1ρ profile for a given nuclei showing evidence of two states A and
B. Calculations used the same parameters as in (B) but with different
B1 fields, which are again specified on the figure. As
seen in the CEST and R1ρ profiles,
at higher B1 fields, linewidths broaden to the point where
state B becomes increasingly difficult to detect. CEST and R1ρ profiles are based on published data.[211]R1ρ and CEST experiments provide
more robust information regarding the chemical shifts of state B.
For a two-site model, Δω, kex, and pB can be extracted from CEST profiles
using the Bloch-McConnell 7 × 7 matrix (including the equilibrium
magnetization terms).[212−214] By combining all data sets, global kex and pB values
can be fit numerically for all the CEST profiles, plotted as I/I0 versus spin-lock offset
(in Hz) (Figure B).
The 7 × 7 two-site Bloch-McConnell equation is derived from the
relaxation matrix and the kinetic rate matrix for an exchanging two-site
system:[207,211,213,214]where R1A/B, ωA/B, and ω1 are the R1 rate of the A/B state, the offset of the B1 spin-lock
field from the peaks in the A/B state (in rad s–1), and the B1 field strength (in rad s–1), respectively. The evolution of magnetization for the peak in state
A during the CEST spinlock period is given bySimilarly, under the R1ρ model for two-site exchange, the R1ρ value for state A magnetization is given by[215]and where Ω = ωrf –
Ωobs is the difference between the resonance frequency
of the observed nucleus (Ωobs) and the spinlock transmitter
frequency (ωrf). For R1ρ experiments, conformational exchange can be detected by plotting R2,eff versus Ω/2π (Figure C). The expression for CEST
and R1ρ (eqs –16) provide
insight into the parameters that are important for acquiring useful
data. For example, higher B1 fields decrease chemical shift
resolution between states and also broadens linewidths (Figure B,C).While almost all
RD studies involve two-site systems, expressions
for CPMG, R1ρ, and CEST models for
characterizing N-site exchange have been described by Arthur Palmer
III and co-workers.[198] Indeed, work from
Al-Hashimi and co-workers on Watson–Crick mismatches and base
pair reshuffling in RNA feature R1ρ and CEST data that described three-site exchange.[216]
Slow Motions: Are Selective Labels Needed?
As with spin relaxation, the scalar and dipolar couplings present
in uniformly labeled samples can lead to complications in RD and CEST
experiments. As we have discussed elsewhere,[75] numerous spectroscopic solutions have been proposed to circumvent
the problems that arise from 13C–13C
couplings that exist in uniformly labeled RNA. These advances include
constant time evolution,[217−220] adiabatic band selective decoupling,[221−223] and selective cross-polarization with weak RF fields.[224−226] These solutions have benefited RD and CEST experiments to varying
degrees in RNA. Specifically, 13C–13C
scalar couplings (e.g., C1′–C2′ or C5–C6)
complicate CPMG experiments[227,228] to a much larger degree
than both CEST and R1ρ. However,
these couplings still pose a problem to CEST[229,230] and R1ρ(211) and oscillations are sometimes observed in the decay profiles of
C1′ and C6 nuclei. Moreover, as with spin relaxation, these
couplings must be explicitly taken into consideration in data analysis.
The number of coupled homogeneous differential equations (n) is equal to (2 × 4)
– 1, where m is the number of weakly coupled
nuclear spins in an m-spin system. Therefore, for
1-, 2-, and 3-spin systems, n = 7, 31, and 127, respectively.[213,214,230] This transforms the CEST matrix
(eq ) from 7 ×
7 to 31 × 31 for 13C–13C scalar
coupled spin pairs found in the nucleobase and ribose moieties. Atom-specific
labeling (Section ), on the other hand, circumvents this problem entirely, and dramatically
simplifies NMR spectra, especially when incorporated position-specifically
via solid-phase synthesis (Section ). However, a drawback for selective labels is the
obvious reduction of probe sites.Nevertheless, using both selective
and uniformly labeled RNA, CEST and R1ρ experiments have now been applied to the protonated nucleobase (Pyr-C5
and Pyr-C6, Pur-C8, and Ade-C2) and ribose (C1′-C5′)
carbons, the nucleobase imino (Gua-N1 and Thy/Uri-N3) and amino (Gua-N2)
nitrogen, nucleobase (Uri-H3, Gua-H1, Ade-H2, Pur-H8, Pyr-H5, and
Pyr-H6) and ribose H1′ protons, as well as nonprotonated (Gua-N7,
Ade-N1, and Pur-N7) and amino (Cyt-N4) nitrogen sites (Figure ).[75,147,167,211,231−233] In practice, CPMG experiments are solely implemented on selectively
labeled RNA, and mainly from our group[75,144,227] and the Kreutz group,[82,85,89,210,234] though not exclusively.[193] CEST and R1ρ, on the other hand, have been used
to great success with uniformly labeled RNA by the Al-Hashimi,[31,147,167,235−245] Petzold,[246,247] and Zhang[211,231,248−250] groups. Moreover, Petzold and co-workers have developed a SELective
Optimized Proton Experiment (SELOPE) approach[251] that can be implemented with R1ρ and CEST[252] experiments using unlabeled
samples. The rest of this section will highlight recent examples of
RD experiments on labeled (selectively and uniformly) and unlabeled
RNA.
Examples of Relaxation Dispersion Experiments
in Selectively Labeled RNA
As highlighted above, implementation
of RD experiments on selectively labeled RNA circumvents all complications
from strong 13C–13C scalar couplings
and permits straightforward data analysis. The following sections
will be devoted to showcasing examples of CPMG, CEST, and R1ρ experiments performed on selectively
labeled RNAs. Specifically, we will highlight recent work from our
group[227,233] using isotope-labeled rNTPs and from Kreutz
and Al-Hashimi and co-workers[150] using
isotope-labeled amidites with post-transcriptional modifications.
CPMG in Atom-Specifically Labeled RNA
Until recently,
CPMG experiments to measure the chemical shifts
of nucleobase methine 1H and ribose methylene C5′(H2) in a low populated, transient state (i.e., state B) were
not available. This gap existed, in part, due to complications from 13C–13C scalar couplings. To fill this knowledge
gap, our group adapted single-quantum (SQ) 1H CPMG experiments
previously designed for methyl groups in protein side-chains[253,254] to obtain CPMG data for the selectively labeled ([2′,8-13C2]-ATP, [1′,6-13C2]-CTP, [1′,8-13C2]-GTP, and [2′,6-13C2]-UTP) bacterial A-site RNA (Figures A,B).[227]
Figure 8
(A) Pulse scheme for SQ 1H CPMG experiment for selectively
labeled RNA,[227] adapted from previous reports.[253,254] (B) Secondary structure of the 27 nt bacterial A-site RNA with all
nucleotides harboring isotope labels shown bolded in orange. Nucleotides
that were found to be in exchange are circled. Exchange parameters
were extracted from a global fit of the CPMG data (i.e., G19-C8 and
A21-C8). (C) Pulse scheme for methylene CH21H–13C TROSY-detected CPMG experiment for selectively
labeled RNA,[227] adapted from previous reports.[256] (D) Secondary structure of the 29 nt iron-responsive
element (IRE) RNA with isotope labels and nucleotides in exchange
presented as in panel B. Exchange parameters were extracted from a
global fit of the CPMG data (i.e., C18–C5′, C18–C1′,
and C18–C6) and likely refer to a structural rearrangement
in the IRE triloop. Orange circles and D refer to 13C and 2H nuclei, respectively. Additional details can be found in
the original work.[227]
(A) Pulse scheme for SQ 1H CPMG experiment for selectively
labeled RNA,[227] adapted from previous reports.[253,254] (B) Secondary structure of the 27 nt bacterial A-site RNA with all
nucleotides harboring isotope labels shown bolded in orange. Nucleotides
that were found to be in exchange are circled. Exchange parameters
were extracted from a global fit of the CPMG data (i.e., G19-C8 and
A21-C8). (C) Pulse scheme for methylene CH21H–13C TROSY-detected CPMG experiment for selectively
labeled RNA,[227] adapted from previous reports.[256] (D) Secondary structure of the 29 nt iron-responsive
element (IRE) RNA with isotope labels and nucleotides in exchange
presented as in panel B. Exchange parameters were extracted from a
global fit of the CPMG data (i.e., C18–C5′, C18–C1′,
and C18–C6) and likely refer to a structural rearrangement
in the IRE triloop. Orange circles and D refer to 13C and 2H nuclei, respectively. Additional details can be found in
the original work.[227]The SQ 1H CPMG experiment was amenable to Pur-H8 sites,
detecting exchange in G19 and A21. The extracted exchange rate (kex = 4000 ± 100 s–1)
from a global fit was consistent with that determined from a standard 1H–13C TROSY CPMG experiment (kex = 3000 ± 800 s–1), demonstrating
that these new experiments are feasible for RNA (Figure B).[227] Moreover, these data agree with R1ρ measurements on uniformly labeled RNA from Al-Hashimi and co-workers,
which suggests that each measurement, using various methods and labeling
techniques, is picking up fundamental motions within this RNA.[255] In addition, these SQ experiments could provide
important data on 1H chemical shifts, which are currently
lacking, such as ribose H1′ and Pyr-H6. In the latter case,
however, the presence of Pyr-H5 can cause dispersive CPMG patterns
for the H6 site.[227] Fortunately, Pyr-H5
deuteration is easily achieved (Scheme ),[85] and therefore, this
experiment can be readily implemented to obtain data for Pyr-H6 sites.Our group also designed a CH21H–13C TROSY-detected CPMG pulse sequence (Figure C)[227,256] to leverage the isolated 13C spin at the ribose C5′ position (Figure ) afforded by our chemo-enzymatic
labeling (Sections and 3.2).[74,75,128] This new CPMG experiment was implemented using the
selectively labeled ([1′,5′,6-13C3, 5-2H]-CTP) iron-responsive element (IRE) RNA and detected
exchange in C18–C5′ (Figure D).[227] These data
were then globally fit with additional CPMG data from other nuclei
to obtain chemical shift (Δω = 2.5 ± 0.2 ppm), population
(pB = 1.7 ± 0.2%), and exchange rate
(kex = 3600 ± 300 s–1) information that suggests a significant structural rearrangement
in the IRE triloop (Figure D).[227]
CEST
in Atom- and Position-Specifically
Labeled RNA
In addition to using selective labels to benefit
CPMG experiments, they can also be used to simplify CEST experiments.
Specifically, our group combined enzymatic ligation, chemo-enzymatic
labeling, and newly developed CEST experiments (Figure A) to study the conformational equilibria
of the SAM-II riboswitch in the apo (ligand-free) state.[233] To understand the formation of the SAM metabolite-binding
pocket, a SAM-II RNA was constructed via DNA splinted ligation with
T4 DNA ligase (EC 6.5.1.1) of two RNA fragments: an unlabeled 31 nt
acceptor fragment and a [1′,6-13C2, 5-2H]-CTP labeled 21 nt donor fragment. This strategy enabled
position-specific labeling, given that there was only one cytidine
(C43) in the donor sequence and therefore permitted direct monitoring
of the G22–C43 base pair interaction in the SAM binding pocket.
Moreover, the isolated spin pair labeling topology enabled the design
of a 1H CEST experiment, and simplified setup and analysis
of 1H and 13C CEST experiments without complications
from 13C–13C couplings to Cyt-C1′
and Cyt-C6 sites.[229]
Figure 9
(A) Pulse scheme for 13C and 1H CEST experiments
with temperature compensation (TC) and 1H decoupling (1H Dec) for selectively labeled RNA,[233] adapted from previous reports.[211,213] (B) Secondary
structure of the 52 nt SAM-II riboswitch RNA. C43 position-specific
labeling is shown bolded in orange and circled to indicate that it
was the subject of CEST experiments. Exchange parameters were extracted
from a global fit of the CEST data (i.e., C43–C1′ and
C43–C6) and reveal a transition from an open to a closed conformation
that resembles the SAM-bound form. Orange circles and D refer to 13C and 2H nuclei, respectively. Additional details
can be found in the original work.[233]
(A) Pulse scheme for 13C and 1H CEST experiments
with temperature compensation (TC) and 1H decoupling (1H Dec) for selectively labeled RNA,[233] adapted from previous reports.[211,213] (B) Secondary
structure of the 52 nt SAM-II riboswitch RNA. C43 position-specific
labeling is shown bolded in orange and circled to indicate that it
was the subject of CEST experiments. Exchange parameters were extracted
from a global fit of the CEST data (i.e., C43–C1′ and
C43–C6) and reveal a transition from an open to a closed conformation
that resembles the SAM-bound form. Orange circles and D refer to 13C and 2H nuclei, respectively. Additional details
can be found in the original work.[233]To leverage the labeling scheme, our group designed
a new 13C CEST experiment based on previous pulse schemes[211,213] and used it on the apo SAM-II riboswitch (Figure A). The CEST profiles of C43–C1′
and C43–C6 indicated two states of the free SAM-II riboswitch:
one that matched the resonance of the ligand-free, highly populated
conformation (i.e., state A) and another that matched the ligand-bound,
transient conformation (i.e., state B) (Figure B).[233] We then
used our new 1H CEST experiment (Figure A) to indirectly obtain the C43–H1′
chemical shift of state A and B.[233] In
agreement with the 13C data, the 1H chemical
shift of state B matched the ligand-bound SAM-II (Figure B).[233] Taken together, these results suggest that the apo SAM-II exists
in a dynamic equilibrium (kex = 36 ±
3 s–1) between an open (highly populated, pA = 90.5 ± 0.5%) and a partially closed
(transient, pB = 9.5 ± 0.5%) state
(Figure B).[233] Moreover, these results underscore the emerging
consensus that transient, low populated states likely enhance rapid
ligand recognition and therefore play a potentially ubiquitous role
in RNA recognition and signaling.
R1ρ and
CEST in Atom- and Position-Specifically Labeled RNA Harboring Post-transcriptional
Modifications
Perhaps the greatest benefit of selective labeling
is the ability to monitor the structural dynamic consequences of epigenetic
and post-transcriptional modifications. Using labels created by Kreutz
and co-workers, the Al-Hashimi group has been at the forefront of
exploring how these modifications alter the dynamic ensembles of nucleic
acids.[148−150,232,257−260] One such example is m6A, an abundant
RNA post-transcriptional modification that modulates gene expression,[261−263] viral lifecycles,[264−270] and other biological phenomena.[271−274] Recent work from the Al-Hashimi
group demonstrated that m6A preferentially slows RNA duplex
annealing with minimal effect on the rate of duplex melting.[149] The effect of m6A on hybridization
kinetics stands in contrast to the effect of mismatches. Mismatches
also slow the rate of duplex annealing but dramatically increase the
rate of duplex melting.[275−277] Of critical importance, the
methylamino group of the m6A nucleobase can form two rotational
isomers that interconvert on the millisecond time scale[278,279] (Figure A). The
preferred syn isomer (i.e., high-populated, state
A) cannot form a canonical Watson–Crick base pair with uridine
due to a steric clash between the uridine keto group and the methylamino[278−280] and is therefore mismatch-like (Figure A). Instead, when base-paired with uridine,
the methylamino rotates into the anti isomer (i.e.,
transient, state B) to form a canonical Watson–Crick m6A:U base pair (Figure A).
Figure 10
(A) Equilibrium between syn:anti conformations of the m6A nucleobase and the types of
base pairing that each conformation can adopt.[278−280] (B) Secondary structure of the 9 and 18 nt ssRNA and dsRNA that
were position-specifically labeled with isotope-labeled m6A, as shown bolded in orange and circled to indicate that it was
the subject of RD and CEST experiments. RNA samples harboring m6A were either made with [2,8-13C2]-m6A (top) or [13CH3]-m6A (bottom)
labels to obtain 13C RD and CEST data for CH3 (methyl), C2, or C8 sites. (C) Schematic of the four-state CS-and-IF
kinetic model with rate constants shown from RD and CEST data collected
at 65 °C.[150] Orange circles refer
to 13C. Additional details can be found in the original
work.[150]
(A) Equilibrium between syn:anti conformations of the m6A nucleobase and the types of
base pairing that each conformation can adopt.[278−280] (B) Secondary structure of the 9 and 18 nt ssRNA and dsRNA that
were position-specifically labeled with isotope-labeled m6A, as shown bolded in orange and circled to indicate that it was
the subject of RD and CEST experiments. RNA samples harboring m6A were either made with [2,8-13C2]-m6A (top) or [13CH3]-m6A (bottom)
labels to obtain 13C RD and CEST data for CH3 (methyl), C2, or C8 sites. (C) Schematic of the four-state CS-and-IF
kinetic model with rate constants shown from RD and CEST data collected
at 65 °C.[150] Orange circles refer
to 13C. Additional details can be found in the original
work.[150]Kinetic mechanisms that involve binding and conformational change
can occur via pathways wherein the conformational change occurs prior
to (conformational selection, CS) or post (induced fit, IF) binding.
Al-Hashimi and co-workers employed their recently developed RD-based
and CEST experiments[31,147,167,236−245] to measure hybridization kinetics of single- and double-stranded
RNA (ssRNA and dsRNA, respectively) harboring atom- and position-specifically
labeled m6A probes (i.e., [2,8-13C2]-m6A or [13CH3]-m6A)
(Figure B) to determine
how m6A modulates hybridization.[150] In this way, they had direct readouts of the effects of the m6A isomers on Watson-Crick or mismatch-like hybridizations.
They showed that m6A with the methylamino group in the anti conformation forms a Watson–Crick base pair
with uridine that transiently isomerizes on the millisecond time scale
to a singly hydrogen-bonded (pB ≈ 1%) mismatch-like
conformation, with the methylamino group in the syn conformation.[150] This rapid interconversion
between Watson–Crick and mismatch forms, combined with different syn:anti preferences in ssRNA and dsRNA
states, hints at how m6A slows duplex annealing without
affecting melting via two pathways in which isomerization occurs before
(CS) or after (IF) duplex annealing (Figure C).[150]
Examples of Relaxation Dispersion Experiments
without Selectively Labeled RNA
While RD experiments work
well with selective labels, it is not a prerequisite, as long as care
is taken to either minimize strong 13C–13C scalar couplings (i.e., probe nuclei where these are minimized)
or take them into consideration in data analysis. The following sections
will be devoted to showcasing examples of CEST and R1ρ experiments performed without selectively labeled RNA. We will highlight
recent work from the Zhang[249] and Petzold[246] groups using uniformly 13C/15N-labeled rNTPs and also new experiments from the Petzold[251] and Al-Hashimi[252] groups that require no labels at all.
CEST
and R1ρ Experiments
in Uniformly Labeled RNA
RNA dynamics can regulate biological
processes from transcription to translation. One such example is the Bacillus cereus fluoride riboswitch RNA (Figure A), which has been characterized
extensively by Zhang and co-workers.[249] Here, they showed that the riboswitch aptamer adopts a near-identical
solution structure[249] with (holo) and without
(apo) the fluoride ligand, in agreement with X-ray crystal structures
(Figure B).[281] Moreover, these states also undergo very similar
dynamic motions across a wide range of time scales, as determined
from 13C spin relaxation rates and residual dipolar couplings
(RDCs).[249] However, functional assays indicate
that transcription activation is fluoride-dependent and kinetically
driven.[281,282] What is more, mutational studies suggest
that a prefolded “holo-like” apo state lowers the kinetic
barrier for ligand binding, enabling efficient fluoride sensing to
activate transcription below or near the toxicity threshold. Until
recently, the mechanism by which this holo-like apo state achieves
the “transcription–off” state remained unknown.[249]
Figure 11
(A) Secondary structure of the 48 nt fluoride
riboswitch aptamer
RNA with domains labeled by color. (B) Solution NMR structure[249] of the apo aptamer (B. cereus) (PDB ID, 5KH8) (left) compared to crystal structures[281] of the apo (PDB ID, 4ENC) and holo (PDB ID, 3VRS) aptamers (T. petrophila). In solution, the aptamer adopts near-identical structures in the
apo and holo forms, in agreement with crystallography.[249,281] (C) Schematic of the equilibrium between the highly populated apo
state (i.e., State A) and the transient “holo-like”
conformation of the apo state (i.e., State B). Exchange parameters
were extracted from a global fit of the CEST data. The transient “holo-like”
conformation of the apo state (i.e., State B) occludes the formation
of a reverse Hoogsteen base pair in the highly populated conformation
of the apo state (i.e., State A) to signal transcription termination.
Additional details can be found in the original work.[249]
(A) Secondary structure of the 48 nt fluoride
riboswitch aptamer
RNA with domains labeled by color. (B) Solution NMR structure[249] of the apo aptamer (B. cereus) (PDB ID, 5KH8) (left) compared to crystal structures[281] of the apo (PDB ID, 4ENC) and holo (PDB ID, 3VRS) aptamers (T. petrophila). In solution, the aptamer adopts near-identical structures in the
apo and holo forms, in agreement with crystallography.[249,281] (C) Schematic of the equilibrium between the highly populated apo
state (i.e., State A) and the transient “holo-like”
conformation of the apo state (i.e., State B). Exchange parameters
were extracted from a global fit of the CEST data. The transient “holo-like”
conformation of the apo state (i.e., State B) occludes the formation
of a reverse Hoogsteen base pair in the highly populated conformation
of the apo state (i.e., State A) to signal transcription termination.
Additional details can be found in the original work.[249]To shed light on this
mechanism, 13C CEST experiments
were implemented on uniformly 13C/15N-GTP- and
uniformly 13C/15N -ATP/UTP labeled aptamer RNA.
For the holo state, CEST profiles consistently showed a single, highly
populated conformation (i.e., state A).[249] A subset of CEST profiles of the apo state, on the other hand, revealed
the presence of conformational exchange to a transient state (i.e.,
state B).[249] The nucleotides that undergo
chemical exchange were localized to the junction of P3, J13, J23,
and the 3′-tail, suggesting a concerted transition (Figure A,B). A global
fit of the CEST data determined the population (pB = 1.4 ± 0.1%) and lifetime (τB = 3.2 ± 0.3 ms) of the holo-like conformation of the apo state.
This fleeting process differentiates the apo and holo states. Rapid
transition to the holo-like conformation of the apo state, which unlocks
the highly conserved reverse Hoogsteen base pair located at the interface
between the aptamer domain and the expression platform, promotes strand
invasion and provides a path to transcription termination (Figure C).[249] Conversely, fluoride binding allosterically
suppresses access to the holo-like conformation of the apo state,
ensuring continued gene transcription.[249]RNA can also regulate the initial steps of translational silencing.
This process begins when a mature miRNA binds to the human Argonaute
(Ago2) protein to form the RNA-induced silencing complex (RISC).[283] Here, translational silencing is predominantly
controlled by base pair complementarity between the “seed”
region of the miRNA and the target mRNA.[283−288] Interestingly, data from bioinformatics,[289] structural,[290] and mutational[291] studies all suggest that RNA dynamics within
the central bulge of miRNA–mRNA duplex likely controls mRNA
fate. To test this hypothesis, Petzold and co-workers used R1ρ experiments coupled with molecular
dynamics simulations to investigate the structural dynamics of the
interaction between miR-34a and its miRNA recognition element in the
3′-UTR of silent information regulator 1 mRNA (mSirt1) (Figure A).[246] Using these experiments, the authors detected
chemical exchange in nucleotides surrounding the central bulge of
the miR-34a–mSirt1 duplex (Figure A).[246] In this
structural rearrangement, the gG8:tC17 base pair (‘g’
refers to the guide miRNA and ‘t’ refers to the target
mRNA) interconverts from a highly populated (i.e., state A) to a transient
(i.e., state B) conformation. A global fit of the R1ρ data determined the exchange rate (kex = 1008 ± 12 s–1) and population
(pB = 0.9 ± 0.2%) of the unfavorable
state (Figure B),[246] and the chemical shift data[246] from 1H (Δω −2.20 ±
0.02 ppm) and 15N (Δω −3.8 ± 0.1
ppm) R1ρ experiments suggest formation
of a gG8:tU21 wobble pair (Figure B),[246] a motif seen in other
miRNAs.[292,293] Taken together, the miR-34a–mSirt1
binding site is in equilibrium between a highly populated 7-mer-A1
and a transient 8-mer-GU (Figure B).
Figure 12
(A) Secondary structure of the mir-34a–mSirt1 duplex.[246] Nucleotides that were found to be in exchange
are circled. (B) Schematic of the equilibrium between the highly populated
7-mer-A1 and transient 8-mer-GU mir-34a–mSirt1 duplex. Exchange
parameters were extracted from a global fit of the R1ρ data (i.e., gG8-H1, gG8-N1, gG8-C8, tC17-C1′,
tA19-C8, tU20-C1′, tU21-C6, and tA22-C8). The boxed nucleotides
represent the critical switch from the gG8:tC17 to gG8:tU21 base pair.
(C) Replotted functional data[246] showing
the percentage of target repression for each miR-34a duplex. The transient
8-mer-GU reduces target mRNA levels ∼2-fold compared to the
highly populated 7-mer-A1. The 8-mer-GU duplex therefore represents
a “catalytically competent RISC”. Additional details
can be found in the original work.[246]
(A) Secondary structure of the mir-34a–mSirt1 duplex.[246] Nucleotides that were found to be in exchange
are circled. (B) Schematic of the equilibrium between the highly populated
7-mer-A1 and transient 8-mer-GU mir-34a–mSirt1 duplex. Exchange
parameters were extracted from a global fit of the R1ρ data (i.e., gG8-H1, gG8-N1, gG8-C8, tC17-C1′,
tA19-C8, tU20-C1′, tU21-C6, and tA22-C8). The boxed nucleotides
represent the critical switch from the gG8:tC17 to gG8:tU21 base pair.
(C) Replotted functional data[246] showing
the percentage of target repression for each miR-34a duplex. The transient
8-mer-GU reduces target mRNA levels ∼2-fold compared to the
highly populated 7-mer-A1. The 8-mer-GU duplex therefore represents
a “catalytically competent RISC”. Additional details
can be found in the original work.[246]Next, Petzold and co-workers sought to investigate
the functional
relevance of the 8-mer-GU unfavorable state using a functional assay
and simulated complexes of human Ago with 7-mer-A1 and 8-mer-GU 34a–mSirt1
duplexes. Interestingly, the switch to the 8-mer-GU state causes coaxial
stacking of the seed and supplementary helix fitting into Ago2, reminiscent
of an active state in prokaryotic Ago.[294,295] Moreover,
this state enhances repression of the target mRNA, revealing the importance
of this dynamic miRNA–mRNA structure (Figure C).
CEST
and R1ρ Experiments in Unlabeled
RNA
After highlighting RD experiments
in selectively and uniformly labeled RNA, we will conclude this section
with a brief description of two pulse schemes that permit R1ρ(251) and CEST[252] experiments in unlabeled RNA. In the first,
Petzold and co-workers developed a SELOPE homonuclear NMR method by
combining the selective excitation of specific groups of protons and
reduction of spectral crowding using coherence transfer among scalar
coupled protons. These coherence transfers take advantage of uniform
homonuclear three bond scalar coupling between H5 and H6 for pyrimidine
bases ( ∼ 8–10
Hz) or between H1' and H2' for ribose in C2'-endo conformation
( ∼
8Hz). Taken together,
SELOPE permits well-resolved 1D and 2D spectra of unlabeled RNA. To
demonstrate the utility of this method to probe RNA transient states,
Petzold and co-workers adapted the SELOPE pulse scheme to include
a spinlock (Figure A).[251] As proof-of-concept, this new 1H R1ρ SELOPE experiment
was used to detect chemical exchange in the central bulge region of
the GUG RNA (Figure B).[251] Importantly, this method enables
the use of lower spinlock strengths to measure slower exchange time
scales.[251]
Figure 13
(A) Pulse scheme for 1H R1ρ experiment on unlabeled
RNA using a SELOPE readout.[251] (B) Secondary
structure of the 25 nt GUG RNA.
Nucleotides that were found to be in exchange are circled, and representative
exchange parameters for U7–H6 (shaded green) are shown. (C)
Pulse scheme for 1H CEST experiment on unlabeled RNA again
with a SELOPE readout.[252] (D) Equilibrium
between Watson–Crick and Hoogsteen A:T and G:C base pairs is
depicted. Exchange rates and populations are shown based on previous
reports,[236] and reporter imino protons
are shaded red. Additional details can be found in the original works.[251,252]
(A) Pulse scheme for 1H R1ρ experiment on unlabeled
RNA using a SELOPE readout.[251] (B) Secondary
structure of the 25 nt GUG RNA.
Nucleotides that were found to be in exchange are circled, and representative
exchange parameters for U7–H6 (shaded green) are shown. (C)
Pulse scheme for 1H CEST experiment on unlabeled RNA again
with a SELOPE readout.[252] (D) Equilibrium
between Watson–Crick and Hoogsteen A:T and G:C base pairs is
depicted. Exchange rates and populations are shown based on previous
reports,[236] and reporter imino protons
are shaded red. Additional details can be found in the original works.[251,252]Building on this work, Al-Hashimi
and co-workers introduced a high-power 1H CEST SELOPE experiment
to target imino protons (Figure C).[252] To showcase the utility
of this method, Watson–Crick
to Hoogsteen exchange of G:C and A:T base pairs in DNA were monitored
(Figure D).[252] Importantly, Al-Hashimi and co-workers showed
that short relaxation delays could be used to characterize fast exchange
events that effectively minimize NOE effects that complicate 1H RD experiments.[213,252,296−301] Moreover, their approach also takes advantage of high-power RF fields
recently shown to extend the time scale sensitivity of CEST to include
faster exchange processes that were traditionally only detectable
by R1ρ.[252,302] While both of these exciting new advancements hold promise, they
are inherently limited to small RNAs. However, RNA biology is increasingly
moving toward larger and larger RNAs. This important topic will be
the focus of the next section.
Exploring
Large Molecular Weight Nucleic Acids
Until now, most studies
of RNA dynamics have focused on relatively
small systems. However, RNA structural biology is increasingly moving
toward larger RNAs, especially as cryo-EM advances in resolution and
popularity.[303−305] Solution NMR spectroscopy, unlike X-ray
crystallography and cryo-EM, is the only biophysical technique capable
of probing nucleic acid conformational dynamics on a wide range of
time scales in a physiologically relevant environment. Moreover, four
technological advances have expanded the types of problems that NMR
can tackle in studies of molecular nanomachines on the order of 1
MDa: (1) commercial availability of high-field magnets, up to 1.2
GHz 1H Larmor frequency (28.2 T),[306] (2) specialized probes (e.g., cryo-probes) that minimize noise associated
with the NMR signals,[307] (3) new isotope
labeling technologies (described in Section ), and (4) the design of new NMR experiments
that are tailored to the isotope labeling used (described in Section ). Our final section
will describe how new labeling efforts can be leveraged to study large
RNAs by NMR.Taking inspiration from protein labeling,[308] our group installed 19F directly
next to a 13C spin in UTP (Scheme )[18,86] and showed that, compared
to the 13C–1H spin pair, 13C–19F had better sensitivity, ∼6-times wider
chemical shift dispersion
and ∼2-times more favorable relaxation properties in 2D TROSY
experiments (Figure A,B).[18] Importantly, the high sensitivity
of the 19F nucleus enabled clear delineation of helical
and nonhelical regions as well as G:U wobble and Watson–Crick
base pairs (Figure C).[18,57] In parallel, the Kreutz group incorporated 13C–19F into both cytidine and uridine 2′-O-tBDMS amidites (Schemes and 22) to show the same effect
in RNAs made by solid-phase synthesis.[57] These findings suggest that structural insights are possible even
in the absence of complete resonance assignment, which is a substantial
bottleneck for large RNAs. Moreover, these labeling schemes can be
readily adapted to exploit 19F CEST and R1ρ experiments, which have been described for proteins
up to 360 kDa.[309−314]
Figure 14
(A) Simulated 13C R2 rates (linewidths) at
various magnetic fields in RNAs of various molecular weights (as measured
by τC) to compare the relative TROSY effects of 13C–1H or 13C–19F spin pairs, which are shown on the left. (B) Same simulated rates
as in A for each spin pair but only at the magnetic fields corresponding
to the narrowest linewidths (smallest R2) (600 and 950
MHz for 13C–19F or 13C–1H, respectively, as shown by the gray lines in panel A).[18] (B) Representative 19F–13C TROSY spectrum to highlight the dispersion of resonances
based on secondary structure (i.e., G:U wobble base pairs, nonhelical
nucleotides, and helical A:U base pairs). Spectral regions are colored
to match the respective uridines on the corresponding RNA. Additional
details can be found in the original works.[18,57]
(A) Simulated 13C R2 rates (linewidths) at
various magnetic fields in RNAs of various molecular weights (as measured
by τC) to compare the relative TROSY effects of 13C–1H or 13C–19F spin pairs, which are shown on the left. (B) Same simulated rates
as in A for each spin pair but only at the magnetic fields corresponding
to the narrowest linewidths (smallest R2) (600 and 950
MHz for 13C–19F or 13C–1H, respectively, as shown by the gray lines in panel A).[18] (B) Representative 19F–13C TROSY spectrum to highlight the dispersion of resonances
based on secondary structure (i.e., G:U wobble base pairs, nonhelical
nucleotides, and helical A:U base pairs). Spectral regions are colored
to match the respective uridines on the corresponding RNA. Additional
details can be found in the original works.[18,57]An alternative approach to heteronuclear
correlation experiments
that include nuclei with large CSAs such as 13C and 19F, which broaden the lines of nearby protons, was recently
described by Bax and Summers and co-workers.[53] This approach capitalizes on the favorable relaxation properties
of 15N nuclei within RNA nucleobases. Here, they employed 1H–15N heteronuclear multiple quantum coherence
(HMQC) experiments to measure 15N R1ρ rates and RDCs in a large 232 nt (∼78 kDa)
RNA by selectively transferring magnetization from Ade-H2 to Ade-N1/N3
via the two-bond scalar coupling (J ≈ 15 Hz[29]) (Figure ). Extending this method in the same 232 nt RNA, Marchant
and Tjandra and co-workers measured pseudocontact shifts using the
two-bond scalar coupling of Ade-H8-N7 ( ≈ 11 Hz[29]) and Ade-H8-N9
( ≈ 8 Hz) for
coherence transfer.[315] Importantly, both
experiments would benefit by atom-specific labeling. That is, selective 15N labeling of Ade-N1 or Ade-N3 (described in Section ) (Schemes and 8) would reduce crowding considerably and direct magnetization
transfer uniquely from Ade-H2 rather than splitting it between both
sites, as in uniformly 15N-labeled RNA (Figure ). In the same way, selective 15N labeling of Pur-N7 or Pur-N9 (described in Section ) (Schemes –11) would again reduce crowding and direct coherence
transfer uniquely from Pur-H8 (Figure ). However, selective pulses can be deployed
to affect the same decrowding and directed transfer. These labeling
topologies could then be leveraged to probe two-bond 15N CEST in large RNAs, as recently described by Zhang and co-workers.[231]
Figure 15
Examples of possible routes for coherence transfer
between two-bond
scalar couplings between Ade–H2-N1 and Ade–H2-N3 (J ≈ 15 Hz[29]), Pur–H8-N7
(J ≈ 11 Hz[29]) and Pur–H8-N9
(J ≈ 8 Hz[29]) in uniformly and
selectively labeled RNA.
Examples of possible routes for coherence transfer
between two-bond
scalar couplings between Ade–H2-N1 and Ade–H2-N3 (J ≈ 15 Hz[29]), Pur–H8-N7
(J ≈ 11 Hz[29]) and Pur–H8-N9
(J ≈ 8 Hz[29]) in uniformly and
selectively labeled RNA.Our final example of
harnessing the versatility of the 15N nuclei is one that
exploits the narrow linewidths in 1H–15N TROSY experiments compared to its 1H–13C counterpart (Figure A). Here, Fürtig and Schwalbe and
co-workers investigated several reconstituted complexes between an
adenine-sensing riboswitch and the 30S ribosome by NMR spectroscopy.[153] In particular, they implemented the 1H–15N BEST-TROSY pulse scheme[316,317] to obtain incredible spectra for a massive-sized complex (>800
kDa)
(Figure B). Taken
together, Fürtig and Schwalbe and co-workers succeed in illuminating
the dynamic network that links the riboswitch RNA regulator, adenine
ligand inducer, and ribosome protein S1 modulator during translation
initiation.[153]
Figure 16
(A) Simulated TROSY-detected R2 rates
(linewidths) for 13C and 15N nuclei at 800 MHz.
The 15N nuclei has significantly narrower linewidths (smaller R2) than that of 13C. (B) Structural
model of the >800 kDa complex of adenine-sensing riboswitch bound
to the 30S ribosomal complex (structural model built from PDB IDs 1Y26 and 5MLN).[153] Additional details can be found in the original work.[153]
(A) Simulated TROSY-detected R2 rates
(linewidths) for 13C and 15N nuclei at 800 MHz.
The 15N nuclei has significantly narrower linewidths (smaller R2) than that of 13C. (B) Structural
model of the >800 kDa complex of adenine-sensing riboswitch bound
to the 30S ribosomal complex (structural model built from PDB IDs 1Y26 and 5MLN).[153] Additional details can be found in the original work.[153]
Conclusion
In humans, RNA transcripts exceed the number of proteins decoded
by more than 50-fold, and yet the number of RNA structures remains
below 1%, preventing a detailed understanding of RNA function (Figure ). It is therefore
essential to characterize RNA structural dynamics and interactions
at atomic resolution to fill this critical knowledge gap. Over the
past two decades, NMR spectroscopy has assumed a central role in RNA
structure determination and probing dynamics on functionally relevant
time scales in solution. In this review, we have summarized some of
the many contributions of solution NMR studies to our knowledge of
RNA structure, dynamics, and interactions, as facilitated by isotope
labeling. We have presented a detailed overview of the prominent role
stable isotopes continue to play in NMR analysis of nucleic acids
(Section 2), how to synthesize these labels
and introduce them into RNA (Section 3), and
how these labels benefit NMR analysis. Of great interest, selective
isotope labeling alleviates spectral crowding and removes dipolar
and scalar couplings to simplify NMR dynamics measurements and data
interpretation (Section 4). Moreover, recent
advances in labeling open the door to study large RNA systems in a
manner previously thought impossible (Section 5). As new orthogonal technologies are developed to better characterize
the functional relevance of RNA, their structural dynamics will become
increasingly important to better understand the cellular basis of
RNA-based dysfunction that leads to various diseases. We anticipate
that several imminent breakthrough technologies, some described herein,
will enable NMR spectroscopy to continue to play a pivotal role in
shining light on the structure, dynamics, and function of the important
“dark matter of the genome”, RNA in vitro, in cellulo, and in vivo.
Authors: A M Gilles; I Cristea; N Palibroda; I Hilden; K F Jensen; R S Sarfati; A Namane; J Ughetto-Monfrin; O Bârzu Journal: Anal Biochem Date: 1995-12-10 Impact factor: 3.365
Authors: Sara Keyhani; Thomas Goldau; Anja Blümler; Alexander Heckel; Harald Schwalbe Journal: Angew Chem Int Ed Engl Date: 2018-08-10 Impact factor: 15.336