Jonathan L Chen1, Stanislav Bellaousov2, Jason D Tubbs1, Scott D Kennedy2, Michael J Lopez1, David H Mathews2,3, Douglas H Turner1,3. 1. Department of Chemistry, University of Rochester , Rochester, New York 14627, United States. 2. Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry , Rochester, New York 14642, United States. 3. Center for RNA Biology, University of Rochester , Rochester, New York 14642, United States.
Abstract
Knowledge of RNA structure is necessary to determine structure-function relationships and to facilitate design of potential therapeutics. RNA secondary structure prediction can be improved by applying constraints from nuclear magnetic resonance (NMR) experiments to a dynamic programming algorithm. Imino proton walks from NOESY spectra reveal double-stranded regions. Chemical shifts of protons in GH1, UH3, and UH5 of GU pairs, UH3, UH5, and AH2 of AU pairs, and GH1 of GC pairs were analyzed to identify constraints for the 5' to 3' directionality of base pairs in helices. The 5' to 3' directionality constraints were incorporated into an NMR-assisted prediction of secondary structure (NAPSS-CS) program. When it was tested on 18 structures, including nine pseudoknots, the sensitivity and positive predictive value were improved relative to those of three unrestrained programs. The prediction accuracy for the pseudoknots improved the most. The program also facilitates assignment of chemical shifts to individual nucleotides, a necessary step for determining three-dimensional structure.
Knowledge of RNA structure is necessary to determine structure-function relationships and to facilitate design of potential therapeutics. RNA secondary structure prediction can be improved by applying constraints from nuclear magnetic resonance (NMR) experiments to a dynamic programming algorithm. Imino proton walks from NOESY spectra reveal double-stranded regions. Chemical shifts of protons in GH1, UH3, and UH5 of GU pairs, UH3, UH5, and AH2 of AU pairs, and GH1 of GC pairs were analyzed to identify constraints for the 5' to 3' directionality of base pairs in helices. The 5' to 3' directionality constraints were incorporated into an NMR-assisted prediction of secondary structure (NAPSS-CS) program. When it was tested on 18 structures, including nine pseudoknots, the sensitivity and positive predictive value were improved relative to those of three unrestrained programs. The prediction accuracy for the pseudoknots improved the most. The program also facilitates assignment of chemical shifts to individual nucleotides, a necessary step for determining three-dimensional structure.
RNA is a central biomolecule involved in many cellular functions,
including synthesizing proteins, regulating gene expression, catalyzing
reactions, and storing genetic data in many viruses.[1] Therefore, knowledge of RNA structure can lead to the discovery
of causes and cures of many diseases. Once sequence is known, the
first step in determining RNA structure is defining the secondary
structure, i.e., base pairing. This level of structure is used for
many applications, including using nuclear magnetic resonance (NMR)
to determine 3D structure,[2−4] designing therapeutics,[5] and providing insights into functional mechanisms[6] and evolution.[7−10]Methods for predicting RNA secondary
structure include comparative
sequence analysis[7,11−13] and minimization
of free energy
predicted on the basis of thermodynamic parameters.[14−18] The latter method is ∼70% accurate when predicting
secondary
structure from a single sequence,[19,20] but prediction
methods that integrate sequence comparison typically give better than
85% average accuracy.[21−26]A pseudoknot is a type of RNA secondary structure in which
nucleotides
in a loop region pair with nucleotides outside of where the loop was
closed. Formally, a pseudoknot occurs if there are two base pairs
with indices i paired to j and k paired to l, with positions i < k < j < l. Base pairs forming pseudoknots are the most difficult to predict,
and most popular programs do not allow pseudoknots. Pseudoknots, however,
have important biological roles.[27−33] The lack of thermodynamic data for pseudoknots compounds the difficulty
of accurately predicting them by calculating free energy changes.[34,35]To improve the accuracy of predicting secondary structure,
techniques
that employ experimental data have been developed. Chemical mapping
constraints[20] and/or restraints[36−38] can be used to identify nucleotides
not in Watson–Crick base pairs. NMR constraints identify nucleotides
in Watson–Crick and GU pairs.[39]Crystallization of RNA is challenging.[40,41] Therefore,
many RNA 3D structures have been determined by NMR, which
also provides insight into dynamics. NMR-Assisted Prediction of Secondary
Structure (NAPSS) utilizes NMR-derived constraints in conjunction
with a thermodynamic folding algorithm to reduce the potential folding
space of an RNA molecule and provide some initial chemical shift assignments.[39] Imino proton 2D NOE spectroscopy (NOESY) is
used to identify helices of stacked canonical base pairs. The user
enters constraints from sequential imino proton walks into a program
that recursively searches for structures containing matching helical
walks. For the set of structures in which all constraints are matched,
the program refolds unconstrained regions and calculates the folding
free energy change of the entire structure. The putative structures
are ranked by predicted folding free energy change. The helical walk
constraints, however, do not provide any information about the 5′
to 3′ strand direction of the base pair stacks. That is, the
directionality of the set of stacks along the sequence is unknown.
Thus, NAPSS often identifies a large set of possible structures that
satisfy all of the stacking constraints.Chemical shifts of
protons in base pairs are affected by the identity
of neighboring bases.[42−44] Direction-dependent trends for
chemical shifts of base pair stacks were mostly identified from the
Biological Magnetic Resonance Data Bank (BMRB)[45] and by measuring spectra for some model duplexes with GU
pairs. Here, we report direction-dependent trends for GH1, UH3, and
UH5 of GU wobble pairs, AH2, UH3, and UH5 of AU pairs, and GH1 of
GC pairs. An expanded NAPSS algorithm, NAPSS-CS, is also provided
that uses these spectroscopic properties to constrain predictions
of RNA secondary structure (Figure ).
Figure 1
NAPSS-CS with direction-dependent constraints reduces
the folding
space of an RNA sequence. (a) A 2D NOESY spectrum of the Moloney murine
leukemia virus core encapsidation signal multibranch loop showing
imino proton walks.[63] (b) Applying triplet
constraints to folding of this sequence by NAPSS-CS reduces the number
of dot plot matches by 16. (c) Secondary structure of this sequence.
Colored base pairs correspond to helixces identified by imino proton
walks.
NAPSS-CS with direction-dependent constraints reduces
the folding
space of an RNA sequence. (a) A 2D NOESY spectrum of the Moloney murine
leukemia virus core encapsidation signal multibranch loop showing
imino proton walks.[63] (b) Applying triplet
constraints to folding of this sequence by NAPSS-CS reduces the number
of dot plot matches by 16. (c) Secondary structure of this sequence.
Colored base pairs correspond to helixces identified by imino proton
walks.While the NAPSS-CS algorithm is
intended for RNA secondary structure
prediction, the approach can be generalized to self-assembling polymers
in which structure allows NMR walks and thermodynamic rules are known.
This includes self-assembling DNA structures and nucleic acid mimics
with different backbones and “bases” for which chemical
modification reagents may not be available.[46−49] The algorithm may also facilitate
determination of
secondary structure when more than one structure is present and the
structures are in slow exchange on the NMR time scale.
Methods
Oligoribonucleotide
Preparation and NMR Spectroscopy
To further explore the dispersion
of resonances for GU pairs, whose
chemical shifts are underrepresented in the literature compared to
AU and GC pairs, NOESY and TOCSY spectra were measured for the following
sequences to determine GH1, UH3, UH5, and AH2 chemical shifts: r(AGGCUU)2, r(AGUCGAUU)2, r(CUGGCUAG)2, r(CAGUCGAUUG)2, r(CCGAAUUUGG)2, r(CGGAAUUUCG)2, r(CGGAUAUUCG)2, r(CGUGAUUACG)2, r(CUGGAUUCAG)2, r(GAGAGCUUUC)2, r(GAGGAUCUUC)2, and r(GUGAAUUUAC)2. Imino proton 1D spectra for these sequences
have been published.[50]All oligoribonucleotides
were purchased from Integrated DNA Technologies, Inc., except for
r(GUGAAUUUAC)2, which was synthesized by M. Serra (Allegheny
College, Meadville, PA) using standard phosphoramidite chemistry.
Oligoribonucleotides were dissolved in 300 μL of 80 mM NaCl,
18.8 mM NaH2PO4, 1.16 mM Na2HPO4, and 0.02 mM EDTA at pH 6.0. To provide a lock signal, 15
μL of D2O was then added. NMR spectra were acquired
with a Varian Inova 500 MHz spectrometer. A 35 ms mixing time was used
for 2D TOCSY spectra, and 100 and 400 ms mixing times were used for
2D 1H–1H NOESY spectra. For 2D spectra,
water signals were suppressed
with a WATERGATE-style pulse with flipback.[51,52] 2D spectra were processed with NMRPipe[53] and resonances assigned with SPARKY.[54]Proton chemical shifts were referenced to a temperature-dependent
water shift (eq ), where T is temperature in kelvin measured
at pH 5.5.[55] 2,2-Dimethylsilapentate-5-sulfonic
acid was used as an external reference standard for water.
Quantification
of Secondary Structure Accuracy
The
accuracy of prediction of secondary structures was quantified by sensitivity
and positive predictive value (PPV):A predicted pair, i to j, was considered correct if the accepted structure
contained a pair from i to j, i ± 1 to j, or i to j ± 1. This accounts for the fact that comparative
analysis cannot always resolve the exact pairing and that pairs can
be dynamic.[19]
RNA Preparation and NMR
Spectroscopy
A plasmid construct
containing the sequence for human accelerated region 1 (HAR1) used
by Beniaminov et al.[9] was provided by A.
Krol of the Institut de Biologie Moléculaire et Cellulaire.
The plasmid was amplified in and purified from Escherichia
coli competent cells using standard plasmid preparation protocols.
The purified plasmid was linearized overnight by SmaI restriction endonuclease (New England BioLabs) at 25 °C. RNA
was transcribed from the linearized plasmid with T7 High Yield RNA
Synthesis Kits (New England BioLabs), with 1 μg of plasmid per
20 μL reaction mixture. After transcription mixtures had been
incubated for 12 h at 37 °C, 5 μL of 100 mM EDTA was added
to stop reactions. A sample with 15N- and 13C-labeled rGTP and rUTP (Sigma-Aldrich) was synthesized with a similar
protocol.HAR1 samples were purified with FPLC.[56] FPLC fractions were concentrated and exchanged into NMR
buffer with Amicon Ultra-15 Centrifugal Filter Units (EMD Millipore).
NMR buffer consisted of 100 mM NaCl, 20 mM NaH2PO4/NaHPO4, and 0.05 mM Na2EDTA (pH 6.25). Each
sample was placed in a susceptibility-matched Shigemi NMR tube (Shigemi, Inc.). Final NMR samples were
300 μL, including 10 μL of D2O to provide a
lock signal. The final RNA concentration of each sample was approximately
0.2 mM.A pUC19 plasmid containing an insert for a 75 nt sequence
for the Bombyx mori R2 retrotransposon pseudoknot
was constructed
and amplified with standard plasmid purification protocols (see the Supporting Information for the plasmid construct).
This construct differs from the 74 nt construct previously studied
with NMR[39] by addition of the wild-type
3′ terminal C. The plasmid was linearized with NheI restriction
endonuclease at 37 °C prior to in vitro transcription.
Unlabeled and 13C- and 15N-labeled rGTP and
rUTP samples were transcribed, purified with polyacrylamide gel electrophoresis,
and extracted from gels with electroelution. Purified RNAs were exchanged
into NMR buffer (see ref (39)), concentrated with centrifugal filter units, and added
to a Shigemi NMR tube. D2O was added to provide a lock
signal.NMR spectra were recorded on a Varian Inova 600 MHz
NMR spectrometer.
NOESY spectra were acquired for the unlabeled HAR1 sample with mixing
times of 125 and 250 ms at 25 °C and 60 ms at 15 °C. A 15N–1H HSQC spectrum was acquired for the
labeled HAR1 sample
at 25 °C. Relevant acquisition parameters are listed in Table S1. NOESY spectra were acquired for the
unlabeled 75 nt B. mori R2 retrotransposon pseudoknot
with a mixing time of 200 ms at 25 °C. A 15N–1H HSQC spectrum was acquired for the labeled B. mori R2 retrotransposon pseudoknot sample at 25 °C.
Results
Imino
Proton Distances in PDB Helices
Adjacent canonical
base pairs and therefore potential helices can be identified from
NOEs between imino protons separated by <5 Å. A database of
imino proton distances between
doublets of canonical base pairs was assembled from 3D structures
in the Protein Data Bank (PDB).[57] Distances
were obtained from nonredundant
X-ray structures with a resolution no larger than 2 Å,[58] to which hydrogens were added with
the REDUCE program[59] and then averaged
for each doublet of canonical pairs. For doublets with at least one
GU pair, G/G, G/U, and/or U/U, imino proton distances between adjacent
base pairs were averaged separately, although the 5′G-5′U
and 3′G-3′U distances were averaged together for 5′GG3′/3′UU5′.
A separate set of imino proton distances for each doublet was obtained
from NMR structures. Imino distances between doublets adjacent to
loops, bulges, and helix termini were not considered because the terminal
or closing base pairs may be dynamic.Most average distances
from the X-ray and NMR structures agreed within 0.3 Å (Table ). In X-ray structures,
26 interbase average imino proton distances were <5 Å, thus
allowing “imino proton walks” to identify helices.[60] The exceptions were 5′UA/3′AU,
5′UG/3′AU, 5′UG/3′GU, and G to U in X-ray
structures of 5′CG/3′GU. Thus, helices with these
doublets may not be completely constrained by NMR data. In published
NMR spectra, however, an NOE between H3 of uracils of 5′UA/3′AU
was observed for a human R/G stem-loop[61] and a hairpin stem from Yersinia enterocolitica.[62] In contrast, a 5′UA/3′AU
imino NOE was not observed for the core encapsidation signal RNA from
Moloney MLV,[63] likely because of overlap
between the H3 signals. Similarly for 5′UG/3′AU, NMR
spectra of the human R/G stem-loop[61] and
the loop E region of Spinacia oleracia 5S rRNA[64] have an NOE between either imino proton of the
GU pair and UH3 of the AU pair, but these NOEs were not observed in
the spectrum of r(GUGAAUUUAC)2 reported here. For the 5′CG/3′GU
doublet, an NOE was observed between H3 of U and H1 of the 5′
G in NMR spectra of a mutant of an element of the 3′ UTR of
turnip crinkle virus[65] and a domain of
medaka telomerase CR4/5.[66] Sometimes spin
diffusion will produce imino–imino proton NOEs even if the
imino protons are separated by >5 Å.[60] Thus, complete helical walks are more
likely to be observed than expected for the distances in Table .
Table 1
Averages of Measured Distances between
Imino Protons of Adjacent Base Pairs in X-ray and NMR Structures Obtained
from the PDBa
base pair
doublet
sets of bases
average distance
in X-ray structures (Å)
occurrence
in X-ray structures
average distance
in NMR structures (Å)
occurrence
in NMR structures
5′AA3′
U-U
3.71
17
3.88
49
3′UU5′
5′AG3′
G×U
3.38
1
3.63
12
3′UU5′
U-U
3.85
1
4.20
10
5′AU3′
U×U
3.52
5
3.61
14
3′UA5′
5′AU3′
G-U
N/A
3.67
7
3′UG5′
U×U
3.77
7
5′CA3′
G-U
4.85
41
4.78
60
3′GU5′
5′CG3′
G×G
4.35
35
4.35
30
3′GC5′
5′CG3′
G×G
4.70
10
4.64
18
3′GU5′
G-U
5.48
10
4.84
18
5′CU3′
G×U
3.46
35
3.68
75
3′GA5′
5′CU3′
G-G
4.26
13
4.17
26
3′GG5′
G×U
3.57
13
3.93
26
5′GA3′
G×U
4.48
17
4.57
86
3′CU5′
5′GC3′
G×G
3.66
41
3.71
55
3′CG5′
5′GG3′
G-G
4.02
53
4.03
94
3′CC5′
5′GG3′
G-G
4.32
14
4.21
9
3′CU5′
G×U
4.63
14
4.64
9
5′GG3′
G-G
4.36
1
4.69
2
G×U
4.46
2
4.75
4
3′UU5′
U-U
3.39
1
3.94
2
5′GU3′
G-U
3.34
29
3.57
81
3′CA5′
5′GU3′
G×G
3.86
12
4.06
20
3′CG5′
G-U
3.47
12
3.69
20
5′GU3′
G×G
3.79
1
3.55
4
3′UG5′
G-U
3.75
2
3.65
8
U×U
3.83
1
3.78
4
5′UA3′
U×U
5.45
7
5.18
16
3′AU5′
5′UG3′
G-U
N/A
5.11
3
3′AU5′
U×U
5.54
3
5′UG3′
G×G
5.64
1
5.13
2
G-U
5.91
2
5.68
4
3′GU5′
U×U
5.44
1
5.45
2
5′UU3′
G×U
4.95
1
4.82
6
3′AG5′
U-U
3.64
1
3.83
6
-, Same strand
distance; ×, cross-strand
distance.
-, Same strand
distance; ×, cross-strand
distance.
Spectral Analysis of Duplexes
with GU Pairs
To expand
the database for GU pairs, exchangeable protons of short duplexes
were assigned from NOESY spectra and the nonexchangeable H5 and H6
protons of C and U were assigned
from TOCSY spectra (Tables S2–S13). CH5–H41/42 cross-peaks in NOESY spectra differentiated
C from U in the TOCSY spectra and initiated assignment of proton resonances.
Imino protons were identified as those of GU, GC, or AU pairs based
largely on their positions in the imino fingerprint regions, from
10 to 12 ppm for GH1 and UH3 with a strong GH1/UH3 cross-peak for
GU pairs, from 11 to 13.5 ppm for GH1 in GC pairs, and from 13 to
15 ppm for UH3 in AU pairs.[60]Aromatic
proton assignments were confirmed with sequential aromatic walks.
In A-form RNA, H1′ of a residue forms NOE contacts with its
own H6 or H8 base proton and that of its 3′ base, allowing
sequential assignments.[60] H2 of adenosine
has an intrastrand cross-peak to H1′ of its 3′ residue
and an interstrand cross-peak to H1′ of the residue 3′
of the base to which it is paired. While NMR spectra for all duplexes
used in this study were assigned with similar methods, an example
of the process for r(GAGGAUCUUC)2 is given below. Two others are given
in the Supporting Information.2D
NOESY spectra for r(GAGGAUCUUC)2 were obtained with mixing times of 100
and
400 ms at 5 and 20 °C, respectively. The G3 H1 and U8 H3 protons
were assigned
according to their positions in the imino region of the 100 ms spectrum
(Figure ). G4 H1 was
identified by its cross-peaks to U8 H3 and G3 H1, while G1 H1 displays
a weak resonance due to exchange with water. H3 of U6 was differentiated
from U9 by its cross-peak to G4 H1. H5 to H6 peaks corresponding to
C7 and C10 were differentiated by cross-peaks from the former’s
H41 and H42 to G4H1. Remaining aromatic proton and H1′ assignments
were made with the 400 ms spectrum as described above (Figure ). The H1′ assignments
of U6 and C7 were confirmed by NOE contacts to A5 H2, while those
of G3 and C10 were confirmed by NOE contacts to A2 H2.
Figure 2
2D imino (top) and NOESY
walk (bottom) region spectra of r(GAGGAUCUUC)2. The top spectrum was acquired with
a 100 ms mixing
time at 5 °C, and the bottom spectrum was acquired with a 400
ms mixing time at 25 °C. Both spectra were acquired with a WATERGATE
pulse to suppress water.
2D imino (top) and NOESY
walk (bottom) region spectra of r(GAGGAUCUUC)2. The top spectrum was acquired with
a 100 ms mixing
time at 5 °C, and the bottom spectrum was acquired with a 400
ms mixing time at 25 °C. Both spectra were acquired with a WATERGATE
pulse to suppress water.
Database
NOEs from imino proton resonances provide
identification of intrabase pair resonances for H5 of C and U and
H2 of A.[60] A database of such NMR chemical
shifts was assembled for
78 GU wobble pairs, 292 AU pairs, and 490 GC pairs flanked on both
sides by at least one canonical base pair and categorized according
to the identities and orientations of the flanking canonical pairs.
Most of the structures were 10–40 nt long and contained stem-loops,
while a few were completely double-stranded RNA. Literature data were
mostly obtained from the BMRB.[45]Chemical shifts determined for GH1, UH3, and UH5 protons of GU pairs,
AH2, UH3, and UH5 protons of AU pairs, and GH1 protons of GC pairs
on the basis of oligoribonucleotide spectra reported here were consistent
with existing data. The expanded database showed strong sequence-dependent
patterns, which were used to determine chemical shift ranges that
identify the 5′ to 3′ direction of base pair triplets
centered on GU, AU, and GC pairs (Table ).
Table 2
Chemical Shift Ranges
Used as Direction-Dependent
Constraints for Secondary Structure Predictiona
base pair
AH2/GH1 range
UH3 range
UH5 range
triplet
GU
11.30–11.90
11.60–12.80
5.00–5.75
+RGY
GU
9.70–10.35
11.10–12.00
5.50–6.00
+YGR
GU
10.60–11.20
11.45–12.30
5.20–5.50
+YGY
GU
11.30–11.90
11.60–12.80
–
+RGY
GU
9.70–10.35
11.10–12.00
–
+YGR
GU
10.35–10.60
11.10–12.30
5.20–6.00
–RGY
GU
9.70–10.35
12.00–12.30
5.20–6.00
–RGY
GU
9.70–10.35
11.10–12.00
5.20–5.50
–RGY
GU
10.60–11.20
11.10–11.45
5.20–6.00
–RGY
GU
10.60–11.20
11.45–12.30
5.50–6.00
–RGY
GU
9.70–10.35
12.00–12.30
–
–RGY/–YGY
GU
10.35–10.60
11.10–12.30
–
–RGY/–YGY
GU
10.60–10.80
11.10–12.30
–
–RGY
GU
10.80–11.20
11.10–11.45
–
–RGY
GU
10.80–11.30
11.45–12.30
–
–RGY/–YGR
AU
7.75–8.10
14.00–14.80
4.70–5.20
+RAY
AU
6.30–6.75
12.90–13.40
5.35–5.85
+YAR
AU
6.75–7.25
12.80–13.50
4.70–5.20
+YAY
AU
7.70–8.10
14.00–14.80
–
+RAY
AU
6.30–6.75
12.90–13.40
–
+YAR
AU
–
14.25–14.80
4.70–5.10
+RAY
AU
–
12.80–13.50
4.70–5.20
+YAY
AU
6.30–6.75
12.80–12.90
–
–RAY
AU
6.30–6.75
13.40–13.50
–
–RAY
AU
6.75–7.40
12.80–13.50
–
–RAY
AU
–
12.90–13.50
5.20–5.80
–RAY
GC
11.50–12.10
–
–
–YGY
GC
13.30–13.90
–
–
–YGR
For
each triplet type, R and
Y represent purines (G or A) and pyrimidines (C or U), respectively.
The algorithm incorporates “+” triplets into and excludes
“–” triplets from potential matches.
For
each triplet type, R and
Y represent purines (G or A) and pyrimidines (C or U), respectively.
The algorithm incorporates “+” triplets into and excludes
“–” triplets from potential matches.For GU pairs, a 2D plot of UH3 versus
GH1 chemical shifts (Figure ) revealed well-defined clusters for 5′RGY3′
and 5′YGR3′, but 5′RGR3′ and 5′YGY3′
regions overlap, where R and Y represent purines (G or A) and pyrimidines
(C or U), respectively. A 2D plot of UH5 versus GH1 chemical shifts
contained well-defined clusters corresponding
to 5′RGY3′, 5′YGR3′, and 5′YGY3′,
but not 5′RGR3′ due to overlap with the 5′YGR3′
region (Figure ).
Thus, if triplets have a central GU pair, then GH1, UH3, and UH5 chemical
shifts often reveal the direction of the helix (Figures and 4).
Figure 3
Chemical shift
patterns for GU pairs. UH3 vs GH1 (top), UH5 vs
UH3 (middle), and UH5 vs GH1 (bottom) colored according to triplet
type. For each triplet type, R is purine (A or G) and Y is pyrimidine
(C or U). Circled points represent chemical shifts not previously
reported in the literature. Colored boxes are chemical shift ranges
used as direction-dependent constraints for triplets with the same
color points. Apparent overlaps in these plots are resolved by including
a third chemical shift (Figure ).
Figure 4
Chemical shift patterns for GU pairs. UH5 vs
UH3 vs GH1 chemical
shifts colored according to triplet type.
Chemical shift
patterns for GU pairs. UH3 vs GH1 (top), UH5 vs
UH3 (middle), and UH5 vs GH1 (bottom) colored according to triplet
type. For each triplet type, R is purine (A or G) and Y is pyrimidine
(C or U). Circled points represent chemical shifts not previously
reported in the literature. Colored boxes are chemical shift ranges
used as direction-dependent constraints for triplets with the same
color points. Apparent overlaps in these plots are resolved by including
a third chemical shift (Figure ).Chemical shift patterns for GU pairs. UH5 vs
UH3 vs GH1 chemical
shifts colored according to triplet type.For AU pairs, 2D plots of AH2 versus UH3 chemical shifts
(Figure ) revealed
defined clusters corresponding to 5′RAY3′
and 5′YAR3′, but the 5′RAR3′ region overlaps
with 5′YAY3′. A 2D plot of UH3 versus UH5 chemical shifts
contained well-defined clusters corresponding
to 5′RAY3′ and 5′YAY3′, but the 5′RAR3′
region overlaps with 5′YAR3′. Thus, if triplets have
a central AU pair, then UH3, AH2, and UH5 chemical shifts often reveal
the direction of the helix (Figures and 6).
Figure 5
Chemical shift patterns
for AU pairs. AH2 vs UH3 (top), UH5 vs
UH3 (middle), and UH5 vs AH2 (bottom) colored according to triplet
type. For each triplet type, R is purine (A or G) and Y is pyrimidine
(C or U). Circled points represent chemical shifts not previously
reported in the literature. Colored boxes are chemical shift ranges
used as direction-dependent constraints for triplets with the same
color points. Apparent overlaps in these plots are resolved by including
a third chemical shift (Figure ).
Figure 6
Chemical shift patterns for AU pairs. UH5 vs
UH3 vs AH2 chemical
shifts colored according to triplet type.
Chemical shift patterns
for AU pairs. AH2 vs UH3 (top), UH5 vs
UH3 (middle), and UH5 vs AH2 (bottom) colored according to triplet
type. For each triplet type, R is purine (A or G) and Y is pyrimidine
(C or U). Circled points represent chemical shifts not previously
reported in the literature. Colored boxes are chemical shift ranges
used as direction-dependent constraints for triplets with the same
color points. Apparent overlaps in these plots are resolved by including
a third chemical shift (Figure ).Chemical shift patterns for AU pairs. UH5 vs
UH3 vs AH2 chemical
shifts colored according to triplet type.For GC pairs flanked by canonical base pairs,
a plot of the distribution of GH1 chemical shifts (Figure ) showed that no GC pairs in
the 5′RGY3′ direction have a GH1 chemical shift below
12.40 ppm, whereas no GC pairs in the 5′YGR3′ direction
have a chemical shift above 13.10 ppm. Thus, if triplets have a central
GC pair, then a GH1 chemical shift below 12.40 ppm eliminates the
5′RGY3′ direction and a shift above 13.10 ppm eliminates
the 5′YGR3′ direction.
Figure 7
Distribution of GH1 chemical shifts for
GC pairs colored according
to triplet type. For each triplet type, R is purine (A or G) and Y
is pyrimidine (C or U). No GH1 chemical shifts from 11.31 to 11.50
ppm were found for GC pairs.
Distribution of GH1 chemical shifts for
GC pairs colored according
to triplet type. For each triplet type, R is purine (A or G) and Y
is pyrimidine (C or U). No GH1 chemical shifts from 11.31 to 11.50
ppm were found for GC pairs.
NAPSS-CS Algorithm
The NAPSS-CS program identifies
helices from a sequence by applying NMR constraints to a dynamic programming
algorithm (Figure ). This algorithm generates a free energy dot plot for a given sequence
and recursively searches it for helices or sets of helices that fully
match helical walk constraints from NMR. As in the previous version,[39] NAPSS-CS considers imino walks across single-nucleotide
bulges and also coaxial stacks if the helices are not in separate
pseudoknots. In NAPSS-CS, experimental chemical shifts for GH1, UH3,
and UH5 in GU pairs, AH2, UH3, and UH5 in AU pairs, and GH1 in GC
pairs can be input, which may define the orientation of base pairs
in helices (Table ).
Figure 8
Flow diagram of the NAPSS-CS algorithm.
Flow diagram of the NAPSS-CS algorithm.To reduce the search space, NAPSS-CS looks for
the dot plot with the minimal number of pairs needed to find at least
one complete match to all the helices revealed by NMR. Starting with
the dot plot with pairs in the predicted most stable structure based
only on thermodynamics, the number of pairs in the dot plot is increased
by allowing base pairs found in structures generated at less favorable
free energy changes. The allowed free energy difference between the
predicted most favorable free energy change and that used as a cutoff
for finding new potential base pairs is gradually increased in 1%
increments until there is at least one match for all of the helices
revealed by NMR.Chemical exchange with water may prevent detection
of imino protons
of terminal base pairs and/or NOEs from those protons to imino protons
of adjacent base pairs from being identified in NMR spectra and included
in imino walk constraints. Therefore, the program extends matched
helices to canonical base pairs that can stack on the matched helices.
For every extended match set, NAPSS-CS forbids alternative pairings
for the nucleotides in the set, uses a dynamic programming algorithm
to fold the RNA into a secondary structure by free energy minimization,
and then re-forms the base pairs from the match set if they have not
already been re-formed by the dynamic programming algorithm. Because
nucleotides are not allowed in matched base pairs to base pair to
any nucleotide other than its matched partner, helices that are pseudoknotted
once the matched base pairs are added are allowed to form.[67] Similar to the ShapeKnots algorithm,[38] NAPSS-CS fills in single or tandem mismatches
and removes isolated pairs. The latter eliminates possible nonsensical
pseudoknots. NAPSS-CS then calculates the folding free energy for
every structure with the extended matches still intact. The free energy
change for nonpseudoknotted secondary structures is calculated with
the INN-HB model.[68,69] Free energy changes for structures
with pseudoknots are calculated
by adding three energy terms: (a) the free energy change of the structure
with the pseudoknot removed (maximizing the number of remaining base
pairs)[70] using the INN-HB model, (b) the
energy of the helix(es) that was removed when breaking the pseudoknot,
calculated with the INN-HB model (without intermolecular initiation
or any loop free energy changes), and (c) a pseudoknot energy penalty
introduced in the ShapeKnots algorithm.[38,71]If available,
additional constraints or restraints can be used
to further reduce the folding space. These include restraints from
SHAPE and other mapping experiments or pairing constraints that can
be derived, for example, from sequence comparison.
Application
of NAPSS-CS to HAR1 and B. mori R2 Retrotransposon
RNAs Illustrates the Method
HAR1 is
a rapidly evolving noncoding RNA discovered in the brains of chimpanzees
and humans.[8] Two secondary structures were
initially proposed
for HAR1, but NMR spectra[10] confirmed the
secondary structure proposed by Beniaminov et al.[9] (Figure ). To expand on published NMR spectra, new spectra were measured
for HAR1 (Figure and Figures S3–S5). NOESY and 15N–1H HSQC spectra (Figure and Figures S3 and S4)
acquired at 25 °C revealed 17 G and 8 U imino peaks, corresponding
to 15 GC pairs, 6 AU pairs, and 2 GU pairs. An additional AU and GU
pair was identified in a NOESY spectrum acquired at 15 °C (Figure S5). These data are consistent with seven
imino proton walks (Figures and 10 and Figures S3–S5). The imino proton chemical shift data from this
work (Table S14) agree well with those
reported by Ziegeler et al.[10] Imino proton
resonances not identified in this work that Ziegeler et al.[10] assigned only in fragments of the full-length
construct are for G1, G48, and U109. These resonances were likely
difficult to assign in the full-length construct because they exchange
with solvent protons in solvent-accessible environments: G1 and U109
are located at the end of helices, and G48 is in a GA pair. Imino
proton resonances that Ziegeler et al.[10] did not assign in the full-length construct but were identified
in this work are for G19, G30, G44, G49, G51, U79, U80, and G104 (Table S14). On the basis of the secondary structure
shown in Figure ,
P2 and P4 were missing expected imino signals and the two pairs assigned
to P4 could be observed only at a reduced temperature (15 °C)
and a reduced mixing time (60 ms). These observations suggest that
P2 and P4 may be unstable and/or
dynamic. Constraints used for HAR1 are listed in Table .
Figure 9
Secondary structure of
human HAR1. Colored base pairs correspond
to imino proton walks identified from NMR spectra (Table ). Residue numbering in the
structure reported in this work is shifted by three residues relative
to the consensus sequence.[9]
Figure 10
NMR spectra of human HAR1 showing imino proton walks for
helix
P1 (see Figure for
the secondary structure). The top spectrum is a 1H–1H NOESY spectrum acquired at 25 °C with a 125 ms mixing
time. Colored lines depict imino proton walks for helix P1 and correspond
to base pairs in Figure with the same colors. Colored dots represent base pair type (red
for GC and blue for AU). Sequences of dots for imino proton walks
are in the bottom right of the spectrum. Blue and green boxes around
sequences of dots correspond to blue and green base pairs in Figure . The bottom spectrum
is a 15N–1H HSQC spectrum acquired at
25 °C.
Table 3
NMR Constraints
Used To Predict the
Structure of HAR1a
Colors correspond to those in the
secondary structure of Figure and the imino proton walks shown in Figure . Chemical shifts are submitted to NAPSS-CS
as shown on the left, and base pairs predicted by NAPSS-CS are shown
on the right. Integers 5, 6, and 7 represent AU, GC, and GU pairs,
respectively. For GU pairs, numbers in parentheses are chemical shifts
of GH1, UH3, and UH5, respectively. For AU pairs, numbers in parentheses
are chemical shifts of AH2, UH3, and UH5, respectively. For GC pairs,
the first number in parentheses is the chemical shift of GH1. For
GU and AU pairs, zero means an unavailable chemical shift. For GC
pairs, the second two zeros are placeholders for absent chemical shift
constraints. Chemical shifts (Table S14) are from ref (10) and this work.
Secondary structure of
human HAR1. Colored base pairs correspond
to imino proton walks identified from NMR spectra (Table ). Residue numbering in the
structure reported in this work is shifted by three residues relative
to the consensus sequence.[9]NMR spectra of human HAR1 showing imino proton walks for
helix
P1 (see Figure for
the secondary structure). The top spectrum is a 1H–1H NOESY spectrum acquired at 25 °C with a 125 ms mixing
time. Colored lines depict imino proton walks for helix P1 and correspond
to base pairs in Figure with the same colors. Colored dots represent base pair type (red
for GC and blue for AU). Sequences of dots for imino proton walks
are in the bottom right of the spectrum. Blue and green boxes around
sequences of dots correspond to blue and green base pairs in Figure . The bottom spectrum
is a 15N–1H HSQC spectrum acquired at
25 °C.Colors correspond to those in the
secondary structure of Figure and the imino proton walks shown in Figure . Chemical shifts are submitted to NAPSS-CS
as shown on the left, and base pairs predicted by NAPSS-CS are shown
on the right. Integers 5, 6, and 7 represent AU, GC, and GU pairs,
respectively. For GU pairs, numbers in parentheses are chemical shifts
of GH1, UH3, and UH5, respectively. For AU pairs, numbers in parentheses
are chemical shifts of AH2, UH3, and UH5, respectively. For GC pairs,
the first number in parentheses is the chemical shift of GH1. For
GU and AU pairs, zero means an unavailable chemical shift. For GC
pairs, the second two zeros are placeholders for absent chemical shift
constraints. Chemical shifts (Table S14) are from ref (10) and this work.The 75 nt B. mori R2 retrotransposon pseudoknot
(Figure ) has NMR spectra that agree well with
those of a 74 nt construct (Figure ) originally
used to identify the same pseudoknot. NOESY and 15N–1H HSQC spectra (Figures S6–S9) acquired at 25 °C revealed imino resonances
corresponding
to 20 GC pairs, 4 AU pairs, and 1 GU pair. These data provided six
imino proton walks (Figure and Figures S6–S9). One
walk is new because G32 is able to pair with C75 in the 75-mer (Figure and Figure S8). Unlike in the spectra of the 74 nt
construct,[39] an NOE from G13 to G27 was
not observed in the spectrum of the 75 nt construct, possibly because
its sample concentration was lower than for the 74 nt construct. This
reduced a previous walk of 5 bp to two separate walks of 3 and 2 bp.
Constraints used for the 75-mer B. mori R2 pseudoknot
are provided in Table S17. While not used
as a constraint, an imino proton resonance at 12.31 ppm was assigned
to G9 in the 75 nt construct, consistent with formation of an imino
G9-A30 pair.
Figure 11
Pseudoknotted structures used to benchmark NAPSS-CS. Colored
base
pairs correspond to imino proton walks identified from literature
NMR spectra (Tables S16–S24).
Pseudoknotted structures used to benchmark NAPSS-CS. Colored
base
pairs correspond to imino proton walks identified from literature
NMR spectra (Tables S16–S24).
Accuracy of Structure Prediction
The NAPSS-CS algorithm
was tested on a total of 18 structures, nine of which contain pseudoknots
(Figures and 11 and Figure S10). NMR
constraints for these sequences are provided in Table and Tables S16–S32. The average sensitivity and PPV were calculated over all 18 structures.
A one-tailed, paired t test was performed to test
the significance of the improvement in sensitivities
and PPVs from constrained NAPSS-CS as compared to those from three
other algorithms without restraints (Table ). The null hypothesis (H0), stating that NAPSS-CS is not more accurate than an
alternative program, was tested. Constrained NAPSS-CS was found to
have a sensitivity (p < 0.05) significantly higher
than those of Fold,[20] ProbKnot,[72] and ShapeKnots,[38] and a PPV significantly higher than that of
ProbKnot (Table ).
Table 4
Prediction Accuracies (in percentage
of known base pairs)a
Saccharomyces cerevisiae group II intron Sc.ai5γ domain 1 κ–ζ MBL[96]
49
16
88
100
88
78
88
100
100
100
TRSV
adenine-dependent hairpin
ribozyme[97]
80
23
100
77
100
77
100
77
100
77
B. mori R2 retrotransposon
PK (74 nt)[39]
74
24
58
58
42
45
58
58
83
91
B. mori R2 retrotransposon
PK (75 nt)
75
25
56
58
40
45
56
58
88
79
E. coli tmRNA PK[87]
31
10
0
0
30
30
100
91
100
91
human HDV ribozyme
PK[88]
63
21
19
21
19
20
19
21
76
73
MMTV-modified
frameshifting
PK[90]
34
11
45
71
45
71
100
92
100
92
PEMV
RNA1 PK2
33
8
63
83
63
83
63
83
100
80
Streptococcus
pneumoniae preQ1-II riboswitch[89]
59
19
63
100
63
100
63
100
100
100
SRV-1
mutant frameshifting
PK[91]
41
12
50
75
83
100
100
100
100
100
SRV-1
wild-type frameshifting
PK[91]
41
12
0
0
0
0
0
0
100
100
average,
all structures
68
74
70
71
79
81
96
92
paired t test p values
0.001
0.017
0.001
0.003
0.009
0.046
Only NAPSS-CS
was constrained
by experimental data. Pseudoknotted structures are denoted by “PK”.
Only NAPSS-CS
was constrained
by experimental data. Pseudoknotted structures are denoted by “PK”.For structures with pseudoknots,
the sensitivity
and PPV for pseudoknotted base pairs were calculated separately with eqs and 3. Pseudoknotted base pairs are
those that form between single-stranded and loop bases and close the
loop involved in formation of those base pairs (see Methods). Pseudoknotted base pairs were predicted considerably
better with the constrained NAPSS-CS algorithm than with unrestrained
programs (Table ).
The average pseudoknot sensitivity and PPV for NAPSS-CS are 95 and
88%, respectively, compared to ≤33% for each of the other programs
when unrestrained. Six of the nine sequences were predicted to be
pseudoknotted only by constrained NAPSS-CS.
Table 5
Prediction
Accuracies (in percentage
of pseudoknotted base pairs) for Pseudoknotsa
ProbKnot[72]
ShapeKnots[38]
NAPSS-CS
structure
length (nt)
no.
of PK
base pairs in accepted structure
PK Sens
PK PPV
PK Sens
PK PPV
PK Sens
PK PPV
B. mori R2 retrotransposon
PK (74 nt)[39]
74
14
0
0
0
0
79
85
B. mori R2 retrotransposon
PK (75 nt)
75
15
0
0
0
0
87
77
E. coli tmRNA PK[87]
31
10
0
0
100
91
100
91
human HDV ribozyme
PK[88]
63
14
0
0
0
0
86
67
MMTV-modified frameshifting
PK[90]
34
11
0
0
100
92
100
92
PEMV RNA1 PK2
33
8
0
0
0
0
100
80
S. pneumoniae preQ1-II riboswitch[89]
59
15
0
0
0
0
100
100
SRV-1 mutant frameshifting
PK[91]
41
12
83
100
100
100
100
100
SRV-1 wild-type
frameshifting
PK[91]
41
12
0
0
0
0
100
100
average, all structures
9
11
33
31
95
88
Only NAPSS-CS was constrained by
experimental data. The pseudoknot sensitivity and PPV were not calculated
for structures predicted with Fold12 because it does not allow pseudoknots.
The pseudoknot sensitivity and PPV were calculated according to the
method of Hadjin et al.[38] If predicted
structures have pseudoknots, then the sensitivity and PPV are calculated
with eqs and 3 only for base pairs involved in the pseudoknots.
If the predicted structure contains no pseudoknot, then the sensitivity
and PPV are defined as 0%. For structures predicted with Fold, the
pseudoknot sensitivity and PPV are 0% because it does not allow prediction
of pseudoknots.
Only NAPSS-CS was constrained by
experimental data. The pseudoknot sensitivity and PPV were not calculated
for structures predicted with Fold12 because it does not allow pseudoknots.
The pseudoknot sensitivity and PPV were calculated according to the
method of Hadjin et al.[38] If predicted
structures have pseudoknots, then the sensitivity and PPV are calculated
with eqs and 3 only for base pairs involved in the pseudoknots.
If the predicted structure contains no pseudoknot, then the sensitivity
and PPV are defined as 0%. For structures predicted with Fold, the
pseudoknot sensitivity and PPV are 0% because it does not allow prediction
of pseudoknots.Both SHAPE[73] and NMR[39] data are
only available for
the 74 nt B. mori R2 retrotransposon pseudoknot.
When two strong and two moderate SHAPE hits[73] were used as restraints with reactivities of 0.8 and 0.4, respectively,
ShapeKnots predicted a structure with 58% sensitivity and PPV for
all base pairs, but with no pseudoknotted base pairs. In contrast,
NAPSS-CS with only NMR constraints predicted a structure with a sensitivity
and PPV of 83 and 91% of all base pairs (Table ) and 79 and 85% of pseudoknotted base pairs
(Table ), respectively.Aside from the 18 structures whose prediction accuracies are reported,
NAPSS-CS failed to give results for two additional structures (Figure S11). The Kluyveromyces lactis telomerase RNA pseudoknot[74] contains
a helical walk across a single base pair (C22-G36), which is not allowed
by the dynamic programming algorithm. The MLV recording signal pseudoknot[75] has a helical walk across coaxially stacked
pseudoknots, which is not supported by NAPSS-CS.[39] Aside from the two structures, little evidence of walks
across these arrangements of base pairs exists in the literature.[39] For that reason, NAPSS-CS is designed not to
support these structures.
Discussion
High-throughput
sequencing methods reveal many new RNAs with unknown
structure.[76−80] Determination of secondary structure usually involves sequence comparison[7,11−13] and/or a combination of chemical
mapping and free energy minimization.[20,36,38,81,82] Chemical mapping limits folding space by revealing nucleotides that
are not in Watson-Crick base pairs. In contrast, NMR Assisted Prediction
of Secondary Structure limits folding space by revealing nucleotides
in canonical base pairs.[39] The results
presented here show that adding chemical shifts for imino, UH5, and
AH2 protons can further reduce folding space by indicating the 5′
to 3′ direction of base pairs in helices.Pseudoknots
are an important motif because they usually indicate
functional significance. Pseudoknots, however, are particularly difficult
to prove, partially because of the limited knowledge of the sequence
dependence of thermodynamics.[34,35] The results in Figure and Table show that constraints from
NMR dramatically improve evidence for and predictions of pseudoknots.
Similar improvement has been shown for 13 sequences when ShapeKnots
is restrained with SHAPE data.[38]
Figure 12
Average accuracy
of prediction of RNA secondary structure with
NAPSS-CS constrained by NMR data compared with Fold,[20] ProbKnot,[72] and ShapeKnots[38] when the last three are not restrained by experimental
data. Accuracy was measured as sensitivity and positive predictive
value (PPV). On the left are shown the average sensitivity and PPV
for canonical base pairs of structures in a database of
nine nonpseudoknotted and nine pseudoknotted structures (Table ). On the right
are shown the average sensitivity and PPV for predicting the presence
of a pseudoknot (Table ). Pseudoknot sensitivity and PPV were not calculated
for structures predicted with Fold because it does not allow pseudoknots.
Average accuracy
of prediction of RNA secondary structure with
NAPSS-CS constrained by NMR data compared with Fold,[20] ProbKnot,[72] and ShapeKnots[38] when the last three are not restrained by experimental
data. Accuracy was measured as sensitivity and positive predictive
value (PPV). On the left are shown the average sensitivity and PPV
for canonical base pairs of structures in a database of
nine nonpseudoknotted and nine pseudoknotted structures (Table ). On the right
are shown the average sensitivity and PPV for predicting the presence
of a pseudoknot (Table ). Pseudoknot sensitivity and PPV were not calculated
for structures predicted with Fold because it does not allow pseudoknots.An NMR and biochemical study of
an adenine riboswitch revealed
that temperature-independent function required two slowly exchanging
secondary structures when ligand was not bound.[6]In vivo chemical mapping of Arabidopsis
thaliana seedlings suggests that stress-expressed RNAs may
have multiple structures.[80] Even an RNA
duplex of 22 nucleotides has been shown by NMR to have two slowly
exchanging secondary structures.[83,84] When secondary
structures are slowly exchanging so that imino cross-peaks are observed
for the different conformations, the NAPSS-CS algorithm may facilitate
determination of individual secondary structures.Compared to
proteins, relatively few 3D structures are known for
RNA despite the importance of 3D structure for deducing structure–function
relationships. In addition to providing base pairing information,
NAPSS-CS provides assignments for many imino proton, UH5, and AH2
resonances for bases in canonical base pairs. Assignment of UH5 also
allows rapid assignment of UH6.[60] These
assignments then permit assignment of other nonexchangeable resonances
and thus for many assignments required to determine 3D structure.The approach presented here should become more important as the
RNA database of NMR spectra, structures, and dynamics grows. For example,
additional patterns for chemical shifts and NOE connections may be
found and incorporated into the NAPSS-CS algorithm to improve accuracy.
Conversely, local environments may produce exceptions to the constraints
currently used. A recent apparent exception is the U5 primer binding
site in Moloney MLV,[85] where a 5′RGY3′
triplet is flanked by a GU pair on both sides. As more data become
available, NMR restraints rather than constraints could be used with
chemical mapping data to further improve the overall prediction accuracy.[38] The existing database for pseudoknots, however,
is too small to train well the parameters required for an approach
using restraints.Because RNAs are often difficult to crystallize
and packing interactions
can affect structure,[86] it is likely that
NMR will continue to be an important method for determining RNA structure.
The approach can also be applied to other self-assembling polymers
that would generate NOESY walks and have known thermodynamics. This
includes DNA and a wide variety of nucleic acid mimics, many of which
cannot be interrogated by all chemical mapping methods because backbones
and/or base equivalents may not be natural.[46−49]
Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971
Authors: Xiaobing Zuo; Jinbu Wang; Ping Yu; Dan Eyler; Huan Xu; Mary R Starich; David M Tiede; Anne E Simon; Wojciech Kasprzak; Charles D Schwieters; Bruce A Shapiro; Yun-Xing Wang Journal: Proc Natl Acad Sci U S A Date: 2010-01-07 Impact factor: 11.205
Authors: Darian D Cash; Osnat Cohen-Zontag; Nak-Kyoon Kim; Kinneret Shefer; Yogev Brown; Nikolai B Ulyanov; Yehuda Tzfati; Juli Feigon Journal: Proc Natl Acad Sci U S A Date: 2013-06-17 Impact factor: 11.205
Authors: Stephan H Bernhart; Ivo L Hofacker; Sebastian Will; Andreas R Gruber; Peter F Stadler Journal: BMC Bioinformatics Date: 2008-11-11 Impact factor: 3.169
Authors: Jonathan L Chen; Damian M VanEtten; Matthew A Fountain; Ilyas Yildirim; Matthew D Disney Journal: Biochemistry Date: 2017-06-29 Impact factor: 3.162
Authors: Kyle D Berger; Scott D Kennedy; Susan J Schroeder; Brent M Znosko; Hongying Sun; David H Mathews; Douglas H Turner Journal: Biochemistry Date: 2018-03-23 Impact factor: 3.162