Literature DB >> 32917774

Chemical shifts-based similarity restraints improve accuracy of RNA structures determined via NMR.

Abstract

Determination of structure of RNA via NMR is complicated in large part by the lack of a precise parameterization linking the observed chemical shifts to the underlying geometric parameters. In contrast to proteins, where numerous high-resolution crystal structures serve as coordinate templates for this mapping, such models are rarely available for smaller oligonucleotides accessible via NMR, or they exhibit crystal packing and counter-ion binding artifacts that prevent their use for the chemical shifts analysis. On the other hand, NMR-determined structures of RNA often are not solved at the density of restraints required to precisely define the variable degrees of freedom. In this study we sidestep the problems of direct parameterization of the RNA chemical shifts/structure relationship and examine the effects of imposing local fragmental coordinate similarity restraints based on similarities of the experimental secondary ribose 13C/1H chemical shifts instead. The effect of such chemical shift similarity (CSS) restraints on the structural accuracy is assessed via residual dipolar coupling (RDC)-based cross-validation. Improvements in the coordinate accuracy are observed for all of the six RNA constructs considered here as test cases, which argues for routine inclusion of these terms during NMR-based oligonucleotide structure determination. Such accuracy improvements are expected to facilitate derivation of the chemical shift/structure relationships for RNA.

Entities: Chemical

Keywords: RNA structure; accuracy; chemical shifts; cross-validation

Year: 2020 PMID： 32917774 PMCID： PMC7668244 DOI： 10.1261/rna.074617.119

Source DB: PubMed Journal: RNA ISSN： 1355-8382 Impact factor: 4.942

INTRODUCTION

Biomolecular NMR plays an important role in structural studies of oligonucleotides, accounting for ∼40% of RNA-only models in the Protein Data Bank (PDB; Berman et al. 2000). However, determination of RNA structure via NMR presents a number of challenges compared to proteins. One of major contributors is the large number of variable torsion angles (α, β, γ, δ, ε, ζ, χ) per nucleotide compared to the (ϕ, ψ) pair for each amino acid. Another is that commonly used interproton distance restraints are less effective for the often elongated oligonucleotide geometries. In protein NMR, structure definition challenges were ameliorated via tight restraints on backbone dihedral angle from the experimental chemical shifts, in combination with precise orientational restraints such as residual dipolar couplings (RDCs; Tolman et al. 1995; Tjandra and Bax 1997). This approach led to the increase in coordinate accuracy while also extending applicability to larger proteins (Raman et al. 2010). The caveat of the procedure is the necessity of establishing a precise relationship between the experimental chemical shifts and three-dimensional coordinates. This challenge has been addressed via knowledge-based analyses of chemical shifts from proximal sites, taking into account both structural and primary sequence similarities, relying on extensive databases of high-resolution protein crystal structures and curated complete sets of 1H/13C/15N chemical shifts. Modern software tools are capable of backbone torsion angle prediction with precision and accuracy approaching 10° (Berjanski et al. 2006; Cheung et al. 2010; Shen and Bax 2013; Hafsa et al. 2015), with such restraints now nearly universal for NMR protein structure determination. As chemical shifts represent one of the most readily accessible and precisely defined NMR observables, similar developments would be advantageous for oligonucleotides. Coordinate-based prediction of the RNA/DNA 1H, 13C and 15N chemical shifts is indeed achievable via quantum chemistry tools (Fonville et al. 2012; Sahakyan and Vendruscolo 2013; Suardiaz et al. 2013; Swails et al. 2015; Jin et al. 2016). In complement, a number of semiempirical approaches have been developed using additive contributions of ring currents, local magnetic anisotropies, and induced electric fields (Wijmenga et al. 1997; Sahakyan and Vendruscolo 2013; Suardiaz et al. 2013), with applications for both validation and refinement of RNA structures (Frank et al. 2013a; van der Werf et al. 2013; Sripakdeevong et al. 2014). Prediction of the RNA 1H and 13C chemical shifts from structure is also possible via machine learning empirical methods (Frank et al. 2013b, 2014), with model validation applications. Increases in the amount of RNA NMR data also led to development of empirical chemical shift prediction tools based solely on the primary sequence and the secondary structure (Barton et al. 2013; Brown et al. 2015), finding use for validation and automation of NMR resonance assignments (Aeschbacher et al. 2013).

Chemical shifts remain underutilized in NMR structural studies of RNA

In light of the considerable body of work aimed at interpretation of the nucleic acids’ chemical shifts, it could seem surprising that they were almost never used quantitatively for determining RNA structure. To a degree, the challenges of inverting the structure/chemical shift relationship for RNA reflect high density of aromatic rings in nucleic acids and widespread occurrence of chemical shift referencing imperfections (Aeschbacher et al. 2012). Ultimately, the main challenge is severe shortage of accurate structural models that could be used for chemical shifts analysis. Out of approximately 240 RNA-only chemical shift data sets in the Biological Magnetic Resonance Bank (BMRB) (Ulrich et al. 2008), fewer than 10 can be associated with crystal structures in the PDB, in result limiting the analysis to models determined via solution NMR. However, even though approximately 200 such correspondences can be established from the BMRB and PDB databases, a majority of those do not exhibit restraint density sufficient for precise specification of the variable torsion angles, with average respective restraint uncertainties of ∼30°. In result, empirical RNA structure-based chemical shift parameterizations so far have been optimized from sparse sets of only 20–30 structures (Cromsigt et al. 2001; Frank et al. 2013b, 2014), compared to the case of proteins with primary databases of up to 600 crystal structures and booster data sets containing approximately 9500 additional models (Shen and Bax 2010). In consequence, limited accuracies of modeling both the ring current effects for the 1H shifts and the torsion angle dependencies of the 13C and 31P chemical shifts have been noted as factors adversely impacting their use for structural analysis (Frank et al. 2013a; Brown et al. 2015; Swails et al. 2015).

Information contained in the NMR chemical shifts of RNA can be processed directly if converted to structural similarity restraints

Considering the challenges of linking NMR chemical shifts with RNA structure, increasing coordinate accuracy of the NMR-determined RNA models and, thus expanding the set of reliable chemical shift/structure pairs, is an issue of utmost importance. Accuracy of the NMR-determined structures can be readily assessed via RDCs due to their steep dependence on the orientations of the respective internuclear vectors (Simon et al. 2005). For the structures refined against the RDC data, cross-validation statistics averaged over subsets of RDCs excluded from the refinement (Clore and Garrett 1999) can be monitored via free Q- or R-factor metrics, analogous to the free R-factors used in crystallography (Brünger 1992), allowing reliable discrimination between closely related models (Chen and Tjandra 2011). We explore an idea of directly exploiting patterns of chemical shifts similarities to improve RNA structural accuracy independently of any preestablished structure/chemical shifts relationships. Specifically, we refine RNA structures applying pseudoenergy terms that enforce agreements between the internal coordinates of local fragments exhibiting similar 13C/1H ribose chemical shifts, in essence lowering the number of model's degrees of freedom according to the observed chemical shifts. The accuracy of the resulting models can then be gauged against reference calculations in which such terms are absent. The notion of a link between the similarity of the oligonucleotide chemical shifts and the related structural parameters was originally expressed in the NMR analysis of the Dickerson dodecamer B-DNA (Tjandra et al. 2000), and subsequently used in a study of an A-form RNA helix (PDB deposition 2GBH; O'Neil-Cabello et al. 2004). This logic is also the driving force behind TALOS programs for protein chemical shifts-based torsion angle prediction (Cornilescu et al. 1999). Noncrystallographic symmetry (NCS) restraint terms that can enforce fragmental similarities are commonly used for NMR refinement of multimeric assemblies. However, they can be configured in a more flexible manner after removal of source code safeguards restricting them to matching primary sequences. We set out to determine whether introduction of the chemical shift-based fragment similarity restraints produces consistent favorable effects on the RNA structural accuracy. As this approach does not involve direct refinement against chemical shifts data, the resulting models would remain perfectly appropriate for subsequent derivations of the chemical shifts/structure relationships. To establish a set of test cases, we analyzed all of the approximately 90 NMR-determined structures of RNA in the PDB that include ribose C–H RDCs among the deposited restraints, selecting those associated with close-to-complete sets of 13C and 1H ribose chemical shifts in the BMRB. Our search yielded six systems: 36-nucleotide (nt) stem–loop SL1 of HIV-1 (PDB code 1N8X; Lawrence et al. 2003), 23-nt stem–loop from Rous sarcoma virus (PDB code 1S34; Cabello-Villegas et al. 2004), 32-nt stem–loop U6 from S. cerevisiae (PDB code 1XHP; Sashital et al. 2004), 14-nt cUUCGg tetraloop from 16S rRNA (PDB code 2KOC; Nozinovic et al. 2009), 20-nt U2 snRNA stem I from S. cerevisiae (PDB code 2O33; Sashital et al. 2007), and the complex between the 16-nt HIV TAR RNA and a 16-nt aptamer (PDB code 2RN1; Van Melckebeke et al. 2008). These cases represent typical NMR-accessible size ranges while also sampling RDC restraint densities from 0.7 RDCs/nt for 1N8X to 4.5 RDCs/nt for 2RN1, aiming to cover the majority of modern NMR-determined structures of RNA. Structure refinement against the chemical shift data as done here is applicable to the majority of NMR studies of RNA and is thus expected to become a useful addition to the RNA structural biology toolbox.

RESULTS AND DISCUSSION

Analysis of the database-deposited RNA chemical shifts reveals nucleotide type dependencies and allows corrections for referencing and assignment imperfections

Chemical shifts-based structure analysis commonly involves corrections for the residue type dependence, assuming additive contributions of the chemical bonding and the tertiary structure effects. Such “secondary” chemical shifts are naturally formulated for RNA setting the canonical A-form as the reference state. Therefore, we have determined the average 1H/13C ribose chemical shift values for each of the four common RNA nucleotide types from the BMRB-deposited data. Our analysis was restricted to the A-form Watson–Crick paired bases similarly flanked on both sides. In agreement with earlier assessments (Aeschbacher et al. 2012), we find ∼40% of the RNA-only chemical shift data sets in the BMRB misreferenced in the carbon dimension. In line with earlier recommendations, we have corrected the 13C data using the 5′-GG and 3′-CC ribose marker chemical shift patterns when such corrections were consistent within 10%. We have also swapped the 2′/3′ assignments for a small fraction of sites that exhibited C2′/C3′ resonance flips. For the RNA H5′/H5″ chemical shifts reported in the BMRB, only 30% correspond to the resolved stereo-specifically assigned pairs. This subset forms two clusters in the 3.9–4.7 parts per million (p.p.m.) range, indicating possible stereo-misassignments (Supplemental Fig. S1). Approximately two thirds of such resonance pairs with 1H shifts differing by more than 0.35 p.p.m. constitute the dominant downfield/upfield H5′/H5″ cluster. In light of this likely stereo-ambiguity, the analyses were carried out for the downfield and upfield 5′ proton signals instead. Chemical shifts data for the A/G/C/U nucleotides selected according to the above criteria were further subjected to two cycles of outlier removal outside of three standard deviations. The resulting accumulated statistics are reported in Table 1. As expected, substantial nucleotide type dependence is exhibited by the 1′ sites, with approximately 1 p.p.m. and 0.3 p.p.m differences between the C1′ and H1′ shifts in purines versus pyrimidines.

TABLE 1.

Average chemical shifts in p.p.m. for the A-form A/G/C/U nucleotides

Patterns of the C5′ chemical shifts support differences in strengths of the purine and the pyrimidine C–H… O5′ hydrogen bonds in A-RNA

Dependence of the C/H shifts on the nucleotide type gradually decreases from the 1′ to the 4′ sites, consistent with the increasing distances to the nucleobases and the diminishing chemical bonding and ring current effects. Surprisingly, C5′ resonances exhibit purine/pyrimidine differences nearly equaling those observed for the C1′ sites, while H5′/H5″ chemical shifts show little nucleotide type dependence. We propose that this phenomenon reflects shorter and stronger intranucleotide C6-H6…O5′-C5′ C–H/O hydrogen bonds for the A-RNA pyrimidines compared to the C8-H8…O5′-C5′ hydrogen bonds in purines. NMR evidence for such purine/pyrimidine differences was previously suggested based on the marked differences between the chemical shift tensors for the C8 and C6 base sites in A-RNA versus B-DNA (Ying et al. 2006). Shortening of the pyrimidine H6…O5′ distances in A-RNA compared to the purine H8…O5′ is apparent from the analysis of crystal structures as well, becoming more pronounced with increasing resolution of the diffraction data (Supplemental Fig. S2). Therefore, our observation of C5′ purine/pyrimidine chemical shift differences can be considered supporting evidence of the systematic differences in the corresponding C–H…O hydrogen bonds observable via the O5′-C5′ sites.

Analysis of the nucleotide-type corrected chemical shifts for the six tested RNA constructs produces significant numbers of novel fragmental similarity restraints

Raw ribose 13C/1H chemical shifts for the six tested RNA constructs were corrected for the nucleotide type dependence as specified in Table 1. Pairwise similarities of the resulting secondary chemical shift vectors were calculated according to Equation 1 with the α parameter set to 1.0, placing emphasis on carbon data's decreased dependence on the interresidue ring currents. Nucleotide-type corrected chemical shift similarity maps for the tested RNA constructs are shown in Figure 1. The active NCS restraints were selected as those below the CSS threshold of 0.2 p.p.m., compared to the maximum observed CSS values of 0.8 p.p.m.–3.3 p.p.m. The majority of the CSS-paired sites with similar local geometries involve canonical Watson–Crick base-paired nucleotides (Fig. 2). Overall, selected sets of similarity restraints include a small fraction of all possible nucleotide pairs in the sequences, ranging between 2% and 7%.

FIGURE 1.

FIGURE 2.

Schematic representation of the location of the CSS-related nucleotides within the tertiary structures of the six tested constructs. Lowest-scoring CSS scores corresponding to the active NCS restraints are highlighted in red.

Internucleotide CSS maps for the six tested RNA constructs. Progression of colors from red to blue corresponds to the CSS values increasing from 0.07 p.p.m. to 0.40 p.p.m. Nucleotide ID numbers match those in the corresponding PDB depositions. Schematic representation of the location of the CSS-related nucleotides within the tertiary structures of the six tested constructs. Lowest-scoring CSS scores corresponding to the active NCS restraints are highlighted in red. Using the structure of the UUCG tetraloop as an example (PDB entry 2KOC), the following nucleotides are paired up based on the similarity of the secondary chemical shifts (see Supplemental Fig. S3): 3/4, 3/11, 3/12, 3/13, 4/11, 4/12, 4/13, 11/13, and 12/13. These pairings enforce close geometric similarities within a cluster of nucleotides including 3, 4, 11, 12, and 13 (see Fig. 2; Supplemental Fig. S3). Compared to the ≈0.1 p.p.m. r.m.s. deviations between the secondary chemical shifts within this cluster, all of the remaining nucleotides exhibit unique patterns of chemical shifts differing by as much as 3.3 p.p.m. r.m.s. As they are determined by the chemical connectivity, these patterns need to be subtracted from the chemical shifts data in order to reveal the structural effects we are aiming to capture.

Reference calculations excluding chemical shift similarity restraints suggest variation of structural accuracy depending on the density of NMR restraints and the extent of conformation dynamics

The initial rounds of reference calculations against the distance and the torsion angle restraints starting from the PDB-deposited coordinates for the six test cases produced structures exhibiting poor agreement with the excluded RDCs (Q-factors of 0.7–1.4). Such results are generally representative of the NOE-based models (Bax and Grishaev 2005), and demonstrate that the preparatory simulation stages accomplish their purpose. Refinements including RDC restraints for the reference sets of calculations without the chemical shifts-based NCS terms resulted in structures exhibiting lower free Q-factors ranging between 0.78 for the loop domain of 1S34 and 0.37 for 2RN1 (Table 2). High average Qfree for the 1S34 loop relative to the respective value of 0.59 for the stem, as well as the twofold difference between their Qfit values are consistent with previous findings of the loop's increased conformational dynamics (Cabello-Villegas et al. 2004), suggesting inadequacy of the single-structure representation for this part of the construct. The difference between the eigenvalues of the alignment tensors for the stem and loop domains of 1S34, with the respective magnitudes and rhombicities (Da/R) of −16.8 Hz/0.47 and −13.1 Hz/0.55, is consistent with conformational variability of the interfacial flip-out base of A7. Aside from the 1S34 entry, cross-validation statistics of the reference NCS-free structure calculations appear fairly uniform with the free Q-factors ranging from 0.37 to 0.52. Among the two best cross-validating entries, 2KOC and 2RN1, the former is characterized by an exceptional density of high-precision torsion angle restraints derived from an extensive set of cross-correlated spin relaxation rates and homo- and hetero-nuclear scalar couplings, while the latter exhibits the highest density of the fitted 1-bond C–H RDCs (4.2 per nt) of all tested cases. These observations suggest that RNA coordinate accuracy is correlated with the ratio of the number of NMR observables to the number of the degrees of freedom necessary to describe the structural model.

TABLE 2.

RDC fit and cross-validation statistics for the tested RNA constructs

Introduction of the chemical shifts-based similarity restraints significantly improves structural accuracy as measured via residual dipolar coupling cross-validation

Except for the loop domain of the 1S34 entry, introduction of the CSS-based restraints brings substantial improvement in cross-validation statistics with relative decreases in Qfree ranging between 14% and 38%, and ending with the free Q-factors varying between 0.26 and 0.44 (Table 2; Fig. 3). Negligible cross-validation improvement for the 1S34 loop domain likely reflects low density of CSS-based restraints acting upon it (one out of six total). In addition to the previously noted well-cross-validating 2KOC and 2RN1 constructs which, with NCS restraints included, exhibit free Q-factors of 0.26 and 0.32, respectively, 1N8X and 1XHP now also cross-validate with low Qfree values of 0.34. Significant accuracy improvements in these cases are likely aided by particularly high numbers of close chemical shift matches and the corresponding active CSS restraints (see Table 3). Application of the CSS-derived pairwise NCS terms leads to effectively identical geometries for the corresponding nucleotide pairs (coordinate r.m.s.d. values below 0.01 Å). Superimposed structures calculated with and without CSS restraints are shown in Supplemental Figure S5.

FIGURE 3.

The impact of the CSS-based restraints on the RDC cross-validation statistics (average free Q-factors) for the tested RNA constructs.

TABLE 3.

Data summary for the test systems considered in this study

The impact of the CSS-based restraints on the RDC cross-validation statistics (average free Q-factors) for the tested RNA constructs. Data summary for the test systems considered in this study Comparison of the empirical validation parameters for the structures refined with and without the CSS-derived geometric similarity terms (Supplemental Table S3) indicates that the predominant effect of enforcing the CSS lies in improvement of the fraction of nucleotide “suites” within well-recognized rotameric states (Murray et al. 2003; Williams et al. 2018). Clash statistics remain largely unchanged, dominated by the specifications of the nonbonded force field and the overall quality of the experimental distance restraints. Similarly, no systematic changes are observed for the coordinate precision.

CSS restraints act to limit the number of the degrees of freedom in the RNA structural model in agreement with the experimental chemical shifts

Improvement of the RNA coordinate accuracy with the introduction of the fragment similarity restraints suggests that restriction of the models’ degrees of freedom in this manner reflects real responses of chemical shifts to the structural changes. These observations are consistent with early analyses of the effect of NCS restraints on the structural accuracy of medium-resolution protein crystal structures (Kleywegt and Jones 1995). Application of the chemical shift-based similarity restraints is straightforward and comes with no additional experimental or computational costs, as any NMR-based structure analysis implies chemical shift availability. Moreover, poor ribose chemical shift dispersion that generally complicates formulation of the NMR restraints becomes an advantage with this approach, as it increases the number of enforceable close chemical shift matches. Since the NCS-based restraints involve similarities rather than the specific chemical shift values, the resulting structures remain appropriate for studying chemical shifts/torsion angles relationships, simply representing more accurate variants of the original models. As the procedure used for obtaining the similarity restraints compares chemical shifts within the same spectral data set, it is unaffected by any possible imperfections of the absolute chemical shift referencing or pulse sequence effects, again facilitating its routine use. The price for this simplicity of application is the necessarily limited amount of information associated with such chemical shift processing, when compared to direct mapping of the chemical shifts to the torsion angles currently achievable for proteins.

Further improvements of structural accuracy are likely achievable with expansion and continued curation of the BMRB, the public database of RNA chemical shifts

Interpretation of the RNA chemical shifts through the proximal torsion angles as done here neglects the impact of the remote nucleobases. Earlier studies concluded that such effects are generally small for the ribose carbon sites (Frank et al. 2013b), while dependences of the ribose 1H shifts on types of the neighboring nucleotides are certainly possible (Barton et al. 2013). Indeed, ribose H1′ and H2″ sites (PDB nomenclature) are positioned closest to the bases of their 3′ or 5′ neighbors in A-RNA. However, inspection of the sites included in the CSS-based restraints reveals that their H1′ atoms are generally located 5–6 Å away from the centers of the nearest aromatic rings of their 5′ neighbors, relatively far considering the r−3 distance dependence of the ring current terms. This observation is consistent with earlier findings of independence of the H1′ shifts on the nature of both 3′ and 5′ neighboring bases in A-RNA (Cromsigt et al. 2001). On the other hand, H2″ sites in the NCS-restrained residues are positioned ∼3 Å from the centers of their 3′ neighboring nucleobases and could be impacted by the corresponding ring currents. Earlier studies, however, have not identified clear correlation of the H2″ shifts with the nature of the 3′ neighbors in A-RNA (Cromsigt et al. 2001; Barton et al. 2013). Fine-tuning of the CSS terms to suppress these effects should be possible via additional corrections of the reference state chemical shifts for the nearest neighbor effects. As demonstrated by the results for 1N8X and 1XHP entries, increases in the density of the CSS restraints likely bring additional improvements in the structural accuracy. The general idea of the approach outlined here offers ample potential for further development, such as further fine-tuning of the contributions of the individual carbon and proton sites within the CSS score. We also propose that the CSS-based restraints may be applicable even between independently refined constructs, similar to the manner they were used here for the two domains of 1S34. Refinement of a cluster of RNA structures linked by the CSS restraints would be a novel concept that has not been previously considered in biomolecular NMR. A significant advantage of this approach is that it could bring additional increases in the density of the active CSS-based restraints proportionally to the number of structures within the co-refined set. Since the CSS-based strategy relies on detection of close chemical shift matches, development of the interconstruct fragmental similarity restraints would require establishing precise chemical shift referencing for the separate entries, as well as corrections for small effects brought by pulse sequence variations (Aeschbacher et al. 2012; Brinson et al. 2019). For the six test cases considered here, ribose 1H chemical shifts are consistent with BMRB statistics. In contrast, two of the studied constructs—1N8X and 1XHP exhibit sizeable shifts in ribose 13C referencing, in agreement with earlier findings (Aeschbacher et al. 2012; Frank et al. 2013b). For the 1XHP entry, the offset appears correctable as the five ribose carbon resonance types are shifted by similar amounts (2.6 p.p.m.–1.9 p.p.m.). In contrast, resonance offsets of the five ribose carbon types for 1N8X are not consistent as they range between 6.2 p.p.m. and 13.7 p.p.m., thus precluding data re-referencing. This issue does not impact restraint generation as carried out in this study but would need to be addressed before any joint analysis of the chemical shifts data can be performed. The strategy outlined here does not appear directly extendable to the 13C/1H chemical shifts of the nucleobases due to a complex interplay of the effects of the nearby aromatic ring systems. However, structural accuracy improvements resulting from application of the ribose CSS terms coupled with better descriptions of the base-base and base-backbone interactions may facilitate better empirical parameterization of the ring currents, opening up a way toward more routine use of the nucleobase chemical shifts in RNA structure determination.

Future avenues for improvement include better characterization of lower-populated conformers

Even with the improved structural accuracy brought by the CSS restraints, derivation of the torsion angle/chemical shifts mapping for RNA is expected to be challenging. Variation of the torsion angles within the main A-form rotamer that encompasses 75% of the nucleotides is limited (r.m.s. of 8° or less for the main chain torsions and 3° or less for the ring torsions in sub-2.7 Å resolution RNA crystal structures in the PDB). Such tight definition of the major rotameric cluster (3′emmtp3′ in the δi−1/εi−1/ζi−1/αi/βi/γi/δi suite notation of Murray et al. 2003) would require very high precision of chemical shift modeling in order to yield nontrivial structural restraints; however, this analysis may be aided by the large fraction of data included in it. Ribose 1H/13C chemical shifts are likely to be more informative for separating the discrete RNA backbone rotamers, in which case the main challenge is the low population of such states relative to the dominant A-form. The small number of suitable RNA constructs considered in this study does not allow for making conclusions regarding the mapping of the chemical shifts to structural parameters; however, we can evaluate sensitivity of the secondary chemical shifts to variation of the local structure. Analysis of the ribose secondary 13C shifts for the two best-cross-validating constructs (2KOC and 2RN1) reveals that one-third of the outliers constitute 3′- or 5′- terminal nucleotides, associated with chain termination effects and likely increased conformational dynamics (Cromsigt et al. 2001). Canonical A-form nucleotides with ribose secondary 13C chemical shifts near zero exhibit tight clustering with standard deviations from 0.1 p.p.m. to 0.6 p.p.m. In order to investigate structural origins of the remaining outliers, we have carried out structure refinement with active CSS terms while fitting all experimental NMR restraints for the 2KOC and 2RN1 constructs, followed by the torsion angle analysis. For the 2KOC construct, nonterminal ribose secondary chemical shift outliers are observed for nucleotides U7–G10 (Supplemental Fig. S3), corresponding to the last three nucleotides of the UUCG tetraloop and the 3′-flanking Watson–Crick base-paired G10. Considering tight definition of the respective backbone torsion angles (Supplemental Table S1), chemical shift deviations from the values typical for the A-form RNA reflect the 3′emttp2′, 2′emmtp2′, and 2′epptt3′ rotamers of the three torsion angle suites within the tetraloop (Murray et al. 2003). C2′-endo pucker of the nucleotides U7 and C8 is thus associated with upfield 4–5 p.p.m shifts for the C1′, downfield 5–8 p.p.m. shifts for the C3′, and downfield 2–5 p.p.m. shifts for the C4′ resonances. C2′ resonances are least affected by the unusual rotamers of the UUCG tetraloop, with 2 p.p.m. downfield secondary shifts for C8 and G9, while C5′ resonances exhibit fairly uniform downfield secondary shifts of 3–4 p.p.m. for U7–G10. These outlier 13C chemical shifts are also expected to reflect altered base stacking and unusual glycosidic χ torsions suggesting that ribose 13C resonance positions respond to RNA backbone rotamer switches in a complex and coordinated manner, which would likely preclude simple analysis of the isolated torsion angles based on small contiguous clusters of local chemical shifts. For the 2RN1 construct, nonterminal secondary ribose 13C chemical shift outliers are observed for the nucleotides C5–U7, G20–C23, and G27 (Supplemental Fig. S4). Out of these, stretches C6–U7 and G21–C23 make up the domain interface of the TAR-TAR*GA kissing complex. The original report of the 2RN1 structure (Van Melckebeke et al. 2008) highlighted the unusual C4′-exo pucker for A11 and G21, as well as the C2′-endo pucker for G20, with the rest of the nonterminal sites confined to the common C3′-endo conformation. Analysis of the structure bundles generated with the CSS restraints indicates that, rather than corresponding to the unusual but precisely determine pucker states, nucleotides C5–C6, A11–G12, G21–U22, and A28–29 exhibit greatly increased backbone torsions angle uncertainties (Supplemental Table S2). Inspection of the corresponding torsion angle restraints reveals that all of these sites are associated with either completely unrestrained α, β, and γ torsions, or their restraint uncertainties of approximately ±100° (G6, G12, G20–U22, A28–C29). Compared to the remaining ±30° torsion angle restraints representing the canonical A-RNA conformation, these omissions were meant to allow greater conformational freedom for both the noncanonical G21–A28 base pair and the interface between the TAR and TAR*GA domains. However, the resulting decreased structural precision near these sites complicates the analysis of the nearby unusual chemical shifts. On the other hand, structure/chemical shifts analysis for 2RN1 highlights two discrepancies that appear readily addressable. Nucleotide U7 belongs to the canonical A-form suite 3′emmtp3′, inconsistent with its extreme ∼12 p.p.m. downfield C4′ resonance and warranting further investigation. Similarly, nucleotide G12, while poorly defined structurally as stated above, displays ribose chemical shifts entirely consistent with the canonical A-form rotamer 3′emmtp3′. These examples highlight benefits attainable with closer integration of the RNA chemical shifts data in structure refinement.

Concluding remarks

We propose routine use of the chemical shifts-based fragment similarity restraints described here in all structural studies of RNA using solution NMR data. Such applications are best performed while monitoring cross-validation statistics, most straightforwardly accomplished in cases when RDC restraints are available. Keeping in mind that one of the payoffs of the better accuracy of the NMR-derived RNA models is the possibility of improved NMR structure-based mapping between the torsion angles and the chemical shifts, further applications can be selected from approximately 90 structures of RNA constructs in the PDB that include RDCs as deposited restraints. Such study would require access to the corresponding data sets of chemical shifts, ∼40% of them not deposited to the BMRB, and a joint effort of the RNA NMR community. The strategy outlined in this study readily lends itself to a number of extensions outlined above, paving a way toward better integration of experimental ribose and nucleobase chemical shifts in the RNA structure determination.

MATERIALS AND METHODS

Ribose 13C and 1H chemical shifts corresponding to the PDB depositions 1N8X, 1S34, 1XHP, 2KOC, 2RN1, and 2O33 were obtained from the BMRB database (entries 5773, 6062, 6320, 5705, 11,014, and 15,080, respectively). RNA-only data entries deposited to the BMRB were also analyzed to construct the nucleotide-type specific chemical shift distributions for the ribose H1′, H2″, H3′, H4′, H5′, H5″, C1′, C2′, C3′, C4′, and C5′ sites based on the data recorded at temperatures between 15°C and 35°C. In total, the assembled set included 537 A, 781 G, 577 C, and 548 U nucleotides from the 78 entries listed in the Supplemental Material. Raw ribose chemical shifts for the six tested RNA constructs were corrected for the residual systematic differences between the nucleotide types calculated from the BMRB-extracted statistics. The agreement between the vectors of such secondary proton and carbon ribose chemical shifts was quantified for all nucleotide pairs via the chemical shift similarity (CSS) index: where N1H and N13C are the numbers of 1H and 13C ribose chemical shifts of the same type reported for both nucleotides within a given pair, Δδ values denote the differences between the respective secondary chemical shifts, and the factor α balances the impact of the 1H and 13C data. Nucleotides at the 3′ and 5′ chain ends were excluded from the CSS index evaluation due to differences in chemical bonding and base stacking, as well as likely increased conformational dynamics that would preclude their representation by a single conformation. In cases when H5′ and H5″ resonances were reported as unambiguously assigned, their chemical shifts were used as deposited. Otherwise, the shifts were flipped to result in the downfield H5′ and the upfield H5″. Nucleotide pairs with the lowest CSS scores were selected for application of the respective similarity restraints via the NCS terms. RNA structure refinements were carried out with the experimental NMR distance, torsion angle, and RDC restraints associated with the PDB depositions 1N8X, 1S34, 1XHP, 2KOC, 2RN1, and 2O33. The most precise 1-bond C–H and N–H RDC restraints were split into multiple exhaustive sets of fitted and cross-validated couplings (Table 3). In the cases when the experimental RDCs included less precisely determined coupling types, the corresponding restraints were fitted but not cross-validated against, with the corresponding fit force constants scaled down to reflect their higher uncertainties. Such examples included 1-bond base and ribose C–C RDCs for the 1S34 entry, 1-bond C–C and 2-bond C–H base RDCs for the 2O33 entry, and lower-precision 1-bond C–H RDCs for the 2RN1 entry (see Table 3 for details). Flexible linkage of the stem and loop domains previously noted for the 1S34 entry (Cabello-Villegas et al. 2004) necessitated refinement with separate alignment tensors fitted to the two domains and resulted in their independently evaluated cross-validation statistics. Xplor-NIH (Schwieters et al. 2003, 2006) and CNS (Brünger et al. 1998) software packages were used for all structure refinements with source code modifications allowing for (i) the continuous SVD fit of the fully variable 5-parameter alignment tensors during the refinement against the RDC restraints, and (ii) the application of the NCS restraints between the nucleotides of differing types. The empirical force field used in the structure refinement included harmonic terms for bonds, angles, and chirality-specifying improper dihedrals, as well as quartic repulsive-only terms representing nonbonded interactions with the atomic radii multiplier of 0.9 and the force constant of 4 kcal Å−4 M−1. Simulated annealing protocols included 500 psec of variable-step torsion angle molecular dynamics with the temperature linearly ramped down from 1000 K to 1 K, followed by 100 steps of Powel energy minimization. The master RDC force constants for the most precise 1-bond C–H RDCs were ramped up geometrically during simulated annealing starting from 0.01 kcal Hz−2 M−1 to the final values that matched RDC fit qualities for the respective PDB-deposited structures. NCS force constants for each nucleotide pair exhibiting lowest CSS scores were set inversely proportional to the corresponding CSS values, enforcing closer geometric agreement for the pairs with more similar chemical shifts. In all cases, the original structures deposited into the PDB were first subjected to 1-nsec rounds of simulated annealing that only included distance and torsion angle restraints with the temperature linearly decreased from 1000 K to 1 K. This stage was necessary to ensure removal of the residual effect of the fitted RDCs that could otherwise favorably bias cross-validation statistics. The resulting structures served as inputs for the simulating annealing refinements against multiple sets of fitted RDCs, performed either with or without the CSS-based NCS terms. In the cases where complete sets of 1′–5′ ribose 1H/13C chemical shifts were reported for a given nucleotide pair, NCS restraint terms between purines and pyrimidines were applied to the entire ribose rings, C5′, O5′, and P atoms, or to the sets of atoms additionally including base C4/N3/C8/H8 or C2/O2/C6/H6 for the purine/purine, or the pyrimidine/pyrimidine pairs, respectively. The former cases corresponded to the similarity restraints acting on the backbone torsion angles only, while the latter additionally included the glycosidic χ torsions, capturing the effects of the nucleobase orientation on the ribose 1H/13C chemical shifts. In cases when C5′/H5′/H5″ chemical shifts were not reported for at least one of the paired nucleotides, P atoms were removed from the corresponding NCS terms. Similarly, when neither C5′/H5′/H5″ nor C4′/H4′ chemical shifts matches could be evaluated for a given nucleotide pair, both O5′ and P atoms were removed from the NCS term. The above-defined NCS term atom selections are illustrated in Supplemental Figure S6. RDC values in each cross-validated set were predicted from the final structures using the alignment tensors determined from the fitted RDC set. Cross-validation statistics are reported via free Q-factors calculated as The numerator of the expression contains the sum over the cross-validated (inactive) set of RDCs with the predicted values calculated from the alignment tensor fitted to the active RDCs, and the denominator—the corresponding magnitude Dafit and rhombicity Rfit. Fitted and free Q-factor statistics are reported as the averages and the uncertainties of the averages over the multiple sets of fitted/cross-validated RDCs used for each entry. Supplemental Material includes the list of RNA entries from BMRB used for extraction of the chemical shift statistics, distributions of the backbone torsion angles in the entries 2KOC and 2RN1 with the structures refined against the CSS-based terms and all RDC data fitted, structure statistics for the models refined with and without the CSS terms (Supplemental Tables S1, S2, S3, respectively), clustering of the stereo-specifically assigned H5′ and H5″ chemical shifts from the BMRB statistics (Supplemental Fig. S1), distributions of the nucleobase/backbone C–H…O hydrogen bond donor/acceptor distances in the crystal structures of RNA in the PDB (Supplemental Fig. S2), the secondary chemical shifts of the ribose carbon sites in the 2KOC and 2RN1 entries (Supplemental Figs. S3, S4, respectively), overlays of the structures calculated with and without the CSS-based terms (Supplemental Fig. S5), and illustration of the atom sets used for specification of the NCS terms (Supplemental Fig. S6).

44 in total

1. Prediction of proton chemical shifts in RNA. Their use in structure refinement and validation.

Authors: J A Cromsigt; C W Hilbers; S S Wijmenga
Journal: J Biomol NMR Date: 2001-09 Impact factor: 2.835

Review 2. The use of residual dipolar coupling in studying proteins by NMR.

Authors: Kang Chen; Nico Tjandra
Journal: Top Curr Chem Date: 2012

3. Estimating the accuracy of protein structures using residual dipolar couplings.

Authors: Katya Simon; Jun Xu; Chinpal Kim; Nikolai R Skrynnikov
Journal: J Biomol NMR Date: 2005-10 Impact factor: 2.835

4. A geometrical parametrization of C1'-C5' RNA ribose chemical shifts calculated by density functional theory.

Authors: Reynier Suardíaz; Aleksandr B Sahakyan; Michele Vendruscolo
Journal: J Chem Phys Date: 2013-07-21 Impact factor: 3.488

5. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures.

Authors: A T Brünger
Journal: Nature Date: 1992-01-30 Impact factor: 49.962

6. Analysis of the contributions of ring current and electric field effects to the chemical shifts of RNA bases.

Authors: Aleksandr B Sahakyan; Michele Vendruscolo
Journal: J Phys Chem B Date: 2013-02-11 Impact factor: 2.991

7. MolProbity: More and better reference data for improved all-atom structure validation.

Authors: Christopher J Williams; Jeffrey J Headd; Nigel W Moriarty; Michael G Prisant; Lizbeth L Videau; Lindsay N Deis; Vishal Verma; Daniel A Keedy; Bradley J Hintze; Vincent B Chen; Swati Jain; Steven M Lewis; W Bryan Arendall; Jack Snoeyink; Paul D Adams; Simon C Lovell; Jane S Richardson; David C Richardson
Journal: Protein Sci Date: 2017-11-27 Impact factor: 6.725

8. Structure and thermodynamics of a conserved U2 snRNA domain from yeast and human.

Authors: Dipali G Sashital; Vincenzo Venditti; Cortney G Angers; Gabriel Cornilescu; Samuel E Butcher
Journal: RNA Date: 2007-01-22 Impact factor: 4.942

9. PREDITOR: a web server for predicting protein torsion angle restraints.

Authors: Mark V Berjanskii; Stephen Neal; David S Wishart
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

10. BioMagResBank.

Authors: Eldon L Ulrich; Hideo Akutsu; Jurgen F Doreleijers; Yoko Harano; Yannis E Ioannidis; Jundong Lin; Miron Livny; Steve Mading; Dimitri Maziuk; Zachary Miller; Eiichi Nakatani; Christopher F Schulte; David E Tolmie; R Kent Wenger; Hongyang Yao; John L Markley
Journal: Nucleic Acids Res Date: 2007-11-04 Impact factor: 16.971