The essential splicing factor Prp24 contains four RNA Recognition Motif (RRM) domains, and functions to anneal U6 and U4 RNAs during spliceosome assembly. Here, we report the structure and characterization of the C-terminal RRM4. This domain adopts a novel non-canonical RRM fold with two additional flanking α-helices that occlude its β-sheet face, forming an occluded RRM (oRRM) domain. The flanking helices form a large electropositive surface. oRRM4 binds to and unwinds the U6 internal stem loop (U6 ISL), a stable helix that must be unwound during U4/U6 assembly. NMR data indicate that the process starts with the terminal base pairs of the helix and proceeds toward the loop. We propose a mechanistic and structural model of Prp24's annealing activity in which oRRM4 functions to destabilize the U6 ISL during U4/U6 assembly.
The essential splicing factor Prp24 contains four RNA Recognition Motif (RRM) domains, and functions to anneal U6 and U4 RNAs during spliceosome assembly. Here, we report the structure and characterization of the C-terminal RRM4. This domain adopts a novel non-canonical RRM fold with two additional flanking α-helices that occlude its β-sheet face, forming an occluded RRM (oRRM) domain. The flanking helices form a large electropositive surface. oRRM4 binds to and unwinds the U6 internal stem loop (U6 ISL), a stable helix that must be unwound during U4/U6 assembly. NMR data indicate that the process starts with the terminal base pairs of the helix and proceeds toward the loop. We propose a mechanistic and structural model of Prp24's annealing activity in which oRRM4 functions to destabilize the U6 ISL during U4/U6 assembly.
Proper gene expression in eukaryotes requires production of functional messenger RNA (mRNA). Through the essential process of pre-mRNA splicing, introns are removed from pre-mRNA, and exons are joined together. Many proteins and RNAs are required for splicing. Key to splicing are five small nuclear RNAs (U1, U2, U4, U5 and U6 snRNAs), each of which combines with multiple proteins to form a small nuclear ribonucleoprotein (snRNP) (1). The five snRNPs and additional proteins form a dynamic complex known as the spliceosome that catalyzes splicing. During splicing, many proteins enter or leave the spliceosome, and both the pre-mRNA and the snRNAs undergo essential dynamic rearrangements.Saccharomyces cerevisiaePrp24 [known as p110 or Sart3 in humans (2)] is an essential component of the U6snRNP (3–9). For U6 to enter the splicing cycle, its internal stem loop (ISL) must be unwound, and U6 must base pair to U4 RNA to form the U4/U6 di-snRNP (10,11). Extensive evidence demonstrates that Prp24 greatly accelerates annealing of the U4 and U6 RNAs, but the mechanism by which it does so is as yet unclear (3–8,12,13). In addition to Prp24, the U6snRNP includes a 7-membered ring of Lsm proteins (Lsm2-8) (14). Prp24 binds both the U6 RNA proximal to the ISL and the Lsm ring (5,6,8,15–17). The Lsm ring binds Prp24 and the uridine rich 3′-end of U6 RNA (8,12,18).Prp24 is a 51 kDa protein containing three known and one predicted RNA recognition motif (RRM) domains (Figure 1A) (3,8,16). RRMs are common protein motifs that typically bind single-stranded RNA along a four stranded antiparallel β-sheet through two conserved RNP motifs enriched in aromatic residues (19,20). Previous work has demonstrated that the N-terminal 40 amino acids of Prp24 are unstructured and that the first three RRMs fold canonically (16). The first two RRMs pack together tightly and form a single RNA binding surface, with RRM2 binding U6 RNA canonically and RRM1 binding non-canonically through an electropositive surface (16,21). In contrast, RRM3 makes no stable contacts with RRM2 in solution (21), and its RNA binding behavior is unknown.
Figure 1.
Structured regions in the C-terminal portion of Prp24. (A) Domains in Prp24. Black boxes indicate the three known and one predicted RRMs. The four lower black bars represent the protein constructs used in this study. Non-native sequence is shown as text. (B) NMR T1/T2 relaxation data of 292–444 and oRRM4. The higher variability of the T1/T2 ratios for 292–444 likely results from decreased spectral quality and difficulties in precisely measuring the shorter T2 relaxation times for the larger protein. (C) Cα secondary shift data for 292–444 and oRRM4. The predicted secondary structure is shown across the top. αN and αC indicate the non-canonical helices. (D) Overlay of 1H-15N-HSQC spectra of 292–444 (red) and oRRM4 (blue). The 292–444 spectrum has wider contour lines than the oRRM4 spectrum to enhance its visibility. (E) Overlay of 1H-15N-HSQC spectra of 208–444 (red) and oRRM4 (blue). The 208–444 has been given wider contour lines. The additional oRRM4 peak at 10 ppm arises from the non-native tryptophan residue.
Structured regions in the C-terminal portion of Prp24. (A) Domains in Prp24. Black boxes indicate the three known and one predicted RRMs. The four lower black bars represent the protein constructs used in this study. Non-native sequence is shown as text. (B) NMR T1/T2 relaxation data of 292–444 and oRRM4. The higher variability of the T1/T2 ratios for 292–444 likely results from decreased spectral quality and difficulties in precisely measuring the shorter T2 relaxation times for the larger protein. (C) Cα secondary shift data for 292–444 and oRRM4. The predicted secondary structure is shown across the top. αN and αC indicate the non-canonical helices. (D) Overlay of 1H-15N-HSQC spectra of 292–444 (red) and oRRM4 (blue). The 292–444 spectrum has wider contour lines than the oRRM4 spectrum to enhance its visibility. (E) Overlay of 1H-15N-HSQC spectra of 208–444 (red) and oRRM4 (blue). The 208–444 has been given wider contour lines. The additional oRRM4 peak at 10 ppm arises from the non-native tryptophan residue.While the extreme C-terminal region of Prp24 contains a conserved Lsm interaction site (referred to as the C-terminal motif or SNFFL box) (8,15), little else is known about the structure or function of residues C-terminal of RRM3. The RNP motifs of the fourth RRM are so divergent from the canonical sequences that it was only identified as an RRM through comparison with fungal homologs and secondary structure prediction (3,8). RRM4 is functionally important, as a triple alanine mutation of RNP1 positions 1, 3 and 5 results in a temperature sensitive phenotype (8), and deletion of RRM4 (residues 317–391) is lethal even when Prp24 is expressed at or above wild-type levels (S.S. Kwan and D.A.B., unpublished data).Here, we present the structural and functional characterization of the C-terminal region of Prp24. While the extreme C-terminus is unstructured, there is a 108 residue structured region centered on the predicted fourth RRM. This domain is a non-canonical RRM in which the expected βαββαβ RRM-fold is flanked by N- and C-terminal α-helices. These α-helices are rigidly bound to the β-sheet face of the RRM fold, forming a domain, which we refer to as an occluded RRM (oRRM). A substantial electropositive surface is present on the solvent exposed sides of the flanking α-helices, which may constitute an alternative RNA binding surface. Indeed, oRRM4 is capable of binding and, unexpectedly, unwinding the U6 ISL in vitro. We present a structural model for the complex between Prp24 RRMs 1–4 and the ISL region of U6 RNA, which suggests a mechanism for the nucleation of U4/U6 pairing.
MATERIALS AND METHODS
Construct design
All constructs were derived from an Escherichia coli-based pET21b expression vector containing Prp24 208–444 prepared as described (15). 292–444, RRM3 and oRRM4 were obtained by truncation through a modification of the QuikChangeTM Site-Directed Mutagenesis protocol (22,23). The additional tryptophan in RRM3 and oRRM4 (Figure 1A) was incorporated to facilitate spectrophotometric quantification. Reactions contained 1× Pfu Turbo Buffer (Agilent Technologies), 0.2 mM dNTPs (Agilent Technologies), 2 µM forward or reverse primer (Integrated DNA Technologies), 100 ng template plasmid and 2U Pfu Turbo DNA polymerase (Agilent Technologies) in a 50 µl volume. Reactions were incubated for 3 min at 94°C in a thermocycler (Eppendorf Mastercycler), then four cycles of 1 min 94°C, 1 min 52°C, 12 min 68°C. Twenty-five micro litre of each reaction were combined with an additional 2 U Pfu Turbo and subjected to the same thermocycler program, except with 15 cycles and a final 1 h 68°C incubation. The remainder followed the manufacturer’s protocols, using XL2-Blue Ultracompetent cells (Agilent Technologies) for the transformation.
Protein and RNA preparation
All proteins were prepared essentially as described (24), with the following changes. The HisPur cobalt spin column (Pierce Biotechnology) elution buffer used 50 mM potassium phosphate pH 6 instead of 50 mM sodium phosphate pH 7.4. NMR Buffer contained 20 mM potassium phosphate pH 6, 50 mM potassium chloride and 1 mM dithiothreitol. Gel filtration was performed in NMR Buffer, and buffer changes were performed through buffer exchange in a centrifugal filter device (Millipore Amicon, 3 kDa cutoff).The U6 nucleotide 49–88 and nucleotide 58–97 RNAs were prepared through in vitro transcription using purified His6-tagged T7 RNA polymerase. Synthetic DNAoligonucleotides (Integrated DNA Technologies) containing either the T7 promoter sequence, two non-natural G nucleotides and the appropriate U6 RNA sequence or the reverse complement were used (Supplementary Figure S1A and B). Transcribed RNA was purified using denaturing 10% polyacrylamide gel electrophoresis and identified through UV shadowing. Gel fragments containing RNA were excised, and the RNA was diffused out of the gel into 0.3 M sodium acetate pH 5.0 and precipitated with cold ethanol. RNA was purified by anion exchange chromatography (Bio-Rad Bio-Scale Mini Macro-Prep High Q Cartridge on a BioLogic LP chromatography unit), then buffer exchanged into NMR Buffer.Fluorescently labeled RNAs were purchased from Dharmacon (U2) or IDT (U4 and U6), and prepared according to the manufacturer’s protocols. U2: 5′-FITC-ACGAAUCUCUUUGCCUUUUGGCUUAGAUCAAGUGUAGUAUCUGUUCUUUUC-3′; U4: 5′-Cy3-AUCCUUAUGCACGGGAAAUUUUGCUGGUU and U6: 5′-Cy5-AGAGAUGAUCAGCAGUUCCCCUGCAUAAGGAUGAACCGUU-3′.
NMR resonance assignments
All NMR experiments, unless stated otherwise, were acquired at 25°C in 90% NMR Buffer/10% D2O using 600 µM 13C15N labeled protein. Standard 2D and 3D spectra of oRRM4 were acquired as follows: 2D 1H, 15N-HSQC, 3D CBCACONH, 3D NOESY-1H, 15N-HSQC, 3D HNCACB and 3D HNCO on a Varian 900 MHz spectrometer; 3D HBHACONH, 3D CCONH and 3D HCCONH on a Varian 600 MHz spectrometer; 2D aliphatic and aromatic 1H, 13C-HSQC, 2D NOESY (unlabeled protein), 3D HCCH-TOCSY and 3D aliphatic and aromatic NOESY-1H, 13C-HSQC on a Varian 900 MHz spectrometer in 99% deuterated NMR Buffer. Deuterated NMR Buffer was obtained by lyophilizing NMR Buffer to dryness and resuspending in an equal volume of D2O three times.Resonance assignments of RRM3 and 292–444 were obtained using 2D 1H, 15N-HSQC, 3D CBCACONH, 3D HNCACB, 3D HBHACONH and 3D HNCO spectra, all acquired on a Varian 900 MHz spectrometer in 90% NMR Buffer/10% D2O.Spectra were processed using nmrPipe (25) and analyzed using Sparky (University of California San Francisco).
NMR relaxation measurements
15N NMR relaxation experiments of 292–444 and oRRM4 were acquired on a 600 MHz Varian spectrometer. Longitudinal (T1) experiments had relaxation delays of 0.1, 0.3, 0.5, 0.7, 0.9 or 1.1 s; transverse (T2) experiments had relaxation delays of 0.01, 0.03, 0.05, 0.07, 0.09 or 0.13 s. Spectra were processed using nmrPipe (25), and relaxation curves fit using the Sparky relaxation peak heights tool.
RNP sequence alignment and statistical analysis
Fifty RRMs of known structure and function were identified from the PDB by searching for ‘RRM’. Their RNP motifs were manually aligned based on sequence and structure (Supplementary Table S1). They were classified as follows: canonical RRMs are known to bind nucleic acid along their β-sheet face, either through presence of nucleic acid in the structure or through other evidence presented in the ‘Primary Citation’ linked to the structure in the PDB (i.e. NMR chemical shift perturbation, mutations, etc.); all other RRMs are non-canonical. Quasi-RRM domains were excluded from the analysis. Thirty-four ‘canonical’ RRMs and 16 ‘non-canonical’ RRMs (including oRRM4) were identified. Each sequence was checked for conservation at every RNP position. Note that tryptophan was accepted at conserved F/Y positions, and methionine at conserved I/L/V positions. The data were used to construct a contingency table for each position, showing the number of canonical and non-canonical RRMs, which either have or do not have sequence conservation at that position. A two-sided Fisher’s exact chi-square test (GraphPad Prism) was used to calculate P-values.
Residual dipolar coupling experiments
RDC values for 13C15N labeled oRRM4 were obtained and analyzed as described (21).
Structure determination
Initial structures of oRRM4 were obtained using the ATNOS-CANDID module of Unio 08 (26–28). Twenty structures were calculated based on the assigned peak list, the 3D NOESY spectra, and dihedral angle and hydrogen bond restraints derived from TALOS+ (29). Ultimately, 1289 NOE distance restraints were obtained. Ninety additional NOE distance restraints were manually identified. NOEs from ATNOS-CANDID were restrained as 1.8 Å lower bound to 0.5 Å above the assigned distance upper bound, while manual NOEs were restrained as 1.8–6 Å. All 20 ATNOS-CANDID structures underwent water refinement using CNS (30) under the HADDOCK 2.0 interface (31,32). As no structure had any distance restraint violation >0.5 Å, all 20 structures were aligned and had their pairwise RMSD calculated using an in-house script. Structure statistics are shown in Table 1. Structure statistics and buried surface area values were obtained from HADDOCK output files, except for RDC Q factors, which were obtained from PALES (33). Ramachandran plot statistics for all 20 structures (based on the PDB validation suite) are: most favored 78.4%, additional allowed 19.7%, generously allowed 1.7% and disallowed 0.2%.
Table 1.
NMR structure determination statistics
NMR distance and angle constraints
Distance constraints
1432
Total NOE
1379
Intraresidue
460
Interresidue
919
Sequential (|i – j| = 1)
363
Medium range (|i – j| < 4)
210
Long range (|i – j| > 3)
346
Hydrogen bonds
53
Total dihedral angle restraints
177
ϕ
92
ψ
85
Total residual dipolar coupling restraints
120
NH
79
CH
41
Structure statistics
Violations (mean and SD)
Distance constraints
0 ± 0
Dihedral angle constraints
2.8 ± 1.4
Max. dihedral angle violation (°)
12.8
Max. distance constraint violation (Å)
0.34
Q-factor (%)
16 ± 1
Deviations from idealized geometry
Bond lengths (Å)
0.0064
Bond angles (°)
0.94
Impropers (°)
1.1
Average pairwise r.m.s. deviation (20 structures) (Å)
Heavy
1.78
Backbone
0.95
NMR structure determination statisticsThe structural model of Prp24 bound to U6 RNA was calculated starting from structures of Prp24 residues 41–399 and U6 nucleotides 49–91. HADDOCK 2.0 (31,32) was used to dock Prp24 and U6. However, HADDOCK does not typically allow the large rearrangements necessary to bring RRM3 and oRRM4 into contact with U6. Therefore, the lowest energy HADDOCK structure was refined in XPLOR-NIH 2.21 (34) to produce the final model. In order to maintain the structures of the individual domains, all distance and dihedral angle restraints from the NMR structures of oRRM4, RRMs 1 and 2, and the extended U6 ISL (BMRB IDs: 17490, 7070 and 6320) were included in the calculations, along with distance restraints generated from the crystal structure of RRM3 (PDB ID: 2GHP). Missing residues from the crystal structure (197–205 and 248–250) were manually added using the ‘build residue’ and ‘sculpting’ functions in PyMol (DeLano Scientific). oRRM4 was added by using the ‘translate’ and ‘create bond’ functions in PyMol to join the NMR structure of oRRM4 to the crystal structure of RRMs 1–3. U6 RNA nucleotide 49–91 was generated as described (21). Intermolecular distance restraints were derived from the previous model of RRMs 1 and 2 bound to U6 RNA (21), chemical shift perturbations on RRM3 induced by U6 RNA (Supplementary Figure S1) and perturbations on the U6 ISL induced by oRRM4 (Figure 5D). Because RRM3 appears to bind the ISL, and oRRM4 was shown to preferentially destabilize the lower ISL, RRM3 was restrained to bind the upper ISL.
Figure 5.
oRRM4 unwinds the U6 ISL. (A) Schematic representation of the U6 RNA construct used in these assays (see also ‘Materials and Methods’ section, Supplementary Figure S2). Colored positions show relative attenuation in (D), and the red star indicates the location of Cy5. (B) Fluorescence anisotropy measurements of binding affinity between oRRM4 and the three RNAs. Calculated apparent Kd values are shown. (C) UV monitored RNA unwinding assay. The arrow indicates a data point collected after the addition of NaCl to 1 M. (D) NMR monitored RNA unwinding assay. Spectra of 10 µM U6 RNA with 0 µM (black), 10 µM (blue) or 20 µM (red) oRRM4.
Structure figures were prepared using PyMol, and electrostatic surfaces were calculated using APBS (35) displaying the solvent accessible surface from −4 kT to 4 kT.
Fluorescence anisotropy RNA binding assay
RNA of 10 nM in Assay Solution (150 mM potassium chloride, 1 mM magnesium chloride, pH 6) was titrated with increasing amounts of oRRM4 as shown in Figure 5B. Fluorescence polarization measurements were taken in a Varian Cary Eclipse spectrofluorimeter fitted with automated polarizers. At each concentration of protein, anisotropy was measured five times using a 10 s averaging time. The mean and standard deviations for three independent experiments were fit using the following one-site equation:
where Yf is the final anisotropy, Yo is the initial anisotropy, X is the protein concentration, Kd is the apparent dissociation constant and n is the Hill Coefficient. Kd and n values were determined through non-linear least squares fitting using GraphPad Prism 4.0. Values for the Hill coefficient were all between 0.9 and 1.0 suggesting no cooperativity. Anisotropy values were normalized to between 0 and ±1.
UV-monitored RNA unwinding assay
All measurements were acquired at 25°C in Assay Solution. U6 RNA of 2 µM was titrated with increasing amounts of protein, as indicated in Figure 5C. For oRRM4 + U6 RNA, the final data point was obtained by adding solid sodium chloride to each sample to bring them to 1 M NaCl. Absorbance at 260 nm was followed in triplicate for the following samples for each protein tested: A1, U6 RNA + protein; A2, Assay Solution + protein (control for protein absorbance); A3, U6 RNA + mock protein addition (control for dilution effects) and A4, Assay Solution + mock protein addition (blank for A3). Mock protein addition entailed addition of an equal volume of Assay Solution instead of protein. All absorbance values were baseline corrected using the no protein value. Changes in absorbance were calculated as:
where the first term reflects the absorbance of U6 in the presence of protein, and the second term reflects the absorbance of U6 in the absence of protein. The range of concentrations tested was limited by the relatively poor solubility of the protein/RNA complex. Denaturing polyacrylamide gel electrophoresis analysis of the RNA before and after addition of protein showed no degradation (data not shown).
NMR-monitored RNA unwinding assay
One dimensional proton with flip-back water suppression NMR spectra were acquired of a 10 µM sample of U6 RNA in 90% NMR Buffer/10% D2O with 0 µM, 10 µM or 20 µM of oRRM4 on a Bruker 500 MHz spectrometer at 25°C. Due to the low sample concentration, 6144 scans were necessary (∼3.5 h). In order to accentuate relative differences and control for signal loss due to partial aggregation, signal intensity was normalized to the peak for nucleotides 77/78, which showed the least absolute change. Spectra were analyzed and processed using TopSpin (Bruker). Resonance assignments were based on BMRB ID: 6320.
Chemical shift perturbation
Chemical shift perturbation of RRM3 was performed as described previously (21) using 167 µM 15N RRM3 in 90% NMR Buffer/10% D2O on a 900 MHz spectrometer at 25°C. For U6 nucleotide 49–88, either 0 µM or 333 µM RNA was used. For U6 nucleotide 58–97 either 0 µM or 15 µM RNA was used. The putative binding site was identified as residues with a change in peak position of >0.08 ppm when U6 nucleotide 49–88 RNA was added.
RESULTS
Structural characterization of the Prp24 C-terminal region
The previously determined crystal structure of Prp24 included residues 1–291, covering the first three RRM domains (16). We investigated the structure of the remaining C-terminal region of Prp24 corresponding to residues 292–444 (Figure 1A). NMR relaxation times were measured in order to identify which regions are structured in solution (Figure 1B). There is a highly structured region overlapping the predicted fourth RRM (residues 302–398, average T1/T2 of 13.3), while the C-terminal tail is unstructured (residues 406-end, average T1/T2 of 3.2). The secondary structure of the C-terminal region was predicted based on Cα secondary shifts (Figure 1C) (36). In addition to a canonical βαββαβ RRM-fold (residues 312–387), additional flanking α-helices are predicted (residues 297–302, 308–311 and 388–397). An α-helix is also predicted in the far C-terminal region (residues 434–442, overlapping the conserved SNFFL box), but subsequent NMR investigation was unable to confirm the existence of this helix (data not shown).Because the large unstructured C-terminal tail results in extensive spectral overlap, a shorter construct incorporating only the structured regions was studied (oRRM4, Figure 1A). NMR relaxation time measurements showed that the entire sequence, with the exception of the extreme termini, is structured in solution (average T1/T2 of 9.4, Figure 1B). Both the Cα secondary shifts (Figure 1C) and 1H-15N-HSQC spectra (Figure 1D) of oRRM4 are consistent with the larger 292–444 construct, demonstrating that the presence of the C-terminal tail does not significantly affect the structured region. We also investigated potential interdomain contacts between oRRM4 and RRM3, as Prp24′s first two RRMs are known to make extensive interdomain contacts (16,21). The chemical shifts of the isolated oRRM4 do not change in a construct containing RRM3 (Figure 1E), indicating that no stable contacts occur between oRRM4 and RRM3.
oRRM4 is a non-canonical RRM
The solution structure of oRRM4 was determined by NMR (Table 1). It displays the βαββαβ RRM fold, but with N-terminal and C-terminal α-helices occluding the β-sheet face (Figure 2A), consistent with the secondary structure predictions (Figure 1C). The solvent exposed faces of the flanking α-helices are rich in basic residues, and therefore could serve as a non-canonical RNA binding site. Together, they form a substantial electropositive surface (Figure 2A). As the opposite side of oRRM4 is neutral to electronegative (Figure 2B), any RNA binding by oRRM4 would likely be through the electropositive patch on the N- and C-terminal α-helices.
Figure 2.
The solution structure of oRRM4. (A) Overlay of the 20 calculated structures, with the core RRM colored dark blue and the N-terminal and C-terminal helices colored light blue. The solvent accessible electrostatic surface is shown below, with blue representing electropositive and red electronegative. (B) As in (A), but rotated 180° about the vertical axis.
The solution structure of oRRM4. (A) Overlay of the 20 calculated structures, with the core RRM colored dark blue and the N-terminal and C-terminal helices colored light blue. The solvent accessible electrostatic surface is shown below, with blue representing electropositive and red electronegative. (B) As in (A), but rotated 180° about the vertical axis.The N- and C-terminal α-helices appear to be rigidly attached to the β-sheet face of oRRM4, as evidenced by T1/T2 ratios (Figure 1B). An extensive hydrophobic core forms between the flanking α-helices and the β-sheet face of oRRM4 (Figure 3A and B). The majority of the residues on oRRM4’s β-sheet face that would be exposed in a canonical RRM are hydrophobic (Figure 3C). In addition, the flanking α-helices are amphipathic with a hydrophobic face oriented toward the β-sheet and a hydrophilic face oriented toward solution (Figure 3D and E). Approximately 1000 Å2 of surface area on the β-sheet face is buried by the flanking helices consistent with an extensive and stable interaction.
Figure 3.
The N- and C-terminal α-helices are rigidly attached to the β-sheet face of oRRM4. (A) Buried surface area between the β-sheet of oRRM4 and the N-terminal helices. The C-terminal helix was removed for clarity. Residues that pack against each other are highlighted with spheres. Atoms are colored as follows: hydrogen white, carbon blue (same shade as the associated domain), oxygen red, nitrogen dark blue and sulfur yellow. (B) As in (A), but showing the C-terminal helix. The N-terminal helices were removed for clarity. (C) Residues on the β-sheet face of oRRM4, which would be solvent exposed in the absence of the flanking helices. (D) Helical wheel representation of the larger N-terminal α-helix. (E) As in (D), but for the C-terminal helix.
The N- and C-terminal α-helices are rigidly attached to the β-sheet face of oRRM4. (A) Buried surface area between the β-sheet of oRRM4 and the N-terminal helices. The C-terminal helix was removed for clarity. Residues that pack against each other are highlighted with spheres. Atoms are colored as follows: hydrogen white, carbon blue (same shade as the associated domain), oxygen red, nitrogen dark blue and sulfur yellow. (B) As in (A), but showing the C-terminal helix. The N-terminal helices were removed for clarity. (C) Residues on the β-sheet face of oRRM4, which would be solvent exposed in the absence of the flanking helices. (D) Helical wheel representation of the larger N-terminal α-helix. (E) As in (D), but for the C-terminal helix.The structure of oRRM4 reveals why it lacks all of the canonical RNA binding side chains in the RNP1 (β3) and RNP2 (β1) motifs. The canonical basic and aromatic residues are instead hydrophobic residues that contribute to the binding surface for the flanking α-helices. In contrast, RNP1 and RNP2 motif residues that pack internally to stabilize the overall fold of oRRM4 match the RRM consensus. To see if this conservation pattern is a general feature of non-canonical RRMs, we identified 50 RRMs of known structure and RNA binding activity from the Protein Data Bank (PDB) (Supplementary Table S1), and classified them as canonical (n = 34) or non-canonical (n = 16) based on whether or not they are reported to bind nucleic acid along their β-sheet face. Statistically significant deviations from the consensus identity in the RNP1 and RNP2 motifs of non-canonical RRMs were observed only at the four canonical RNA binding positions (Figure 4).
Figure 4.
Sequence conservation of the RNP1 and RNP2 motifs from 50 RRM domains (see also Supplementary Table S1). oRRM4′s RNP motifs are indicated at the top, with solvent exposed side chains as black text and internal side chains as white text. The canonical RNP sequences are at the bottom: canonical RNA binding residues (red), internal side chains (gray), solvent exposed or off the β-sheet (black). Parentheses indicate residues that are not in the canonical motif, but were counted as canonical in this analysis. RRMs classified as canonical RNA binding are shown as red rectangles; non-canonical as white rectangles. Positions with statistical differences between canonical and non-canonical RRMs based on two-sided Fisher’s exact chi-square test are indicated as *P < 0.05, **P < 0.005, ***P < 0.0005.
Sequence conservation of the RNP1 and RNP2 motifs from 50 RRM domains (see also Supplementary Table S1). oRRM4′s RNP motifs are indicated at the top, with solvent exposed side chains as black text and internal side chains as white text. The canonical RNP sequences are at the bottom: canonical RNA binding residues (red), internal side chains (gray), solvent exposed or off the β-sheet (black). Parentheses indicate residues that are not in the canonical motif, but were counted as canonical in this analysis. RRMs classified as canonical RNA binding are shown as red rectangles; non-canonical as white rectangles. Positions with statistical differences between canonical and non-canonical RRMs based on two-sided Fisher’s exact chi-square test are indicated as *P < 0.05, **P < 0.005, ***P < 0.0005.The results of our analysis suggest that RRMs that deviate from the consensus at these four positions are unlikely to bind RNA in a canonical fashion. In these cases, the RRM fold is maintained for some other purpose, such as non-canonical RNA binding or a protein interaction. However, such a conclusion is not definitive, since there are examples of RRMs binding RNA canonically despite conserving only one or none of the RNA binding positions [Polypyrimidine Tract Binding Protein RRMs 2–4 (37)], and of RRMs conserving three of the RNA binding positions and not binding RNA canonically [Prp24-RRM1 (16), La-RRM1 (38), Y14 (39) and p14 (40)].
oRRM4 binds RNA and unwinds the U6 ISL
Despite the occlusion of the canonical β-sheet RNA binding site, oRRM4′s large electropositive surface suggests that it may be capable of RNA binding. To investigate this, a fluorescence anisotropy assay was used to determine its binding affinity for segments of three different yeast spliceosomal RNAs. RNAs containing the U4/U6 base pairing sequence from U4 (nucleotide 1–18 + 54–64 + 5′-Cy3) and U6 (nucleotide 49–88 + 5′-Cy5) were used (Figure 5A and Supplementary Figure S2). In addition, the region of U2 that base pairs with U6 (nucleotide 1–51 + 5′-fluorescein isothiocyanate) was assayed. oRRM4 is capable of binding all three RNAs with an apparent dissociation constant (Kd) ranging between 3.1 µM for U6 and 7.1 µM for U4 (Figure 5B). The similarity of the Kd’s for three different RNAs suggest that RNA binding is primarily non-specific, consistent with oRRM4 binding RNA predominantly through its electropositive surface.oRRM4 unwinds the U6 ISL. (A) Schematic representation of the U6 RNA construct used in these assays (see also ‘Materials and Methods’ section, Supplementary Figure S2). Colored positions show relative attenuation in (D), and the red star indicates the location of Cy5. (B) Fluorescence anisotropy measurements of binding affinity between oRRM4 and the three RNAs. Calculated apparent Kd values are shown. (C) UV monitored RNA unwinding assay. The arrow indicates a data point collected after the addition of NaCl to 1 M. (D) NMR monitored RNA unwinding assay. Spectra of 10 µM U6 RNA with 0 µM (black), 10 µM (blue) or 20 µM (red) oRRM4.A surprising result of the binding studies is that oRRM4 binding to U6 RNA causes a decrease rather than the expected increase in fluorescence anisotropy (Figure 5B). Our U6 construct forms the U6 ISL (nucleotide 59–88, Figure 5A), and previous work has found that Cy5 can bind nucleic acid helices either at a terminal base pair (41) or in the major groove (42). The decrease in anisotropy upon oRRM4 binding indicates that the fluorophore is able to rotate more freely when the protein is bound. There are two possible explanations for this behavior: oRRM4 binding either displaces the fluorophore from the intact U6 ISL, or disrupts the U6 ISL so that the fluorophore can no longer interact with it.To discriminate between these two possibilities, we assayed the effect of oRRM4 binding on U6 RNA (nucleotide 49–88) secondary structure by monitoring the UV absorbance at 260 nm. While adding bovineserum albumin (BSA) or RRM3 had no effect on the UV absorbance of U6 RNA, adding oRRM4 caused a dramatic increase in UV absorbance (Figure 5C), consistent with unwinding of the ISL. This large change in absorbance is similar to that observed for the heat-induced melting of U6 RNA (data not shown). Adding sodium chloride to 1 M to the oRRM4/U6 complex returns the UV absorbance of the sample to that of folded U6 RNA in the absence of oRRM4 (Figure 5C). This result indicates that the protein-induced destabilization of the ISL is reversible. It is also consistent with an electrostatic interaction between the protein and RNA that is disrupted at high-ionic strength. Attempts at mapping the interaction of RNA to the surface of oRRM4 by NMR chemical shift perturbation were unsuccessful due to the limited solubility of the complex. Although RRM3 is incapable of unwinding the U6 ISL (Figure 5C), chemical shift perturbation data indicate that it is able to bind the ISL (Supplementary Figure S1).In order to verify destabilization of RNA base pairing upon oRRM4 binding, 1D NMR spectra were obtained of the hydrogen bonded imino proton region of U6 RNA (Figure 5D). As oRRM4 is added to U6, imino peaks corresponding to base paired uridines and guanosines in the lower half of the ISL (63–65, 81 and 86) are attenuated relative to those in the upper half of the ISL (70, 77 and 78) (Figure 5A and D). This relative attenuation of a subset of peaks is consistent with oRRM4 destabilizing the lower half of the ISL. While the poor solubility of the oRRM4/U6 complex precludes obtaining NMR spectra of RNA fully bound by protein, the fluorescence anisotropy, UV-monitored hyperchromicity and NMR data all indicate that oRRM4 binding unwinds the U6 ISL in vitro. The unwinding activity could arise from a conformational trapping mechanism in which oRRM4 binds preferentially to a frayed form of the ISL and/or an induced fit mechanism where oRRM4 binding increases the rate of helical breathing.
DISCUSSION
To our knowledge, Prp24-oRRM4 is unique in having both N- and C-terminal flanking α-helices that are stably anchored to the β-sheet face, and which form an electropositive patch suitable for RNA binding. Using the PDBe Fold server (43) plus a manual inspection of PDB structures annotated as RRMs, we were able to identify seven other proteins where RRMs have non-canonical α-helices blocking the β-sheet surface, plus two quasi-RRMs (qRRMs) (Supplementary Figure S3). Eight have only a C-terminal α-helix, while one has only an N-terminal helix. Drosophila melanogasterGW182 (44) and S. cerevisiaeSet1-RRM1 (45) were not reported to bind RNA in vitro, and the C-terminal α-helix is fairly neutral in charge. GW182′s RRM is thought to interact with protein through a hydrophobic cleft (44), while Set1-RRM1 has an unclear regulatory function (45). Homo sapiensp14 has a rigidly attached C-terminal α-helix involved in protein binding (40). The H. sapiensU1A (46,47) and Cstf-64 (48) proteins have a flexible C-terminal α-helix that can interact with the β-sheet face, but which moves away or unfolds upon RNA binding. Two unpublished structures have flanking helices; PDB ID: 2CQ2 has a short N-terminal helix and PDB ID: 2CPY has a C-terminal helix, but both are fairly neutral in charge. Finally, two of the qRRMs in H. sapienshnRNP F have C-terminal helices, but they are neither electropositive nor involved in RNA binding (49).Our data show that RRM3 is capable of binding the U6 ISL (Supplementary Figure S1). Analysis of RRM3 binding to partially randomized libraries of single-stranded RNA using Scaffold Independent Analysis (50) followed by NMR titration experiments indicated that RRM3 binds non-specifically to single-stranded RNA with poor affinity (Kd ∼1 mM) (data not shown). While some residues on the β-sheet of RRM3 are perturbed in the presence of the U6 ISL, the largest changes occur in the β2-β3 loop (Supplementary Figure S1), showing the interaction between RRM3 and the ISL is at least partially non-canonical. Interestingly, a previous structure of an RRM binding double stranded RNA (H. sapiensRBMY) also observed extensive interactions between the β2–β3 loop and RNA, coupled with a canonical interaction on the β-sheet (51).We previously proposed that S. cerevisiaePrp24 is composed of two ‘matchmaker domains’, each of which binds a site on U6 RNA through one RRM responsible for sequence specific recognition and a second RRM, which interacts electrostatically to facilitate unwinding and/or annealing of RNA helices (21). The N-terminal half of Prp24 appears to fit this description—RRM2 binds U6 RNA sequence specifically, positioning a weakly base paired region of U6 near a large electropositive patch on RRM1 (21). Our results are consistent with the C-terminal half of Prp24 also acting as a matchmaker domain. oRRM4 appears to bind RNA through a large electropositive surface, and is capable of destabilizing the U6 ISL, which suggests that it may serve a similar function to RRM1. Although, the U6 ISL is an attractive target for oRRM4′s destabilization activity, we cannot rule out that it may act on other secondary structure elements in the context of full-length U6 RNA, such as the telestem (nucleotide 29–39 + 92–103 in the most recent model) (7,17) or the previously proposed central stem loop (nucleotide 29–59) (11).Based on previous data (21) and results presented here, we propose a mechanistic (Figure 6A) and structural (Figure 6B) model for all four domains of Prp24 bound to U6 RNA. This model extends our previous model of RRMs 1 and 2 bound adjacent to the U6 ISL (21). RRMs 1 and 2 are bound to U6 nucleotide 49–60; RRM2 binds sequence specifically, while RRM1 destabilizes a weakly paired region (nucleotide 54–61 + 86–91) of U6 that extends from the ISL (17,21). RRM3 may provide additional affinity by interacting with the ISL, consistent with chemical shift perturbation studies (Supplementary Figure S1). Finally, the electropositive flanking helices of oRRM4 are positioned at the base of the ISL, where they would be able to disrupt base pairs or capture helical fraying motions. This model shows that the individual domains of Prp24 are physically capable of interacting with U6 RNA in a manner consistent with available data. For example, our model is consistent with previously observed hydroxyl radical footprinting results studying an in vitro complex between Prp24 and U6 snRNA (17). The separation between RRMs 2 and 3 in the model is made possible by a flexible 10 residue linker (16); the presence (but not sequence) of this linker appears universally conserved (data not shown).
Figure 6.
Modeling the Prp24-U6 complex. (A) Schematic model showing a potential mechanism by which Prp24 could recognize and unwind the U6 ISL. (B) Structural model of a complex between Prp24 and U6 RNA (nucleotides 49–91). Colors as in (A), except U6 is now green and the electropositive helices on oRRM4 are light blue. Yellow indicates sites of U6 A62G suppressor mutations (7). This complex corresponds to the final stage shown in (A).
Modeling the Prp24-U6 complex. (A) Schematic model showing a potential mechanism by which Prp24 could recognize and unwind the U6 ISL. (B) Structural model of a complex between Prp24 and U6 RNA (nucleotides 49–91). Colors as in (A), except U6 is now green and the electropositive helices on oRRM4 are light blue. Yellow indicates sites of U6A62G suppressor mutations (7). This complex corresponds to the final stage shown in (A).U6-A62G is a cold-sensitive mutation thought to act through hyperstabilizing the U6 ISL (11). A screen for spontaneous suppressors of A62G found mutations in U6, U4 and RRMs 2 and 3 of Prp24 (7,11). When we map the locations of the A62G suppressor mutations in Prp24 onto our structural model, we find that they are all located near the U6 RNA (Figure 6B). This finding suggests that the source of suppression may be a modulation of Prp24′s RNA binding activity.Beyond demonstrating a potential mechanism by which Prp24 helps anneal U4 and U6, our model suggests a mechanism, by which Prp24 is released from the U4/U6 di-snRNP. The helices in the U4/U6 di-snRNA complex, once formed, are predicted to have a substantially more favorable free energy than the U6 ISL (−37.9 versus −7.0 kcal/mol, respectively, at 1 M NaCl, 37°C) (52,53). If RRM1 and oRRM4 bind preferentially to single stranded or frayed regions of RNA, the increased stability of the U4/U6 complex would eliminate their binding sites. Coupled with the loss of the RRM3 binding site due to ISL unwinding, this would result in release of Prp24. Furthermore, the model makes testable predictions related to the orientation of protein–RNA contacts, and may help guide further investigations into the mechanism of Prp24-mediated assembly of the U4/U6 di-snRNP.
ACCESSION NUMBERS
The structure of oRRM4 has been deposited to the Protein Data Bank (PDB ID: 2L9W) and to the BioMagResBank (BMRB ID: 17490). Backbone resonance assignments for 292–444 and RRM3 have been deposited to the BioMagResBank (BMRB IDs: 17491 and 17589, respectively).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
The National Institutes of Health (NIH) (grant number GM065166). NIH pre-doctoral training grant (grant number GM007215 to S.M.-T.); William R. and Dorothy E. Sullivan Wisconsin Distinguished Graduate Fellowship to S.M.-T. This study made use of the National Magnetic Resonance Facility at Madison, which is supported by the NIH (grant numbers P41RR02301 and P41GM66326); Additional NMRFAM equipment was purchased with funds from the University of Wisconsin, the NIH (grant numbers RR02781 and RR08438); the National Science Foundation (grant numbers DMB-8415048, OIA-9977486 and BIR-9214394), and the United States Department of Agriculture. Funding for open access charge: Grant number R01 GM065166 (to S.E.B. and D.A.B.).Conflict of interest statement. None declared.
Authors: Sjoerd J de Vries; Aalt D J van Dijk; Mickaël Krzeminski; Mark van Dijk; Aurelien Thureau; Victor Hsu; Tsjerk Wassenaar; Alexandre M J J Bonvin Journal: Proteins Date: 2007-12-01
Authors: Nikolaus S Trede; Jan Medenbach; Andrey Damianov; Lee-Hsueh Hung; Gerhard J Weber; Barry H Paw; Yi Zhou; Candace Hersey; Agustin Zapata; Matthew Keefe; Bruce A Barut; Andrew B Stuart; Tammisty Katz; Chris T Amemiya; Leonard I Zon; Albrecht Bindereif Journal: Proc Natl Acad Sci U S A Date: 2007-04-06 Impact factor: 11.205
Authors: Ana Eulalio; Felix Tritschler; Regina Büttner; Oliver Weichenrieder; Elisa Izaurralde; Vincent Truffault Journal: Nucleic Acids Res Date: 2009-03-18 Impact factor: 16.971
Authors: Christopher E Morgan; Jennifer L Meagher; Jeffrey D Levengood; James Delproposto; Carrie Rollins; Jeanne A Stuckey; Blanton S Tolbert Journal: J Mol Biol Date: 2015-05-21 Impact factor: 5.469
Authors: Piotr Wysoczański; Cornelius Schneider; ShengQi Xiang; Francesca Munari; Simon Trowitzsch; Markus C Wahl; Reinhard Lührmann; Stefan Becker; Markus Zweckstetter Journal: Nat Struct Mol Biol Date: 2014-09-14 Impact factor: 15.369
Authors: Ivan A Belashov; David W Crawford; Chapin E Cavender; Peng Dai; Patrick C Beardslee; David H Mathews; Bradley L Pentelute; Brian R McNaughton; Joseph E Wedekind Journal: Nucleic Acids Res Date: 2018-07-27 Impact factor: 16.971