Literature DB >> 30949712

Increased versatility despite reduced molecular complexity: evolution, structure and function of metazoan splicing factor PRPF39.

Francesca De Bortoli¹, Alexander Neumann¹, Ana Kotte¹, Bernd Timmermann², Thomas Schüler³, Markus C Wahl^4,5, Bernhard Loll⁴, Florian Heyd¹.

Abstract

In the yeast U1 snRNP the Prp39/Prp42 heterodimer is essential for early steps of spliceosome assembly. In metazoans no Prp42 ortholog exists, raising the question how the heterodimer is functionally substituted. Here we present the crystal structure of murine PRPF39, which forms a homodimer. Structure-guided point mutations disrupt dimer formation and inhibit splicing, manifesting the homodimer as functional unit. PRPF39 expression is controlled by NMD-inducing alternative splicing in mice and human, suggesting a role in adapting splicing efficiency to cell type specific requirements. A phylogenetic analysis reveals coevolution of shortened U1 snRNA and the absence of Prp42, which correlates with overall splicing complexity in different fungi. While current models correlate the diversity of spliceosomal proteins with splicing complexity, our study highlights a contrary case. We find that organisms with higher splicing complexity have substituted the Prp39/Prp42 heterodimer with a PRPF39 homodimer.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2019 PMID： 30949712 PMCID： PMC6582350 DOI： 10.1093/nar/gkz243

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Splicing is an essential step in precursor messenger RNA (pre-mRNA) processing. It is catalyzed by the spliceosome, a multi-megadalton protein RNA complex (1). This elaborate molecular machine consists of five core components, the U1, U2, U4, U5 and U6 snRNPs, and multiple auxiliary proteins, which are assembled in a stepwise manner for each splicing event on a pre-mRNA substrate. It is crucial that splice site (ss) recognition is consistently accurate, as a single mistake can result in the production of a nonfunctional protein and in some cases in disease. For 5′ss recognition in the early steps of splicing, the U1 snRNP plays a critical role. During this step a single stranded region of the U1 snRNA base pairs with the 5′ss of the pre-mRNA (2). Structural work on the spliceosomal machinery has mainly been conducted in the human or Saccharomyces cerevisiae (referred to as yeast in the following) systems. The yeast system more readily yields stable complexes for structural analysis and serves as a good model for spliceosomes from higher eukaryotes. The human spliceosome is more complex than the spliceosome in yeast, including around 80 additional, mostly non-snRNP, proteins (3,4), that are considered to yield increased functional versatility such as alternative splicing. However, the U1 snRNP seems to represent an exception. The yeast U1 snRNA alone is already ∼3.5 times larger than its human counterpart (5,6). Purified human U1 snRNP contains merely ten proteins (seven Sm proteins, hU1-70K, hU1A, and hU1C). In contrast, seven additional, stably associated proteins (yLuc7, yNam8, yPrp39, yPrp40, yPrp42, ySnu56 and ySnu71) can be found in purified yeast U1 snRNPs. Among the yeast U1 proteins, yPrp42 and ySnu56 have no known human orthologs (7,8). Earlier studies in yeast showed that yPrp39 associates with the U1 snRNP and is necessary for pre-mRNA splicing (9). A cryo-electron microscopy (cryo-EM) study of the yeast U1 snRNP (10) showed that the core components of the U1 snRNP are connected to its auxiliary proteins through a yPrp39/yPrp42 heterodimer. Both yPrp39 and yPrp42 consist mainly of half-a-tetratricopeptide repeats (HAT repeats). HAT-repeat proteins are mostly involved in protein-protein and protein-RNA interactions and act as scaffolding proteins. They participate in various RNA processing pathways, prominent examples being SART3, which is involved in splicing and translesion DNA synthesis (11) and RNA14, which is important for mRNA 3′-end maturation (12). Furthermore, a cryo-EM structure of a yeast pre-catalytic pre-B complex (13) revealed that the contact between the U1 and U2 snRNP is largely mediated by the yPrp42/yPrp39 heterodimer. In the pre-B complex, yPrp39 contacts the U2-specific protein yLea1 (human homolog U2A’) via its N-terminal HAT domain (HAT-NTD), and a positively charged groove in the HAT-NTD of yPrp42 accommodates the U2 snRNA. Moreover, a yeast pre-spliceosome structure (14) provided novel insights into the early spliceosome assembly steps, showing two interfaces between the U1 snRNP and the U2 snRNP. The more stable interaction is mediated by yPrp39 and yLea1. These findings indicate that the yPrp39/yPrp42 heterodimer is crucial for the relative spatial positioning of the U1 and U2 snRNPs during early spliceosome assembly. Together, these studies revealed a central role of the yeast yPrp39/yPrp42 heterodimer as a crucial scaffolding complex involved in protein-protein and protein-RNA interactions at different stages of the splicing cycle. Notably, in human there is no Prp42 ortholog, leading to the question how the yPrp39/yPrp42-based protein/RNA interactions are established in the human spliceosome. Furthermore, surprisingly little is known about Prp39 in higher eukaryotes, although the protein has been demonstrated to be essential in diverse human cell lines in independent CRISPR-screens (15–17) indicating a similar importance in metazoan species as in yeast. A first indication how the loss of Prp42 may be compensated in mammals came from a recent study. Negative stain EM data revealed the shape of hPRPF39 to resemble the yPrp39/yPrp42 heterodimer and co-immunoprecipitation experiments suggest that human PRPF39 (hPRPF39) forms a homodimer (10). Murine PRPF39 (mPRPF39, 94% amino acid identity with hPRPF39; Supplementary Figure S1) initially came to our attention because it is alternatively spliced in murine naïve versus memory T-cells. Given the essential role of Prp39 in spliceosome assembly and to further address the question how the loss of Prp42 is compensated in metazoan, we solved the crystal structure of mPRPF39. Structural and biophysical data reveal mPRPF39 to be arranged as a homodimer that can fulfill the function of the yPrp42/yPrp39 heterodimer. Based on our structure, we investigated the effect of point mutations that interfere with dimerization and we show that monomeric mPRPF39 reduces splicing efficiency in vitro. Furthermore, a phylogenetic analysis suggests a coevolutionary reduction in U1 snRNA length and the loss Prp42 homologs. Organisms that use a PRPF39 homodimer show higher splicing complexity, providing an example for how reduced diversity of spliceosomal proteins can increase splicing versatility.

MATERIALS AND METHODS

Cloning

Open reading frames (ORFs) encoding mPRPF39 were amplified from a murine cDNA sample and cloned into the pGEX-6P-1 (or pCMV-GFP-N3 or pCMV-Flag-N3) vector using the restriction sites BamHI (XhoI) at the 5′-end and SalI (BamHI) at the 3′-end. Point mutations were introduced according to the QuikChange protocol (Agilent). The pGEX-6P-1 vector guides the production of N-terminally glutathione S-transferase (GST)-tagged, PreScission-cleavable fusion proteins. Primer sequences are given in Supplementary Table S1. Constructs were confirmed by sequencing. Nucleotide sequences of all primers used in this study are summarized in Supplementary Table S1.

Protein expression and purification

Proteins bearing an N-terminal, PreScission-cleavable GST-tag were produced in Escherichia coli BL21 pLys cells in LB-medium overnight at 18°C after induction at an OD600 of ∼0.6 with 0.5 mM IPTG. The following steps were performed at 4°C. Cells were resuspended in solubilization buffer (50 mM Tris/HCl, pH 8.4, 1 mM DTT) and lysed by sonication. Cell debris was separated from the soluble fraction by centrifugation for 45 min at 55 900 × g in an Avanti J-26 XP centrifuge (Beckman Coulter). Target proteins were captured on glutathione agarose (Macherey-Nagel) and washed with solubilization buffer. The GST-tag was cleaved with 1:50 PreScission on the column overnight and for the mutant mPRPF39 variants for 4 h. The flow-through was collected and loaded on a 1 ml Q column (GE Healthcare). The protein was eluted with a salt gradient from 0 mM to 500 mM followed by concentration and size exclusion chromatography (SEC) in solubilization buffer using a Superdex 200 column (GE Healthcare). Peak fractions were analyzed by SDS-PAGE. Fractions containing the target proteins were pooled, concentrated, and shock-frozen in liquid nitrogen. For production of selenomethionine-derivatized protein, transformed E. coli Rosetta2 DE3 cells were cultured in defined medium containing selenomethionine (18) and at an OD600 of ∼0.6, protein expression was induced by addition of 0.5 mM IPTG. Purification was carried out as described above for wild type mPrp39 (mPrp39wt), except that all buffers contained 2 mM DTT. Integrity of the purified proteins was confirmed by MALDI-TOF mass spectrometry.

Crystallographic procedures

Crystals of the native protein were obtained by the sitting-drop vapor-diffusion method at 4°C with a reservoir solution composed of 0.1 M Bis–Tris–propane/HCl, pH 6.5, 1.8 M sodium acetate. Crystals of selenomethionine-labeled protein were obtained by the hanging-drop method at 18°C with a reservoir solution composed of 0.1 M Bis–Tris–propane/HCl, pH 6.5, 1.6 M sodium acetate, 0.75 mM TCEP. Crystals were cryo-protected with a solution composed of 80% mother liquor and 20% (v/v) propylene glycol and subsequently flash-cooled in liquid nitrogen. Synchrotron diffraction data were collected at the beamlines of the MX beamlines of the BESSY II (Berlin, Germany) and at beamline P14-2 of PETRA III (DESY Hamburg, Germany). Diffraction data were processed with XDS (Table 1) (19). Experimental phases were determined by the selenomethionine single-wavelength anomalous diffraction method with the AUTOSOL routine in PHENIX (20) using PHASER (21) and SOLVE/RESOLVE (22), using diffraction data collected from a selenomethionine-labeled mPRPF39 crystal (Table 1). An initial, partial model of mPRPF39 was built with the program AUTOSOL in PHENIX (20) and completed manually by cycles of maximum-likelihood restrained refinement using PHENIX (23,24) including TLS refinement (25) model building with COOT (26). Model quality was evaluated with MolProbity (27) and the JCSG validation server. (28) Secondary structure elements were assigned with DSSP (29) and ALSCRIPT (30) was used for displaying sequence alignments generated by ClustalOmega (31). Structure figures were prepared using PyMOL.

Table 1.

Crystallographic data collection and model refinement statistics

Data collection	SeMet	Native
Wavelength [Å]	0.9763	0.9184
Temperature [K]	100	100
Space group	C2	C2
Unit cell parameters
a, b, c [Å]; β [°]	189.2, 73.0, 206.7; 112.4	189.5, 72.8, 207.1; 112.5
Resolution range [Å]^a	50.00–3.80 (3.90–3.80)	50.00–3.30 (3.45–3.30)
Reflections^a
Unique	49 231 (3671)	39 170 (4809)
Completeness [%]	98.1 (97.6)	98.5 (97.9)
Multiplicity	9.9 (9.9)	4.7 (4.8)
Data quality^a
Intensity [I/σ(I)]	11.8 (1.5)	9.4 (0.9)
R_meas [%]	17.7 (168.3)	14.9 (260.3)
CC_1/2	99.9 (83.7)	99.8 (63.7)
Wilson B value [Å²]	120.0	102.1
Number of selenium atoms	25	-
FOM	0.28	-
BAYES-CC	2.9	-
Refinement
Resolution range [Å]^a		50.00–3.30 (3.42–3.30)
Reflections^a		38 901 (3770)
R _work ^a		0.246 (0.443)
R _free ^a		0.293 (0.443)
Contents of an asymmetric unit
Residues, Atoms		1019, 8592
Mean B-factor [Å²]		154.5
RMSD from target geometry
Bond lengths [Å]		0.005
Bond angles [°]		1.03
Validation statistics
Ramachandran plot
Residues in allowed regions [%]		4.6
Residues in favored regions [%]		95.7
Molprobity score^b		2.1
Molprobity Clashscore ^b,c		16.2

aData for the highest resolution shell in parentheses.

bCalculated with MOLPROBITY (27).

cClashscore is the number of serious steric overlaps (>0.4) per 1000 atoms.

Crystallographic data collection and model refinement statistics aData for the highest resolution shell in parentheses. bCalculated with MOLPROBITY (27). cClashscore is the number of serious steric overlaps (>0.4) per 1000 atoms.

Size exclusion chromatography coupled to multi-angle light scattering (SEC-MALS)

SEC-MALS experiments were performed at 18°C with buffer containing 50 mM Tris/HCl, pH 8.4, 300 mM NaCl, 0.02% NaN3. 80 μg of mPRPF39wt or of mPRPF39 variants were loaded onto a Superdex 200 increase 10/300 column (GE Healthcare) that was coupled to a miniDAWN TREOS three-angle light scattering detector (Wyatt Technology) in combination with a RefractoMax520 refractive index detector. For calculation of the molecular mass, protein concentrations were determined from the differential refractive index with a specific refractive index increment (dn/dc) of 0.185 ml/g. Data were analyzed with the ASTRA 6.1.4.25 software (Wyatt Technology).

Immuno-biochemical assays

For immunoprecipitations (IPs), HEK293 cells were lysed in RIPA buffer (10 mM Tris/HCl, pH7.5, 100 mM NaCl, 2 mM EDTA, 1% (v/v) NP-40, with proteinase inhibitors). 100 mg of lysate were diluted in 400 ml RIPA buffer containing 2% (w/v) BSA and pre-cleaned for 1 h using protein A/G Sepharose (Invitrogen). Pre-cleaned lysates were incubated with protein αFlag-beads, rotating at 4°C overnight. Beads were washed four times in RIPA buffer. After the last wash, SDS-sample buffer was added, samples were boiled and analyzed by SDS-PAGE and Western blot.

T cells and RNA sequencing

Naïve and memory CD8+ T cells were generated using the OT-I system as described previously (32,33). RNA-Seq was done essentially as described (34). Briefly, total RNAs were prepared using RNA-Tri (Bio&SELL) and further purified using the RNeasy mini kit (Qiagen) in combination with a DNase I (Qiagen) treatment. RNA sequencing libraries were prepared by using the TruSeq mRNA Library Preparation kit (Illumina). 125-bp paired-end reads were generated by using a HiSeq 2500 sequencer (Illumina) with V4 sequencing chemistry. Triplicate samples from naïve and memory T cells were sequenced (around 40 × 106 reads per sample) and analyzed using a MISO-based pipeline (35).

In vitro splicing

Splicing-active nuclear extracts were prepared according to (36), except HEK cells harvested at ca. 80% confluency were used. For splicing analysis, m7G-capped RNAs were produced by in vitro transcription using linearized plasmid as template. In vitro splicing assays were performed in 52% HEK nuclear extract, incubating 1 fmol of pre-mRNA and 5 μg of protein or equivalent volume of purification buffer per 25-μl reaction mixture under splicing conditions for 30 min. The reaction was stopped by proteinase K treatment, followed by RNA extraction. Twenty percent of the RNA was reverse-transcribed and analyzed by RT-PCR.

RT-PCR

RT-PCRs were done as previously described (37) Briefly, RNA was extracted using RNATri (Bio&Sell) and 1 mg RNA was used in a gene-specific RT-reaction. Low-cycle PCR with a 32P-labeled forward primer was performed, products were separated by denaturing PAGE and quantified using a Phosphoimager and ImageQuantTL software.

Bioinformatics

U1 snRNA sequences were extracted from Rfam (38) and NCBI (39). Protein sequences of Prp39 and Prp42 orthologs were extracted from NCBI and the canonical isoform was defined using UniProt (40). RNA secondary structure predictions were performed using the RNAstructure Web Server (41). ClustalOmega was used for multiple sequence alignments (31). The common tree taxonomy tools from NCBI was used to create a phylogenetic tree using selected species. The tree was visualized using Interactive Tree Of Life (42).

RESULTS

mPRPF39 forms a homodimer

Full-length M. musculus mPRPF39 was purified, crystallized and the structure was determined at 3.3 Å. The electron density was of excellent quality that readily allowed to trace the amino acid sequence (Supplementary Figure S2). The asymmetric unit contains two mPRPF39 polypeptide chains that are practically indistinguishable, with a root mean square deviation (rmsd) of 0.39 Å. The protein is purely α-helical, consisting of 12 pairs of anti-parallel α-helices arranged as half-a-tetratricopeptide (HAT) repeats. The overall architecture of mPRPF39 can be divided into three distinct domains/regions: An N-terminal domain (HAT-NTD; residues 74–251) comprised of five HAT repeats and a C-terminal domain (HAT-CTD; residues 348–602) with seven HAT repeats. The HAT-NTD and -CTD are connected by a curved, mainly α-helical linker region (residues 252–347; Figure 1A and B).

Figure 1.

mPRP39 is a homodimer in structure and solution. (A) Schematic representation of the domain architecture of mPRPF39. The HAT-NTD and HAT-CTD are colored in gradients from yellow to orange and teal to aqua, respectively; linker – pale green. Regions that are unstructured are indicated as dotted lines. (B) Cartoon and combined cartoon/surface representation of the mPRPF39 homodimer in two orientations. Ellipse symbol – non-crystallographic pseudo-twofold axis. One protomer is colored as in panel A, the other protomer is in gray. HAT repeats are numbered 1 through 12 starting at the N-terminus. In this and the following figures, orientations relative to panel A are indicated by rotation symbols. (C) Analysis of the oligomeric state of mPRPF39 by SEC-MALS. Molecular mass (Mm) values across the elution peak (solid line; absorbance at 280 nm) are indicated by the dashed line. Average Mm = 157 000 Da; theoretical monomeric mPRPF39wt Mm = 78335 Da. (D) Western blot of an αFlag-IP of a cellular mixture of Flag- and GFP-tagged mPRPF39. GFP-mPRPF39 can only be pulled down in the presence of Flag-mPRPF39. Asterisk indicates an unspecific band. The linker region is composed of two short α-helices (Lα1 and Lα2) and an unstructured region (residues 275–294), which connects the short α-helices to a long, curved helix (Lα3). We did not observe interpretable electron density for the 73 N-terminal as well as the 63 C-terminal residues, most likely due to their flexibility. The two mPRPF39 molecules are arranged in an anti-parallel manner, forming a homodimer (Figure 1B). Such a homodimeric organization has been suggested before for hPRPF39 (10). Homodimerization is mediated through the concave surfaces of the HAT-CTDs, with the HAT-NTDs facing away from the dimer interface. The overall architecture resembles the arrangement as observed for other HAT-repeat proteins involved in mRNA processing, such as SART3 (43) and CSTF77 (44). In mPRPF39, ∼1536 Å2 of the surface area of each monomer are buried at the interface of the dimer, and the PISA server (45) suggests the presence of a stable dimer in solution. To further investigate whether the dimer exists in solution, we performed analytical size exclusion chromatography coupled to multi-angle light scattering (SEC-MALS) to precisely determine the molecular mass of mPRPF39 in solution. Our SEC-MALS experiments revealed a molecular mass of 157 kDa, confirming the presence of dimeric mPRPF39 in solution (calculated monomeric molecular mass 78.3 kDa; Figure 1C). In addition, FLAG- and GFP-tagged mPRPF39 were co-transfected into HEK 293 cells and a FLAG-IP was performed. GFP-mPRPF39 was co-precipitated in the presence of FLAG-mPRPF39 (Figure 1D). Thus, the mPRPF39 homodimer is also formed in a cellular environment, in agreement with similar findings for hPRPF39 (10).

The mPRPF39 homodimer can be disrupted by single point mutations

We mapped the amino acid sequence conservation on the surface of our mPRPF39 structure using The ConSurf Server (46). A highly conserved surface patch is located on the concave side of the HAT-CTD, exactly matching the dimerization interface (Figure 2A and B), indicating that presumably the homodimeric arrangement is conserved in higher eukaryotes. Mapping the electrostatic potential to the surfaces of the mPRPF39 structure revealed two areas in the concave surface of the HAT-CTD with a strongly positive and negative charge, respectively (Figure 2C). Upon dimerization, these positive and negative patches reciprocally interact with each other (Figure 2D). Although homodimerization is thus dominated by charged interactions, the dimer is highly salt-resistant, as even NaCl at a concentration of 900 mM did not lead to disruption of the dimer (Supplementary Figure S3).

Figure 2.

Impact of dimerization on splicing. (A) Surface representation of a mPRPF39 monomer. Interface and non-interface residues are colored teal and grey, respectively. View identical to the lower panel of Figure 1B. (B) Sequence conservation projected onto the surface of mPRPF39. Color-coding according to bins: bin 9 (magenta) contains the most conserved positions, bin 1 (cyan) contains the most variable positions. The homodimer interface on the concave side of the HAT-CTD is highly conserved. (C) Electrostatic potential mapped on the surface of mPRPF39. The homodimer interface on the concave side of the HAT-CTD is carpeted by complementary patches of positive and negative surface potential. (D) Zoom into the dimerization interface. Amino acid residues subjected to site-directed mutagenesis are indicated in light blue (mPRPF39R458D), blue (mPRPF39R464D), red (mPRPF39E576K/D577K) and black (mPRPF39Y536W). Same view as upper panel in Figure 1B. (E) Analysis of the oligomeric states of mPRPF39 variants in solution by SEC-MALS. Molecular mass (Mm) values across the elution peaks (solid lines; absorbance at 280 nm; legend provided in the inset) are indicated by the dashed lines, average Mm values are indicated next to the respective peaks. Theoretical monomeric mPRPF39wt Mm = 78335 Da. mPRPF39wt and mPRPF39Y536W migrate as dimers, mPRPF39R458D forms a mixture of molecular species, mPRPF39R463D and mPRPF39E576K/D577K migrate as monomers. (F) Schematic representation of the modified 3-exon/2-intron Adml construct used for in vitro splicing. (G) Exemplary gel of RT-PCR analysis of an in vitro splicing reaction, comparing buffer-, mPRPF39wt- and mPRPF39R464D-treated samples. mPRPF39wt and mPRPF39R464D treatments lead to elevated splicing and significantly reduced splicing. The asterisk indicates a non-specific band. (H) Quantification of RT-PCRs of four biologically independent in vitro splicing experiments performed in technical triplicates. Values represent means ± standard deviations. Significance was estimated via a one-sample t-test (*P ≤ 0.05; **P ≤ 0.01). To test the importance of interacting residues for homodimerization, we exchanged conserved contact residues (Figure 2D). R458 and R464 are two highly conserved residues in the positive interface patch, which we individually mutated to oppositely charged residues (mPRPF39R458D, mPRPF39R464D). Likewise, E576 and D577 are highly conserved in the negative interface patch, which we jointly converted to positively charged residues (mPRPF39E576K/D577K). Y536 is the only aromatic residue in the interface, which we exchanged for a tryptophan to potentially disrupt the dimer due to steric hindrance (mPRPF39Y536W). The dimerization properties of all variants were investigated by SEC-MALS and compared to those of wild type (wt) mPRPF39. Dimerization was unaffected and partially lost in the mPRPF39Y536W and mPRPF39R458D variants, respectively (Figure 2E). mPRPF39R464D and mPRPF39E576K/D577K clearly migrated as monomers in SEC. These results confirm that dimerization is based on the interactions between the conserved, oppositely charged interface patches in the HAT-CTD of mPRPF39 and demonstrate that exchange of a single residue (mPRPF39R464D) can lead to complete disruption of the dimer.

mPRPF39 homodimerization is functionally important

To test if the observed homodimerization is required for the functionality of mPRPF39, we performed in vitro splicing assays with mPRPF39wt and a monomeric version of mPRPF39. For the functional assays the R464D exchange was chosen, as it led to complete monomerization with the least number of residue exchanges. A modified Adml construct (Figure 2F) was used for in vitro splicing and the 2-exon product, representing the main product, was quantified. Splicing extracts treated with mPRPF39wt showed a trend towards higher splicing efficiency (Figure 2G and H), but the effect was not significant when compared to the samples treated only with buffer. This suggests that PRPF39 is not limiting in nuclear extracts, so that exogenous PRPF39 does not strongly affect splicing. In contrast, samples treated with mPRPF39R464D showed significantly reduced splicing levels compared to both, samples treated with only buffer or mPRPF39wt. These findings indicate that the mPRPF39 homodimer represents the functional unit of mPRPF39 and is required for splicing.

mPRPF39 is an NMD target

Our data indicate that reducing the level of the active mPRPF39 homodimer can be used to control splicing efficiency. Interestingly, the prpf39 gene has an alternative exon which contains a premature stop codon (Figure 3A). The alternative exon and its flanking regions are highly conserved between mouse and human (Supplementary Figure S4). Upon inclusion of the alternative exon, the premature stop codon could potentially lead to NMD and might thus regulate PRPF39 levels. RNA-Seq and RT-PCR validation in independent samples of murine T-cells showed that inclusion of the alternative exon is strongly increased in memory versus naïve T cells (Supplementary Figure S5) Testing different murine tissue types, we observed that inclusion of the alternative exon is particularly high in testis and low in lymph node (Figure 3D).

Figure 3.

Alternative splicing of prpf39 pre-mRNA with resulting mRNA and protein isoforms. (A) An alternative exon (orange) can either be included or excluded. Upon exon exclusion, the mRNA can be translated to full-length mPRPF39; upon exon inclusion, a premature stop codon is introduced, which may lead to NMD or production of a non-functional truncated mPRPF39. Red arrows indicate location of primers used for RT-PCR. (B) Representative validation of RNA-seq results by RT-PCR. (C) Quantification of validation as shown in panel B. Percent spliced in (PSI) values represent means ± standard deviations for four independent experiments. (D) Exemplary gel of RT-PCR in different murine tissues. The degree of alternative exon inclusion is tissue specific, with the highest inclusion in testis and the lowest in lymph node. Results of quantification of experiments performed with samples from three individual mice are shown below the respective lanes. Values represent means ± standard deviations. (E–H) Analysis of mprpf39 (pre-)mRNAs in mouse EL4 cells (E, F) and human Jsl1 cells (G, H). Left – representative RT-PCR analyses; right – quantification. DMSO/CHX – DMSO/cycloheximide-treated samples. In the CHX treated cells, an accumulation of the transcript containing the alternative exon can be seen, indicating that this isoform is an NMD-target. Values represent means ± standard deviations of three independent experiments. Significance was estimated via a one-sample T-test (*P ≤0.05, **P ≤0.01). Detection of the exon in the mRNA suggests that the inclusion mRNA isoform is not a strong NMD target. To address whether inclusion leads to degradation at all, cells of a murine and human T-cell line (EL4 and Jsl1, respectively) were treated with cycloheximide (CHX). RT-PCR demonstrated an increase in the isoform containing the alternative exon in cells treated with CHX (Figure 3E–H), confirming that inclusion of the alternative exon induces NMD. This effect was moderate, indicating that the corresponding mRNA is only a weak NMD target. However, regardless of whether the inclusion isoform is degraded or translated, this isoform will lead to reduced functional protein (Supplementary Figure S6), as the translated product would be missing the C-terminal domain involved in dimerization and will thus represent a dominant negative protein isoform. Furthermore, the importance of the alternative exon to regulate PRPF39 protein levels is supported by its conservation and the conserved effect as weak NMD exon between mouse and human.

A PRPF39 homodimer might functionally substitute for the yPrp39/yPrp42 heterodimer in metazoans

Although a search using the DALI server (47) identified murine CSTF-77 (PDB entry 2OOE (44)), its Kluyveromyces lactis ortholog RNA14 (PDB entry 4EBA (12)) and human SART3 (PDB entry 5JPZ, DOI: 10.2210/pdb5JPZ/pdb) as the closest structural neighbors of mPRPF39 (Supplementary Table S2), the structurally most similar snRNP proteins were yPrp39 and yPrp42 (PDB entry 5UZ5 (10); rmsd of 7.6/4.5 Å for 583/539 number of residues, respectively). While all three proteins share the same domain organization, mPRPF39 and yPrp42 contain an additional α-helix in their HAT-NTDs (Supplementary Figure S7 and Supplementary Discussion). Moreover, while the linker region is the structurally most divergent part in the three proteins (Supplementary Figures S7 and S8 and Supplementary Discussion), the yPrp42 linker with two α-helices is more similar to the three-helix mPRPF39 linker than the yPrp39 linker that bears only a single α-helix. Thus, somewhat surprisingly, mPRPF39 is structurally more similar to yPrp42 than to the formal ortholog, yPrp39. Consistently, mPRPF39 exhibits a slightly higher sequence identity to yPrp42 than to yPrp39 (23 vs. 22%, respectively). But notably in human, hPRPF39 shows a sequence identity of 17% to yPrp42 compared to 18% to yPrp39. Hence, PRPF39 may be considered an ortholog of both, yPrp39 and yPrp42. Because of the similar central architecture of mPRPF39, yPrp39 and yPrp42, also the mPRPF39 homodimer is globally similar to the heterodimer of yPrp39/yPrp42 (Figure 4A). In both complexes the interaction is established via the respective HAT-CTDs. The mPRPF39 homodimer is more elongated than the yPrp39/yPrp42 heterodimer and it shows less curvature in its bridging HAT-CTDs (Figure 4B and C). In contrast to the mPRPF39 homodimer interface, the interface of yPrp39 and yPrp42 is dominated by hydrophobic interactions. Only one salt-bridge is established between the yPrp39 CTD extension and the yPrp42 HAT-NTD.

Figure 4.

Comparison of mPRPF39, yPrp39 and yPrp42 structures. (A) Schematic representation of the domain architecture of mPRPF39, yPrp39 and yPrp42. Domain coloring as in Figure 1A. Regions that are unstructured are indicated as dotted lines. (B) Ribbon diagram of the mPRPF39 homodimer (teal and palegreen) (C) Ribbon diagram of yPrp42/yPrp39 heterodimer (dark and light gray). yPrp42 is colored in dark grey and yPrp39 in light gray. The dimensions of the dimer are indicated in Å. Same view as in the lower panel of Figure 1B. As (i) metazoans lack a yPrp42 ortholog, (ii) mPRPF39 shows higher sequence and structural similarity to yPrp42 than to yPrp39, (iii) mPRPF39 forms a homodimer, (iv) the mPRPF39 homodimer resembles the yPrp39/yPrp42 heterodimers and (v) monomeric mPRPF39 has a detrimental effect on splicing, a PRPF39 homodimer is likely to functionally substitute the yPrp39/yPrp42 heterodimer in metazoan species. Consistent with this notion, the area on yPrp42 that mediates protein-protein contacts connecting the core U1 snRNP to its auxiliary proteins, is highly conserved in mPRPF39 (Supplementary Figure S9).

PRPF39 homodimerization is evolutionarily connected to a short U1 snRNA

The yeast U1 snRNA has a strongly elongated stem-loop (SL) 2 compared to its human counterpart (Figure 5). It consists of two sub-domains, the conserved SL2-1 and elongated SL2-2 (Figure 5). Interestingly an evolutionary study focusing on the SL3 (48) suggested that the duplication of prp39 went along with an increasing length of SL3. The recent cryo-EM structure of the yeast U1 snRNP indeed revealed an interface between the sub-domain SL3-4 (Figure 5B) and yPrp42 (10). However, the buried surface area between yPrp42 and SL3-4 is much smaller than between yPrp42 and SL2-2, indicating a larger influence of SL2-2 on the evolution of the dimeric system. The convex side of the yPrp42 HAT-NTD has a strongly positively charged groove that accommodates SL2-2 of U1 snRNA (Figure 5A). A similar positive patch is not seen on mPRPF39 (Figure 5A), most likely because the U1 SL2 is much shorter in metazoans than in yeast (Figure 5B) and will thus not reach this surface area of PRPF39. This observation led us to address the question, whether PRPF39 homodimerization in metazoan might be evolutionarily linked to a short U1 SL2. The sequence lengths of all annotated U1 snRNAs from Rfam (38) range between 96 and 825 nucleotides (nts). Notably, higher eukaryotic U1 snRNAs all lie on the lower end of the range, with most of them having a length of 164 nts (Figure 6B). In contrast, fungi tend to have longer U1 snRNA sequences, most of them over 450 nts (5). The longer U1 snRNA likely goes hand in hand with the presence of a yeast-like SL2. However, we found a few fungi with short U1 snRNAs; e.g., Aspergillus fumigatus and Debaryomyces hansenii have U1 sequence lengths of 149 and 165 nts, respectively. When performing secondary structure predictions (41), we observed that their U1 snRNA structures resemble the human U1 structure with a short SL2 (Supplementary Figure S10). The Candida albicans snRNA with an intermediate length of 244 nts shows a shortened SL3, but a yeast-like SL2.

Figure 5.

Figure 6.

Phylogenetic analysis of U1 snRNA length and presence of a hetero- or homodimer. (A) Phylogenetic tree of the organisms included in the analysis. Phylogenetic distances are not to scale but are given in Supplementary Figure S11. Fungal clade – orange, metazoan clade – black. On the right side, the respective median U1 snRNA lengths are plotted as a bar chart (organisms with an annotated Prp42 – black; organisms lacking an annotated Prp42 – grey). Most fungi, except A. fumigatus and D. hansenii, have a Prp42 ortholog. A. fumigatus and D. hansenii also have short U1 snRNAs. A list of all NCBI GeneIDs and NCBI sequence identifiers can be found in Supplementary Table S3. (B) The number of organisms plotted against the median U1 snRNA length. There are two populations that cluster around 160 nts and 550 nts.

U1 snRNA binding to yPrp42. (A) Electrostatic potential mapped on the surface of yPrp42 (left) and mPRPF39 (right) in the same orientation as in the upper panel of Figure 1B, with the U1 snRNA SL2-2 shown as cartoon bound to yPrp42. Only yPrp42 shows the positively charged grove needed to accommodate the RNA. Same view as in Figure 1B. (B) Schematic representations of yeast U1 snRNA (left) and of murine U1 snRNA (right). Characters in red indicate nucleotides that are not conserved between human and murine U1 snRNA. The pale orange box highlights the elongated SL2-2 in yeast. Phylogenetic analysis of U1 snRNA length and presence of a hetero- or homodimer. (A) Phylogenetic tree of the organisms included in the analysis. Phylogenetic distances are not to scale but are given in Supplementary Figure S11. Fungal clade – orange, metazoan clade – black. On the right side, the respective median U1 snRNA lengths are plotted as a bar chart (organisms with an annotated Prp42 – black; organisms lacking an annotated Prp42 – grey). Most fungi, except A. fumigatus and D. hansenii, have a Prp42 ortholog. A. fumigatus and D. hansenii also have short U1 snRNAs. A list of all NCBI GeneIDs and NCBI sequence identifiers can be found in Supplementary Table S3. (B) The number of organisms plotted against the median U1 snRNA length. There are two populations that cluster around 160 nts and 550 nts. We then queried NCBI for hPRPF39, yPrp39 and yPrp42 orthologs. Interestingly, only fungi contain Prp42 proteins (Figure 6A and Supplementary Figure S11). Notably, we found that the only fungi with an annotated Prp39 but no annotated Prp42 homolog are D. hansenii, Y. lipolytica, S. pombe, A. fumigatus and N. crassa (Table 2, Figure 6A and Supplementary Figure S11), which also harbor short U1 snRNAs with a metazoan-like SL2 and SL3. Thus, organisms that lack Prp42, and thus most likely are dependent on Prp39 homodimers, also contain U1 snRNAs with a short SL2 and SL3. In contrast, the U1 snRNAs of organisms that have both Prp39 and Prp42, and that thus most likely use a Prp39/Prp42 heterodimer, exhibit an extended SL2 and SL3.

Table 2.

Analysis of splicing complexity in connection with U1 snRNP properties. The length of SL2 and SL3 was estimated based on secondary structure predictions of the respective U1 snRNA. The length of SL2 and SL3 of S.cerevisiae is based on the structure of U1 snRNP (10). Intron containing genes are given in [%] and the maximal number of introns per gene is given according to Ivashchenko et al. (49)

species	Prp39/Prp42	Prp(f)39/Prp(f)39	U1 snRNA [nts]	SL2 [nts]	SL3 [nts]	Intron-containing genes [%]	max. number of introns per gene
C. glabrata	x		595	147	396	1.5	2
K. lactis	x		528	132	330	2.4	2
E. gossypii	x		483	134	276	4.5	2
S.cerevisiae	x		568	122	377	4.5	2
D. hansenii	x		165	45	28	5.0	4
Y. lipolytica	x		150	43	14	10.6	4
S. pombe		x	149	44	22	45.6	15
A. fumigatus		x	149	46	18	78.0	18
N. crassa		x	130	43	22	79.6	11

PRPF39 homodimerization and short U1 snRNA correlate with higher splicing complexity

A previous analysis the exon-intron structure of genes in fungal genomes revealed a large variation in the percentage of intron containing genes (from 0.7 to 97) and in the number of introns per gene (2 to 18) as well as a lack of correlation between splicing complexity and genome size, number of chromosomes or number of genes (49). We noticed that fungi with less complex splicing have long U1 snRNAs and a Prp39/42 heterodimer, while fungi with more complex splicing have short U1 snRNAs and a Prp39 homodimer (Table 2). Initially, D. hansenii and Y. lipolytica seemed to represent exceptions to this trend (relatively low splicing complexity and short U1 snRNAs). However, BLAST searches with the S. cerevisiae yPrp39 or yPrp42 orthologs uncovered an additional Prp39/Prp42-like protein in these species, in addition to the annotated Prp39 ortholog. These observations suggest that D. hansenii and Y. lipolytica occupy intermediate phylogenetic positions, with a Prp39/Prp42 heterodimer, short U1 snRNAs yet only slightly elevated splicing complexity (Table 2). Our observations suggest that lack of a Prp42 ortholog, and thus a Prp39 homodimer, in conjunction with a short U1 snRNA enabled the development of more complex splicing patterns. While we cannot pinpoint a particular sequence of evolutionary events, these findings clearly demonstrate that presence of a Prp39/Prp42 heterodimer phylogenetically correlates with U1 snRNAs bearing a long SL2 and SL3. Moreover, the shortening of the U1 SL2 and SL3 alone appears to allow only slightly higher splicing complexity (D. hansenii, Y. lipolytica), but it is the absence of a Prp42 homolog that then causes dramatically increased levels of splicing complexity. This indicates that relying on a Prp(f)39 homodimer is the driving force behind diversifying splicing in these organisms.

DISCUSSION

We have determined a crystal structure of full-length mPRPF39 composed of a HAT-NTD that is connected via an extended linker to the HAT-CTD. This structure in combination with biophysical and CoIP assays demonstrates that mPRPF39 forms a stable homodimer in vitro and in vivo. As disruption of the mPRPF39 homodimer has detrimental effects on splicing, we conclude that mPRPF39 functions as a homodimer in splicing, by functionally replacing the Prp39/Prp42 heterodimer of the yeast system. Given the high sequence conservation, in particular at the dimer interface, PRPF39 most likely forms similar homodimers in all eukaryotes lacking a Prp42 ortholog. Prp39 and Prp42 presumably emerged from the same ancestral gene by gene duplication in some fungi. Our detailed structural and phylogenetic comparisons revealed that PRP(F)39 in higher eukaryotes should rather be considered orthologs of yeast Prp42 than Prp39. There are ∼80 proteins in the human spliceosome that lack an obvious counterpart in yeast, but only few examples, such as Prp42, for which a yeast spliceosomal protein lacks an assigned ortholog in human (3,4). We find that the level of functional mPRPF39 has a large impact on splicing in vitro. Furthermore, mPRPF39 expression is controlled by alternative splicing leading to either NMD or a dominant negative protein incapable of forming a dimer, in a tissue- and activation-dependent manner. These data suggest a role of PRP(F)39 in adapting splicing efficiency to the requirements of specific cells or tissues. In particular in immune cell differentiation and activation it has been shown that regulated intron retention plays an important role in controlling gene expression and function (50). Controlling PRPF39 levels through inclusion of an exon containing a premature sop codon may contribute to fine tune splicing efficiency and intron retention in such contexts, as we find it strongly regulated between naive and memory T cells. Our phylogenetic analysis show that lack of a Prp42 ortholog (as presently assigned) correlates with more complex splicing patterns. While it is straightforward to imagine mechanisms, by which splicing complexity may be increased in a molecularly more complex spliceosome, the converse scenario appears counter-intuitive. The S. cerevisiae Prp39/Prp42 heterodimer is crucial in organizing the spliceosome during early steps of splicing (13,14). However, while in yeast the Prp39/Prp42 heterodimer is stably associated with the U1 snRNP, hPRPF39 is only transiently associated with the spliceosome. A less stable association of splicing factors with the spliceosome may allow for more flexibility in splicing, e.g. with respect to splice site choice, with respect to enabling other modes of alternative splicing or with respect to tuning the ratio of productive and abortive splicing, by rendering particular splicing events dependent on additional factors or signals. In the present case, increased splicing complexity correlates not only with the lack of a Prp42 ortholog but also with a U1 snRNA bearing a short SL2 and SL3. These two features seem to allow for more flexibility in the apposition of U1 and U2 snRNPs during the early steps of splicing, as PRPF39 and a short U1 SL2 and SL3 are less tightly associated compared to yPrp42 and a long U1 SL2 and SL3. While the yPrp39/yPrp42 heterodimer mediates a stable connection between the U1 and U2 snRNPs in a pre-spliceosome (A-complex (14)) and the spliceosomal pre-B complex in yeast (13), in the human spliceosome stable inter-U1/U2 snRNP contacts are lacking in the pre-B state (51). Thus, PRPF39-mediated U1-U2 bridging in the human system seems to depend on additional proteins and this dependency may be one of the molecular principles, by which splicing is additionally regulated. Other splicing factors have also been reported to be ‘reprogrammed’ from stable to facultative snRNP subunits. For example, Prp38 and Snu23 are stable snRNP components in yeast but in human they enter the spliceosome individually upon integration of the tri-snRNP into the (pre-)B complex (52). The present example provides a particularly striking example that splicing complexity does not necessarily scale with the number of spliceosomal protein—even though the yPrp39/yPrp42 heterodimer in yeast represents a molecularly more complex spliceosomal subunit, the molecularly more simple PRP(F)39 homodimer in metazoan species correlates with higher flexibility in spliceosome assembly and thus splicing complexity. The increased splicing complexity seems to be provided at least in part by a reduced pre-organization of the U1 snRNP, conferred by an absent PRP(F)39-U1 snRNA interface, and in part by alternative splicing of the prp(f)39 gene itself, which controls the level of functional PRP(F)39.

DATA AVAILABILITY

The atomic coordinates and structure factor amplitudes have been deposited in the Protein Data Bank under the accession code 6G70. Click here for additional data file.

52 in total

1. IFN-γ regulates CD8+ memory T cell differentiation and survival in response to weak, but not strong, TCR signals.

Authors: Diana Stoycheva; Katrin Deiser; Lilian Stärck; Gopala Nishanth; Dirk Schlüter; Wolfgang Uckert; Thomas Schüler
Journal: J Immunol Date: 2014-12-05 Impact factor: 5.422

2. Use of TLS parameters to model anisotropic displacements in macromolecular refinement.

Authors: M D Winn; M N Isupov; G N Murshudov
Journal: Acta Crystallogr D Biol Crystallogr Date: 2001-01

3. Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank.

Authors: Huanwang Yang; Vladimir Guranovic; Shuchismita Dutta; Zukang Feng; Helen M Berman; John D Westbrook
Journal: Acta Crystallogr D Biol Crystallogr Date: 2004-09-23

4. Dali server: conservation mapping in 3D.

Authors: Liisa Holm; Päivi Rosenström
Journal: Nucleic Acids Res Date: 2010-05-10 Impact factor: 16.971

5. Rhythmic U2af26 alternative splicing controls PERIOD1 stability and the circadian clock in mice.

Authors: Marco Preußner; Ilka Wilhelmi; Astrid-Solveig Schultz; Florian Finkernagel; Monika Michel; Tarik Möröy; Florian Heyd
Journal: Mol Cell Date: 2014-05-15 Impact factor: 17.970

6. A U1 snRNA:pre-mRNA base pairing interaction is required early in yeast spliceosome assembly but does not uniquely define the 5' cleavage site.

Authors: B Séraphin; L Kretzner; M Rosbash
Journal: EMBO J Date: 1988-08 Impact factor: 11.598

7. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.

Authors: Haim Ashkenazy; Shiran Abadi; Eric Martz; Ofer Chay; Itay Mayrose; Tal Pupko; Nir Ben-Tal
Journal: Nucleic Acids Res Date: 2016-05-10 Impact factor: 16.971

8. The NCBI BioSystems database.

Authors: Lewis Y Geer; Aron Marchler-Bauer; Renata C Geer; Lianyi Han; Jane He; Siqian He; Chunlei Liu; Wenyao Shi; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2009-10-23 Impact factor: 16.971

9. Global intron retention mediated gene regulation during CD4+ T cell activation.

Authors: Ting Ni; Wenjing Yang; Miao Han; Yubo Zhang; Ting Shen; Hongbo Nie; Zhihui Zhou; Yalei Dai; Yanqin Yang; Poching Liu; Kairong Cui; Zhouhao Zeng; Yi Tian; Bin Zhou; Gang Wei; Keji Zhao; Weiqun Peng; Jun Zhu
Journal: Nucleic Acids Res Date: 2016-07-01 Impact factor: 16.971

10. Structural basis for recruiting and shuttling of the spliceosomal deubiquitinase USP4 by SART3.

Authors: Joon Kyu Park; Tanuza Das; Eun Joo Song; Eunice EunKyeong Kim
Journal: Nucleic Acids Res Date: 2016-04-07 Impact factor: 16.971

1 in total

1. The U1 snRNP component RBP45d regulates temperature-responsive flowering in Arabidopsis.

Authors: Ping Chang; Hsin-Yu Hsieh; Shih-Long Tu
Journal: Plant Cell Date: 2022-02-03 Impact factor: 11.277

1 in total