Literature DB >> 24617759

Structure and function of pre-mRNA 5'-end capping quality control and 3'-end processing.

Ashley R Jurado¹, Dazhi Tan, Xinfu Jiao, Megerditch Kiledjian, Liang Tong.

Abstract

Messenger RNA precursors (pre-mRNAs) are produced as the nascent transcripts of RNA polymerase II (Pol II) in eukaryotes and must undergo extensive maturational processing, including 5'-end capping, splicing, and 3'-end cleavage and polyadenylation. This review will summarize the structural and functional information reported over the past few years on the large machinery required for the 3'-end processing of most pre-mRNAs, as well as the distinct machinery for the 3'-end processing of replication-dependent histone pre-mRNAs, which have provided great insights into the proteins and their subcomplexes in these machineries. Structural and biochemical studies have also led to the identification of a new class of enzymes (the DXO family enzymes) with activity toward intermediates of the 5'-end capping pathway. Functional studies demonstrate that these enzymes are part of a novel quality surveillance mechanism for pre-mRNA 5'-end capping. Incompletely capped pre-mRNAs are produced in yeast and human cells, in contrast to the general belief in the field that capping always proceeds to completion, and incomplete capping leads to defects in splicing and 3'-end cleavage in human cells. The DXO family enzymes are required for the detection and degradation of these defective RNAs.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2014 PMID： 24617759 PMCID： PMC3977584 DOI： 10.1021/bi401715v

Source DB: PubMed Journal: Biochemistry ISSN： 0006-2960 Impact factor: 3.162

In eukaryotes, mRNA precursors (pre-mRNAs) are transcribed by RNA polymerase II (Pol II) from the genome and must undergo extensive cotranscriptional processing to become mature mRNAs. The typical progression of pre-mRNA maturation involves 5′-end capping, splicing, and 3′-end cleavage and polyadenylation. The integrity and precision of each of these steps are critical for generating stable, functional mRNAs. Moreover, recent studies have demonstrated the importance of alternative splicing, alternative polyadenylation (APA), and RNA editing in producing an incredibly diverse, often cell-specific mRNA library that contributes to the biological complexity of higher eukaryotes. 5′-end capping occurs very early during Pol II transcription, typically after the synthesis of ∼20 nucleotides of the pre-mRNA. Capping has been linked to splicing and 3′-end processing of the pre-mRNA, and the export of the mature mRNA. In addition, the 5′-end cap is directly recognized by the eukaryotic translation initiation factor eIF-4E, which is essential for mRNA translation by the ribosome. A majority of pre-mRNAs acquire a poly(A) tail after 3′-end processing, which is important for the export of the mature mRNAs from the nucleus to the cytoplasm. The poly(A) tail also promotes the translation of the mRNAs and protects them from degradation. In comparison, 3′-end processing of replication-dependent histone pre-mRNAs involves only the cleavage reaction, and these mRNAs do not carry a poly(A) tail. Instead, a conserved stem–loop structure at their 3′-end supports many of the functions that are associated with the poly(A) tail. This review will focus on recent advances (within the past ∼5 years) in structural and functional studies of pre-mRNA 3′-end processing, and the newly reported structures are summarized in Table 1. There are also many other excellent reviews on these topics, some of which are listed here.[1−8] In addition, a novel quality surveillance mechanism for 5′-end capping was discovered recently and will be reviewed here, as well. Other aspects of pre-mRNA processing, such as splicing, APA,[9−11] and poly(A) length regulation,[12,13] and other mechanisms of mRNA quality control and decay, such as nonsense-mediated decay and no-go decay, will not be covered here because of space limitations.

Table 1

Recently Published Structures of Protein Factors Involved in Pre-mRNA 3′-End Processing or 5′-End Capping Quality Surveillance

protein factor	subdomains	Protein Data Bank entries	refs
CPSF
archaeal CPSF-73	metallo-β-lactamase, β-CASP	2YCB, 2XR1	(21), (22)
archaeal CPSF-73–RNA complex	metallo-β-lactamase, β-CASP	3AF6	(20)
CPSF-30–NS1A complex	CPSF-30: zinc fingers 2 and 3; NS1A: effector	2RHK	(29)
CstF
Rna14–Rna15 complex	Rna14: full length; Rna15: hinge	4EBA, 4E85, 4E6H	(37)
Rna14–Rna15 complex	Rna14: monkeytail; Rna15: hinge	2L9B	(40)
CstF-50	homodimerization	2XZ2	(39)
Rna15	RRM	2X1B	(44)
Rna15–RNA complex	RRM	2X1A, 2X1F	(44)
Rna15–Hrp1–RNA complex	Rna15: RRM; Hrp1: RRM	2KM8	(43)
CF I_m
CF I_m25	full length	3BAP, 2CL3, 3BHO, 2J8Q	(50), (51)
CF I_m25–RNA complex	full length	3MDG, 3MDI	(52)
CF I_m25–CF I_m68 complex	CF I_m25: full length; CF I_m68: RRM	3Q2S	(53)
CF I_m25–CF I_m68–RNA complex	CF I_m25: full length; CF I_m68: RRM	3Q2T	(53)
CF I_m25–CF I_m59 complex	CF I_m25: full length; CF I_m59: RRM	3N9U	unpublished (2010)
PAP
PAPγ	core	4LT6	(74)
PAP–Fip1 complex	PAP: full length; Fip1: NTD fragment	3C66	(23)
PAPD1		3PQ1	(77)
cytoplasmic polyadenylation
CPE-binding protein (CPEB)	ZZ domain	2M13	(86)
symplekin-Ssu72
symplekin	NTD	3ODR, 3ODS, 3GS3, 3O2T	(88), (146)
Ssu72	full length	3OMW, 3OMX, 3FDF	(147)
Ssu72–Pol II CTD pSer5 complex	full length	3P9Y	(98)
symplekin–Ssu72 complex	Symplekin: NTD; Ssu72: full length	3O2S	(88)
symplekin–Ssu72–Pol II CTD pSer5 complexes	Symplekin: NTD; Ssu72: full length	3O2Q, 4IMJ, 4IMI	(88), (97)
symplekin–Ssu72–Pol II CTD pSer7 complexes	Symplekin: NTD; Ssu72: full length	4H3H, 4H3K	(99)
histone mRNA 3′-end processing
SLBP–RNA complex	RBD fragment	2KJM	(148)
SLBP–3′hExo–RNA complex	SLBP: RBD; 3′hExo: full length	4L8R	(112)
SLBP–SLIP1 complex	SLBP: fragment; SLIP1: full length	4JHK	(106)
mRNA 5′-end capping quality surveillance
DXO–RNA complexes	full length	4J7L, 4J7M	(143)
DXO–m⁷GpppG complex	full length	4J7N	(143)
Dxo1	full length	4GPU, 4GPS	(142)
Dom3Z (DXO)	full length	3FQI	(140)
Dom3Z–GDP complex	full length	3FQJ	(140)
Rai1	full length	3FQG	(140)
Rat1–Rai1 complex		3FQD	(140)

Canonical Pre-mRNA 3′-End Processing

Key sequence elements in the 3′-untranslated regions (UTRs) of pre-mRNAs are recognized for 3′-end processing. In mammals, the major sequence elements include a hexanucleotide poly(A) signal (PAS, oftentimes AAUAAA) 10–30 nucleotides upstream of the cleavage site,[14] the cleavage site itself (oftentimes after a CA dinucleotide), and a U- or G/U-rich downstream sequence element (DSE) (Figure 1). In addition, auxiliary sequence elements can be recognized, which may also help alter the site of 3′-end processing by APA.

Figure 1

Schematic drawing of the canonical mammalian pre-mRNA 3′-end processing machinery showing the various protein factors and their subcomplexes. Many additional protein factors are involved in 3′-end processing but are not shown. A large protein machinery is responsible for 3′-end processing in mammals, which consists of several subcomplexes such as cleavage and polyadenylation specificity factor (CPSF), cleavage stimulation factor (CstF), cleavage factor I (CF Im), CF IIm, and other protein factors such as poly(A) polymerase (PAP), symplekin, and Ssu72 (Figure 1). CPSF-73 (the 73 kDa subunit of CPSF) is the endoribonuclease for the cleavage reaction. CPSF-160 recognizes the PAS, and CstF-64 recognizes the DSE. This large machinery ensures the fidelity of 3′-end processing, supports APA in response to specific molecular and cellular environments, and is also connected to the DNA damage response.[15] Moreover, many protein factors in the machinery communicate with other transcription processes, such as Pol II initiation and termination. A proteomic study identified more than 90 protein factors that may be associated with the pre-mRNA, although the exact roles of many of these proteins in 3′-end processing remain to be established.[16] Similarly, pre-mRNA 3′-end processing in yeast also involves a large protein machinery. Many of the protein factors have homologues in the mammalian machinery, although the subcomplexes in yeast can have compositions and functions different from those of the subcomplexes in mammals. For example, yeast CF IA contains Rna14 and Rna15, the homologues of mammalian CstF-77 and CstF-64, respectively. CF IA also contains Clp1 and Pcf11, which belong to CF IIm in mammals (Figure 1). CF II in yeast contains Ysh1, Ydh1, and Yhh1, which are homologues of CPSF-73, CPSF-100, and CPSF-160, respectively. CF II also contains Pta1, the homologue of symplekin, while the other two subunits of CPSF, CPSF-30 (Yth1) and hFip1 (Fip1), belong to polyadenylation factor I (PF I). CF II, PF I, and many other protein factors comprise the cleavage and polyadenylation factor (CPF) in yeast. The machinery for 3′-end processing of most eukaryotic pre-mRNAs will be termed the canonical machinery, to distinguish it from the machinery required for 3′-end processing of replication-dependent histone pre-mRNAs (see below). We will describe below the various subcomplexes and protein factors of the mammalian machinery, together with the equivalent proteins in the yeast machinery.

CPSF

CPSF has five subunits: CPSF-160, CPSF-100, CPSF-73, CPSF-30, and hFip1. CPSF-160 contains three β-propeller domains. CPSF-73 and CPSF-100 contain a metallo-β-lactamase and a β-CASP domain in the N-terminal region. CPSF-30 has five zinc fingers and one zinc knuckle. hFip1 does not contain any recognizable domains and is likely disordered on its own. The structure of human CPSF-73 showed that its active site is located at the interface of the metallo-β-lactamase and β-CASP domains.[17] CPSF-73 homologues are found in all three domains of life, with important functions in RNA processing and/or decay.[18,19] Recently published structures of CPSF-73 homologues from two different archaeal species revealed the presence of two type II K homology (KH) RNA-binding motifs at the N-terminus, as well as the formation of a homodimer via the C-terminal region of the metallo-β-lactamase domain (Figure 2A). The RNA is likely recognized by the KH domains in one monomer and cleaved by the active site in the other monomer.[20−22] This mechanism may be unique to archaea as mammalian CPSF-73 and its yeast homologue Ysh1 do not contain KH domains at the N-terminus.

Figure 2

Recently published structures of CPSF subunits. (A) Structure of the Methanosarcina mazei CPSF-73 homologue dimer [Protein Data Bank (PDB) entry 2XR1].[21] The bound position of the RNA analogue is modeled from the structure of the Pyrococcus horikoshii CPSF-73 homologue (PDB entry 3AF6).[20] The 2-fold axis of the dimer is depicted as a black oval. (B) Structure of yeast PAP in complex with Fip1 (PDB entry 3C66).[23] (C) Structure of human CPSF-30 (second and third zinc fingers) in complex with the influenza virus NS1A effector domain (PDB entry 2RHK).[29] All the structures were produced with PyMOL (http://www.pymol.org). Fip1, the yeast homologue of hFip1, tethers PAP to the processing machinery, which recognizes an intrinsically unstructured segment in Fip1 near its N-terminus (Figure 2B).[23] PAP mutants that retain polymerase activity but cannot bind Fip1 are nonetheless lethal, indicating that the Fip1–PAP interaction serves an essential function in yeast. An N-terminal deletion mutant of Fip1 in which this binding site is disrupted cannot complement the loss of wild-type Fip1, but the mutant is fully functional if it is fused directly to PAP.[24] CPSF-30 is targeted by the C-terminal effector domain of the nonstructural protein (NS1A) from the influenza A family of viruses,[25−27] and the viral polymerase stabilizes this complex.[28] NS1A binding inhibits host antiviral responses such as production of type I interferon and activation of dendritic cells. The effector domain of NS1A is recognized by the second and third zinc fingers of CPSF-30, in a 2:2 heterotetrameric complex (Figure 2C).[29] Single-site mutations of NS1A residues in the interface prevent binding to CPSF-30, and an influenza virus carrying such a mutation in NS1A cannot inhibit interferon-β pre-mRNA processing and is attenuated in cells. Arabidopsis thaliana CPSF-30 (AtCPSF-30) binds the A-rich near upstream element (NUE, which contains the AAUAAA motif) that is present in a subset of pre-mRNAs, located 10–30 nucleotides upstream of the cleavage site.[30] Binding of RNA by AtCPSF-30 is mostly mediated through the first of its three zinc fingers.[31] AtCPSF-30 also possesses endonuclease activity, which is mediated by its third zinc finger and inhibited by the N-terminal region of AtFip1(V), a plant homologue of Fip1.[31] Loss of AtCPSF-30 results in an enhanced tolerance to oxidative stress because of the overexpression of proteins with thioredoxin- and glutaredoxin-like domains.[32] The nuclease activity of AtCPSF-30 itself is redox-sensitive, as the third zinc finger contains a disulfide bond that stabilizes the overall structure of the protein.[33] Some of these properties may be unique to AtCPSF-30 as it is localized in the cytoplasm in the absence of other CPSF subunits.[34]

CstF

CstF contains three subunits: CstF-50, CstF-64, and CstF-77. CstF-50 has a WD40 domain in the C-terminal region. CstF-64 has a RNA recognition module (RRM) at the N-terminus, followed by a hinge region, a Pro/Gly-rich region, and a small C-terminal domain (CTD). CstF-77 contains a HAT domain in the N-terminal region, followed by a Pro-rich region. The crystal structure of the HAT domain of CstF-77 revealed a dimeric association, providing the first evidence that CstF may function as a dimer.[35,36] The recently published crystal structure of the Kluyveromyces lactis Rna14–Rna15 complex also showed a dimeric association of this heterodimer into a heterotetramer, mediated by the HAT domain of Rna14 (Figure 3A).[37] Mutation of two residues in the HAT domain dimer interface caused a temperature-sensitive phenotype in yeast, and the cell extract was defective in cleavage and polyadenylation.[38] The structure of the N-terminal segment of CstF-50 is also a dimer (Figure 3B).[39] Overall, the structures as well as biochemical studies support a stable, dimeric association of the CstF complex. While Rna14 and Rna15 are dimeric in the CF IA complex in yeast, Clp1 and Pcf11 may actually be monomeric, giving an overall 2:2:1:1 stoichiometry for the complex.[38]

Figure 3

Recently published structures of the CstF subunits and their homologues. (A) Structure of the K. lactis Rna14–Rna15 dimer (PDB entry 4EBA).[37] Only one copy of the complex between the Rna14 C-terminal Pro-rich segment (red) and the Rna15 hinge region (pale green) is ordered (shown as a molecular surface). (B) Structure of the Drosophila CstF-50 N-terminal domain dimer (PDB entry 2XZ2).[39] (C) Structure of the heterodimer of the Pro-rich segment of Rna14 (red) with the hinge region of Rna15 (pale green) (PDB entry 2L9B).[40] (D) Structure of the yeast Rna15 RRM (blue)–Hrp1 (magenta)–RNA (orange) complex (PDB entry 2KM8).[43] (E) Structure of the yeast Rna15 RRM–RNA complex (PDB entry 2X1F).[44] The binding site at the top (RNA labeled GU) mediates specific recognition. The RNA in front of the β-sheet (labeled G′U′) is related by crystal symmetry to the GU RNA and is not specifically recognized. The interactions between Rna14 and Rna15 are mediated by the C-terminal Pro-rich region of Rna14 and the hinge region of Rna15 (Figure 3C).[37,40] The formation of the CstF-64–CstF-77 complex is also important for the nuclear localization of CstF.[41] CstF-77 contains a poly(A) site within its intron 3, and the usage of this site depends on the expression levels of CstF-77, resulting in a negative feedback mechanism.[42] The amino acid sequences of the RRMs of CstF-64 and Rna15 are ∼50% identical, but the RRMs display distinct sequence preferences for RNA. CstF-64 recognizes the G/U-rich DSE, while Rna15 recognizes the A-rich positioning element (PE) in yeast. This distinct preference for Rna15 may be due to Hrp1, which constitutes CF IB but does not have a counterpart in the mammalian machinery. A crystal structure of the Rna15 RRM–Hrp1–RNA complex showed that the A-rich RNA interacts with the surface of the RRM β-sheet (Figure 3D),[43] which is the canonical mode of RNA recognition for RRMs. On the other hand, a crystal structure of Rna15 RRM in complex with U-rich RNA, in the absence of Hrp1, shows that the RRM has a second, noncanonical RNA binding surface, which involves conserved loops above the β-sheet of the RRM (Figure 3E).[44] The interaction between Hrp1 and the Rna14–Rna15 dimer has also been studied by NMR, and a model for the Rna14–Rna15–Hrp1–RNA complex has been proposed.[45] The regulation of the RNA preference of Rna15 by Hrp1 may have important functional relevance. If there are two copies of Rna15 and only one copy of Hrp1 in the yeast 3′-end processing machinery, the two copies of Rna15 may bind to two different sequence elements in the transcript, one being A-rich and the other U- or G/U-rich. This may explain why 3′-end processing is enhanced by the addition of U-rich sequences between the PE and the cleavage site in the absence of Hrp1.[44] Whether CstF-64 has a binding partner that is functionally homologous to Hrp1 to facilitate an A-rich sequence preference has not been determined. Musashi1, a mammalian homologue of Hrp1, with a similar RNA binding mode,[46] is known to bind to the 3′-UTR of mRNAs but exerts its control at the translation level rather than the transcription level. Such regulation of the RNA preference of CstF-64 could also be important for its function in APA. A second isoform of CstF-64 in mammals, CstF-64τ, was originally thought to be restricted to the testis and brain, although recent studies suggest that it is more widely expressed.[47] CstF-64τ may complement the function of CstF-64. Moreover, a dimeric CstF complex could include one copy each of CstF-64 and CstF-64τ, which could be another mechanism for regulating 3′-end processing and APA. In addition, a family of splicing variants of CstF-64 has been identified, known as βCstF-64, which may have roles in APA in neuronal cells.[48]

CF Im

CF Im is comprised of two subunits: CF Im25 and CF Im68. CF Im25 has a Nudix nucleotide hydrolase fold but lacks hydrolase activity. CF Im68 is the most common second subunit, but there are alternative 59 and 72 kDa subunits. These three proteins contain an N-terminal RRM, a central Pro/Gly-rich region, and a C-terminal Arg/Ser-, Arg/Asp-, and Arg/Glu-rich segment. CF Im binds UGUA elements and is typically positioned 40–50 nucleotides upstream of the cleavage site.[49] Several crystal structures have been reported for this complex over the past few years, which have greatly enhanced our understanding of its molecular mechanism.[50−54] The structures show that CF Im is a heterotetramer, with a central CF Im25 dimer and two CF Im68 monomers bound to opposite sides of the CF Im25 dimer (Figure 4A).[53,54] The two Nudix domains in the CF Im25 dimer are arranged antiparallel to each other, which would require the two UGUA cis elements of the pre-mRNA to make a 180° turn to bind to them simultaneously (Figure 1). The two RRMs of CF Im68 enhance RNA binding by ∼3-fold and promote RNA loop formation but are dispensable.

Figure 4

Structures of CF Im and PAPD1. (A) Structure of the human CF Im25 (cyan)–CF Im68 (magenta)–UGUA RNA (orange) complex dimer (PDB entry 3Q2T).[53] (B) Structure of the human mitochondrial PAPD1 D325A mutant dimer (PDB entry 3PQ1).[77] The domains are shown in different colors. The catalytic residues are shown as stick models. CF Im has a key role in APA[49,55,56] and in the export of mRNA from the nucleus.[57] Knockdown of the 25 and 68 kDa subunits in HEK293 cells increased the extent of global use of proximal poly(A) sites. On the other hand, knockdown of CF Im59 has no effect on poly(A) site choice. Therefore, the CF Im25–CF Im68 complex promotes the selection of distal poly(A) sites, producing mRNAs with an extended 3′-UTR that may be subject to specific 3′-UTR-mediated regulation. CF Im68 interacts with the nuclear export machinery through the Thoc5 protein of the TREX complex and the nuclear export receptor NXF1/TAP and shuttles between the nucleus and the cytoplasm.[58] Knockdown of Thoc5 also promotes the usage of proximal poly(A) sites.[59] CF Im68 is recruited by the capsid protein of HIV[60] and helps the virus to evade host innate immune recognition.[61] A C-terminal deletion of CF Im68 promotes HIV-1 capsid disassembly.[62]

CF IIm

CF IIm comprises two subunits: hClp1 and hPcf11. hClp1 contains three domains: N-terminal, central, and C-terminal domains. The central domain contains a Walker A P-loop motif and can bind ATP. hPcf11 contains a Pol II C-terminal domain (CTD) interaction domain (CID) at the N-terminus, two zinc fingers, a short sequence between the two zinc fingers that interacts with Clp1,[63] and other sequence motifs. Their homologues in yeast, Clp1 and Pcf11, belong to CF IA and interact with Rna14 and Rna15. An equivalent interaction between CF IIm and CstF in mammalian cells has not been demonstrated. In addition to the Clp1–Pcf11 interface identified from earlier studies,[63] there may be a “distant” binding site for Pcf11 on Clp1.[64,65] hClp1 exhibits ATP hydrolase activity and plays an important role in pre-tRNA splicing[66−68] as well as pre-mRNA 3′-end processing. Yeast Clp1 does not have ATP hydrolase activity but still requires ATP binding for its function. Yeast cells carrying Clp1 mutations in the ATP binding pocket, which induce a conformational change but do not occlude ATP binding, are not viable.[65,69] The correct conformation of Clp1 induced by ATP binding may be essential for interactions with other protein factors in the machinery, including Ssu72, Ysh1, Pta1, and Rna14. Clp1 may therefore be an important structural protein in the machinery, which is supported by the observation that reconstitution of CF IA from individual components requires Clp1.[64] Clp1 may contribute to gene looping and transcriptional directionality at bidirectional promoters, possibly through its interaction with Ssu72. hClp1 may also compete with the mRNA nuclear export factor Aly (Yra1 in yeast) for binding to hPcf11, and yeast Clp1 can displace Yra1 from Pcf11 in affinity experiments.[70] Recombinant Yra1 inhibits in vitro CF IA-mediated cleavage and polyadenylation reactions.[71]

PAP

CPSF stimulates the activity of PAP so that it processively extends the poly(A) tail, the length of which is regulated by the nuclear poly(A)-binding protein (PABPN1).[72] Once the tail reaches ∼250 nucleotides, PABPN1 interferes with this stimulation. PABPN1 can also stimulate PAP and cause hyperadenylation, which can mediate RNA degradation by the exosome.[73] A recently published structure of the human PAPγ core,[74] from the γ clade of mammalian PAPs, confirms the three-domain core structure shared among the canonical PAPs, with N-terminal, middle, and C-terminal domains (Figure 2B). Noncanonical PAPs have very weak conservation of sequence with respect to that of canonical PAPs and lack the C-terminal domain in the core.[75,76] Some of these enzymes catalyze the oligo- or polyuridylation of their substrates and are also known as poly(U) polymerases (PUPs) or terminal uridylate transferases (TUTs or TUTases). The structure of human mitochondrial PAPD1 reveals a dimer of this enzyme, involving a RL (RNA-binding domain-like) domain unique to this enzyme (Figure 4B), and biochemical studies suggest that dimerization is required for PAPD1 activity.[77] PAPD1 can use all four nucleotides as substrates in vitro, and how it achieves nucleotide specificity in the mitochondria is currently not known. Structures of other noncanonical PAPs (TUTs) have also been reported.[76]

Cytoplasmic Polyadenylation

Cytoplasmic polyadenylation is important for the post-transcriptional control of gene expression through the reactivation of deadenylated and dormant but otherwise intact cytoplasmic mRNAs, which is directed by the presence of a cytoplasmic polyadenylation element (CPE) in the 3′-UTR of the mRNAs.[78,79] The CPE is bound by the regulatory cytoplasmic element binding protein (CPEB), which in turn interacts with many other proteins to regulate cytoplasmic polyadenylation and mRNA translation. This CPEB complex contains competing deadenylase (PARN) and PAP (Gld2) enzymes to regulate the length of the poly(A) tail.[80] CPEB-dependent protein synthesis plays a key role in synapse formation and long-term memory persistence in sensory neuron–motor neuron cultures,[81] and CPEB prion-like multimerization is associated with changes in synapse persistence.[82] Many protein factors can affect the expression of their target mRNA by binding to CPEB and freeing the mRNA for polyadenylation and translation.[79] For example, translation of the mRNA for tumor suppressor p53 is promoted by the interaction between CPEB and noncanonical PAP Gld4[83] and inhibited by overexpression of CPEB in the absence of Gld2.[84] Another tumor suppressor protein, parafibromin, also exerts control over cell fate through its interaction with CPEB[85] and affects the translation of multiple genes. CPEB also recruits several protein factors of the canonical, nuclear 3′-end processing machinery, including CPSF and symplekin, for cytoplasmic polyadenylation. A recent solution structure of the C-terminal zinc-binding domain of CPEB reveals a structural similarity to ZZ-type zinc fingers, which are known to facilitate protein–protein interactions with sumoylated proteins.[86] Both symplekin and CPSF are known to be sumoylated, suggesting a possible mechanism for how CPEB recognizes these factors.

Symplekin–Ssu72–Pol II CTD Complex

Symplekin is a scaffold protein and mediates interactions among many proteins in the 3′-end processing machinery. The yeast homologue, Pta1, shares very weak sequence conservation with symplekin but has generally equivalent protein partners. The N-terminal domain (NTD) of symplekin/Pta1 interacts with Ssu72, a central region with CstF-64/Pti1 (a yeast homologue of Rna15), and a C-terminal domain with CPSF-73/Ysh1.[87−89] Ssu72 is a Pol II CTD phosphatase and has functions in 3′-end processing as well as gene looping,[90] which helps to maintain correct transcription directionality and prevents transcription of certain noncoding RNAs from bidirectional promoters.[91] The Pol II CTD contains heptapeptide repeats (26 in yeast and 52 in humans) with the Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7 consensus sequence, and the phosphorylation state of the CTD regulates the function of Pol II.[92−94] Ssu72 is a well-characterized pSer5 phosphatase and has recently been reported to have pSer7 phosphatase activity, as well.[95,96] The recent crystal structure of the symplekin NTD in complex with Ssu72 and a Pol II CTD pSer5 peptide defines the detailed interactions in this ternary complex (Figure 5A).[88] Although the NTD–Ssu72 interface is ∼25 Å from the active site of Ssu72, the NTD can stimulate the phosphatase activity of Ssu72, indicating that symplekin is not just a passive scaffold in the 3′-end processing machinery.

Figure 5

Structures of human symplekin NTD–Ssu72–Pol II CTD phosphopeptide complexes. (A) Structure of the human symplekin NTD (cyan)–Ssu72 (yellow)–Pol II CTD pSer5 phosphopeptide (green) complex (PDB entry 3O2Q).[88] The seven pairs of antiparallel helices are labeled. (B) Mode of binding of the Pol II CTD pSer5 peptide in the active site of Ssu72. (C) Overlay of the modes of binding of the Pol II CTD pSer7 peptide (green) and the pSer5 peptide (gray) in the active site of Ssu72 (PDB entry 4H3H).[99] The directions of the polypeptide backbones are denoted by the arrows. The primed numbers indicate residues in the second CTD repeat. The structure also reveals that the pSer5–Pro6 peptide bond of the Pol II CTD assumes the cis configuration in the active site of Ssu72 (Figure 5B), the first time that a protein phosphatase (or protein kinase) has been shown to recognize the cis configuration of the substrate. Subsequent structures of another ternary complex (with pThr4 and pSer5)[97] as well as an Ssu72–phosphopeptide binary complex[98] confirmed this finding. The peptidyl-prolyl isomerase Pin1 enhances the dephosphorylation activity of Ssu72.[88] The symplekin NTD can regulate the function of Ssu72 in transcription-coupled 3′-end processing with the HeLa cell nuclear extract.[88] Studies in yeast show that the Pta1 NTD can inhibit 3′-end processing, and binding of Ssu72 to the NTD relieves this inhibition.[87] The binding mode of the pSer5 peptide in the active site of Ssu72 contrasts with its reported pSer7 phosphatase activity, as it could require that the pSer7–Tyr1 peptide bond be in the cis configuration. The crystal structure of the symplekin NTD–Ssu72–Pol II CTD pSer7 peptide ternary complex showed that the peptide, with all its amide bonds in the trans configuration, is bound in the reverse orientation compared to that of the pSer5 peptide in the active site of Ssu72 (Figure 5C).[99] The substrate and the general acid for catalysis are misaligned in this complex. In vitro assays using peptide substrates indicate that the pSer7 phosphatase activity is ∼4000-fold weaker than the pSer5 phosphatase activity. On the other hand, assays with the entire CTD as the substrate, using antibodies to monitor dephosphorylation, showed no differences between the two activities. The reason behind this discrepancy is currently not known. A CTD peptide phosphorylated at Ser2, Ser5, and Ser7 is bound exclusively with pSer5 in the active site, suggesting that Ssu72 has a higher affinity for pSer5 than for pSer7.[99]

Replication-Dependent Histone Pre-mRNA 3′-End Processing

Metazoan replication-dependent (also known as canonical) histone proteins H1, H2A, H2B, H3, and H4 are involved in de novo chromatin packaging during DNA replication, while variant histones, notably, H3.3, H2A.Z, CENP-A, macroH2A, and H1.0, are important for chromatin remodeling, centromere function, and epigenetic silencing.[100−102] Variant histone pre-mRNAs carry introns and are cleaved and polyadenylated. In comparison, the replication-dependent histone pre-mRNAs do not have introns and are cleaved but not polyadenylated. Their 3′-end processing is conducted by a different machinery, which will be the focus of this section. The 3′-UTRs of replication-dependent histone pre-mRNAs contain two signature cis-acting elements: a highly conserved stem–loop (SL) 25–50 nucleotides downstream of the open reading frame and a purine-rich segment further downstream (15–20 nucleotides in vertebrates) named the histone downstream element (HDE) (Figure 6A). The SL is recognized by the stem–loop-binding protein (SLBP, also known as the hairpin binding protein, HBP), which has a central role in replication-dependent histone mRNA processing and function. The SL is also bound by a 3′–5′ exoribonuclease, known as 3′hExo or Eri-1, which has an important role in histone mRNA degradation, although Drosophila lacks this nuclease. The HDE recruits the U7 snRNP, another important component for histone pre-mRNA 3′-end processing. These various factors will be described in more detail below.

Figure 6

Structural information about histone pre-mRNA 3′-end processing. (A) Schematic drawing of the replication-dependent histone pre-mRNA 3′-end processing machinery. (B) Structure of Danio rerio SLIP1 bound to the SLIP1-binding motif (SBM) of SLBP (PDB entry 4JHK).[106] (C) Structure of the human SLBP RNA binding domain (RBD)–3′hExo–stem-loop RNA complex (PDB entry 4L8R).[112] (D) Specific recognition of the second guanine in the stem by Arg181 of SLBP.

SL, SLBP, and 3′hExo

The SL has a 6 bp stem and a four-nucleotide loop (Figure 6A). A G-C base pair at the second position of the stem is strictly conserved, while the loop is generally rich in pyrimidines.[103] Systematic as well as focused studies on the effects of SL mutations on binding to SLBP are consistent with sequence conservation.[102,104] SLBP is found in all metazoans and is a 31 kDa protein in humans.[102] A highly conserved 70-residue RNA-binding domain (RBD) near the center of SLBP binds tightly to the SL, with a dissociation constant (Kd) of ∼10 nM. Phosphorylation of Thr171 in the RBD enhances the affinity, reducing the Kd by ∼7-fold. Immediately C-terminal to the RBD is a 20-residue segment that is dispensable for RNA binding but is required for efficient 3′-end processing. The N-terminal region of SLBP binds SLBP interaction protein 1 (SLIP1) (Figure 6B), which interacts with eukaryotic translation initiation factor eIF-4G and is essential for promoting the translation of histone mRNAs.[105,106] Phosphorylation of several residues in this region is correlated with polyubiquitination and rapid breakdown of SLBP at the end of the S phase.[107,108] 3′hExo belongs to the DEDD family of 3′–5′ exonucleases and prefers single-stranded RNA as the substrate. The activity of 3′hExo requires two Mg2+ cations coordinated by four invariant acidic residues (DEDD) in its active site. In addition to the nuclease domain, 3′hExo also contains an N-terminal SAP domain, previously characterized as a nucleic acid binding motif.[109] 3′hExo trims up to two nucleotides from the 3′-end of histone mRNAs after cleavage, and it also participates in the rapid decay of histone mRNAs at the end of the S phase.[110] In addition to its roles in histone mRNA metabolism, 3′hExo is critical for trimming the 3′-end of the 5.8S rRNAs[111] and regulating microRNA homeostasis. The recently published crystal structure of the human SLBP RBD–3′hExo–SL ternary complex provided the first molecular insights into the architecture of this complex (Figure 6B).[112] The stem adopts the conformation of A-form RNA, and three of the four bases of the loop are flipped out. The SLBP RBD interacts with the 5′-flanking sequence, the 5′-arm of the stem, and the loop of SL. The SAP domain of 3′hExo interacts with the loop, and the nuclease domain with the 3′-arm and flanking sequence. In particular, the last nucleotide of the SL is located in the active site of the nuclease domain, explaining how 3′hExo can trim the last two nucleotides of the histone mRNA after cleavage. The observed binding mode is consistent with most of the mutagenesis data on this complex. Only the guanine base in the second base pair of the stem is specifically recognized, by the side chain of Arg181 in SLBP (Figure 6D), which explains why this nucleotide is invariant in all metazoans. The structure indicates that SLBP and 3′hExo primarily recognize the shape, rather than the sequence, of the SL. There are no direct contacts between the SLBP RBD and 3′hExo in the ternary complex (Figure 6C). The cooperative binding between the two proteins observed earlier likely results from the induced-fit behavior of the SL, as there are large conformational differences between the SL in the complex versus that free in solution.[112] Therefore, binding of one protein induces a conformation of the SL that promotes the binding of the other protein.

HDE and U7 snRNP

HDE recruits the U7 snRNP through base pairing with the 5′-extension of U7 snRNA.[101] The heptameric Sm ring of U7 snRNP contains Sm proteins B, D3, E, F, and G that are found in spliceosomal snRNPs as well as two unique subunits, Sm-like proteins Lsm10 and Lsm11. Lsm11 is a 40 kDa protein in humans, which is substantially larger than the typical Sm protein (∼13 kDa). It has a unique N-terminal segment that is mostly unstructured but is essential for histone pre-mRNA processing via the recruitment of other processing factors, including the zinc finger protein ZFP100 and FLASH (see below).

Other Processing Factors

Similar to polyadenylated mRNAs, CPSF-73 is the endoribonuclease for the cleavage reaction of replication-dependent histone pre-mRNAs.[102] The cleavage site is also typically located after a CA dinucleotide, located five (in vertebrates) or four (in fruit fly and sea urchin) nucleotides downstream of the stem.[103] The 5′-end-capped and 3′-end-cleaved histone mRNA, accompanied by SLBP and 3′hExo, is then exported into the cytoplasm by the antigen peptide transporter. Besides CPSF-73, several other protein factors in the canonical 3′-end processing machinery are also important for histone pre-mRNA 3′-end processing, including CPSF-100, symplekin, CstF-64, and CstF-77.[113,114] These proteins form the heat labile factor (HLF), discovered in the 1980s as an essential component of the histone pre-mRNA 3′-end processing machinery,[115] and are recruited to the machinery through FLASH. FLASH (FLICE-associated huge protein) was initially identified as a 220 kDa proapoptotic protein that is part of the death-inducing signaling complex (DISC).[116] Only a small segment of ∼140 residues at the N-terminus of FLASH, especially a Leu-Asp-Leu-Tyr motif, is required for recruiting the various protein factors for histone pre-mRNA processing.[117−121] This segment also has tight interactions with Lsm11, which brings FLASH to the 3′-end of histone pre-mRNAs. The central region of FLASH recruits arsenite resistance protein 2 (ARS2).[122] ARS2 directly binds to histone mRNAs and interacts with the nuclear cap-binding complex (CBC).[123] The CBC–ARS2 complex can stimulate the 3′-end processing of histone mRNAs, presumably through CBC’s interactions with the negative elongation factor (NELF) and SLBP.[123−125] The 100 kDa zinc finger protein (ZFP100) interacts with both SLBP and U7 snRNP and is crucial for efficient 3′-end processing.[126] It contains a poorly conserved N-terminal domain and a C-terminal domain that is comprised of 18 C2H2-type zinc fingers. Overexpression of ZFP100 greatly enhances the 3′-end processing of a reporter RNA that mimics histone pre-mRNAs, while overexpression of the components of the U7 snRNP alone does not, indicating that ZFP100 is the limiting factor for histone pre-mRNA processing. The primary role of ZFP100 is probably to bridge the SLBP–SL complex with the U7 snRNP–HDE complex, thereby stabilizing the overall processing machinery. Phosphorylation of Thr4 in the Pol II CTD is required for histone 3′-end processing.[127] It is also required for Pol II transcription elongation.[128] The kinase that phosphorylates Thr4 is CDK9, which also targets NELF.[127,129] CDK9 is recruited to histone genes by the nuclear protein ataxia-telangiectasia locus (NPAT), the expression of which is negatively regulated by p53.[130] It remains unclear how the Pol II CTD communicates with the integral components of the histone pre-mRNA 3′-end processing machinery.

Cell Cycle Regulation of Histone Pre-mRNA 3′-End Processing

The demand for histones during the S phase of the cell cycle is enormous, and an estimated 108 molecules of each of the five histones are synthesized within a period of several hours.[131] Levels of histone mRNA increase by 35-fold at the beginning of the S phase, through transcription activation (3.5-fold increase) and enhanced pre-mRNA processing (10-fold increase). At the conclusion of the S phase, or through inhibition of DNA synthesis in the S phase, levels of histone mRNA are reduced back to the G1 phase baseline. SLBP is a key factor in the regulation of histone mRNA levels, and the level of this protein correlates with those of histone mRNAs during the cell cycle. In late G1 phase, inhibition of protein degradation and activation of translation synergistically increase the level of SLBP.[132] At the end of the S phase, several residues in the N-terminal segment of SLBP (including Ser20, Ser23, Thr60, and Thr61) are phosphorylated, which facilitates its polyubiquitination and rapid clearance by the ubiquitin–proteasome system.[107,108] Pin1 may also have a role in facilitating the phosphorylation of Ser20 and Ser23, and the dephosphorylation of Thr171.[108] On the other hand, artificial inhibition of DNA replication in the S phase has little effect on the cellular levels of SLBP, indicating the existence of other mechanism(s) for regulating histone levels.

A Novel Quality Surveillance Mechanism for mRNA 5′-End Capping

The 5′-end 7-methylguanosine (m7G) cap is a significant contributor to mRNA splicing, nuclear export, translation, stability, and other processes.[133,134] The cap is added cotranscriptionally and attached to the terminal nucleotide of the RNA by an unusual 5′–5′ triphosphate linkage. Capping proceeds in three steps: conversion of 5′-end-triphosphorylated RNA (pppRNA, the primary transcript of Pol II) to diphosphorylated RNA (ppRNA), coupling to GMP to produce capped RNA (GpppRNA), and methylation to produce m7GpppRNA (Figure 7). A mature, methylated cap is essential for recognition by the cap-binding complex, CBC and eIF-4E, which coordinate many of the functions attributed to the cap.[135,136]

Figure 7

Reactions for pre-mRNA 5′-end capping and quality control. Reactions in the capping pathway are denoted by the green arrows. The intermediates in the capping pathway are recognized by the DXO family enzymes (Rai1, Dxo1, and DXO) for degradation (red arrows). The fate of the ppRNA is currently not known, although it may be possible that Rai1 and DXO also mediate its degradation. The reaction catalyzed by the classical decapping enzymes (Dcp2 and Nudt16) is denoted by the blue arrow. DXO and Dxo1 can also remove the mature cap but generate a different product (dashed red arrow). Removal of the cap (decapping) is a regulated process catalyzed by at least two Nudix hydrolase enzymes, Dcp2 and Nudt16,[137,138] which release m7GDP (m7Gpp) and 5′-end-monophosphorylated RNA (pRNA) (Figure 7). Six additional Nudix proteins possessing decapping activity in vitro have also been reported,[139] although the functional role of these putative decapping enzymes in cells remains to be determined. Until recently, it was generally accepted in the field that capping always proceeds to completion, and a quality control mechanism was not known (or deemed necessary). However, if there are defects in 5′-end capping, the intermediates of the capping pathway (pppRNA, ppRNA, and GpppRNA) could accumulate in cells, because Dcp2 and Nudt16 predominantly function on mature m7GpppRNA and have minimal activity on these intermediates.[137] They are also protected against degradation by 5′–3′ exoribonucleases (XRNs), which are only active against pRNA substrates. A novel family of enzymes that possess RNA 5′-end pyrophosphohydrolase (PPH, releasing pyrophosphate PPi), decapping, and/or distributive 5′–3′ exoribonuclease activity was recently discovered. These enzymes include Rai1[140,141] and Ydr370C/Dxo1[142] in yeast and Dom3Z/DXO in mammals.[143] These decapping exonucleases (DXO family of enzymes) primarily act on incompletely capped mRNAs, converting them to substrates for degradation by XRNs or their own exonuclease activity (Figure 7). These biochemical activities are strongly suggestive of a hitherto unrecognized mRNA 5′-end capping quality surveillance mechanism, helping to clear transcripts with incompletely capped 5′-ends. Functional studies in yeast and mammalian cells have confirmed the presence of defects in 5′-end capping and demonstrated the importance of the DXO family enzymes in this quality control mechanism.

Biochemical Properties of the DXO Family Enzymes

The DXO activities were first identified, unexpectedly, from studies on yeast Rai1, the protein partner of the nuclear 5′–3′ exoribonuclease Rat1.[140] The crystal structure of Rai1 revealed a large pocket lined with conserved residues, some of which coordinate a divalent metal ion at the bottom of the pocket. This indicated that Rai1 is an enzyme, although no catalytic activities were known for it at the time. Further studies demonstrated that Rai1 has PPH activity[140] and decapping activity toward GpppRNA, while it has much weaker activity toward the mature m7GpppRNA.[141] Moreover, the product of decapping is GpppN, the entire cap structure, in contrast to Dcp2 and Nudt16, which release m7GDP (m7Gpp) (Figure 7). Rai1 has a weak sequence homologue in yeast, Dxo1 (Ydr370C). Biochemical studies showed that it has decapping activity (toward both GpppRNA and m7GpppRNA) as well as a distributive 5′–3′ exoribonuclease activity, although it lacks PPH activity.[142] The mammalian homologue of Rai1, DXO (previously known as Dom3Z), has all three activities, PPH, decapping (toward both GpppRNA and m7GpppRNA), and exoribonuclease.[143] These activities would allow DXO to single-handedly detect and degrade incompletely capped mRNAs. m7GpppRNAs are protected from DXO degradation by cap-binding proteins in vivo, indicating that mature mRNAs are insensitive to DXO. Therefore, DXO is expected to function preferentially on incompletely capped pre-mRNAs.

Structural Basis for the DXO Activities

Biochemical studies showed that mammalian DXO possesses three, apparently distinct, catalytic activities: PPH, decapping, and exonuclease. On the other hand, the RNA body produced by these activities is the same, 5′-end-monophosphorylated RNA (pRNA). Crystal structures of mouse DXO in complex with 5′-end-monophosphorylated RNA oligos, 5-mer RNA (pU5) (Figure 8A), and 6-mer RNA with phosphorothioate linkages to inhibit hydrolysis [pU(S)6] as well as the m7GpppG cap analogue (Figure 8B) have defined the binding modes of the RNA substrate/product and revealed the molecular mechanism for the different activities.[143]

Figure 8

Molecular mechanism for the catalytic activities of DXO family enzymes. (A) Structure of mouse DXO in complex with pU5 oligo RNA (black stick models) (PDB entry 4J7L).[143] The two Mg2+ ions are shown as orange spheres. (B) Structure of mouse DXO in complex with the m7GpppG cap analogue (gray sticks) (PDB entry 4J7N). The expected location of the metal ions is indicated by the red star. The view is related to that of panel A by an ∼60° rotation around the vertical axis. (C) Binding mode of the 5′-end phosphate group of pU5. This RNA is bound in the active site as the product. Binding of the pyrophosphate (PPi), the cap structure [(m7)GpppN], or the first nucleotide (N1) on the other side of the catalytic machinery explains the three catalytic activities. (D) The active site of DXO is located at the bottom of a deep pocket, which is large enough to accommodate only ssRNA. The pU5 oligo is bound in the DXO active site as a product, with its 5′-end phosphate group mimicking the scissile phosphate of the substrate. A second metal ion is bound in the active site in the presence of this oligo, and a terminal oxygen atom of the 5′-end phosphate group is a bridging ligand to both metal ions (Figure 8C). The pU(S)6 oligo is bound in the active site as a substrate, revealing the recognition pocket for the first nucleotide (especially its 5′-phosphate) for the 5′–3′ exonuclease activity. However, there are disruptions to the conformation of this oligo at the scissile bond caused by the incorporation of the phosphorothioate linkages and the fact that only one metal ion (Ca2+, to prevent hydrolysis) is present in the active site. The structures demonstrate that the same active site machinery supports the three activities, and it is the distinct binding modes of the substrates that determine the outcome of the reaction. The 5′-end PPi pyrophosphate, (m7)GpppN cap, or the first nucleotide is bound on the other side of the catalytic machinery from the RNA body (Figure 8C). An attack on the scissile phosphate group, likely by a water/hydroxide coordinated to one of the metal ions, then leads to the hydrolysis. At the same time, different DXO family enzymes have distinct biochemical activity profiles. For example, Rai1 has PPH and GpppRNA decapping activities but no exonuclease activity, while DXO has all the activities (Figure 7). Further studies are needed to elucidate the molecular mechanisms of these differences. The DXO family enzymes share four conserved sequence motifs.[142] Motif I is an Arg residue and recognizes the 5′-phosphate group of the pRNA substrate and pppRNA. Motif II, GΦXΦE (where Φ is an aromatic or hydrophobic residue and X any residue), provides a ligand to the metal ion [Glu192 in mouse DXO (Figure 8C)]. Motif III, EhD (where h is a hydrophobic residue), is ligated to both metal ions in the pU5 complex [Glu234 and Asp236 (Figure 8C)]. Motif IV, EhK, provides a ligand to the metal ion (Glu253), and the Lys residue is likely to stabilize the transition state of the reaction. The structures of the DXO family enzymes have a remote relationship to that of D-(D/E)XK nucleases,[142,144,145] which include some viral and phage nucleases. However, there is little sequence conservation with these enzymes, and only the Asp residue of motif III (EhD) and motif IV (EhK) [the D-(D/E)XK motif] are shared among them. The level of structural conservation outside of these two motifs is much lower among these enzymes. The D-(D/E)XK enzymes also include some type II restriction endonucleases, such as HincII, EcoRV, EcoRI, BamHI, and BglI,[142] but the level of structural conservation with the DXO family enzymes is much lower. For example, the active site of the DXO family enzymes is located at the bottom of a deep pocket (Figure 8D), which is consistent with their exonuclease, decapping, and PPH activity. In comparison, the active site of the type II enzymes is much more open, in line with their endonuclease activity.

Functions of DXO Family Enzymes in 5′-End Capping Quality Control

Functional studies in yeast cells harboring a deletion of Rai1 and/or Dxo1 reveal a role for these proteins in ensuring the integrity of mRNA 5′-end caps. Incompletely capped mRNAs were observed in rai1Δ cells following nutritional stress (glucose or amino acid starvation), suggesting that Rai1 is necessary for their detection and degradation.[141] Moreover, incompletely capped mRNAs are detected under normal, nonstress growth conditions in rai1Δdxo1Δ doubly disrupted yeast strains.[142] Collectively, these findings demonstrate that incompletely capped transcripts are normally generated in yeast cells (Figure 9), providing direct evidence that the capping process is less efficient than initially envisioned.

Figure 9

DXO family enzymes function in 5′-end capping quality control. In eukaryotes, pre-mRNAs are transcribed in the nucleus by Pol II and processed into mature mRNAs by the addition of a 5′-end cap, intron splicing, and 3′-end cleavage and polyadenylation. The mature mRNAs are exported to the cytoplasm for protein translation. In yeast cells, incompletely capped Pol II RNA transcripts are subjected to degradation by the Rai1–Rat1 decapping–exonuclease heterodimer, which detects and degrades 5′-end uncapped RNA or 5′-end unmethylated capped RNA in the nucleus. Dxo1, which is predominantly but not exclusively in the cytoplasm, decaps and degrades unmethylated capped Pol II transcripts. In mammalian cells, the incompletely capped Pol II RNA transcripts are substrates for DXO, which decaps and exonucleolytically degrades the defectively capped pre-mRNA prior to further splicing and 3′-end processing. Collectively, the DXO enzymes can hydrolyze the 5′-end of incompletely capped RNAs to expose the 5′-end of the RNA to subsequent exonucleolytic decay (by Dxo1 and DXO directly or by the Rai1–Rat1 heterodimer) in a 5′-end capping quality control mechanism to maintain RNA fidelity. CE is the capping enzyme and CBP the cap-binding protein. In addition, the fact that a double disruption of Rai1 and Dxo1, but not individual disruptions, is required for the accumulation of incompletely capped mRNAs during nonstress conditions indicates that the Rai1 and Dxo1 proteins function redundantly in the surveillance mechanism to detect and degrade incompletely capped transcripts. On the other hand, Dxo1 cannot complement the loss of Rai1 under nutritional stress conditions. It remains to be determined whether the incompletely capped transcripts are generated as a consequence of intrinsic stochastic inefficiency of the capping process or an indication that capping is normally a regulated process in which not all primary transcripts are destined to acquire a methylated cap. Functional studies in human embryonic kidney 293T cells confirm the importance of DXO in ensuring mRNA 5′-end capping quality.[143] A decrease in the DXO level in these cells through shRNA knockdown results in a significant accumulation of unprocessed pre-mRNAs (with splicing and polyadenylation defects) with minimal changes in mature mRNA levels. These unprocessed pre-mRNAs harbor incompletely capped 5′-ends, while the mature mRNAs contain an m7G cap. These data indicate that incompletely capped pre-mRNAs do not undergo further processing (splicing or polyadenylation), while the normally capped pre-mRNAs are licensed to undergo processing (Figure 9). While earlier studies had demonstrated a link between capping quality and splicing of the first intron, the accumulated pre-mRNAs in DXO knockdown cells show retention of all the introns tested, irrespective of their positions.[143] Therefore, the data suggest that incompletely capped pre-mRNAs are inefficiently spliced at all introns. These pre-mRNAs also have compromised 3′-end processing, consistent with earlier reports that the cleavage step is facilitated by the 5′-end cap. The reported findings demonstrate that incompletely capped transcripts are generated in mammalian cells and define a novel link between capping and pre-mRNA processing. They also indicate that the capping process may function as a critical checkpoint that determines whether a pre-mRNA should be further processed (Figure 9). DXO serves as a surveillance protein in a 5′-end capping quality control mechanism to clear incompletely capped pre-mRNAs.

Future Perspectives

Structural, biochemical, and functional studies over the past few years have provided great new insights into pre-mRNA 3′-end processing in eukaryotes, and we are gaining a better understanding of the molecular mechanisms for the functions of various proteins in the 3′-end processing machineries in yeast and humans. However, this is still a burgeoning field of research, and there is much to learn about the architecture of these machineries and the regulation of their cellular functions, for example, in APA. It is also important to understand how the core machineries can acquire post-translational modifications and additional protein factors in response to specific cellular conditions or localizations. Further characterizations of the molecular mechanism and cellular functions of cytoplasmic polyadenylation will be another important area for research. The discovery of the 5′-end capping quality surveillance mechanism has opened up a new field of research. For the first time, it is apparent that the capping step does not always proceed to completion. An important unanswered question is whether this is simply a consequence of the intrinsic inefficiency of the capping process or a regulated event to modulate subsequent pre-mRNA processing by controlling cap addition. If the latter is true, what are the components involved and how are the decisions about which pre-mRNAs are capped made? Regardless of how incompletely capped transcripts are generated, important future studies in this area also include defining the genomewide and cellular impacts of 5′-end capping quality surveillance, as well as the molecular mechanism of how the authenticity of the 5′-end cap influences pre-mRNA splicing and polyadenylation.

148 in total

1. Eri1 degrades the stem-loop of oligouridylated histone mRNAs to induce replication-dependent decay.

Authors: Kai P Hoefig; Nicola Rath; Gitta A Heinz; Christine Wolf; Jasmin Dameris; Aloys Schepers; Elisabeth Kremmer; K Mark Ansel; Vigo Heissmeyer
Journal: Nat Struct Mol Biol Date: 2012-12-02 Impact factor: 15.369

2. A complex containing the CPSF73 endonuclease and other polyadenylation factors associates with U7 snRNP and is recruited to histone pre-mRNA for 3'-end processing.

Authors: Xiao-Cui Yang; Ivan Sabath; Jan Dębski; Magdalena Kaus-Drobek; Michał Dadlez; William F Marzluff; Zbigniew Dominski
Journal: Mol Cell Biol Date: 2012-10-15 Impact factor: 4.272

3. Cleavage factor Im is a key regulator of 3' UTR length.

Authors: Andreas R Gruber; Georges Martin; Walter Keller; Mihaiela Zavolan
Journal: RNA Biol Date: 2012-11-27 Impact factor: 4.652

4. Structural and biochemical analysis of the assembly and function of the yeast pre-mRNA 3' end processing complex CF I.

Authors: Ravi Pratap Barnwal; Susan D Lee; Claire Moore; Gabriele Varani
Journal: Proc Natl Acad Sci U S A Date: 2012-12-10 Impact factor: 11.205

Review 5. Alternative polyadenylation: new insights from global analyses.

Authors: Yongsheng Shi
Journal: RNA Date: 2012-10-24 Impact factor: 4.942

6. Systematic reconstruction of RNA functional motifs with high-throughput microfluidics.

Authors: Lance Martin; Matthias Meier; Shawn M Lyons; Rene V Sit; William F Marzluff; Stephen R Quake; Howard Y Chang
Journal: Nat Methods Date: 2012-11-11 Impact factor: 28.547

7. Gene loops enhance transcriptional directionality.

Authors: Sue Mei Tan-Wong; Judith B Zaugg; Jurgi Camblong; Zhenyu Xu; David W Zhang; Hannah E Mischo; Aseem Z Ansari; Nicholas M Luscombe; Lars M Steinmetz; Nick J Proudfoot
Journal: Science Date: 2012-09-27 Impact factor: 47.728

8. Structure of histone mRNA stem-loop, human stem-loop binding protein, and 3'hExo ternary complex.

Authors: Dazhi Tan; William F Marzluff; Zbigniew Dominski; Liang Tong
Journal: Science Date: 2013-01-18 Impact factor: 47.728

9. Contribution of double-stranded RNA and CPSF30 binding domains of influenza virus NS1 to the inhibition of type I interferon production and activation of human dendritic cells.

Authors: Irene Ramos; Elena Carnero; Dabeiba Bernal-Rubio; Christopher W Seibert; Liset Westera; Adolfo García-Sastre; Ana Fernandez-Sesma
Journal: J Virol Date: 2012-12-19 Impact factor: 5.103

10. Genome-wide control of polyadenylation site choice by CPSF30 in Arabidopsis.

Authors: Patrick E Thomas; Xiaohui Wu; Man Liu; Bobby Gaffney; Guoli Ji; Qingshun Q Li; Arthur G Hunt
Journal: Plant Cell Date: 2012-11-06 Impact factor: 11.277

14 in total

1. Arsenic induces polyadenylation of canonical histone mRNA by down-regulating stem-loop-binding protein gene expression.

Authors: Jason Brocato; Lei Fang; Yana Chervona; Danqi Chen; Kathrin Kiok; Hong Sun; Hsiang-Chi Tseng; Dazhong Xu; Magdy Shamy; Chunyuan Jin; Max Costa
Journal: J Biol Chem Date: 2014-09-28 Impact factor: 5.157

2. Arabidopsis DXO1 possesses deNADding and exonuclease activities and its mutation affects defense-related and photosynthetic gene expression.

Authors: Shuying Pan; Kai-En Li; Wei Huang; Huan Zhong; Huihui Wu; Yuan Wang; He Zhang; Zongwei Cai; Hongwei Guo; Xuemei Chen; Yiji Xia
Journal: J Integr Plant Biol Date: 2019-11-07 Impact factor: 7.061

3. Molecular basis for the interaction between Integrator subunits IntS9 and IntS11 and its functional importance.

Authors: Yixuan Wu; Todd R Albrecht; David Baillat; Eric J Wagner; Liang Tong
Journal: Proc Natl Acad Sci U S A Date: 2017-04-10 Impact factor: 11.205

Review 4. Eukaryotic RNA 5'-End NAD⁺ Capping and DeNADding.

Authors: Megerditch Kiledjian
Journal: Trends Cell Biol Date: 2018-03-12 Impact factor: 20.808

Review 5. New insights into decapping enzymes and selective mRNA decay.

Authors: Ewa Grudzien-Nogalska; Megerditch Kiledjian
Journal: Wiley Interdiscip Rev RNA Date: 2016-07-17 Impact factor: 9.957

6. Molecular mechanism for the inhibition of DXO by adenosine 3',5'-bisphosphate.

Authors: Ji-Sook Yun; Je-Hyun Yoon; Young Jun Choi; Young Jin Son; Sunghwan Kim; Liang Tong; Jeong Ho Chang
Journal: Biochem Biophys Res Commun Date: 2018-09-01 Impact factor: 3.575

7. Intergenic RNA mainly derives from nascent transcripts of known genes.

Authors: Jernej Ule; Nicholas M Luscombe; Federico Agostini; Julian Zagalak; Jan Attig
Journal: Genome Biol Date: 2021-05-05 Impact factor: 13.583

Review 8. Invited review: decoding the pathophysiological mechanisms that underlie RNA dysregulation in neurodegenerative disorders: a review of the current state of the art.

Authors: Matthew J Walsh; Johnathan Cooper-Knock; Jennifer E Dodd; Matthew J Stopford; Simeon R Mihaylov; Janine Kirby; Pamela J Shaw; Guillaume M Hautbergue
Journal: Neuropathol Appl Neurobiol Date: 2015-02 Impact factor: 8.090

9. Structural and biochemical studies of the distinct activity profiles of Rai1 enzymes.

Authors: Vivien Ya-Fan Wang; Xinfu Jiao; Megerditch Kiledjian; Liang Tong
Journal: Nucleic Acids Res Date: 2015-06-22 Impact factor: 16.971

10. Crystal Structure of the SPOC Domain of the Arabidopsis Flowering Regulator FPA.

Authors: Yinglu Zhang; Katarzyna Rataj; Gordon G Simpson; Liang Tong
Journal: PLoS One Date: 2016-08-11 Impact factor: 3.240