| Literature DB >> 28453393 |
Jonathan Neve1, Radhika Patel1, Zhiqiao Wang1, Alastair Louey1, André Martin Furger1.
Abstract
Cleavage and polyadenylation (pA) is a fundamental step that is required for the maturation of primary protein encoding transcripts into functional mRNAs that can be exported from the nucleus and translated in the cytoplasm. 3'end processing is dependent on the assembly of a multiprotein processing complex on the pA signals that reside in the pre-mRNAs. Most eukaryotic genes have multiple pA signals, resulting in alternative cleavage and polyadenylation (APA), a widespread phenomenon that is important to establish cell state and cell type specific transcriptomes. Here, we review how pA sites are recognized and comprehensively summarize how APA is regulated and creates mRNA isoform profiles that are characteristic for cell types, tissues, cellular states and disease.Keywords: 3′end processing; alternative cleavage and polyadenylation; gene expression
Mesh:
Substances:
Year: 2017 PMID: 28453393 PMCID: PMC5546720 DOI: 10.1080/15476286.2017.1306171
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.The cis-elements that define pA sites. The cleavage and polyadenylation machinery relies on key cis elements to mediate 3′end processing. Canonical cis elements include the A[A/U]UAAA hexamer and its variants which lie ∼21 nucleotides upstream of the cleavage site (CS) and a downstream less well defined GU/U-rich element. Additional auxiliary elements may be positioned upstream and/or downstream of the cleavage site and are often U, GU and or G-rich.
Figure 2.The core factors of the cleavage and polyadenylation complex. There are more than 80 proteins associated with the cleavage and polyadenylation machinery but fewer than 20 factors are considered to build the core of the processing complex. The major components are made up of multi-subunit factors including the cleavage and polyadenylation specificity factor CPSF (WDR33, hFip1, CPSF160, CPSF100, CPSF70, CPSF30); the cleavage stimulatory factor CstF (CstF77, CstF64, CstF50), the CFI (CFIm65, CFIm25) and CFII (∼15 subunits). The core factors involved in cleavage and polyadenylation, and the cis elements to which they bind are outlined here. Details of the individual factors are given in the text.
Figure 3.Coding region APA (CR-APA) and UTR APA. Depending on the location of the different pA sites, APA events can be classed into 2 major groups. CR-APA is the result of differential usage of pA sites that are located within the body of the gene and alternative usage produces APA mRNA isoforms that differ in their coding potential. UTR-APA summarizes events where the different pA sites are located downstream of the stop codon and alternative usage modulates 3′UTR length but does not change the coding potential. pA sites can be found in the intron and in the UTR of a gene. Intronic pA sites (pAi) are often cryptic poly A sites (pAc) that need to be actively repressed to enable gene expression. pA sites in the 3′UTR are generally separated into proximal (pAp) or distal (pAd) sites. Usage of the proximal sites generates mRNA isoforms that have a so called constitutive 3′UTR (cUTR) and isoforms that are generated by usage of the distal site contain both the constitutive and alternative 3′UTR (aUTR) regions. The respective resulting APA mRNA isoforms are indicated, dotted lines refer to the removal on introns (i) and fusion of exons (E) and the 5′ splice sites and 3′ slice sites are indicated by the green and purple triangles respectively. The terminal exon is indicated by “tE” and “7meG” refers to the 5′ cap.
Methods of 3′end targeted deep sequencing.
| Category | Technique | Overview | References | |
|---|---|---|---|---|
| 3′end capture | 3P-Seq | A biotinylated primer is added to the poly(A) tail and solely the poly(A) fragments can then be isolated using streptavidin. RT with just TTP is used to fill in poly(A) tail and RNase H is then used to cleave the poly(A) tail leaving just the very 3′end which is then used as input for sequencing library preparation. | ||
| 3′READS | Uses a unique chimeric CU5T45 oligo isolation system, which completely eradicates internal priming and amplification of oligo adenylated transcripts. | |||
| Direct RNA sequencing | DRS | DRS uses the Helicos BioSciences system which starts by using an oligo(dT) coated surface to which the poly(A) tail binds to. Reverse transcription using only dTTP is then used to fill in the entire poly(A) tail. Sequencing is then initiated from the most 3′prime non-A base. | ||
| Oligo(dT)-based priming | 3Seq | Standard oligo(dT)-based priming technique using an oligo(dT)25 containing RT primer, and sequencing the terminal 25 bp upstream of the cleavage site to map the pA site. | ||
| 3′end RNA-Seq | Standard oligo(dT)-based priming technique using paired-end sequencing to obtain strand-specificity | |||
| 3′end-seq | Includes an | |||
| 3Seq | Adapted slightly from 3Seq | |||
| 3′T-fill | Directly before sequencing, the poly(A) tail is filled using TTP resulting in sequencing starting from the base directly upstream of the cleavage site. | |||
| A-seq | Uses an RT primer consists of an anchor nucleotide, followed oligo(dT) sequence with a stem-loop containing the adaptor sequence for priming the subsequent PCR reaction, in the middle of the oligo(dT). | |||
| MAPS | Standard oligo(dT)-based priming using a random primer for second-strand cDNA synthesis. | |||
| PAS-Seq | Variation to the standard oligo(dT)-based priming technique involves using the terminal transferase activity of MMLV reverse transcriptase, which allows generation of cDNAs with linkers in a single RT step, thus skipping several enzymatic steps. | |||
| PolyA-seq | Standard oligo(dT)-based priming using a random primer for second-strand cDNA synthesis. | |||
| Quant-seq | Commercialised standard oligo(dT)-based priming using a random primer for second-strand cDNA synthesis. | |||
| SAPAS | Oligo(dT)-based priming using template switching and optimised primer anchoring in the RT-reaction to avoid sequencing in the long poly(A) tail. | |||
| SMPSS | Uses a single-molecule system based on the HeliScope single molecule sequencer. It is amplification and ligation-free, allowing very little bias in quantitation. |
Figure 4.Distinguishing between ACTIVE and PASSIVE APA. The APA profile can either be modified at the point of cleavage (Active), or at the post transcriptional level (Passive). In active APA, factors that inhibit or enhance one pA site over another produce APA isoforms that can avoid a particular regulatory pathway. On the other hand, in passive APA, the availability of factors such as RBPs (dark red circle) and miRNAs (navy) in the cytoplasm alter the APA profile by specifically downregulating a particular isoform. For example, as depicted here, miRNAs can target the aUTR which can recruit the RNA induced silencing complex (RISC) result in degradation by exoribonucleases (red “PacMan”). Different RBPs that bind to the aUTR can either stabilize or degrade the isoform. In this case although the whole cell APA profile is the same, the nuclear APA profile is different, highlighting the importance of assessing changes in the cytoplasm compared with the nucleus to distinguish Active and Passive APA. This gives a better resolution of the causes that enforce specific APA changes in different environments.
Figure 5.Factors that regulate APA at the point of cleavage. Numerous RNA-binding proteins, and environmental stresses have been associated with modulating active APA at the point of cleavage. Factors are grouped into enhancing (green) or repressing (red) effects on a particular site and factors that are between a green and red bracket can either enhance or repress a site depending on the circumstances. For more details, see Tables 2, 3 and 4 or download the interactive slide. Red lines indicate inhibitory effects on pA sites and green lines indicate enhancing effects of factors on particular pA sites. Black and gray dots with arrows indicate the position of the different types of pA sites: (pAi) = intronic pA site; (pAc) = cryptic pA site; pAp, proximal pA site; pAd = distal pA site. The gene structure is detailed by specifying introns as blue double lines (i) and exons as black double lines (E) and the 5′ splice sites and 3′ splice sites are indicated by yellow and purple triangles respectively. The terminal intron is symbolised by tE.
pA-factors known to influence pA efficiency and may be involved in regulating APA.
| Factor | Motif bound | Proposed model | References |
|---|---|---|---|
| CFIm | (UGUA)n | At high levels, CFIm interacts with suboptimal CFIm binding sites preventing the interaction of CPSF with these proximal pA sites and promotes usage of distal pA sites. Depletion of CFIm allows the interaction of CPSF with proximal pA sites, resulting in 3′UTR shortening. At the single gene level increase of CFIm causes distal pA usage in MeCP2, a protein which is important for brain function. Thus CFIm mediated APA in MeCP2 links APA to neuropsychiatric conditions. | |
| CstF64 | U-rich | Co-depletion of CstF64 and CstF64τ leads to APA shifts in a small number of genes primarily to the distal pA site, which is thought to be reflective of the general higher efficiency of distal pA sites. Furthermore, CstF64 has been found to promote usage of weaker pA sites containing the downstream GUKKU motif. | |
| CstF77 | High levels of CstF77 result in activation of the pA site in intron 3 of CstF77 gene resulting in a negative feedback loop. Additionally, it influences both shortening and lengthening event changes in APA profiles of cell cycle genes, specifically where U-rich regions surround the pA sites. | ||
| hFip1 | A component of CPSF complex. Regulation of APA by Fip1 is dependent on the distance between pA sites. When far apart, low-levels of Fip1 result in reduced pA efficiency and decreased use of weaker, proximal pA sites. When close together, Fip1 blocks CstF binding at the proximal site and therefore results in distal pA site usage. | ||
| Pcf11 | Pcf11 is a component of CFIIm. It binds directly to the pre-mRNA and enhances the use of proximal pA sites through direct binding to the pre-mRNA. | ||
| βCstF-64 | A neuronal splice variant of CstF64 that associates with the CstF complex and stimulates pA thereby activating weaker pA sites. | ||
| Star-PAP | AUA | Star-PAP is a noncanonical poly (A) polymerase. It associates with RNAs that have an AUA motif upstream of a pA site that also has a suboptimal DSE. This Star-PAP mediated selection of pA sites may play a role in the regulation of APA. | |
| PABPC1 | No obvious | ||
| PABPN1 | Promotes the use of distal pA sites by inhibiting pA at weaker proximal pA site through competition with CPSF for binding to the PAS. Reduced availability of functional PABPN1 in OPMD causes widespread 3′UTR shortening. |
Other RNA-binding factors known to influence pA efficiency.
| Factor | Motif bound | Proposed model | References | |
|---|---|---|---|---|
| α CP (αCP) | C-rich motifs | αCP binds mRNAs containing a subgroup of C-rich elements in their UTRs and acts as an upstream 3′end processing enhancer. Usage of distal or proximal pA sites can be influenced depending on upstream C-rich regions close to the respective pA site by varying αCP levels. | ||
| Cirbp and Rbm3 | GNNGNNG | Upon cold-shock, these factors are upregulated and, through 3′UTR binding, inhibit the use of proximal pA sites. | ||
| CPEB1 | CPE | CPEB1 shuttles to the nucleus binding cytoplasmic polyadenylation elements and enhances polyadenylation at nearby pA sites. Also, it prevents U2AF65 binding, which inhibits splicing. CPEB1 in the nucleus causes shortening and this correlates with cell proliferation. | ||
| DICER | Nuclear Dicer affects pA site usage by modifying the chromatin landscape surrounding the 3′end processing sites. In a region of closed chromatin Pol II progression is slowed down, increasing the likelihood that a weak pA site is recognized. In contrast if the weak pA site is in an open conformation, Pol II progression is fast decreasing pA site usage. | |||
| ELAV ( | In the neuronal tissues, ELAV is recruited to the promoter-paused Pol II complex. Upon resuming transcription, ELAV is deposited near proximal pA sites, inhibiting their usage, resulting in extended 3′UTRs. | |||
| FUS | UGGUU | FUS binds directly downstream of a proximal pA site, which enhances CPSF160 recruitment and activates the pA site leading to short transcripts. If there is no pA site upstream of a FUS binding site, FUS binding causes Pol II stalling and premature termination, producing short transcripts that are not polyadenylated. | ||
| hnRNP C | U-rich | hnRNP C binds to U-rich sequences, which masks the pA site in its vicinity to represses their use. The transcripts affected by hnRNP C mediated APA are enriched in ELAVL1 binding sites and this process may thus be linked to the HuR ( | ||
| hnRNP F | G-rich DSE | Competes with CstF-64 by binding to G-rich motifs near pA sites. | ||
| hnRNP H1 | Auxiliary DSE | Depletion results in a general shift to distal pA sites, with hnRNP H1 binding sites surrounding proximal pA sites. | ||
| hnRNP H2 | G-rich | Binds near pA sites and enhances binding of CstF-64. | ||
| hnRNP K | UCCCUU | Competes with CFI for binding the pre-mRNA, reducing pA efficiency and reduced usage of that pA site. | ||
| hnRNP L | CA-rich elements | Functions as a splicing regulator, so altering levels of hnRNP L can sway the balance between competing splicing and intronic pA events | ||
| HuR ( | AU-rich elements (AREs) | HuR ( | ||
| When associated with particular aUTRs, HuR can also control the final destination of the protein product. For example, the | ||||
| Mbnl proteins | R/YGCY | Muscleblind-like proteins (Mbnl) are important regulators of alternative splicing during development. Mbnl is also implicated in APA and can either inhibit pA site usage if it binds close to a pA site or enhance pA site usage if it binds further upstream. Inhibition is thought to occur through steric hindrance. Mbnl is critical for creating a normal APA landscape during development and dysregulation of this process is associated with myotonic dystrophy. | ||
| MED23 | Mediator complex subunit 23 (MED23) interacts with hnRNP L and affects hnRNP L regulated APA events, possibly by controlling hnRNP L occupancy at the promoter. | |||
| Nkx2-5 | In conjunction with Xrn2, Nkx2-5 regulates pA site usage which is of high importance during mouse heart development. This tissue specifically expressed factor regulates APA, and its knockdown causes 3′UTR lengthening. | |||
| Nova | YCAY | NOVA is a neural-specific factor that binds YCAY elements in the 3′UTR. Depending on the location of these motifs, binding of NOVA can influence pA site choice by suppressing their use. | ||
| Paf1C | Depletion of some Paf1C subunits (Paf1, Cdc73,Ski8) results in global 3′UTR shortening. Regarding CR-APA, only Paf1 and Cdc73 depletion activated coding region pA sites. Paf1C subunits also play a role in suppressing transcription site intronic pA sites. Absence of Paf1 may cause increased Pol II pausing, which stimulates recognition of a pA site in the coding region. | |||
| PTB | G-rich USE | PTB competes with CstF64 to bind the DSE, thereby inhibiting pA site usage. However, it can also aid recruitment of hnRNP H1, which stimulates pA site usage. | ||
| RBBP6 | unknown | RBBP6 competes with its isoform iso3 for binding with the core pA machinery. When RBBP6 is bound, it enhances pA site cleavage efficiency and promotes the use of weaker proximal pA sites. RBBP6 and iso3 particularly affect APA in transcripts that have AU-rich 3′UTRs such as c-jun. | ||
| SRm160 | Unknown | Enhances pA through the association with CPSF. | ||
| SRSF3 | CNUC | Promotes biogenesis of long 3′UTR APA isoforms and regulates their nuclear cytoplasmic export. | ||
| SRSF7 | Promotes biogenesis of short 3′UTR APA isoforms and regulates their nuclear cytoplasmic export. | |||
| TDP-43 | UG rich | High levels of TDP-43 cause inhibition of pA1 site in intron 7 of its own | ||
| THOC5 | THOC5 is a member of the human transcription export complex (TREX). THOC5 knockdown activates proximal pA site usage. It is suggested that THOC5 recruits CFIm68 to target genes, promoting distal pA site usage. | |||
| U1 snRNP | AGGURAGU | Suppresses cryptic pA sites in the gene body, which is essential for the formation of full-length transcripts. Shown to suppress premature transcription termination in polycistronic pre-mRNAs in | ||
| U1 snRNA levels drop after UV-induced DNA damage and activate intronic pA sites. | ||||
| U1A | AUGCN(1–3)C | Component of U1 snRNP which binds GU-rich regions downstream of pA sites inhibiting the binding of CstF64, thus inhibiting polyadenylation. U1A can also bind to PAP inhibiting the polyadenylation reaction itself. | ||
| U1A is known to inhibit polyadenylation of its own mRNA, and has also been shown to act independently of U1 snRNP to inhibit polyadenylation of the | ||||
| U2 | 3SS | U2 interacts with CPSF and enhances polyadenylation efficiency. | ||
| U2AF | Pyrimidine tract | U2AF interacts with CFI stimulating pA. |
Features and conditions that can influence pA site choice.
| Factor | Motif bound or affected | Proposed model | References |
|---|---|---|---|
| DNA methylation (imprinting) | CpG islands | The methylation status of CpG islands influences pA site selection in the murine imprinted gene | |
| E2F | TTGGCGG | Through enhanced proliferation, increased levels of the transcription factor E2F result in the increased use of proximal pA sites by upregulation of key 3′end processing genes. | |
| Nucleosome positioning | High nucleosome occupancy directly upstream of proximal pA sites generally correlates with increased proximal pA usage. | ||
| Transcription rate | Slow transcription rates result in a longer time between when the proximal and distal pA sites are transcribed thereby causing in an increased probability of proximal pA site utilization. At the single gene level, pausing downstream of the intronic (μS) pA site in the | ||
| H3K4me3 levels | Chromatin status regulates pA site choice. An “open chromatin” state as measured by high H3K4me3 levels in spermatids compared with spermatocytes influences pA site usage resulting in global UTR shortening accompanied with greater transcript stability. | ||
| Neuronal activity | Neuronal activity promotes the use of proximal and internal pA sites affecting many transcription factor MEF2 target genes. | ||
| Stress: arsenite, anisomycin, viral stress | Viral stress or cells exposed to stress agents such as arsenite and ansiomycin tend to enhance the usage of intergenic pA sites and generate 3′extended transcripts. Ansiomycin mediated stress also suppressed intronic pA sites and pA sites that are located in the ORF. No clear trend is observed regarding 3′UTR-APA events. |
Figure 6.Consequences of APA: APA-isoform dependent decay rates and protein output. The 3′UTR length changes arising from APA can have implications on mRNA localization and transcript stability, which can impact on protein output and also determine the final destination of the encoded protein. This figure depicts the case where a short 3′UTR evades miRNA target sites in the aUTR, making it a more stable transcript, enabling increased protein output (protein symbolised by gray globules; ribosomes symbolised by mustard colored structures). The longer isoform shown here is bound by an RBP (dark green) in the nucleus, which prevents its export into the cytoplasm. The transcripts that are exported can be targeted for degradation by miRNA binding to the aUTR. The aUTR of the longer isoform can also bound by an RBP (dark red circle) in the cytoplasm which alters the localization of the transcript, for example in close proximity to the Endoplasmic Reticulum, for protein synthesis. Therefore, the UTR is important in mediating nuclear export, transcript stability, translatability and mRNA localization and the modulation of this is achieved by changing the expression of RBPs and miRNAs.
Cis-elements in the 3′UTR.
| class of | Sequence element | Overview | References | |
|---|---|---|---|---|
| AU-rich elements (AREs) | AUUUA | These are present in 5–8% of all genes and can trigger mRNA destabilisation and translational repression. This is triggered by the binding of ARE-binding proteins (ARE-BPs), including TTP. | ||
| The Hu family of proteins bind AREs and stabilize the corresponding transcript, particularly during neuronal differentiation. | ||||
| GU-rich elements (GREs) | GUUUG | Contained in at least 5% of human mRNAs and triggers mRNA deadenylation and degradation. Acts through binding of proteins from the CELF family. | ||
| CU-rich elements (CUREs) | (C/U)CCANxCCC | PTB is the best-characterized CURE-binding protein and can affect translational repression, polyadenylation and mRNA stability. | ||
| (U/A)PyxUC(C/U)CC | ||||
| CA-rich elements (CAREs) | (CA)n | A stabilizing dinucleotide repeat, which acts primarily via hnRNP L binding, which alters the susceptibility of the mRNA to endo- and exonucleases. | ||
| microRNA target sites | NNNNNNN | By far the most common destabilising element and target sites are present in > 60% of all genes. Regulation is primarily done via destabilisation of target mRNA (> 84%), rather than translational inhibition. |
Examples of genes producing differentially regulated UTR-APA isoforms.
| Gene | Gene function | Summary | References | |
|---|---|---|---|---|
| Cell cycle regulator | 3′UTR shortening is seen in cancer cell lines relative to normal tissues, thereby avoiding regulation by miR-15/16. Preferential use of the proximal pA site has been shown to increase the number of cells present in S-phase. | |||
| Cell cycle regulator | Usage of the proximal pA sites avoids miRNA-mediated repression, resulting in increased CDC6 protein levels. This is triggered by the potent proliferation signal 17β-estradiol (E2), and may, therefore, be a mechanism by which the cell promotes cell cycle progression in response to proliferation signals. | |||
| RNA-binding protein | The aUTR region of HuR mRNA region contains an ARE region where HuR and TTP competitively bind, resulting in mRNA stabilization or destabilisation respectively. This, therefore, creates an autoregulatory loop, which may amplify the pathological role of HuR. | |||
| Stabilises microtubules, specifically in neurons | UTR-APA isoforms are differentially regulated in neuroblastoma cell lines, with miR-34 family members targeting solely the distal APA isoforms. This gene encodes the Tau protein, which is one of the key components of protein aggregates formed during Alzheimer disease. | |||
| DNA repair | Glioblastomas are shown to shift pA site usage to a distal site, resulting in the inclusion of target sites for miR-767-3p, miR-181d and miR-648, thus reducing the expression of | |||
| Transcription factor which controls myogenesis | In quiescent muscle stem cells, APA results in the production of | |||
| Putative modulator of heterotrimeric G proteins | Several AREs are located between 2 pA sites in the 3′UTR of | |||
| Neuron development | ||||
| Transcription of ZFR to produce miR-579 also regulates CPSF2 in a negative feedback loop. The longer CPSF2 isoform is targeted by miR-579, favoring the usage of the proximal pA site, which is resistant to regulation by miR-579. |