Benoit P Nicolet1,2, Nordin D Zandhuis1,2, V Maria Lattanzio1,2, Monika C Wolkers1,2. 1. Department of Hematopoiesis, Sanquin Research and Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands. 2. Oncode Institute, Utrecht, The Netherlands.
Abstract
T cell homeostasis, T cell differentiation, and T cell effector function rely on the constant fine-tuning of gene expression. To alter the T cell state, substantial remodeling of the proteome is required. This remodeling depends on the intricate interplay of regulatory mechanisms, including post-transcriptional gene regulation. In this review, we discuss how the sequence of a transcript influences these post-transcriptional events. In particular, we review how sequence determinants such as sequence conservation, GC content, and chemical modifications define the levels of the mRNA and the protein in a T cell. We describe the effect of different forms of alternative splicing on mRNA expression and protein production, and their effect on subcellular localization. In addition, we discuss the role of sequences and structures as binding hubs for miRNAs and RNA-binding proteins in T cells. The review thus highlights how the intimate interplay of post-transcriptional mechanisms dictate cellular fate decisions in T cells.
T cell homeostasis, T cell differentiation, and T cell effector function rely on the constant fine-tuning of gene expression. To alter the T cell state, substantial remodeling of the proteome is required. This remodeling depends on the intricate interplay of regulatory mechanisms, including post-transcriptional gene regulation. In this review, we discuss how the sequence of a transcript influences these post-transcriptional events. In particular, we review how sequence determinants such as sequence conservation, GC content, and chemical modifications define the levels of the mRNA and the protein in a T cell. We describe the effect of different forms of alternative splicing on mRNA expression and protein production, and their effect on subcellular localization. In addition, we discuss the role of sequences and structures as binding hubs for miRNAs and RNA-binding proteins in T cells. The review thus highlights how the intimate interplay of post-transcriptional mechanisms dictate cellular fate decisions in T cells.
T cells are key in our defense against infections. Their capacity to differentiate into various effector cell types allows T cells to respond to a plethora of pathogens. T cells also differentiate into a variety of memory T cells, which can rapidly respond to recurring infections.
To sustain the memory T cell pool long‐term, memory T cells undergo homeostatic proliferation. These differentiation processes, combined with the capacity of T cells to respond to pathogenic insults, require that both the transcriptome and the proteome can undergo profound and rapid remodeling.
,
To achieve this remodeling, substantial switches occur at several layers of regulation, such as the metabolome, the transcription rate, the control of mRNA, and the translation efficiency.A second layer of the intricate regulation of the proteome can be found on the RNA level. This includes different RNA splicing events and RNA modifications. Furthermore, it defines the subcellular localization of RNA, the RNA stability, and the level of translation into protein. These post‐transcriptional events are critical for the fine‐tuning of gene expression during T cell activation and T cell differentiation.
,
As much as 50% of the alterations in the transcriptome of activated T cells occurs independently of de novo transcription, highlighting the potential of post‐transcriptional regulation in T cells.Post‐transcriptional regulation (PTR) is executed at many different levels (Figure 1). Factors such as microRNAs (miRNAs) and RNA‐binding proteins (RBPs) define the fate of the transcript and the protein output. To interact with the target transcript, sequence and structure of the RNA are key. However, sequence and structure are not only required as hubs for the RNA‐binding factors, but also other features of the sequence, such as GC content, sequence motifs, and codon usage, can contribute to the level of gene expression. In fact, the mRNA—defined as the coding sequence and the two untranslated regions (UTRs)—contains ample information in its sequence and determines its fate. In this review, we will discuss how the sequence itself—in concert with the RNA‐binding factors—contributes to gene and protein expression in T cells.
FIGURE 1
General principles of post‐transcriptional regulation. Schematic representation of post‐transcriptional regulatory events in human cells. In the nucleus, pre‐miRNA is spliced (1) and polyadenylated (2). Constitutive splicing results in the inclusion of all exons, while alternative splicing results in the exclusion of exons (1). Intron retention leads to the inclusion of intronic sequences in the mature mRNA molecule (1). Spliced mRNA molecules are polyadenylated at one of the two polyadenylation sites (PAS) present in the 3′UTR in exon 3 (2). Canonical polyadenylation makes use of the PAS most proximal to the 3′ end of exon 3, alternative polyadenylation makes use of a more distal PAS site (2). In the cytoplasm, miRISC complexes bind to complementary mRNA sequences, and RBPs bind to linear or structural sequences to regulate mRNA stability and/or translation (3, 4). Sequence characteristics located in the CDS and sequence elements found in the 3′UTR also mediate the localization of mRNAs to different cellular locations, including P‐bodies, stress granules, and TIS granules (5)
General principles of post‐transcriptional regulation. Schematic representation of post‐transcriptional regulatory events in human cells. In the nucleus, pre‐miRNA is spliced (1) and polyadenylated (2). Constitutive splicing results in the inclusion of all exons, while alternative splicing results in the exclusion of exons (1). Intron retention leads to the inclusion of intronic sequences in the mature mRNA molecule (1). Spliced mRNA molecules are polyadenylated at one of the two polyadenylation sites (PAS) present in the 3′UTR in exon 3 (2). Canonical polyadenylation makes use of the PAS most proximal to the 3′ end of exon 3, alternative polyadenylation makes use of a more distal PAS site (2). In the cytoplasm, miRISC complexes bind to complementary mRNA sequences, and RBPs bind to linear or structural sequences to regulate mRNA stability and/or translation (3, 4). Sequence characteristics located in the CDS and sequence elements found in the 3′UTR also mediate the localization of mRNAs to different cellular locations, including P‐bodies, stress granules, and TIS granules (5)
INTRINSIC SEQUENCE CHARACTERISTICS AS POST‐TRANSCRIPTIONAL REGULATOR
Here, we discuss how the inherent characteristics of the sequence of a gene can define RNA stability and the translation efficiency in the context of T cells, and how these intrinsic sequence characteristics modulate gene expression.
Sequence conservation
The sequence of our genome is under continuous pressure of evolution. This results in the removal and the addition of regulatory elements that define the mRNA and protein expression levels.
,
Intriguingly, highly conserved genes generally display higher gene expression levels. This also holds true for the gene expression in human CD8+ T cells.
We observe that highly conserved genes have generally higher mRNA and protein expression levels than genes with intermediate or low conservation levels.
Furthermore, integration of the mRNA and protein abundance reveals that genes with a high sequence conservation have on average a higher protein‐to‐mRNA ratio than genes with low sequence conservation.
This finding implies that translation, and possibly also other post‐transcriptional events, is influenced by sequence conservation. This complements the known effect of sequence conservation on promoter regions and other transcription regulation mechanisms.
,
It also indicates that the sequence of the gene product itself contributes to the gene expression levels. Furthermore, it suggests that post‐transcriptional gene regulation is associated with sequence conservation. Investigating the gain and loss of regulatory elements through sequence conservation in the context of PTR could thus help reveal the key regulators in human T cells.The evolutionary pressure on sequences in the gene body is also detectable when measuring the length of untranslated regions. For example, whereas the median length of 5′UTRs in zebrafish is only 131nt, the median length of mouse 5′UTRs is 175nt, and median length of human 5′UTRs is 218nt.
Similarly, also the median length of 3′UTRs is substantially longer in humans with 804nt compared to zebrafish, where 3′UTRs only reach a mere median length of 402nt, or mouse 3′UTRs that have a median length of 704nt.
The correlation of gene expression and the 5′UTR or 3′UTR length also holds true when one compares the UTR length with the gene expression levels (Figure 2A). Genes with the 25% longest 5′UTRs (> 200nt) in human CD8+ T cells have significantly lower protein expression levels than those with the 25% shortest 5′UTR (<50nt
;). This correlation of the UTR length with gene expression is even more pronounced for the 3′UTR: both the mRNA and protein expression levels significantly decrease as the length of the 3′UTR increases.
Furthermore, the length of both the 5′UTR and the 3′UTR influences the protein‐to‐mRNA ratio,
implying that the UTRs also influence the translational output of mRNAs. Of note, the length of 3′UTRs may not per se be the prime factor influencing the mRNA expression levels and/or the protein output. Rather, a longer 5′UTR or 3′UTR increases the likelihood of containing regulatory elements, such as binding hubs for miRNA or RNA‐binding proteins (RBPs). In addition, the number of a well‐studied RBP hub, the Adenine‐Uridine rich elements (AREs; as defined by the motif AUUUA), increases with the 3′UTR length in drosophila
and in humans (Figure 2B). Of note, the occurrence of AUUUA motifs is higher than that of GUUUG and CUUUC motifs (Figure 2B), suggesting that the increased occurrence of AREs may not be solely present by chance in longer 3′UTRs. Hence, the sequence conservation of genes and the length of the UTRs are directly linked to gene expression, both at the mRNA and at the protein level.
FIGURE 2
Relation between the length of 3′UTR and sequence motif occurrences. (A) Schematic representation how the length of the 5′UTR and 3′UTR influences gene expression. (B) Comparison of the genome‐wide 3′UTR length of human protein coding genes with the occurrence of sequence motifs AUUUA (left panel), CUUUC (middle panel), and GUUUG (right panel). The blue line represents the linear model fitted to the data. 3′UTR sequence for human gene annotated as protein coding were obtained from Ensembl release 104 using BioMart
Relation between the length of 3′UTR and sequence motif occurrences. (A) Schematic representation how the length of the 5′UTR and 3′UTR influences gene expression. (B) Comparison of the genome‐wide 3′UTR length of human protein coding genes with the occurrence of sequence motifs AUUUA (left panel), CUUUC (middle panel), and GUUUG (right panel). The blue line represents the linear model fitted to the data. 3′UTR sequence for human gene annotated as protein coding were obtained from Ensembl release 104 using BioMart
G‐rich regions and the GC content
Evolutionary sequence conservation can also result in an altered guanine‐cytosine (GC) content in gene sequences. These alterations in GC content can occur both in the UTRs and in the coding region. In the context of CD8+ T cells, a high GC content per gene is a good predictor of high mRNA and protein levels, as well as the protein‐to‐mRNA ratio.
Intriguingly, the location of high GC content defines the effect on gene expression (Figure 3A). Sequences with a high GC content can create tighter tertiary structures, which should in principle hamper translation. However, when these regions are present in the coding sequence, the opposite occurs. A high GC content in the mRNA coding region leads to increased mRNA levels, higher translation efficiency, and thus to more protein output.
,
,
,
For example, increasing the GC content of the IL‐6 and IL‐2 coding sequence, without changing the amino‐acid sequence, results in higher mRNA stability, ribosome occupancy, and protein output.
,
In line with this finding, coding sequence (CDS) parameters such as the GC content are predictive of mRNA and protein expression levels in human medulloblastoma cells.
Of note, altering the GC content of the coding region can also result in changes of the codon usage, as defined by the GC content at the third position of an amino acid codon, the so‐called GC3. This can also influence the translation rate of an mRNA, as discussed below.
FIGURE 3
Intrinsic sequence parameters as post‐transcriptional regulation. (A‐B) Schematic representation how intrinsic sequence parameters influence gene expression. (A) Effect of GC content in the 5′UTR (forming G‐quadruplexes), coding region (CDS), and the 3′UTR (increasing the strength of mRNA structures) on gene expression. (B) Effect of codon usage on mRNA and protein levels. tRNAs recognizing individual codons for a given amino acid can be found at low and at high abundance. mRNAs that use codons for highly abundant tRNA (green) will translate faster than those with codons for low abundance tRNA (yellow), which in turn affects protein and mRNA levels
Intrinsic sequence parameters as post‐transcriptional regulation. (A‐B) Schematic representation how intrinsic sequence parameters influence gene expression. (A) Effect of GC content in the 5′UTR (forming G‐quadruplexes), coding region (CDS), and the 3′UTR (increasing the strength of mRNA structures) on gene expression. (B) Effect of codon usage on mRNA and protein levels. tRNAs recognizing individual codons for a given amino acid can be found at low and at high abundance. mRNAs that use codons for highly abundant tRNA (green) will translate faster than those with codons for low abundance tRNA (yellow), which in turn affects protein and mRNA levelsA high GC content and the presence of G‐rich regions in 5′UTRs can in turn substantially affect gene expression levels (Figure 3A). G‐rich regions are found in 8% of all human 5′UTRs and form highly stable G‐quadruplex tertiary mRNA structures.
They can effectively repress mRNA translation, because unwinding of the G‐quadruplex structure is required for translation initiation.
,
,
For example, the oncogene NRAS contains a G‐quadruplex in its 5′UTR, which is conserved in mammals, and which strongly represses its protein expression.
In the 3′UTR, a high GC content correlates with shorter mRNA half‐life, and with overall lower mRNA levels in murine CD4+ T cells and Jurkat cells.
Again, the higher GC content in the 3′UTR results in increased tertiary structure formation.
The mechanisms by which GC content leads to altered expression are not yet fully understood. Indeed, in addition to changes in the tertiary structure, sequences with high GC content could also possibly directly influence the affinity or capacity of RNA‐binding proteins or miRNAs to interact with mRNA (see below), for instance by altering the sequence that are adjacent to the binding hub of these regulators. Altogether, GC content inherently affects many levels of the gene expression regulation, and it does so in a region‐specific manner.
Codon usage
GC content and sequence conservation are intrinsically linked to codon usage in the mRNA. Codon usage dictates the choices of transfer‐RNA (tRNAs) carrying the amino acid to the translating ribosome. Yet, due to the redundancy of codon usage for a given amino acid, several codons can encode for the same amino acid. In turn, tRNAs corresponding to the different codons encoding for the same amino acid can be expressed at varying concentrations. For example, Leucine, an essential amino acid, is encoded by 6 different codons, of which the respective tRNAs are found in unequal concentrations in CD4+ T cells.
In fact, the use of a rare codon leads to an increased dwelling time of the ribosome while resolving the low abundance tRNA
(Figure 3B). Optimal codon usage is thus key to ensure rapid and copious protein production by matching the tRNA abundance in the cells. For example, optimizing the codon usage of a transgenic TCR construct based on the average murine tRNA expression levels leads to improved TCR protein expression.
Similar findings were obtained by optimizing the codon usage for the IL15 mRNA, when the protein production was measured in human T cells.In cell lines and in human tissues, the mRNA coding sequence is a key determinant of global protein expression levels.
,
It could even override the 3′UTR‐mediated regulation of expression of the p53 mRNA.
Surprisingly, codon usage and coding sequence parameters do not only affect the protein production but also appear to modulate the mRNA expression levels and the mRNA half‐life.
,
For example, chemical inhibition of translation in activated murine CD4+ T cells substantially influences the mRNA half‐life.
Intriguingly, this occurs in a transcript‐specific manner.
This study also reveals that the degradation of a subset of mRNAs is translation‐dependent and again strongly correlates with codon usage.
Thus, codon usage could possibly cause transcript‐specific translation‐dependent degradation. Interestingly, this study suggests that translation can directly affect the expression of mRNA. Yet more investigation is needed to fully understand the mechanisms.Whereas the codon usage remains fixed in each mRNA, the tRNA abundance differs between cell types.
The expression levels of individual tRNAs can also be dynamic.
,
For instance, the tRNA abundance greatly altered throughout activation of murine CD4+ T cells.
In addition to alterations in tRNA abundance, as a result of the remodeling of their metabolism, the availability of amino acids also alters upon T cell activation.
It is therefore conceivable that some mRNAs are sub‐optimally translated in resting T cells due to their codon usage, yet they may acquire a competitive advantage for translation in activated T cells due to the alterations in tRNA abundance and amino acid availability.Not only the tRNA abundance and the amino acid availability but also the transcript expression alters upon T cell activation.
,
As a consequence, the composition of codons that are required for translation, termed the codon demand, also changes.
In resting naive CD4+ T cells, the codon demand by the transcriptome is mostly in line with the abundance of tRNAs. Yet the AAG tRNA, carrying a lysine, is in over‐abundance compared to the codon demand.
As lysine is one of the most highly incorporated amino acid upon CD4+ T cell activation, the over‐abundance of lysine tRNA could possibly facilitate the rapid translation rate that is observed during early T cell activation.
Whereas this is an intriguing hypothesis, the full interplay between codon demand and tRNA abundance and its regulatory potential are yet to be uncovered. In conclusion, the intrinsic sequence characteristics of an mRNA are critical determinants of mRNA expression and of the protein output in T cells.
SEQUENCES AND STRUCTURAL MOTIFS AS HUBS FOR POST‐TRANSCRIPTIONAL REGULATORS
The role of sequences in determining the fate of mRNA, and of translation into proteins also stems from their function as binding hubs for post‐transcriptional regulators such as miRNAs and RNA‐binding proteins (RBPs). Also, long non‐coding RNA (lncRNA) was reported to modulate splicing, mRNA stability, and translation in a sequence‐specific manner (reviewed elsewhere
). How and to which extend lncRNA participate to the PTR in T cells is to date not known. We therefore focus here on miRNAs and RBPs. These post‐transcriptional regulators have different means to interact with the sequences, and exert different roles in T cell differentiation, which is discussed below.
miRNAs
miRNAs are short, 21‐24 nucleotide long non‐coding RNAs that interact with the target mRNA by pairing with the complementary “seed‐match” site located within the target mRNA
(Figure 4A). Through the so‐called “seed” sequence of 7‐8 nucleotides in length that is located in the 5′ region of miRNAs, miRNAs engage with the target mRNA.
Furthermore, additional pairing can occur with the 3′ end sequence of the miRNA.
,
The vast majority (98%) of all RefSeq annotated genes are considered potential targets of miRNAs.
miRNAs have been reported to interact with coding sequences (CDS), with 5′UTRs, intronic regions and 3′UTRs of their target mRNA.
,
miRNAs form a so‐called miRNA‐induced Silencing Complex (miRISC), with the Argonaut family proteins as core of this complex. When interacting with their target mRNA, the miRISC can interfere with translation and promote mRNA degradation.
Recent findings suggest that miRNAs interacting with the CDS are more potent in inhibiting translation, while miRNAs that bind the 3′UTR of their target mRNA are more efficient in inducing mRNA degradation.
,
FIGURE 4
Sequences and structural motifs as hubs for post‐transcriptional regulators. (A) Schematic representation of miRNAs binding to complementary mRNA sequences. Left panels: canonical miRNA binding with a perfect match between the seed sequence and the seed‐match sequence. Right panels: non‐canonical miRNA binding with an imperfect match between the seed sequence and the seed‐match sequence. miRNAs can also employ 3′ sequence to enhance sequence specificity (bottom panels). (B) Schematic representation of the 3′UTR of IFNG mRNA. Deletion of the AU‐rich elements (AREs) substantially increases the IFN‐γ protein production (C). Linear and structural elements in the 3′UTR of mRNAs as binding hubs for RBPs. Whereas TTP family members (TTP, ZFP36L1, and ZFP36L2) recognize linear AREs, Regnase‐1 and Arid5a bind to stem loops. (D) Competition between Regnase‐1 and Arid5a for the binding of stem loops present in the 3′UTR of IL‐6 mRNA. Whereas Regnase‐1 destabilizes IL‐6 mRNA, Arid5a increases the stability of the IL‐6 mRNA
Sequences and structural motifs as hubs for post‐transcriptional regulators. (A) Schematic representation of miRNAs binding to complementary mRNA sequences. Left panels: canonical miRNA binding with a perfect match between the seed sequence and the seed‐match sequence. Right panels: non‐canonical miRNA binding with an imperfect match between the seed sequence and the seed‐match sequence. miRNAs can also employ 3′ sequence to enhance sequence specificity (bottom panels). (B) Schematic representation of the 3′UTR of IFNG mRNA. Deletion of the AU‐rich elements (AREs) substantially increases the IFN‐γ protein production (C). Linear and structural elements in the 3′UTR of mRNAs as binding hubs for RBPs. Whereas TTP family members (TTP, ZFP36L1, and ZFP36L2) recognize linear AREs, Regnase‐1 and Arid5a bind to stem loops. (D) Competition between Regnase‐1 and Arid5a for the binding of stem loops present in the 3′UTR of IL‐6 mRNA. Whereas Regnase‐1 destabilizes IL‐6 mRNA, Arid5a increases the stability of the IL‐6 mRNAmiRNAs are important mediators in T cells for several key processes, including T cell differentiation, survival, proliferation, migration, and T cell effector function.
,
,
The abundance of miRNAs is highly variable in T cells. For example, whereas the miRNAs miR‐16, miR‐142‐3p, miR‐150, miR‐142‐5p, miR‐15b, and let‐7f are abundantly expressed in naive CD8+ T cells, they become downregulated upon T cell activation.
In fact, this downregulation in expression upon T cell activation holds true for the majority of miRNAs.
One exception to this rule is the 7 miRNAs located in the miRNA cluster 17 ~ 92. Constitutive overexpression of this cluster in mice reveals that the miR‐17 ~ 92 cluster promotes T cell proliferation and T cell survival through direct targeting of the proliferation inhibitor PTEN and the pro‐apoptotic molecule Bim, respectively.
The miR‐17 ~ 92 cluster also plays an important role in T cell activation by repressing negative regulators of T cell activation, including Phlpp2, Cyld, and Rcan3, which results in increased calcineurin/NFAT signaling.The importance of miRNAs in T cell development and differentiation is exemplified by the severely impaired development and differentiation of T cells in mice that are devoid of the miRNA processing ribonuclease Dicer, which impairs the overall miRNA expression.
,
Recent studies also highlight the role of individual miRNAs. For instance, genetic deletion of the miR‐17 ~ 92 cluster in mice results in the generation of fewer effector T cells, but an increase of the formation of memory CD8+ T cells upon lymphocytic choriomeningitis virus (LCMV) infection.
Conversely, T cell–specific deletion of miRNA 15/16 family members restrains the memory formation of CD8+ T cells by targeting a number of potentially relevant targets, including Il7r and Pim1.
Similarly, miR‐150 is involved in regulating CD8+ T cell differentiation by directly repressing the expression of Foxo1, which promotes the expression of the T cell memory inducing transcription factor TCF1.
Loss of miR‐150 therefore skews CD8+ T cell differentiation toward memory formation.The effector function of CD8+ T cells relies on the expression of a set of key proteins, including the inflammatory cytokines TNF‐α and IFN‐γ, and the cytotoxic molecules perforin and granzymes. miRNAs are important regulators of inflammatory cytokine production. In fact, Dicer‐deficient T cells produce twice as much IFN‐γ as their WT counterparts.
One of the known miRNAs that regulate the gene expression of IFN‐γ in T cells is the miR‐29 family.
,
Regulation of IFN‐γ has been proposed to occur either through direct interactions to the 3′UTR of Ifng mRNA, or indirectly through regulation of the T‐box transcription factors.
,
Likewise, the expression of cytotoxic molecules, that is, Perforin, and multiple granzymes (Gzmb, Gzmc, Gzmd, Gzme, and Gzmg) is controlled by miR‐31, as revealed by T cell–specific deletion of this miRNA.Continuous antigen exposure during chronic infections or in tumors causes a gradual loss of T cell effector function.
,
T cells differentiate into so‐called dysfunctional T cells, which is characterized by the expression of inhibitory receptors, such as PD‐1, LAG‐3, and TIM‐3. Recently, miRNAs have also been implicated in this process. For example, during chronic LCMV infection, miR‐31‐deficient T cells are superior in controlling the infection compared to their WT counterparts, which is concomitant with reduced expression levels of PD‐1.
Conversely, miR‐155 expression in T cells is required to sustain long‐term CD8+ T cell responses during a chronic LCMV infection.
Lastly, TGF‐β induced expression of miR‐23a in tumor‐infiltrating CD8+ T cells has detrimental effects on T cell effector function.
miR‐23a represses the expression of the transcription factor BLIMP‐1, which in turn promotes effector cell differentiation and the expression of cytotoxic molecules like Granzyme B.Although the functional role of miRNAs is widely studied in T cells, their exact binding characteristics are yet to be determined. Early reports indicated that seed pairing in itself may not be a reliable predictor of miRNA‐mRNA target interactions.
In fact, miRNA‐mRNA interactome studies have revealed that 60% of the miRNA binding could not be explained by a perfect match between the seed and seed‐match sequence.
,
Indeed, multiple non‐canonical miRNA‐binding sites have been identified, with the majority of these sites sharing extensive complementary sequences with the mRNA with a mismatched or bulged nucleotide
,
,
(Figure 4A). Accumulating evidence suggests that these non‐canonical miRNA‐binding sites are functionally relevant.
,
Furthermore, a recent study performing an individual‐nucleotide resolution cross‐linking immunoprecipitation (iCLIP) with the miRNA interacting protein Argonaute in C elegans to identify miRNA binding genome wide reveals that seed pairing alone may not be sufficient for mRNA specificity.
Rather, a subset of miRNAs may require additional pairing with the 3′ region of the miRNA
(Figure 4A). This additional pairing may define the affinity of miRNAs to the target sequences, as recently reported in an elegant screen.
Interestingly, miRNA‐binding sites proximal to the poly(A) tail or the stop codon in the mRNA are more likely to induce miRNA‐mediated repression of gene expression.
Thus, whereas the role of miRNA in T cell differentiation and activation is established, the sequence specificity of miRNAs and their target identification requires further investigation.
RNA‐binding proteins
RNA‐binding proteins (RBPs) are also key regulators of the fate of RNA. RBPs define the process of RNA splicing, the transport of RNA to different subcellular locations, and they determine the stability of mRNA, among other features. RBPs also define the translational output of an mRNA.
,
,
To exert their regulatory function, RBPs bind to RNA in a sequence‐specific manner. These binding hubs can be linear sequences or stem‐loop regions. A well‐described linear RBP binding hub is the AREs. AREs are evolutionary conserved sequences that were identified 35 years ago in the 3′UTR of TNF and of other pro‐inflammatory cytokines.
The importance of ARE‐mediated regulation was later evidenced by germ‐line deletions of the AU‐rich element‐containing region in the Tnf and Ifng 3′UTR in mice, which resulted in hyper‐production of these cytokines, and consequentially hyperinflammation and immunopathology
,
(Figure 4B). CRISPR‐Cas9‐mediated removal of AREs in primary human T cells also increased the production of IFN‐γ.
AREs are found in thousands of mRNAs, are highly conserved, and are localized in the 3′UTRs of mRNAs.
Interestingly, recent studies have also indicated a high prevalence of intronic AREs, which points to yet another layer of regulation of the mRNA levels and thus of the protein expression.Several RBPs can interact with AREs. ARE‐binding proteins that interact with AREs through their tandem zinc fingers are the tris‐tetrapolin family members TTP, ZFP36L1, and ZFP36L2.
HuR/ELAV1, TIA, TIAR, and AUF1 can also interact with AREs through the so‐called RNA recognition motifs (RRMs), even though their affinity to U‐rich sequences is generally considered higher.
,
The critical contribution of RBPs in regulating the gene expression of immune cells was revealed in mice that lacked individual RBPs. Deficiency of TTP family members in thymocytes leads to hyperinflammation, and double deficiency for ZFP36L1/ZFP36L2 drives the development of leukemia.
,
The prime role of TTP family members has been attributed to RNA stability, yet more recent studies have also highlighted other mode of actions such as control of translation and of subcellular mRNA localization.
,
Likewise, germ‐line deficiency of HuR impairs the survival of progenitor cells in mice, including that of hematopoietic cells.
T cell–specific deficiency of HuR in mice results in impaired thymocyte maturation.
In mature mouse T cells, HuR stabilizes cytokine and transcription factor mRNA upon activation, among other genes.
,
,Notably, the mode of action of RBPs can be cell‐type specific. For instance, TTP is a key regulator of TNF production in macrophages.
In T cells, we and others observed a critical role of its family members ZFP36L1/ZFP36L2 in the production of TNF, and of IFN‐γ.
,
,
Intriguingly, even though the production of TNF in macrophages is regulated through mRNA stability modulated through AREs,
we failed to observe stabilization of Tnf mRNA in activated murine T cells.
Not only the cell type but also the differentiation status defines which post‐transcriptional event pre‐dominantly happens through sequence recognition of AREs. Whereas in memory T cells the ready‐to‐deploy Ifng and Tnf mRNA is blocked for translation by ZFP36L2 in the absence of reactivation,
in activated T cells AREs primarily govern Ifng RNA stability and translation efficiency.RBPs not only interact with linear sequences but also with structural motifs for RNA binding (Figure 4C). A recent study that mapped RBP interactions with RNA in a cell‐free RBP‐RNA interaction system found that 70% of the 950 tested RBPs interacted with linear sequences, and that the remaining 30% bound to RNA structures.
Examples of RBPs that interact with stem loops and that modulate gene expression in T cells are the Roquin, Regnase, and ARID5a proteins. Roquin was first identified through genetic screens in mice, where a point mutation in the RNA‐binding motif in T cells resulted in autoimmunity in mice.
Its importance in controlling inflammation was later confirmed in human.
Roquin 1 and 2 deficient mouse models revealed that Roquin blocks the expression of costimulatory molecules in CD4+ T cells and thereby regulates T‐helper cell differentiation.
,
Roquin interacts for instance with constitutive decay element (CDE) motifs present in the 3′UTR of TNF‐α, in addition to U‐rich hexaloop motifs that are present in the 3′UTR of the costimulatory molecule OX‐40.
,
Of note, Roquin not only drives mRNA degradation, but it can also block the translation of its target genes.Also, the CCCH‐type zinc finger protein Regnase 1 (ZC3H12A) interacts with target mRNA by binding to stem loops in the 3′UTR of for instance IL‐6, IL‐17 receptor and the transcription factor NFkBIZ.
,
Recent structural analysis has revealed that the Regnase family utilizes unique features in the CCCH domain to interact with RNA,
which induces degradation through its ribonuclease activity,
,
in addition to translational silencing.
Intriguingly, Regnase 1 and Roquin act non‐redundantly, as double deficiency results in synergistic effects in Th1 differentiation in mice, indicating sequence specificity of these two RBPs.
Regnase 1 is implicated in the differentiation of CD8+ T cells, and deletion thereof results in higher efficiency of Chimeric Antigen Receptor T cells (CAR‐T cells) in a murine acute lymphoblastic leukemia (ALL) model.
,
Regnase 4 (ZC3H12D) modulates cytokine production in human T cells, also by dampening cytokine mRNA levels.The RBP Arid5a, originally described as a transcription factor, translocates to the cytoplasm in Lipopolysaccharide‐activated RAW264.7 macrophage cells.
Arid5a interacts with its target mRNAs with helix‐turn‐helix Arid motif, which binds to stem loops.
The interaction of Arid5a with mRNA results in stabilization and thus an increased protein output of the target mRNAs, such as IL6 mRNA in macrophages, and Stat3 mRNA in T cells.
,
Interestingly, Arid5a competes with Regnase 1 for interacting with the stem loops (Figure 4D). Thus, Arid5a and Regnase 1 counterbalance each other's activity on their shared target mRNAs.In the past years, great efforts have been undertaken to create transcriptome‐wide RBP interaction maps. Approaching this from a sequence‐centric point of view, comprehensive maps of RBP binding motifs have been provided for linear sequences and for stem loops.
,
,
,
,
Intriguingly, recent screens for RBP interaction with target sequences reveal a limited repertoire of recognition motifs for RBPs. Of the 78 human RBPs tested in in vitro binding essays, high overlap of sequence specificity is reported.
Surprisingly, this overlap is independent of the RNA‐binding domain present in the RBPs.
It is important to note that RBPs bind to only a subset of cognate motifs in expressed transcripts, and the additional requirements for target interaction are yet to be determined.
For instance, RBP affinity to sequences and structures may differ due to mRNA chemical modifications or due to structural changes of the mRNA as discussed above, or by post‐translational modifications of the RBP itself in response to external stimuli. Another layer of complexity of identifying sequences for RNA‐RBP interactions is that RBPs can contain several RNA‐binding domains (RBD) and thus interact with several motifs. In addition, promiscuous interactions with several motifs have been observed for RBPs that express only one RBD.
RBP interaction with target mRNA may thus at least in part be determined by contextual features, such as structure and additional motifs.From the RBP‐centric view, an RBP interaction map in Jurkat cells was generated with enhanced RNA interactome capture (eRIC), and recently also in CD4+ T cells with the orthogonal organic phase separation (OOPS).
,
These data sets revealed the identity of proteins that act as RBP in T cells. Yet, it does not reveal the sequence specificity of RBPs. To identify sequence specificities for individual RBPs, iCLIP and eCLIP strategies in addition to others have been developed.
,
Recently, a comprehensive compilation of 150 RBP interaction maps were generated with eCLIP and were added to the Encode project.
,
These large‐scale efforts form the basis of our understanding of RBP interactions with target RNA. Nevertheless, because the eCLIP studies have been performed in cell lines, they may not directly translate into the binding landscape of RBPs in primary immune cells. They may in particular deviate in their binding (and thus their mode of action) when T cells undergo dynamic changes, such as upon differentiation or activation.Because RBP activity depends on the cellular context, we have recently mapped the overall RBP expression from RNAseq and MS data sets throughout human B cell and T cell differentiation.
Integrating this data set with the eCLIP data, the eRIC/OOPS data together with the RBP motif maps should further substantiate the RBP expression and sequence binding specificity, in particular when integrated with data sets from different activation statuses. In conclusion, not only the interaction with target RNAs but also the role of RBPs appears highly context dependent.
SUBCELLULAR LOCALIZATION OF mRNA AS REGULATOR OF GENE EXPRESSION
Appropriate cellular function requires well‐organized spatial organization within cells. The distribution of most proteins within cells is in fact not uniform, but rather compartmentalized and/or enriched in specific structures.
,
To achieve this spatial distribution, some proteins are for instance marked by a signal peptide that is located at the N‐terminus and that directs the protein to the intracellular protein transport machinery.
In addition to the protein sequence‐mediated localization, the mRNA itself contains information for the target compartment and defines the ultimate protein localization. For example, Nanos and Oskar mRNA localization ensures localized protein production which defines the proper anteroposterior pattern in the Drosophila embryo.
Likewise, in Xenopus laevis, Vg1 mRNA localization in the vegetal pole is key for the oocyte polarization.
Subcellular mRNA localization also dictates the cell motility in primary chicken embryo fibroblasts.
In neurons, where the cellular body can span over 1 meter in length, the presorting of mRNA to the dendrites allows neurons to rapidly produce and release neurotransmitters in response to external stimuli at a specific location.
,To date, the mechanisms that define the subcellular localization of mRNA are not well understood. The development of single‐molecule FISH (smFISH) was a critical step forward to visualize the localization of mRNA within cells.
,
Other recent studies have mapped the subcellular localization of RNA genome wide in HEK293T cells with APEX‐seq, that is, proximity labeling and subsequent subcellular fractionation and sequencing.
In addition, machine learning methods allowed to decipher a part of the ZIP‐code system responsible the subcellular localization of mRNAs in human HEK293T cells.In immune cells, the subcellular mRNA localization is not well understood. It may nonetheless be critical in several regulatory processes. For instance, the preformed cytokine mRNA in memory T cells that allows for rapid recall responses is blocked from translation to prevent unwanted production.
,
The block of translation may be in part imposed by subcellular localization and/or membrane‐less structures such as stress granules (SGs), processing bodies (P‐bodies), or the recently described TIS granules (named after the RBP TIS11B/ZFP36L1
). Such membrane‐less structures can be induced as part of the stress response. Upon activation, lymphocytes undergo a complex transcriptional and translational program that is sensed as physiological stress. As a result, the integrated stress response (ISR) is triggered to regulate mRNA translation and to preserve the encoded mRNA from degradation.
ISR induces the localization of specific mRNAs in RNA‐protein aggregates in SGs.
,
In activated human T cells, PDCD1 transcripts encoding PD‐1 and other immune checkpoint‐related genes such as CTLA4, LAG3, TIM3, TIGIT, and BTLA transcripts accumulate in SGs (Figure 5A), which limits the production of these immune‐inhibitory receptors.
In primary murine B lymphocytes, activation triggers the localization of transcripts of the pro‐apoptotic gene p53 to SGs to induce translation silencing
(Figure 5A).
FIGURE 5
mRNA sequence regulates its subcellular localization. (A) The production of p53 and PD‐1 is regulated in two ways: under steady‐state conditions, the mRNA is blocked from nuclear export. In activated B cells, p53 mRNA translocates from the nucleus to stress granules (SGs), which results in translational silencing. In activated T cells, PDCD1 mRNA (encoding PD‐1 protein) interacts with G3BP1 proteins via its 3′UTR inducing formation of SGs, where translation is repressed. (B) The CD47 mRNA containing the short 3′UTR (CD47‐SU) produces CD47 protein that is localized on the endoplasmic reticulum (ER). Conversely, the CD47 mRNA containing the long 3′UTR (CD47‐LU) results in RNA localization within TIS granules, and in efficient protein assembly on the cell surface. (C) Sequences in the 3′UTR (ZIP Code) are the prime sequences to drive transcript localization to eight subcellular compartments between mitochondria, cytoplasm, and nucleus
mRNA sequence regulates its subcellular localization. (A) The production of p53 and PD‐1 is regulated in two ways: under steady‐state conditions, the mRNA is blocked from nuclear export. In activated B cells, p53 mRNA translocates from the nucleus to stress granules (SGs), which results in translational silencing. In activated T cells, PDCD1 mRNA (encoding PD‐1 protein) interacts with G3BP1 proteins via its 3′UTR inducing formation of SGs, where translation is repressed. (B) The CD47 mRNA containing the short 3′UTR (CD47‐SU) produces CD47 protein that is localized on the endoplasmic reticulum (ER). Conversely, the CD47 mRNA containing the long 3′UTR (CD47‐LU) results in RNA localization within TIS granules, and in efficient protein assembly on the cell surface. (C) Sequences in the 3′UTR (ZIP Code) are the prime sequences to drive transcript localization to eight subcellular compartments between mitochondria, cytoplasm, and nucleusmRNAs can also be localized in P‐bodies, where they are degraded or stored.
Of note, the localization to these membrane‐less organelles is not mutually exclusive, as observed for instance for the mRNA encoding for the costimulatory molecule ICOS, which resides not only in SGs but also in PBs in mouse CD4+ T cells.
Another type of membrane‐less structures coined TIS granules were recently described in HEK293T cells.
TIS granules involve the RBP ZFP36L1, which form RBP‐RNA complexes with ARE‐containing transcripts, and the association of TIS granules with the endoplasmic reticulum facilitates the process of translation.
This selected translation is for instance observed for the mRNAs encoding the “don't eat me” signal CD47, and the immune checkpoint molecule Programmed Death Ligand 1 (PD‐L1).
Whether TIS granules are also formed in lymphocytes, and whether a similar selection of mRNAs to these granules occurs in T cells is yet to be determined.Not all segments of the transcript contribute equally in the choice of subcellular localization.
In fact, the coding sequence and the 3′UTR appear to play a more important role in defining the transcript localization than the 5′UTR.
For instance, in HEK293T cells, the CDS has been predicted to be a key feature for the transcript localization on the outer mitochondrial membrane (OMM), together with 3′UTR.
Proteins involved in mitochondrial function are encoded in the nucleus and further transported to the mitochondria. However, the transcript itself could translocated to the OMM where translation occurs.
Especially, CDS carrying G/U‐rich 6‐mer sequences appear to have a pivotal role for the transcript localization to the OMM. Also, the splicing and GC content of the coding sequence affects the subcellular localization. In Hela cells, unspliced transcripts with a high GC content are enriched in the cytoplasmic fraction compared to the nuclear fraction, suggesting that a high GC content increases the localization to the cytoplasm.
,
Furthermore, transcripts with high GC content are enriched in P‐bodies of human cell lines.Even more studies elucidate the importance of the 3′UTR in the localization of an mRNA. For instance, to associate with TIS granules, the mRNA requires ARE sequences in its 3′UTR.
Intriguingly, for the CD47 transcript, the localization to TIS granules depends on the 3′UTR isoform. CD47 mRNA comes in two isoforms, of which one contains a long‐3′UTR with 19 ARE motifs and the other one a short 3′UTR with only 4 ARE motifs.
Although the short CD47 mRNA is randomly distributed in the ER, the long isoform localizes specifically in TIS granules
(Figure 5B). Notably, this different usage of 3′UTR isoforms also leads to a different protein localization: Protein translated from the long‐3′UTR isoform localizes to the cell membrane, whereas the protein translated from the short‐3′UTR isoform localizes to the endoplasmic reticulum.Several studies have already pointed out that longer 3′UTRs are associated with transcript accumulation in SGs.
Indeed, PDCD1 transcripts (encoding the immunosuppressive protein PD‐1) contain a 3′UTR of 1,177 nucleotides, which is 1.5 times the size of the median 3′UTR length in the human genome. The length of the 3′UTR appears to be a common feature between transcripts required for SG localization during human T cell activation.The 3′UTR also contains motifs that function as a ZIP‐code for the correct mRNA localization
(Figure 5C). The 3′UTRs of cytokine transcripts carry ARE motifs, which determine their cytoplasmic localization. When a T cell becomes activated, the RBP HuR interacting with AREs then supports the shuttling of the mRNA from the nucleus to the cytosol.
Similarly, U‐rich elements can define the localization of mRNAs through interaction with the RBP TIA1, as was described for the localization of the p53 transcript to SGs during B lymphocyte differentiation.
Thus, subcellular localization is defined by the sequence and the RBPs interacting with the sequences. However, the full potency of the zip‐code system for mRNA localization/sorting remains to be revealed, in particular in the context of T cells.
SEQUENCE MODIFICATION AS POST‐TRANSCRIPTIONAL REGULATOR
Post‐transcriptional events occur by the interplay of regulators (miRNA, RBPs, etc) and sequence motifs, and the sequence characteristics. Yet another mode of regulating the fate of mRNA is by altering the mRNA sequence and properties. The mRNA sequence can be chemically modified. It can also undergo alternative splicing to include or exclude specific sequences. These modifications substantially influence the actions of post‐transcriptional events. In this part, we discuss the effect of the sequence modifications and the underlying regulatory mechanisms in the context of T cells.
Nucleotide modifications in RNA
To influence the fate of mRNA, nucleotides can be edited and chemically modified. For example, a small number of mRNAs can be edited by the ADAR family enzymes, which replace adenosine with inosine (A‐to‐I editing)
(Figure 6A). Inosine, a nucleotide interpreted as guanine by the translation machinery, then leads to alterations in the protein coding sequence. Even though the underlying effects of this A‐to‐I editing in T cells remain to be elucidated, the importance thereof has been demonstrated by genetic deletion of ADAR1 gene in T cells, which leads to autoimmunity and loss of thymic self‐tolerance in mice.
FIGURE 6
Sequence modification as post‐transcriptional regulation. (A) Schematic representation of adenosine to inosine (A‐to‐I) RNA modifications by the ADAR enzyme family. Inosine is interpreted as guanine by the translation machinery, thereby A‐to‐I editing alters the coding sequence (re‐coding event). (B) Schematic representation of the mRNA chemical modifications 7‐Methylguanosine (m7G; mRNA 5′cap modification), N6‐methyladenosines (m6A), pseudo‐uridine (Ψ), and 5‐methylcytosine (m5C). (C‐D) Alternative splicing event can modify the sequence of an mRNA by including or excluding stretches of sequence. This includes (C) alternative polyadenylation and (D) intron retention. (C) Alternative polyadenylation site (PAS) usage in a schematic representation of Stat5b pre‐mRNA containing 2 PAS site, and the long (using the distal PAS) and short (using the proximal PAS) Stat5b mRNA. The short Stat5b mRNA isoform lacks the miRNA and RBP hub found in the longer isoform. (D) Schematic representation of the retention of intron 3 of CXCL2 gene into CXCL2 mRNA leading to decreased protein expression. Colored arrows indicate increase (blue) or decrease (red) in the indicated parameter
Sequence modification as post‐transcriptional regulation. (A) Schematic representation of adenosine to inosine (A‐to‐I) RNA modifications by the ADAR enzyme family. Inosine is interpreted as guanine by the translation machinery, thereby A‐to‐I editing alters the coding sequence (re‐coding event). (B) Schematic representation of the mRNA chemical modifications 7‐Methylguanosine (m7G; mRNA 5′cap modification), N6‐methyladenosines (m6A), pseudo‐uridine (Ψ), and 5‐methylcytosine (m5C). (C‐D) Alternative splicing event can modify the sequence of an mRNA by including or excluding stretches of sequence. This includes (C) alternative polyadenylation and (D) intron retention. (C) Alternative polyadenylation site (PAS) usage in a schematic representation of Stat5b pre‐mRNA containing 2 PAS site, and the long (using the distal PAS) and short (using the proximal PAS) Stat5b mRNA. The short Stat5b mRNA isoform lacks the miRNA and RBP hub found in the longer isoform. (D) Schematic representation of the retention of intron 3 of CXCL2 gene into CXCL2 mRNA leading to decreased protein expression. Colored arrows indicate increase (blue) or decrease (red) in the indicated parameterIn addition to nucleotide editing, the fate of an mRNA can be altered by chemical modification of the nucleotides. The most abundant mRNA modification is the 7‐methylguanosine (m7G) at the 5′ end of an mRNA, forming the so‐called “5′cap”. The 5′cap is co‐transcriptionally added to the pre‐mRNA and protects it from degradation (Figure 6B). The 5′cap then serves as a binding hub for the translation initiation factors and thereby governs translation initiation (reviewed in
).RNA modifications have also been found throughout the mRNA sequence. Several hundreds of RNA modifications have been reported, yet most are observed at very low levels on mRNA.
,
Therefore, the full extent to which the mRNA modifications modify the fate of RNA is not yet known. The most common RNA modifications include N6‐methyladenosine (m6A), pseudo‐uridine (Ψ), and 5‐methylcytosine (m5C).
,
,
These three modifications can affect almost all stages of the life of an mRNA
(Figure 6B). Each modification is conferred to the nucleotide by a set of specific “writer” proteins.
In the case of m6A and m5C, specific “reader” proteins confer the effect on the mRNA.
,
In addition, m6A modifications can be removed by a set of “eraser” proteins.
The intricate interplay between these so‐called readers, writers, and erasers makes this reversible modification of nucleotides a suitable tool for dynamically regulating the mRNA fate, and its translation efficiency.m6A is the most abundant modification within the mRNA sequence and is thought to be added to the mRNA in the nucleus. Genetic studies have shown that deletion of members of all three gene classes, that is, readers, writers, and erasers can be detrimental. For instance, the lack of the m6A reader YTHDF2 renders mice infertile and disturbs the cellular differentiation.
If the m6A methyltransferase Mettl3 is deleted from CD4+ T cells, both homeostatic proliferation and T cell differentiation are impaired.
,
Likewise, deletion of the m6A demethylase AlkBH5 interferes with migration of CD4+ T cells and their capacity to induce neuroinflammation in mice.Importantly, m6A modifications may not only serve for recognition of fate of RNA by the m6A readers alone. It may also substantially alter the affinity for interaction of other RNA‐binding proteins. In fact, m6A methylation inhibits splicing events in C. elegans by interfering with the binding of the splicing factor U2AF35.
Furthermore, the presence of the bulky m6A results in reduced hairpin formation, which can influence the binding of RBPs that recognize specific structures. Indeed, m6A methylation increased the binding of the splicing factor HNRNPC to its target sites in HeLa cells.
It is therefore conceivable that the m6A modification could also modify the interaction of RBPs with their target mRNAs, and thus define the fate of RNA in T cells. Of note, m6A methylation can also influence translation. For instance, the m6A reader YTHDF1 dampens the cross‐presentation of MHC class‐I peptides in dendritic cells by promoting the translation of lysosomal cathepsins, which in turn destroys the protein substrates for peptide generation.
It was recently shown that m6A methylation can also decrease the accuracy of translation by influencing the paring of tRNAs with the codon.Other RNA modifications also show clear effects on the life of mRNA. For example, specific uridines in an mRNA can be modified to its isomer pseudo‐uridine, an almost certainly irreversible modification which alters the mRNA structure and the translation efficiency in cell‐free translation systems, and which increases the mRNA half‐life in HeLa cells.
,
,
Whether and how these RNA modifications alter the mRNA expression and translation in T cells is to date not known, yet conceivable. In conclusion, sequence and structural recognition of RNAs can be substantially influenced by chemical modifications of transcripts.
Alternative splicing
RNA splicing does not only include the mere excision of all introns, with all exons remaining in the transcript to generate the mature mRNA. Rather, a great variation of transcripts—and thus of protein variants—is achieved by a process called alternative splicing. Alternative splicing events result in the use of different sets of exons by exon skipping or alternative exon usage, retention of intron in mRNA, and alternative polyadenylation site usage.
Alternative splicing is a highly dynamic process. It can be regulated throughout the immune lineage maturation,
and by extracellular signals a cell receives.
,
,
Interestingly, single‐cell RNA‐sequencing analyses revealed that alternative splicing is heterogenous in human tumor‐infiltrating T cells (TILs), as revealed by investigating alternative splicing on a single‐cell level.
,
Intriguingly, the level of alternative splicing of the RBP WARS is associated with patient survival in lung adenocarcinoma.
Alternative exon usage
Several examples for alternative splicing in the coding region can be observed during T cell differentiation. This includes for instance the membrane protein CD45R (PTPRC gene). Whereas naive T cells express the protein isoform CD45RA, upon T cell activation this shifts to the alternatively spliced variant CD45RO, which is generated by the splicing factor HNRNPL‐L.
Another isoform of this protein is CD45RABC, also known as B220, which is expressed by B cells. Alternative splicing can also result in alterations in the signaling pathways. This is for instance the case for MAPKK7 (MKK7), a member of the MAP kinase pathway.
,
Upon TCR engagement in human CD4+ T cells, MKK7 undergoes CELF2‐mediated alternative splicing. This non‐canonical MKK7 protein contains one additional binding site for the transcription factor JUNK, and this increased binding of JUNK strengthens the signal of the TCR signaling pathway.
,
Another protein that promotes T cell activation through alternative splicing and that links the TCR signal to downstream signaling pathways is MALT1.
The inclusion of exon 7 in MALT1 upon TCR triggering, controlled by the splicing factor HNRNPU, results in the addition of 11 amino acids in the protein.
This addition increases the scaffolding function of MALT1, and thereby promotes T cell activation.Transcription factors can also be subject to alternative splicing. For instance, the core regulatory T cell (Treg) transcription factor FOXP3 can undergo alternative splicing upon activation of the JAK‐STAT pathway,
resulting in isoforms that lack exon 2, or exons 2 and 7.
Loss of these two exons leads to loss of the DNA‐binding domain and the suppressor region mediating transcriptional‐repressor function of FOXP3, which abrogates the suppressive function of Tregs.
Lastly, the membrane‐bound form of the Fas receptor CD95/FAS is induced upon IL‐7 receptor triggering in human CD4+ T cells due to the inclusion of exon 6.
In contrast to the soluble CD95 isoform which arises from exon skipping of exon 6, the membrane‐bound CD95 protein could signal and synergize with IL‐7 receptor signaling to promote survival.Intriguingly, alternative splicing not only occurs in the coding region. It can also result in non‐canonical UTRs, leading to the loss or addition of regulatory sequences.
For instance, alternative splicing in the 3′UTRs was shown to modulate the mRNA half‐life, the protein output, and even subcellular localization, as exemplified by CD47 in various mammalian cell lines
(Figure 5B). Alternative splicing of the 3′UTR can also reduce the miRNA‐mediated regulation. A bioinformatic study revealed that as much as 30% of genes can shed miRNA target sequences in their 3′UTR by alternative splicing, thereby escaping miRNA‐mediated regulation.Alternative 3′UTRs can also alter the mRNA stability.
,
For example, the key apoptotic regulator BCL2 comes in two isoforms, containing a long (BCL2α) and a short (BCL2β) alternatively spliced 3′UTR. Yet, only BCL2α is sensitive to TRA2β RBP‐mediated degradation favoring the translation of the BCL2β isoform.
While we start to understand the effect of alternative 3′UTR splicing in many cell types, its functional effects in T cells are yet to be unraveled.
Alternative polyadenylation
Alternative polyadenylation (APA) is a form of alternative splicing where the polyadenylation tail is transferred onto polyadenylation sites (typically AAUAAA). While APA is sequence‐encoded, the usage of the polyadenylation sites is dynamically regulated by RBPs.
The usage of the polyadenylation site depends on the state of the cell, that is, proliferation,
,
activation,
,
,
or the infection status.
,
APA can take place both in exonic and in intronic regions.
While exonic APA alteration will modify the properties of the 3′UTR, intronic APA can lead to the truncation of proteins.
For example, APA‐mediated protein truncation has been associated with aberrant tumor suppressor proteins in chronic lymphocytic leukemia (CLL), thereby contributing to the disease development.APA is not only observed in disease but also occurs during B cell and T cell activation and differentiation.
,
In activated T cells, APA leads to shorter 3′UTRs.
,
,
Whereas the 3′UTR shortening does not correlate with alterations in the overall protein levels of cells,
it showed impacts on the expression of specific targets.
,
For example, the APA‐mediated 3′UTR shortening of the transcription factor STAT5B occurs upon activation, which increases the protein levels in Type‐1‐helper (Th1) CD4+ T cells
(Figure 6C). Similarly, 3′UTRs of the transcription factor NF‐ATc and of the surface glycoprotein CD5 are shortened upon T cell activation, which promotes their protein accumulation.
,It is conceivable that at the global level, APA‐mediated 3′UTR shortening ablates not only the interaction of negative regulators of PTR but also that of positive regulators of RNA stability and/or translation efficiency. These more subtle effects of APA‐mediated 3′UTR shortening cannot be well captured when the data analysis is performed with linear models.
Revealing such important but subtle differences may thus require more sophisticated data analysis tools such as recently developed machine learning algorithms for clinical data.
mRNA intron retention
Another consequence of alternative splicing is the retention of specific intronic sequences in the mature mRNA. This intron retention allows for the inclusion of additional sequences into the mRNA. Whereas the full scale of the functional consequences is yet to be determined, intron retention in an mRNA can lead to nuclear retention by preventing nuclear export, to non‐sense mediated decay, to modulation of protein level, and to altered localization of the mRNA.
In mouse granulocytes and human monocytic cell lines, intron retention results in altered protein expression levels.
,
For instance, intron retention of CXCL2 intron 3 leads to the nuclear retention of the CXCL2 mRNA, thus impairing the protein production
(Figure 6D). Another effect of intron retention can be the localization of proteins in primary rat neuron dendrites, as described for FRMP protein, an RBP that is critical for the cognitive development.
In B cells, intron retention correlates inversely with the proliferative state of B cells.
The underlying mechanisms of this effect are, however, yet to be determined. Several studies also investigated the intron retention profile of human T cells.
,
,
As much as 15.4% of the RNA‐sequencing reads map to retained introns in rested CD4+ T cells compared to 7.3% after 18h of activation, clearly demonstrating a reduction of intron retention upon T cell activation.
The functional consequence of intron retention in T cells remains to date unexplored.Recently, naive and memory T cells were found to keep “preformed” mRNA pools that are translationally repressed, and which can be rapidly translated upon activation.
,
Whether some transcripts of this pool of preformed “ready‐to‐go” mRNA pool also make use of intron retention to prevent the translation of mRNA remains to be determined. First indications come from a macrophage cell line where activation results in the removal of introns in transcripts like CXCL2 and NFKBIZ RNAs, thereby allowing for their timely translation to occur.
Likewise, in murine CD4+ T cells, the nuclear TNF pre‐mRNA is cleaved upon T cell activation.
It then migrates to the cytoplasm, allowing rapid TNF‐α translation to occur prior to the onset of de novo transcription.
Whether this is due to intron retention or solely through splicing out inhibiting pseudoknots in the 3′UTR
is yet to be determined. It also remains unknown which splicing factor is responsible for this activation‐induced intron removal of pre‐mRNAs. In summary, alternative splicing and intron retention allow for dramatically different regulation of mRNA and protein expression. How these processes are regulated is still unknown and still much is to learn.
CONCLUDING REMARKS
In this review, we have summarized how sequence determinants define the gene expression in T cells. It is important to note that we are only at the beginning of our understanding of the role of sequence determinants in PTR. The sequence determinant for a given interaction of RBPs or miRNA is context dependent and may have to be further validated. For instance, the surrounding sequences of a sequence determinant can influence the RNA structure and thus the accessibility of a given motif. In addition, the combination of neighboring motifs within a gene body may dictate the affinity for a given RNA‐binding factor, which either attracts or repels the interaction between the RNA and the binding factor. Lastly, the relative expression of the RNA binders may also be a determinant, which sequence determinant is dominant in defining the fate of mRNA. We and others showed that RBPs are expressed in a cell type–specific manner and are subject to changes upon differentiation and activation.
,
Nonetheless, integrating mRNA and protein expression measurements with the interplay of sequence determinant will help to further unravel the post‐transcriptional control of gene expression. In addition, more precise determination of amino acid levels and tRNA abundance should help to improve our capacity to crack the code that fine‐tunes protein expression.One could even speculate that the integration of sequence determinants into a complex mathematical model could help predict protein levels from sequence alone with little information on gene expression levels. Such a model could for instance be used to predict the actual protein make‐up of cells from low input transcriptomic data of rare cell types. Another potential use of predicting gene expression with sequence determinants is the optimization of protein production. Codon usage has already been successfully employed to enhance protein production. Integrating also other features into algorithms could possibly improve such gene expression level predictions. This could not only be beneficial for large‐scale in vitro protein production but also for the design of efficient DNA or RNA vaccines.
CONFLICT OF INTEREST
Authors declare no conflict of interest.
AUTHOR CONTRIBUTIONS
BPN and MCW designed the review. BPN, NDZ, MVL, and MCW wrote the review. NDZ and MVL contributed equally to this work. MCW supervised the project.
Authors: Young Ho Ban; Se-Chan Oh; Sang-Hwan Seo; Seok-Min Kim; In-Pyo Choi; Philip D Greenberg; Jun Chang; Tae-Don Kim; Sang-Jun Ha Journal: Cell Rep Date: 2017-09-12 Impact factor: 9.423
Authors: Vinod S Ramgolam; Scott D DeGregorio; Gautham K Rao; Mark Collinge; Sharmila S Subaran; Silva Markovic-Plese; Ruggero Pardi; Jeffrey R Bender Journal: PLoS One Date: 2010-12-29 Impact factor: 3.240
Authors: Michael J Moore; Troels K H Scheel; Joseph M Luna; Christopher Y Park; John J Fak; Eiko Nishiuchi; Charles M Rice; Robert B Darnell Journal: Nat Commun Date: 2015-11-25 Impact factor: 14.919