Poly(A) tails are important elements in mRNA translation and stability, although recent genome-wide studies have concluded that poly(A) tail length is generally not associated with translational efficiency in nonembryonic cells. To investigate whether poly(A) tail size might be coupled to gene expression in an intact organism, we used an adapted TAIL-seq protocol to measure poly(A) tails in Caenorhabditis elegans. Surprisingly, we found that well-expressed transcripts contain relatively short, well-defined tails. This attribute appears to be dependent on translational efficiency, as transcripts enriched for optimal codons and ribosome association had the shortest tail sizes, whereas noncoding RNAs retained long tails. Across eukaryotes, short tails were a feature of abundant and well-translated mRNAs. This seems to contradict the dogma that deadenylation induces translational inhibition and mRNA decay and suggests that well-expressed mRNAs accumulate with pruned tails that accommodate a minimal number of poly(A)-binding proteins, which may be ideal for protective and translational functions.
Poly(A) tails are important elements in mRNA translation and stability, although recent genome-wide studies have concluded that poly(A) tail length is generally not associated with translational efficiency in nonembryonic cells. To investigate whether poly(A) tail size might be coupled to gene expression in an intact organism, we used an adapted TAIL-seq protocol to measure poly(A) tails in Caenorhabditis elegans. Surprisingly, we found that well-expressed transcripts contain relatively short, well-defined tails. This attribute appears to be dependent on translational efficiency, as transcripts enriched for optimal codons and ribosome association had the shortest tail sizes, whereas noncoding RNAs retained long tails. Across eukaryotes, short tails were a feature of abundant and well-translated mRNAs. This seems to contradict the dogma that deadenylation induces translational inhibition and mRNA decay and suggests that well-expressed mRNAs accumulate with pruned tails that accommodate a minimal number of poly(A)-binding proteins, which may be ideal for protective and translational functions.
During transcriptional termination, the majority of eukaryotic mRNAs undergo polyadenylation, resulting in a 3’ tail estimated to contain ~90 (yeast) or ~250 (animals) adenosines[1]. The poly(A) tail has been shown to be important for protection and translation of the mRNA[2,3]. These roles are largely mediated by poly(A) binding proteins (PABPs), which coat the tail[1]. The direct interaction of PABP with the 5’ cap binding complex factor eIF4G is thought to promote mRNA stability and translation by supporting formation of the closed-loop state[1-3]. Conversely, PABP also binds deadenylation complexes (CCR4-NOT-Tob and PAN2-PAN3) and contributes to microRNA-mediated repression[4-Biochim Biophys Acta. 2014 ">6]. These seemingly contradictory roles of PABP suggest that poly(A) tail length and, hence, the number of bound PABPs might determine mRNA fate.In early embryos and other cellular contexts, regulated cytoplasmic polyadenylation lengthens the tails of select mRNAs, resulting in their translational activation[7]. Yet, recent studies that measured poly(A) tails of individual transcripts genome-wide did not identify a general association between tail size and translational efficiency in most somatic cells[8-10]. Only transcripts containing poly(A) tails shorter than 20 nt were found to have reduced translational efficiency in cultured cells[9]. Consistent with single gene studies showing the importance of tail length and translation in early embryogenesis[7], recent genome-wide analyses of poly(A) tails in frog, zebrafish and Drosophila early embryos confirmed a positive correlation between tail length and translational efficiency in pre-gastrulation stages[10-12]. Since cellular context can regulate poly(A) size and function[7], we asked if tail length was associated with stability and translation of mRNAs in an intact animal. To do this, we profiled poly(A) tails in Caenorhabditis elegans worms and utilized available datasets to probe for relationships between tail size and gene expression in this organism, as well as in other eukaryotes.
RESULTS
The C. elegans poly(A) profile
Two distinct high-throughput sequencing methods have been developed to assay global poly(A) tail sizes: TAIL-seq[8] and PAL-seq (poly(A)-tail length profiling by sequencing)[10]. We adapted the TAIL-seq protocol to analyze poly(A) tails in C. elegans because it utilizes a standard and direct sequencing platform. However, the TAIL-seq method relies on costly bead-based ribosomal RNA (rRNA) removal procedures that are ineffective or unavailable for many organisms, including C. elegans. Therefore, it was necessary to modify TAIL-seq to minimize contamination by rRNAs. Inspired by the PAL-seq method, we used a splint ligation approach, in which a DNA oligo bridges the last 9 adenosines of the poly(A) tail and the 3’adaptor, greatly favoring the ligation reaction of poly(A)+ RNAs over non-adenylated transcripts (Fig. 1a). This adapted TAIL-seq method produces reliable and reproducible libraries (Supplementary Fig. 1a–c), requires less starting material, and can be readily applied to measure poly(A) tails in any organism. Since our adaptation is very similar to the recently published mTAIL-seq (mRNA-TAIL-seq) method[12], we will also refer to it as mTAIL-seq (see Online Methods for our protocol).
Figure 1
The C. elegans poly(A) profile. (a) Outline of the adapted mTAIL-seq procedure. A splint oligo is used to select for polyadenylated RNAs and exclude other RNA contaminants. (b and c) Global size distribution of C. elegans poly(A) tails measured by mTAIL-seq (b) and bulk poly(A) labeling (c). (d) Distribution of median poly(A) tail-length per gene (n = 13,601 protein coding genes). Genes with a median tail ≤ 70 nt were categorized as short-tailed (n = 3,570), genes with a median tail >70 and ≤ 94 nt were categorized as medium-tailed (n = 6,648) and genes with a median tail > 94 nt were categorized as long-tailed (n = 3,383). (e) Functional annotations (Gene Ontology terms) significantly enriched for genes with short or long tails. The colored bars represent the percent of members in each tail-length category. (f) Tissue enrichment profiles for genes with short, medium or long tails. ▲ significant enrichment; ▼ significant depletion for a tissue category (p<0.01, Fisher test). Poly(A) tail measurements, DAVID Gene Ontology Analysis, and tissue enrichment analysis for C. elegans transcripts are available in Supplementary Data Set 1.
We used mTAIL-seq to investigate the poly(A) tail lengths of transcripts produced during the last larval stage of worm development (L4). We found that 90% of all individual mRNA molecules have tail lengths between 26 and 132 nucleotides (nt) and the median overall poly(A) length is 57 nt (Fig. 1b and Supplementary Data Set 1). These sizes are comparable to the bulk tail lengths measured in mammalian[8-10] and DrosophilaS2 cells[10]. Interestingly, the most abundant species of polyadenylated mRNAs were 33–34 nt (Fig. 1b), which is close to the reported 25–30 nt footprint for a single PABP[13-15]. Additionally, we observed a phasing pattern with peaks at the poly(A) sizes expected to occur with serial binding of PABP (Fig. 1b), suggesting removal of unprotected 3’ adenosines. Furthermore, the sharp drop in frequency of mRNAs with tail-lengths under 30 nt indicates that the minimal tail length required for stability corresponds to the size of one PABP footprint. We validated this phasing pattern with a ~34 nt peak by direct labeling and visualization of bulk poly(A) tails from total C. elegans RNA (Fig. 1c), which was consistent with previous poly(A) profiling of nematode RNA by this method[16].The mTAIL-seq method allowed us to analyze the tail distributions and median tail lengths of 13,601 protein coding gene transcripts with 10 or more poly(A) measurements. Within this comprehensive dataset, the most frequent median poly(A) length was 82 nt, with 90% of mRNAs having median tails ranging between 53 and 115 nt (Fig. 1d). To investigate if there were functional classes of genes that tended to have longer or shorter poly(A) tails, genes were sorted according to their median tail lengths. We classified the quartiles of genes with the shortest (short: ≤ 70 nt) and longest (long: > 94 nt) median poly(A) tails (Fig. 1d) and searched for enriched gene ontology (GO) terms within each category (Supplementary Data Set 1). Short-tailed transcripts were highly enriched for genes involved in translation, nucleosome components, and cuticular collagens (Fig. 1e). Conversely, long-tailed transcripts were enriched for genes with regulatory functions, such as transcription factors, signal transduction proteins, mediators of neuronal activity, and hormone receptors (Fig. 1e). The observation that the long-tailed category was enriched for genes associated with neuronal functions prompted us to investigate the relationship between tissue-specific expression[17] and poly(A) length. Remarkably, many long-tailed genes were specific to neurons, whereas short-tailed transcripts were enriched for genes with germline and muscle expression (Fig. 1f). Binning of transcripts based on predicted PABP occupancy produced similar results (Supplementary Fig. 1d–e). For example, 68% of “ribosome” genes have median tail lengths expected to bind 1–2 PABPs.
Highly expressed mRNAs have short poly(A) tails
As shortening of the poly(A) tail is usually associated with mRNA destabilization[2,3], we were surprised to find that short-tailed transcripts were enriched on highly expressed genes, such as those encoding ribosomal proteins (Fig. 1e). However, this pattern would explain the disparity between the median tail of the global mRNA pool (57 nt) and the median poly(A) size per transcript (82 nt). Our analyses indicate that the transcripts associated with short tails are very abundant, thus skewing the global poly(A) profile towards shorter poly(A) lengths (Fig. 1b and d). To compare steady state transcript levels to poly(A) size, we plotted the median tail lengths of mRNAs categorized by relative abundance (Fig. 2a). This analysis revealed that the majority of highly expressed transcripts contained short tails, whereas the least abundant transcripts had longer tail distributions. When we binned genes according to median poly(A) tail lengths, we observed a striking inverse correlation between poly(A) size and transcript abundance (Fig. 2b and Supplementary Table 1). The mRNAs with shorter median poly(A) tail lengths were, on average, much more abundant than those with the longest tails. The only exception was the small group of 33 transcripts with median tails in the 29–35 nt range, where many RNAs likely contain tails too short to accommodate a single PABP and are undergoing active degradation. This strong inverse relationship between tail length and transcript abundance was unexpected, as it is generally thought that longer tails are associated with stable and highly expressed RNAs[2,5,7,18].
Figure 2
Highly expressed mRNAs have short poly(A) tails. (a) Tail-length distribution is different in pools of genes with distinct expression levels. The transcript abundance categories represent the highest expressed genes (n = 500), those closest to the median expression (n = 500), and lowest expressed (n = 500). All three distributions were significantly different (Mann-Whitney U test). (b) Global relationship between poly(A) length and abundance was measured by plotting the mean normalized abundance of bins of genes (n = 13,601 protein coding genes) divided by median tail lengths. (c and d) Heat maps demonstrating the interplay of the frequency of optimal codons (Fop) and tail size with transcript abundance (n = 13,421 protein coding genes) (c) and ribosome enrichment[25] (n = 13,370 protein coding genes) (d). (e) Violin distribution plots with inlaid box-plots (white dot represents the median) of all tail-length measurements in genes with different frequencies of optimal codons (Fop) and abundance levels. (f to h) C. elegans genes were classified according to codon optimization, demonstrating a significant relationship between translational efficiency and the cumulative distribution of poly(A) length (f), transcript abundance (g) and ribosome enrichment[25] (h). Normalized abundance was calculated as the log2 of the fold-change of the number of tags in a transcript over the median transcript level. P-values were calculated using the Mann-Whitney U test between each codon optimization category and all genes sampled. Poly(A) tail measurements, abundance, Fop, and ribosome enrichment for C. elegans transcripts are available in Supplementary Data Sets 1 and 2.
We next asked if poly(A) tail size was associated with translational efficiency. In general, the ribosome occupancy and frequency of optimal codons in a given mRNA are indicators of its translational status[19-21]. Additionally, it was recently shown in Saccharomyces cerevisiae and Zebrafish that transcripts with optimized codons have higher rates of translational elongation and are more stable than genes with suboptimal codons[20,22-24]. Consistent with these reports, we found that in C. elegans the most abundant transcripts were enriched for optimal codons (Fig. 2c) and ribosome association (Fig. 2d), using data from previously published ribosome profiling studies[25]. Moreover, these favored translation substrates were strongly biased towards short poly(A) tails (Fig. 2c–d and Supplementary Date Sets 1 and 2). However, for these genes, and almost all others, we were still able to detect transcripts with tail lengths consistent with the very long (>200 nt) poly(A) tails synthesized on nascent mRNAs[1]. Specifically, we detected molecules with tail sizes ≥ 200 nt for 78% and ≥ 160 nt for 90% of all genes assayed (Supplementary Fig. 2a). More variability was observed for the minimum and overall range of poly(A) tail sizes of mRNAs (Supplementary Fig. 2b–c). The finding that genes with the highest frequencies of optimal codons were represented by mRNAs that spanned the entire range of detectable tail sizes but were strongly biased for short tailed species (Fig. 2e and Supplementary Fig. 2c) suggests that well-expressed mRNAs undergo poly(A) tail shortening to a defined length, which we refer to as pruning.Examination of the distribution of poly(A) tail lengths for individual genes revealed distinct patterns based on transcript abundance and codon composition (Fig. 2e). For highly expressed and codon-optimized genes such as rpl-21 (a ribosomal protein) and daf-21 (HSP90 - a molecular chaperone), tail lengths ranged from 5–231 nt but concentrated prominently around lengths that would accommodate 1–2 PABPs (~30–60 nt). In contrast, less abundant mRNAs with poorly optimized codons, such as egl-15 (fibroblast growth factor receptor) and svh-1 (neuronal growth factor), tended to have much longer and more diffusely distributed poly(A) tail sizes. On a genome-wide scale, we observed significant differences in the distribution of median poly(A) lengths, abundance and ribosome enrichment for transcripts containing low, medium, and high levels of optimal codons (Fig. 2f–h). Consistent with the general trend of highly expressed genes being compact[26], we found that C. elegans genes with short poly(A) tails tended to have short open reading frame (ORF) and 3’ untranslated region (UTR) lengths (Supplementary Table 1).To further investigate the relationship between gene expression and poly(A) tail size, we focused on a set of mRNAs undergoing translational activation or repression during the last larval stage of development, using published RNA-seq and ribosome profiling time course data for C. elegans[25]. During a two-hour window that spans the time point we used for mTAIL-seq, transcripts for 365 genes become at least 8-fold enriched while those for 341 genes become at least 8-fold depleted from ribosomes, after normalization to changes in mRNA abundance. Remarkably, the ribosome enriched transcripts, and presumably more actively translated group, had significantly shorter median poly(A) tail sizes compared to the transcripts associated with translational repression (Fig. 3a). Further evidence suggesting an inverse relationship between poly(A) tail size and translation surfaced from our analysis of annotated long non-coding RNAs (lncRNAs)[27]. In general, lncRNAs, including antisense RNAs, had long poly(A) tails and showed no evidence of the phasing seen for mRNAs (Fig. 3b). Taken together, our findings suggest that pruned poly(A) tails are a feature of well-translated mRNAs.
Figure 3
Efficient translation is associated with short poly(A) tails. (a) Cumulative median tail length distributions of genes that are enriched (n = 365) or depleted (n = 341) in ribosomes (at least 8 fold) over a 2 hour period[25] that spans the time point used for mTAIL-seq (29 h). P-values were calculated using the Mann-Whitney U test between each category and all genes sampled. (b) Density plot comparing the bulk distribution of poly(A) tails between mRNAs and two classes of long non-coding RNAs: lincRNAs (long intervening non-coding RNAs) and antisense RNAs. Poly(A) tail measurements are available in Supplementary Data Set 1.
Short poly(A) tails are associated with highly expressed genes across eukaryotes
We next asked if the association between mRNA expression and poly(A) tail size might be conserved in other eukaryotes. We analyzed published datasets for poly(A) tail lengths[10], ribosome enrichment[10], RNA stability[20,28,29], and translation[29] for S. cerevisiae, Drosophila and mouse transcripts. We observed that highly translated mRNAs tended to have shorter tails (Fig. 4a–b and Supplementary Fig. 3a and Supplementary Table 1), higher steady state expression levels (Fig. 4c and Supplementary Fig. 3b and Supplementary Table 1), and longer half-lives (Fig. 4d and Supplementary Table 1). Notably, the shorter relative median tail length of transcripts encoding ribosomal proteins was well conserved among the different organisms (Supplementary Fig. 4a). Additionally, in the C. elegans dataset this class of mRNAs exhibited highly uniform median tail lengths of ~40 nt (Supplementary Fig. 4a), with the largest fraction of tails sized to accommodate one, and to a lesser extent, two PABPs (Supplementary Fig. 4b–c). Overall, these results suggest that pruned poly(A) tails are a feature of stable and efficiently translated mRNAs across species.
Figure 4
Short poly(A) tails are features of highly expressed mRNAs in yeast and mouse. (a to d) Cumulative distribution plots showing the relationship between translation levels and poly(A) length[10] (yeast n = 3,526; mouse n = 3,469) (a), ribosome enrichment[10] (yeast n = 3,394; mouse n = 3,214) (b), transcript abundance[10] (yeast n = 3,394; mouse n = 3,214) (c), and transcript half-lives (yeast n = 2,702; mouse n = 3,469) (d) in S. cerevisiae[20] and mouse NIH3T3[29] cells. In mouse cells, translation rate constants (ksp) represent the number of proteins synthesized per mRNA per hour[29]. In yeast, translation rates are reflected in the codon optimization of the transcripts. (e) Relationship between poly(A) tail size and co-translational decay in yeast transcripts (n = 2,994). Higher CPI (Codon Protection Index) values correspond to higher rates of co-translational 5’decapping[28]. P-values were calculated using the Mann-Whitney U test between each tail size category (short = 1st quartile; medium = 2nd and 3rd quartiles; long = 4th quartile, based on median length) and all genes sampled. Source data are from refs. 10, 20, 28, 29 and summarized in Supplementary Table 2.
Recent studies have shown that codon composition strongly influences mRNA stability and translation efficiency[20,22-24]. In S. cerevisiae, a series of HIS3 reporters that differ only in their percentages of optimal codons revealed that mRNA half-life is remarkably sensitive to this variable[24]. Using these same reporters, we analyzed steady state poly(A) tail lengths and observed that transcripts with high percentages of optimal codons accumulated with relatively short poly(A) tails (Supplementary Fig. 5). In contrast, transcripts with lower codon optimality had longer, more diffuse tail sizes (Supplementary Fig. 5). These results suggest that the influence of codon optimality on translation efficiency and mRNA stability extends to poly(A) tail length regulation.Initially, it was puzzling to find that the class of relatively unstable and poorly translated mRNAs had the longest median poly(A) tail sizes (Figs. 2 and 4). One possibility is that this pool mainly consists of recently synthesized transcripts that have not yet been targeted for rapid decay. In yeast, unstable mRNAs have been shown to undergo rapid deadenylation to a ~10 nt oligo(A) tail length, followed by decapping and 5’ to 3’ exonucleolytic degradation[30]. Although decay intermediates are rare in wild type cells[31,32], a recent study used deep sequencing methods (5PSeq) to identify decapped yeast mRNAs on a genome-wide scale[28]. Using published 5PSeq datasets for yeast mRNAs[28], we found that genes for transcripts with long median tails were represented by the highest levels of 5’ decapped mRNAs (Fig. 4e and Supplementary Table 1). The 5’ decay intermediates only accounted for ~12% of cellular RNAs that could be captured by oligo(dT) isolation methods[28], which is consistent with the brief existence of decapped RNAs in wild type cells[31,32]. Thus, many transcripts in the “long” poly(A) tail class may actually be detected in a transient state prior to rapid destabilization. Conversely, most “short” class transcripts seem to be those that accumulate with pruned poly(A) tails.
DISCUSSION
Here we provide genome-wide evidence that short poly(A) tail sizes are a feature of abundant and efficiently translated mRNAs across eukaryotes. Previous poly(A) tail sequencing studies concluded that tail length was not associated with translational efficiency in non-embryonic cells[8-10]. However, the PAL-seq study reported that in yeast and mouseNIH3T3 cells tail sizes and measures of translation rates were negatively correlated (Rs = −0.12, P<10−9 (S. cerevisiae); Rs = −0.20, P<10−16 (mouse))[10], findings confirmed in our analyses (Fig. 4a and Supplementary Table 1). Additionally, the classes of transcripts found to have long or short tails by our study and PAL-seq[10] are largely in agreement, with short-tailed transcripts generally considered to be among the most abundant and well translated in the cell. These observations are also consistent with conclusions from direct labeling experiments where short poly(A) tails were associated with the most stable mRNAs in vegetatively growing Dictyostelium discoideum cells[33,34]. Presently, it is unclear why other analyses of poly(A) tail size on individual genes in yeast or NIH3T3 cells found that ribosomal protein and other abundantly expressed transcripts had relatively long tails[8,35]. Those conclusions are at odds with single-gene Northern Blot or PCR based assays that have detected relatively short poly(A) tails on ribosomal protein mRNAs in yeast[10,36,37], mouseNIH3T3 cells[38], and worms (Supplementary Fig. 4c). It is possible that, like with translation, gene specific control of poly(A) tail length is sensitive to differences in cellular contexts[39-41]. Furthermore, the categories of “short” and “long” are relative to the population of polyadenylated transcripts analyzed, which was limited in some of the previous studies[8,35].Although our study challenges the longstanding idea that longer tails promote mRNA stability and translation[2,5,7,18], it suggests that instead there might be an optimal tail size that results from a shortening process we refer to as pruning. Since poorly translated mRNAs and non-coding transcripts were found to contain long, less defined poly(A) tails, pruning seems to be associated with translational activity. Additionally, bulk and single gene analyses revealed a ~30 nt distribution of poly(A) tail sizes that was primarily associated with highly expressed mRNAs. This phased binding pattern of PABP may be related to translation status and, thus may help distinguish coding from long ncRNAs. The currently available datasets are insufficient for determining if translation directly promotes pruning or stabilizes mRNAs with short poly(A) tails. In a model open to either possibility, the initially long poly(A) tails on newly synthesized transcripts become deadenylated to different extents depending on translational status: for well translated mRNAs, tail shortening ceases at lengths that accommodate a minimal number of PABPs, and for inefficiently translated mRNAs, deadenylation progresses to critically short lengths that trigger decapping and rapid mRNA decay (Fig. 5). Processive deadenylation may result when the last PABP is dislodged from the poly(A) tail, and efficient translation may antagonize this event by stabilizing the PABP-poly(A) tail association, perhaps through direct interactions with initiation (eIF4G) and termination (eRF3) factors[1,42]. Numerous studies have pointed to dual, seemingly contradictory, roles for PABP in regulating mRNA stability. Whereas binding of PABP can protect the poly(A) tail from degradation[43,44], it also has been shown to recruit the major deadenylase complexes PAN2-PAN3 and CCR4-NOT[45-47]. The multiple PABPs bound to initially long tailed transcripts could engage deadenylation factors that either reduce the tails to lengths that exclude PABP binding, resulting in rapid decay, or that stall at short tail sizes bound by a minimal number of PABPs stably associated with actively translated mRNAs (Fig. 5). While consistent with the well-established connection between translation and mRNA decay[3,48], this model implicates an optimal poly(A) tail length that is achieved through translational activity and, in turn, may contribute to the stability and efficient decoding of the mRNA. Overall, our analyses led to the surprising conclusion that in somatic cells short poly(A) tails are a general feature of highly-expressed genes across eukaryotes.
Figure 6
Model for short poly(A) tails on highly expressed mRNAs. Newly transcribed mRNAs receive long (>200 nt) tails, which are coated with PABP[1]. The PABP C-terminal domain (PABC, black triangles) binds the CCR4-NOT-Tob and PAN2-3 deadenylation complexes[5,6]. In strong translation substrates, interactions between a proximal PABP and translation initiation factor eIF4G promote a closed-loop structure and the translation termination factor eRF3 may compete with the deadenylases for binding the PABC domain[6,42]. These interactions are predicted to stabilize the proximal PABP and prevent processive deadenylation of the transcript, allowing the tail to be pruned to a defined length. Trimming of the poly(A) tail to limit the number of associated PABPs may be important for removing binding sites for factors that catalyze deadenylation and translational repression. For weak translation substrates, the deadenylases recruited to the PABC domain can act processively without the impediment of stabilizing interactions provided by translational activity, resulting in critically short tails that trigger decapping and 5’→3’ degradation of the mRNA.
ONLINE METHODS
Nematode culture and RNA extraction
WT Caenorhabditis elegans (N2 Bristol) animals were cultured on OP50 bacteria at 25°C, and collected at the last larval stage (mid-L4 – 29 h time point). Standard worm synchronization methods were used[49]. RNA was extracted with Trizol and DNAse treated. RNA quality was measured by 260/280 ratio and confirmed by gel electrophoresis.
Bulk poly(A) labeling
1 µg total RNA (DNase treated) was 3’ labeled by performing a 3’ ligation reaction containing 20 U T4 RNA Ligase (NEB) and 1 µM [32P]pCp (Perkin Elmer) overnight at 16 °C. Enzymes were inactivated at 68 °C for 5 min and unincorporated nucleotides were removed with MicroSpin G-50 columns (GE Healthcare). Labeled RNA was digested with 80 U RNase T1 and 4 ug RNase A (which cannot act on the poly(A) tail) for 2 h at 37 °C; 40 µg unlabeled yeast RNA was used as ballast. The reaction was stopped by Proteinase K digestion of the RNases and the labeled poly(A)s were extracted with acid-phenol:chloroform:IAA and ethanol precipitated. Labeled poly(A) tails were resuspended in 20 µL, of which 5 µL were run on a long 15% Urea-PAGE sequencing gel along with labeled Decade RNA Marker (Ambion). The gel was dried onto whatman paper and scanned on a PhosphorImager.
Poly(A) analysis by northern blot
As detailed in Sallés et al., 1999[50], total RNA samples were digested with RNase H (NEB) in the presence of a gene specific complementary oligonucleotide and, in the case of poly(A)- samples, also oligo(dT)18. The samples were then resolved on a 6% Urea-PAGE minigel along with RNA Century marker (Ambion) for size determination of the fragments. Northern blotting was performed as described in Van Wynsberghe et al., 2011[51].
Yeast culture and RNA analysis
RNA samples from strains expressing HIS3 mRNA reporters with varying degrees of codon optimality were prepared and subjected to poly(A) tail length analyses by RNase H Northerns as previously described[24,32].
mTAIL-seq
mTAIL-seq was performed as in the original TAIL-seq[8], with the following modifications. 3’ adaptor splint ligation: A splint oligonucleotide was used to favor capture of poly(A)+ RNAs. We incubated 20 ug of total RNA in a 5 uL volume with 1 uL 10 uM biotinylated 3’ adaptor and 1 uL 10 uM splint oligonucleotide (5’-NNNGTCAGTTTTTTTTT-3’) at room temperature for 5 min. Next, 1 uL 10× RNA ligase buffer (NEB), 0.5 uL of Superase-In (Ambion) and 1 uL T4 RNA ligase 2 truncated (NEB) were added and the ligation was performed overnight at 18 °C. RNA Size selection: After partially digesting the RNA from the ligation reaction with 2 U of RNase T1 (1U /uL) for 5 min at 22 °C and performing the original protocol for biotin pull-down and on-bead 5’ phosphorylation, we eluted the RNA and size-selected fragments of 250–1000 nt. This was done by gel extraction and purification from a 6% Urea-PAGE gel. Libraries were normalized, pooled and then sequenced in the Illumina MiSeq platform (51 × 251 bp paired end run) with PhiX control library and the spike-in controls mixture. The quantified fluorescent signals were saved and processed by tailseeker2. Since this protocol is very similar to the recently published method, mTAIL-seq[12], we refer to our method by the same name.
mTAIL-seq data analysis
Base calling, trimming of adapter sequences, removal of duplicated reads and determination of poly(A) tail sizes were performed by tailseeker2. Reads were analyzed by mapping to the WS247 assembly of the C. elegans genome using RNA-STAR[52]. Poly(A) lengths were then assigned to individual coding genes by intersecting the mapped sequences with WormBase.org WS247 gene annotations using BEDTools[53]. Assignment to WormBase annotated non-coding RNAs[27] was determined after ruling out matches to other overlapping coding and non-coding transcripts. Sequenced tags without a poly(A) tail were discarded and represented less than 0.02 percent of the data. The minimal poly(A) length detected was 5 nt.
RNA-seq
Three independent replicates of wildtype C. elegans were cultured at 25° and collected at the L4 stage for RNA. These samples were prepared for sequencing by rRNA depletion with Ribo-Zero rRNA Removal Mix – Gold (Illumina), and the TruSeq Stranded Total RNA Library Prep Kit (Illumina) according to the Low Sample Protocol. After sequencing on the Illumina HiSeq platform, read counts were quantified using kallisto[54] and aligned to C. elegans genome WS247.
Frequency of optimal codons (Fop) and ribosome enrichment
Optimal codons have been identified for yeast[20], C. elegans[55] and D. melanogaster[56]. Fop was calculated as in a previous study[55] and represents the ratio of optimal codons relative to the total number of codons in a transcript, excluding codons for amino acids represented by a single codon (methionine and tryptophan) and stop codons. Values can range from 0 to 1, with a Fop of 1 meaning that every codon is optimal. Ribosome enrichment was determined by calculating the log2 fold change of normalized RPKM values for each transcript in the ribosome fraction relative to total RNA using paired RNA-seq and ribosome profiling datasets[10,25]. The first 50 nucleotides of the ORF were excluded from this analysis in order to avoid biases at the start codon.
Gene Ontology (GO) and tissue enrichment analysis
GO terms associated with long and short-tailed gene pools were identified using DAVID[57]. Analysis for tissue enrichment in long and short-tailed genes was performed by employing scores from a dataset of global predictions of tissue-specific gene expression in C. elegans[17].
Statistics
Fisher’s exact test was used to test for enrichment in gene classes. The Mann-Whitney U test was used to test for differences in the distribution of values belonging to specific gene categories and all genes tested. Spearman correlations were used to measure the strength and direction of association between two ranked variables.
Data availability
The datasets generated in this study for analyzing poly(A) tail length and RNA expression in L4 stage C. elegans are available on GEO under the accession number GSE104502. Source data for figures 1d–f; 2a–d, f–h; 3a; Supplementary figures 2a–c; 4a; Supplementary Table 1 are available with the paper online. Previously published datasets used in this study are summarized in Supplementary Table 2 and include the following, C. elegans: ribosome profiling and RNA-seq[25], ORF and 3’UTR lengths[58]; S. cerevisiae: poly(A) measurements[10], ribosome profiling and RNA-seq[10], RNA half-life[20], and co-translational 5’ decapping (codon protection index of cycloheximide treated cells)[28]. NIH3T3: poly(A) measurements[10], ribosome profiling and RNA-seq[10], RNA half-life[29], and translation rates[29]. DrosophilaS2: poly(A) measurements and RNA-seq[10]. HeLa: poly(A) measurements[10]. Other data are available upon request. A Life Sciences Reporting Summary for this article is available.
Authors: Ariel A Bazzini; Florencia Del Viso; Miguel A Moreno-Mateos; Timothy G Johnstone; Charles E Vejnar; Yidan Qin; Jun Yao; Mustafa K Khokha; Antonio J Giraldez Journal: EMBO J Date: 2016-07-19 Impact factor: 11.598
Authors: Matthew T Parker; Katarzyna Knop; Anna V Sherwood; Nicholas J Schurch; Katarzyna Mackinnon; Peter D Gould; Anthony Jw Hall; Geoffrey J Barton; Gordon G Simpson Journal: Elife Date: 2020-01-14 Impact factor: 8.140
Authors: A P Sudheesh; Nimmy Mohan; Nimmy Francis; Rakesh S Laishram; Richard A Anderson Journal: Nucleic Acids Res Date: 2019-11-18 Impact factor: 16.971
Authors: Brigitte Grima; Christian Papin; Béatrice Martin; Elisabeth Chélot; Prishila Ponien; Eric Jacquet; François Rouyer Journal: Proc Natl Acad Sci U S A Date: 2019-03-04 Impact factor: 11.205