| Literature DB >> 35438785 |
Angela L Nicholson-Shaw1, Eric R Kofman2,3,4, Gene W Yeo2,3,4, Amy E Pasquinelli1.
Abstract
The poly(A)-tail appended to the 3'-end of most eukaryotic transcripts plays a key role in their stability, nuclear transport, and translation. These roles are largely mediated by Poly(A) Binding Proteins (PABPs) that coat poly(A)-tails and interact with various proteins involved in the biogenesis and function of RNA. While it is well-established that the nuclear PABP (PABPN) binds newly synthesized poly(A)-tails and is replaced by the cytoplasmic PABP (PABPC) on transcripts exported to the cytoplasm, the distribution of transcripts for different genes or isoforms of the same gene on these PABPs has not been investigated on a genome-wide scale. Here, we analyzed the identity, splicing status, poly(A)-tail size, and translation status of RNAs co-immunoprecipitated with endogenous PABPN or PABPC in human cells. At steady state, many protein-coding and non-coding RNAs exhibit strong bias for association with PABPN or PABPC. While PABPN-enriched transcripts more often were incompletely spliced and harbored longer poly(A)-tails and PABPC-enriched RNAs had longer half-lives and higher translation efficiency, there are curious outliers. Overall, our study reveals the landscape of RNAs bound by PABPN and PABPC, providing new details that support and advance the current understanding of the roles these proteins play in poly(A)-tail synthesis, maintenance, and function.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35438785 PMCID: PMC9071453 DOI: 10.1093/nar/gkac263
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Validation of PABPN and PABPC RIPs and annotation pipeline. (A) Western blots showing that knockdown of PABPN or PABPC with siRNAs shows a subsequent reduction in the protein band that is targeted by the antibodies used for immunoprecipitations. Tubulin and actin are used as loading controls. (B) Western blots showing representative IPs pulling down PABPN or PABPC from total cell lysate. (C) Western blots showing matched IgG isotype control antibodies that were also used for pulldown. The RNA isolated from these RIPs was so minimal that it could not be prepared for sequencing, suggesting that the RIP conditions prevented non-specific binding of RNA. (D) Schematic of part of the pipeline developed to analyze RNA sequencing experiments. Any portion of the annotation file that had two or more genes overlapping the exact same genomic space was removed to avoid false positives. (E) Schematic showing ‘readthrough_transcript’ annotations and how they block any assignment of reads to the two genes that comprise that readthrough event. Readthrough_transcripts were removed from the annotation file. (F) Genome browser track read pileup for PABPN RIP showing reads that suggest failure to properly terminate transcription after the CENPM protein-coding gene, resulting in intergenic reads until reaching the neighboring gene, TNFRSF13C. (G) Principal Component Analysis (PCA) plot from Illumina RNA sequencing of total input, PABPC IP and PABPN IP, three independent biological replicates.
Figure 2.RNA Immunoprecipitation of PABPN and PABPC reveals distinct enrichment profiles. (A and B) Transcripts per kilobase million (TPM) of input compared to TPM of PABPN IP (A) or PABPC IP (B) from Illumina RNA-Seq experiments. ▴ indicates a canonical histone gene as annotated in HistoneDB 2.0 (94). Blue dashed line is an overlaid 1:1 line. Significant genes enriched or depleted were determined by comparison to input samples using DESeq2 (57) using cut-offs of log2FoldChange 0.5/−0.5, Padj ≤ 0.01, and baseMean > 50. Non-significant genes are colored in grey. (C and D) Volcano plots showing enrichment and depletion of protein-coding and non-coding genes in PABPN IP vs Input (C) or PABPC IP versus input (D). Grey dashed lines indicate significance cut-offs of Padj ≤ 0.01 and log2FoldChange > 0.5 or < −0.5. (E) Venn diagrams showing genes considered significantly enriched in PABPN IP or PABPC IP compared to input with the overlap showing the genes that were enriched in both. Cut-offs used were baseMean > 50, Padj ≤ 0.01 and log2FoldChange > 0.5. (F) Venn diagrams showing genes considered significantly depleted in PABPN IP or PABPC IP compared to input with the overlap showing the genes that were enriched in both. Cut-offs used were the same as in (E). (G) Intron presence in PABPN and PABPC IPs, analyzed by normalizing each exon or intron to their respective length and then comparing the ratio of intron reads/exon reads for each gene. A value of 1 would indicate completely unspliced, and a value of 0 indicates fully spliced. Genes used for calculation had at least a TPM of 1 in both IP conditions. Box and Whisker plots show the median as the central line in the box. The upper and lower edges of the box indicate the range of the upper and lower quartiles. (H) Transcripts per Kilobase Million (TPM) of intronic versus exonic reads detected in PABPN IP. A pseudocount of 0.5 was added before taking the log of TPM values. Blue dashed line is an overlaid 1:1 line. (I) Transcripts per kilobase million (TPM) of intronic versus exonic reads detected in PABPC IP. A pseudocount of 0.5 was added before taking the log of TPM values. Blue dashed line is an overlaid 1:1 line. (J) Gene ontology (GO) molecular function enrichment analysis of protein coding genes significantly enriched in PABPN IP compared to Input (n = 1893) using PANTHER. Significance cut-offs used are baseMean > 50, Padj ≤ 0.01 and log2FC > 0.5. Reference list used was the protein coding genes that were detected with a baseMean greater than 50 overall (n = 12 608). (K) Gene ontology (GO) molecular function enrichment analysis of protein coding genes significantly enriched in PABPC IP compared to Input (n = 5113) using PANTHER. Significance cut-offs used are baseMean > 50, Padj ≤ 0.01 and log2FC > 0.5. Reference list used was the protein coding genes that were detected with a baseMean greater than 50 overall (n = 12 398). (L) Protein coding genes determined to be enriched or depleted in PABPN IP or PABPC IP were grouped and compared to published half-life values (73). Significant differences in the cumulative distributions attributable to enrichment or depletion with PABPN or PABPC are indicated: ***P < 0.001; two tailed Kolmogorov−Smirnov test. Number of genes in each boxplot are displayed at the base of the graph. (M) Groupings were compared to published translation efficiency data as determined by ribosome profiling (49). Otherwise, this panel is the same as in (L).
Figure 3.Transcripts bound to PABPC associate with the ribosome. (A) Boxplot of edit scores (≥0.5 confidence score) for all transcripts in PABPC IP or PABPN IP. Genes displayed have a TPM ≥ 5 in both IP conditions. ***P < 0.001; two tailed Kolmogorov–Smirnov test. (B) Boxplot of edit scores (≥0.5 confidence score) for protein-coding genes (TPM ≥ 5) when they are associated with PABPN or when associated with PABPC. ***P < 0.001; two tailed Kolmogorov–Smirnov test. (C) Boxplot of edit scores (≥0.5 confidence score) for non-coding genes (TPM ≥ 5) when they are associated with PABPN or when associated with PABPC. Significance calculated with two tailed Kolmogorov–Smirnov test. (D) Boxplot of edit scores (≥0.5 confidence score) from cytoplasmic input STAMP-RPS2 condition (no IP pulldown) for transcripts of protein-coding genes depleted or enriched with PABPN (TPM ≥ 5). ***P < 0.001; two tailed Kolmogorov–Smirnov test. Significance was determined by DESeq2 with cut-offs of log2FoldChange > 0.5 (enriched) or < –0.5 (depleted), baseMean > 50 and Padj ≤ 0.01. (E) Metagene plot showing edit (≥0.5 confidence score) distribution for transcripts from protein-coding genes enriched or depleted from PABPN across 5′ UTR, CDS and 3′ UTR gene regions, when they were associated with PABPN. (F) Boxplot of edit scores (≥0.5 confidence score) from cytoplasmic input STAMP-RPS2 condition (no IP pulldown) for transcripts of protein-coding genes depleted or enriched with PABPC (TPM ≥ 5). ***P < 0.001; two tailed Kolmogorov–Smirnov test. Significance was determined by DESeq2 with cut-offs of log2FoldChange > 0.5 (enriched) or < –0.5 (depleted), baseMean > 50 and Padj < = 0.01. (G) Metagene plot showing edit (≥0.5 confidence score) distribution for transcripts from protein-coding genes enriched or depleted from PABPC across 5′ UTR, CDS and 3′ UTR gene regions, when they were associated with PABPC.
Figure 4.Incompletely spliced transcripts with longer poly(A) tails are associated with PABPN. (A) TPM of PABPN IP compared to TPM of PABPC IP, colored by degree of enrichment in cytoplasmic (blue) or nuclear (red) fractions. Select abundant genes highly enriched in the nucleus or cytoplasm are labeled by name. A pseudocount of 0.5 was added before taking the log of TPM values. Only genes with mature length >200nt are plotted, as PABPC and PABPN rarely bind to shorter transcripts. Replication-dependent histone genes (which are depleted from both IPs) were used as a threshold for background in the IP conditions and any gene that was detected at a lower ratio than these histone genes was not included in this plot. (B) Density plot showing overall poly(A) tail length distribution of all nuclear-encoded genes detected by Nanopore direct RNA sequencing. Dashed lines indicate total cytoplasm and total nuclear fraction, and solid lines indicate the two IP conditions. Region from 0–150 nt is magnified in the bottom panel to show the differences in distributions for shorter poly(A) tails. Number of reads in each density plot line are as follows: Nuclear 585 914; Cytoplasm 1 039 310; Nuclear PABPN 300 565; Cytoplasm PABPC 1 047 845. (C) Density plot of poly(A) tail length for reads that contained one or more introns (dashed line) or no introns (solid line) while with PABPN in the nucleus. (D) Density plot of poly(A) tail length for reads that contained one or more introns (dashed line) or no introns (solid line) while with PABPC in the cytoplasm.
Figure 5.Poly(A) tail size differs depending on whether a transcript is associated with PABPN or PABPC. (A) Cumulative plot showing median and maximum poly(A) tail length for each gene that had at least 10 reads in PABPN IP from the nucleus (n genes = 7343). Graphing area for all cumulative plots has been limited to 0–550 nt. (B) Cumulative plot showing median tail length detected in PABPN IP in the nucleus, separated by gene biotype, for genes that had at least 10 reads. PCG n = 7025, processed pseudogene n = 73, unprocessed pseudogene n = 72, lncRNA n = 168. (C) Cumulative plot showing maximum tail length detected in PABPN IP in the nucleus, separated by gene biotype, for genes that had at least 10 reads. PCG n = 7025, processed pseudogene n = 73, unprocessed pseudogene n = 72, lncRNA n = 168. (D) Cumulative plot showing maximum tail length detected in PABPN IP in the nucleus for protein coding genes that contain none, one, two or multiple introns, as well as processed pseudogenes, for genes that had at least 10 reads. PCG no introns n = 104, PCG 1 intron n = 140, PCG 2 introns n = 217, PCG 3 + introns n = 6563, processed pseudogene n = 72. (E) Cumulative plot showing median and maximum poly(A) tail length for each gene that had at least 10 reads in PABPC IP from the cytoplasm (n genes = 7736). (F) Cumulative plot showing median tail length detected in PABPC IP from the cytoplasm, separated by gene biotype, for genes that had at least 10 reads. PCG n = 7209, processed pseudogene n = 236, unprocessed pseudogene n = 52, lncRNA n = 233. (G) Cumulative plot showing maximum tail length detected in PABPC IP from the cytoplasm, separated by gene biotype, for genes that had at least 10 reads. PCG n = 7209, processed pseudogene n = 236, unprocessed pseudogene n = 52, lncRNA n = 233. (H) Cumulative plot showing maximum tail length detected in PABPC IP from the cytoplasm for protein coding genes that contain none, one, two or multiple introns, as well as processed pseudogenes, for genes that had at least 10 reads. PCG no introns n = 134, PCG 1 intron n = 209, PCG 2 introns n = 308, PCG 3 + introns n = 6557, processed pseudogene n = 236. (I) Violin plots of individual genes and their poly(A) tail distributions when associated with PABPN in the nucleus and when associated with PABPC in the cytoplasm. RPS2 and RPL7A are among the top most abundant genes in both PABPN IP from the nucleus and PABPC IP from the cytoplasm. CKB and HNRNPA2B1 are among the top most abundant with PABPN in the nucleus. Number of reads for each violin are shown in black text at the base of the violin. White boxplots are inlaid within the violin, indicating the median (line) and upper and lower quartiles (box). The lines extending out from the central box indicate the minimum and maximum value in that dataset. (J) Median poly(A) tail length when transcripts were associated with PABPN compared to their degree of enrichment or depletion as determined by total cell lysates and comparison of PABPN IP to input condition. Genes had to have at least 10 poly(A) reads to be considered in this analysis. Blue line is a best fit line using a linear model. Data points in black are genes that are significantly enriched/depleted, determined as previously in Figure 1, using cutoffs of Padj ≤ 0.01, baseMean > 50 and log2FC of > 0.5 or < −0.5. (K) Median poly(A) tail length when transcripts were associated with PABPC compared to their degree of enrichment or depletion as determined by total cell lysates and comparison of PABPC IP to input condition. Cut-offs are the same as described in (J). (L) Density plots of the change in median poly(A) tail length when a transcript is with PABPN in the nucleus compared to PABPC in the cytoplasm, separated by gene biotype. A negative change indicates that the poly(A) tail was shorter when associated with PABPC in the cytoplasm. Genes must have been represented by at least 35 reads in each IP to be displayed. Unprocessed pseudogenes did not have enough reads to pass cut-offs and are thus not displayed here. (M) Change in median poly(A) tail length of all protein coding genes that were displayed in blue in (L), compared to their maximum detected poly(A) tail length when associated with PABPN in the nucleus. Blue line is a best fit line using a linear model.