Literature DB >> 30150773

Widespread intronic polyadenylation inactivates tumour suppressor genes in leukaemia.

Shih-Han Lee¹, Irtisha Singh^2,3, Sarah Tisdale¹, Omar Abdel-Wahab⁴, Christina S Leslie², Christine Mayr⁵.

Abstract

DNA mutations are known cancer drivers. Here we investigated whether mRNA events that are upregulated in cancer can functionally mimic the outcome of genetic alterations. RNA sequencing or 3'-end sequencing techniques were applied to normal and malignant B cells from 59 patients with chronic lymphocytic leukaemia (CLL)1-3. We discovered widespread upregulation of truncated mRNAs and proteins in primary CLL cells that were not generated by genetic alterations but instead occurred by intronic polyadenylation. Truncated mRNAs caused by intronic polyadenylation were recurrent (n = 330) and predominantly affected genes with tumour-suppressive functions. The truncated proteins generated by intronic polyadenylation often lack the tumour-suppressive functions of the corresponding full-length proteins (such as DICER and FOXN3), and several even acted in an oncogenic manner (such as CARD11, MGA and CHST11). In CLL, the inactivation of tumour-suppressor genes by aberrant mRNA processing is substantially more prevalent than the functional loss of such genes through genetic events. We further identified new candidate tumour-suppressor genes that are inactivated by intronic polyadenylation in leukaemia and by truncating DNA mutations in solid tumours4,5. These genes are understudied in cancer, as their overall mutation rates are lower than those of well-known tumour-suppressor genes. Our findings show the need to go beyond genomic analyses in cancer diagnostics, as mRNA events that are silent at the DNA level are widespread contributors to cancer pathogenesis through the inactivation of tumour-suppressor genes.

Entities: Chemical

Mesh：

Substances：
RNA, Messenger

Year: 2018 PMID： 30150773 PMCID： PMC6527314 DOI： 10.1038/s41586-018-0465-8

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

In addition to DNA-based mutations, recent studies found that alterations in mRNA processing, including splicing, promote tumorigenesis[6]. In CLL, up to 25% of patients have mutations in ATM or SF3B1, but a third has less than two mutated driver genes and most patients (58%) only have a 13q deletion or a normal karyotype[3,7-9]. Here, we investigated if intronic polyadenylation (IPA) might serve as a novel driver of tumorigenesis. As 16% of genes in normal immune cells use IPA to generate truncated mRNAs that contribute to transcriptome diversity[2], we hypothesized that cancer-specific IPA would generate truncated proteins that lack essential domains, and thus, may phenocopy truncating (TR) mutations (Fig. 1a).

Figure 1.

Hundreds of genes generate recurrent CLL-IPAs.

(a) Schematic showing full-length mRNA and protein expression in normal cells and generation of a truncated mRNA and protein through cancer-specific IPA, despite no difference in DNA sequence. Polyadenylation sites (pA) are shown in light green. Loss of essential protein domains (dark green boxes) through cancer-gained IPA may inactivate TSGs, thus contributing to cancer pathogenesis. (b) Representative CLL-IPAs (from N = 330) are shown. mRNA 3′ ends detected by 3′-seq are depicted as peaks whose height corresponds to transcript abundance shown in transcripts per million (TPM). The bottom panel shows RNA-seq reads and numbers correspond to read counts. MemB, memory B cells, NB, naïve B cells. Full-length and IPA-generated truncated proteins are depicted in grey, known domains are shown in green and the domains lost through IPA are named. CC, coil-coil. For CLL-IPA, the number of retained and novel amino acids (aa) and aa of full-length proteins are given. (c) Representative RNA-seq tracks from two independent CLL data sets are shown as in (b), one is indicated by ′L′. Zoom-in shows the exonized part of intron 23 of DICER1 (green). (d) Difference in relative abundance (usage) of IPA isoforms between CLL and normal CD5+B cells. A GLM was used to identify significant events. CLL-IPAs with significantly higher usage are shown in red (FDR-adjusted P value < 0.1, usage difference ≥ 0.05, TPM in CD5+B < 8) and CD5+B-IPAs are shown in blue. Grey, IPAs present in CLL and CD5+B cells without significantly different usage. (e) Number of CLL-IPAs per sample is shown as box plots, horizontal line, median; box, 25th through 75th percentile; error bars, range. CLL high, N = 21/59, median of CLL-IPAs/sample = 98 vs CLL low, N = 38/59, median = 29. Two-sided Mann-Whitney test, ***, P = 6E-10.

Using 3′-seq of 44 samples, including normal B cells and CLL, we identified 5,587 IPA isoforms, including 3,484 without previous annotation (Extended Data Table 1 and methods)[1,2]. We validated 4,630 IPA isoforms using RNA-seq and additional 3′ end sequencing data (Extended Data Fig. 1a, 1b)[2,10]. To assess IPA usage in CLL, we first identified the normal B cell subset whose gene expression profile was most closely related to CLL cells. Lymphoid tissue-derived CD5+ B (CD5+B) cells were most similar (Extended Data Fig. 2), but clustered separately from CLL samples based on IPA site usage (Extended Data Fig. 1c). Using a generalized linear model (GLM), we identified 931 IPA events with significantly higher expression among 13 CLL samples, but low or absent expression in CD5+B cells (Fig. 1b, Extended Data Fig. 1d)[1,2]. As CLL-IPAs are detectable by RNA-seq, we used an unrelated RNA-seq data set to validate our CLL-IPA events (Fig. 1c)[3]. We verified up to 71% of testable IPAs by this independent method and data set (Extended Data Fig. 1d). For further analysis, we combined the data sets (N = 59 CLL samples) and focused only on CLL-IPAs present in more than 10% of the sample cohort resulting in 330 CLL-IPAs, derived from 306 genes (Fig. 1d, Supplementary Table S1). While CLL-IPAs were detected in all CLL samples, one third of samples had a significantly higher number of CLL-IPAs (Fig. 1e, Extended Data Fig. 1e).

Extended Data Table 1.

Samples investigated by 3′-seq and RNA-seq.

(a) CLL sample characteristics. (b) Normal human immune cells investigated by 3′-seq. (c) Normal human immune cells investigated by RNA-seq.

a
	CLL low vsCLL high	Number ofCLL-IPAs	Age atdiagnosis	Rai stage atsamplecollection	WBC countat samplecollection	IgVH status	Cyto-genetics	Treated beforesample collection	Treated aftersamplecollection	Diagnosis tosamplecollec-tion(time; mo)	Treatment-free survival(yr)	RNA-seq	3’-seq
CLL1	L	13	62	III	153	UN	Del 11q	N	Y	10	1	Y	Y
CLL2	L	7	62	III	300	NA	Del 17p	N	Y	112	9	Y	Y
CLL3	L	26	54	IV	139	NA	Tri12, t(14;19)	N	Y	84	8	Y	Y
CLL4	H	93	72	0	173	UN	Normal	N	Y	37	4	Y	Y
CLL5	L	11	55	III/IV	193	UN	Tri8, del 13q	N	Y	46	4.5	Y	Y
CLL6	L	28	39	I	137	MUT	Del 13q	N	Y	49	3.3	Y	Y
CLL7	H	108	54	IV	111	MUT	Del 13q	N	Y	108	8	Y	Y
CLL8	L	12	72	III	365	UN	Tri12	N	Y	109	9	N	Y
CLL9	L	5	63	III	200	UN	Del 13q, t(6;19)	N	Y	30	2	N	Y
CLL10	L	11	51	III	77	UN	Del 11q	N	Y	70	6	N	Y
CLL11	H	274	39	0	100	UN	Del 11q, 13q, 14q	N	Y	44	5.5	Y	Y
CLL12	H	42	49	II	178	NA	NA	N	N	240	23.3	Y	Y
CLL13	L	7	66	I	125	UN	Del 11q, del 13q	N	Y	5	0.5	N	Y
CLL14	H	160	45	NA	NA	NA	NA	N	NA	112	NA	Y	N
CLL15	L	49	NA	NA	NA	NA	NA	N	NA	NA	NA	Y	N

N, No; Y, Yes; NA, not analyzed

BM, bone marrow

Extended Data Figure 1.

Validation of IPA isoforms by independent methods and identification of CLL-IPAs used for further analysis.

(a) RNA-seq data were used to validate the presence of IPA isoforms using a GLM. Within two 100 nt windows (green bars) separated by 51 nt and located up- and downstream of the IPA peak the RNA-seq reads were counted. The IPA peak was considered validated if Padj < 0.1 (see methods). Out of N = 5,587 tested IPA isoforms, N = 1,662 were validated by this method. Shown is MGA as a representative example. (b) As only a fraction of IPA isoforms were validated by the method from (a), additional methods were used to obtain independent evidence for the presence of the IPA isoforms. Independent evidence was obtained using untemplated adenosines from RNA-seq data or through the presence of the IPA isoform in other 3′-seq protocols (10). As the majority of immune cell types used in this study have not been investigated using other 3′ end sequencing protocols and IPA isoform expression is cell type-specific (2), highly expressed IPA isoforms (>10 TPM) were not excluded from further analysis even if no read evidence was found by other protocols. (c) Hierarchical clustering based on IPA site usage separates the 3′-seq dataset into four groups. It separates CD5+B from CLL samples and clusters CLL samples into three different groups. Shown is the usage difference of the 20% most variable IPA isoforms across the data set (N = 342). Four of 13 CLL samples cluster away from the rest of the samples and are characterized by a high number of IPA isoforms (CLL high). (d) The GLM (FDR-adjusted P value < 0.1, IPA usage difference ≥ 0.05, IPA isoform expressed in CD5+B < 8 TPM) identified 477 recurrent (significantly upregulated in at least 2/13 CLL samples by 3′-seq) and 454 non-recurrent (significantly upregulated in 1/13 CLL samples by 3′-seq). IPAs were validated in an independent RNA-seq data set containing 46 new CLL samples. Among the recurrent IPAs, 71% of testable IPAs were verified using another GLM (see a). Among the non-recurrent IPAs, 64% of testable IPAs were verified. (e) Plotting the number of CLL-IPAs per sample separates the CLL samples investigated by 3′-seq into two groups: 4/13 samples generate a high number of CLL-IPAs (CLL high, median of CLL-IPAs/sample, N = 100, range, 42 – 274), whereas the rest of the samples generate lower numbers (CLL low, median, N = 9, range, 5 – 28). Center bar shows median and the error bars show the interquartile range. Two-sided Mann-Whitney test, **, P = 0.003.

Extended Data Figure 2.

The normal B cell counterpart of CLL cells are CD5+B cells derived from lymphoid tissue.

(a) Hierarchical clustering of normal human B cells (naive B (NB), memory B (MemB) and CD5+B) derived from lymphoid tissues or peripheral blood based on mRNA expression obtained from RNA-seq. The heatmap shows the 20% most variable genes across the data set (N = 1,887). The gene expression profiles of B cell subsets derived from peripheral blood or lymphoid tissue differ substantially, although the same markers were used for purification. (b) As in (a), but RNA-seq data from CLL samples were added to the analysis. The heatmap shows the 20% most variable genes across the data set (N = 2,078). CLL samples cluster with tissue-derived and not with blood-derived normal immune cells. (c) Number of all differentially expressed genes from the analysis shown in (b).

To investigate if CLL-IPAs express truncated proteins we performed western blots on 13 candidates. Whereas normal B cells only expressed the full-length proteins, the malignant B cells also expressed truncated proteins whose size was consistent with the predicted size of IPA-generated proteins (Fig. 2a, Extended Data Fig. 3 and 4).

Figure 2.

IPA-generated truncated proteins resemble the protein products of truncating DNA mutations and have cancer-promoting properties.

(a) 3′-seq and RNA-seq data of functionally validated CLL-IPAs (N = 5) as in Fig. 1b. The remaining tracks are shown in Extended Data Fig. 3. Endogenous full-length proteins are detected by western blot analysis in CLL and normal B cells (BLCL), whereas IPA-generated truncated proteins (red arrows) are only present in primary CLL cells. ACTIN was used as loading control on the same blot. The experiment was replicated with similar results (CARD11, N = 4, DICER, N = 3, MGA, N = 2). For gel source data see Supplementary Fig. 1. *, indicates an unspecific band. (b) Protein models are shown as in Fig. 1b. The aa positions of recurrent TR mutations are shown in blue. (c) Endogenous phospho-NF-κB-p65 levels are shown as normalized mean fluorescent intensity (MFI) values as mean ± SD from N = 5 (shRNA Control (Co), shRNA1 full-length CARD11 (FL) or N = 6 (shRNA2/3 CARD11 IPA, N = 3, each) biologically independent experiments. Two-sided Kruskal-Wallis test, **, P = 0.002; P value of two-sided Mann-Whitney test was adjusted for multiple testing, *, Padj = 0.036. (d) miRNA cleavage assay, performed twice with similar results, showing processing of pre-let-7i into mature let-7i by V5-DICER. Mock, no protein was added. V5-DICER IPA shows a complete loss of function, but no dominant-negative activity. (e) qRT-PCR of endogenous MYC target genes after expression of full-length or MGA IPA in Raji cells. Shown are GAPDH-normalized values as mean ± SD from three biological replicates, each performed in technical triplicates. Two-sided t-test for independent samples, *, P < 0.05, **, P < E-3, NS, not significant. Exact P values are shown in Supplementary Fig. 1. MGA represses all MYC target genes. Binding sites, BS.

Extended Data Figure 3.

3′-seq and RNA-seq tracks of functionally validated CLL-IPAs.

Five CLL-IPAs were functionally validated. Their 3′-seq and RNA-seq tracks are shown here and in Fig. 2a. Data are shown as in Fig. 1b. The corresponding RT-PCRs are shown in Extended Data Fig. 5a.

Extended Data Figure 4.

CLL-IPAs generate truncated mRNAs and proteins.

Gene models and western blots of 10 candidates depicted as in Figures 1b and 2a show that CLL B cells generate full-length and IPA-generated truncated proteins. BLCL were used as control B cells and were included in the 3′-seq tracks. ACTIN was used as loading control on the same blots. For gel source data see Supplementary Fig. 1.

To rule out that proteolytic cleavage truncates the proteins, we validated the presence of the IPA-generated truncated mRNAs (Extended Data Fig. 5a). Moreover, we were able to induce IPA isoform expression through downregulation of splicing factors or through inhibition of 5′ splice site recognition using an antisense oligonucleotide, indicating that de-regulated mRNA processing can cause expression of a truncated protein (Extended Data Fig. 5b)[11,12].

Extended Data Figure 5.

Validation of the IPA-generated truncated mRNAs and validation of their stable expression over time.

(a) Detection of full-length and IPA-generated truncated mRNAs by RT-PCR in normal B cells (CD5+B, BLCL) and CLL cells used in the western blot validations shown in Fig. 2a and Extended Data Fig. 4. All experiments were performed twice with similar results. Primers to amplify the mRNA isoforms are located in the first and last exons shown in the gene models and are listed in Supplementary Table S3. HPRT was used as loading control. (b) Induction of truncated mRNAs and proteins through shRNA-mediated knock-down of splicing factors. All experiments were performed twice with similar results. U2AF1 was knocked-down in HeLa cells, U2AF2 was knocked-down in HEK293 cells and hnRNPC was knocked down in A549 cells. Shown as in (a), except for NUP96 which is shown as in Extended Data Fig. 4. NUP96 is derived from NUP98 precursor. Induction of DICER1 IPA by transfection of increasing amounts of anti-sense morpholinos (MO) directed against the 5′ splice site of intron 23 of DICER1 in HeLa cells. Shown are RT-PCRs. (c) RT-PCRs, performed once, on expression of full-length (FL) and IPA isoforms for eight CLL-IPAs in samples from two CLL patients and control B cells (CD5+B, BLCL). The samples were collected over a time interval of over 6 years. CLL11: T1, 17 months (mo) after diagnosis, T2, 24 mo, T3, 44 mo; CLL6: T1, 16 mo, T2, 49 mo, T3, 91 mo (42 mo after treatment). Samples from all time points (except CLL6, T3) were obtained from untreated patients. The primers for amplifications of the products were located in the first and last exons shown in the gene models and are listed in Supplementary Table S3. Expression of HPRT serves as loading control. The same gel picture of HPRT is shown in Fig. 3b for CLL samples and Extended Data Fig. 5a, far right panel, for BLCL and CD5+ control samples. All tested CLL-IPA isoforms were detectable at several time points during the course of the disease. Compared with CD5+B cells, expression of FCHSD2 IPA was not significantly upregulated in CLL. (d) Western blots of full-length and IPA-generated truncated proteins from CARD11, DICER, and SCAF4. All experiments were performed twice with similar results. ACTIN was used as loading control on the same blot. Shown are samples from normal B cells (BLCL) and two CLL patients, both at two different time points 0.5 – 10 months apart. For gel source data see Supplementary Fig. 1.

Many of the truncated proteins generated by CLL-IPAs are strikingly similar to the predicted protein products produced by TR mutations, suggesting that CLL-IPAs may functionally mimic the outcome of genetic mutations (Fig. 2b, Extended Data Fig. 6a). To test this, we investigated the functional consequences of expression of IPA and full-length protein isoforms of four candidates in malignant B cells. CARD11 is a positive regulator of the NF-κB pathway and is important for lymphocyte survival and proliferation[13]. We observed substantial CARD11 IPA protein expression compared to only slightly increased CARD11 IPA mRNA expression, indicating that the truncated protein is more stable and may activate the NF-κB signaling pathway more potently than the full-length protein (Fig. 2a)[14]. To test this, we exclusively knocked-down either full-length or CARD11 IPA in a malignant B cell line that expresses comparable CARD11 IPA levels as CLL cells (Extended Data Fig. 6b, 6c). We measured phospho-p65 (RELA) to assess NF-κB activity and found significantly lower activity after knock-down of CARD11 IPA than of the full-length protein (Fig. 2c, Extended Data Fig. 6d). Thus, CARD11 IPA activates NF-κB more potently than full-length CARD11, suggesting that it may mimic activating mutations present in high-grade lymphomas[13]. CARD11 IPA may contribute to NF-κB activation in CLL where the signaling components are rarely mutated[15].

Extended Data Figure 6.

IPA-generated truncated proteins resemble the protein products of truncating DNA mutations and have cancer-promoting properties.

(a) CARD11 IPA results in translation of intronic nucleotides (grey) until an in-frame stop codon is encountered. This results in the generation of 16 new amino acids (grey) downstream of exon 10. In the case of MGA IPA three new amino acids downstream of exon 9 are generated. (b) Western blot showing that TMD8 cells express similar amounts of CARD11 IPA as CLL samples. The western blot is depicted as in Fig. 2a and was performed twice. ACTIN was used as loading control on the same blot. (c) Western blot (as in b) showing full-length CARD11 as well as CARD11 IPA in TMD8 cells expressing a control shRNA (Co), an shRNA that exclusively knocks-down the full-length protein (FL) and two different shRNAs that exclusively knock-down the CARD11 IPA isoform (IPA). The experiment was performed twice with similar results. GAPDH was used as loading control on the same blot. (d) Endogenous phospho-NF-κB-p65 levels were measured by FACS in TMD8 cells expressing the indicated shRNAs from (c). Mean fluorescent intensity (MFI) values are shown in parentheses in FACS plots of a representative experiment out of three. (e) Immunoprecipitation of V5-DICER or V5-DICER IPA from HEK293T cells using an anti-V5 antibody. The experiment was performed twice with similar results. 2.5% of input was loaded. (f) The extent of miRNA processing depends on the expression levels of full-length (FL) DICER, but not IPA. Shown are wild-type (WT) and DICER knock-out (KO) HCT116 cells. Re-expression of different amounts of FL DICER1 protein in the KO cells (measured by western blot of DICER1 in the top panel) results in different levels of endogenous let-7 expression (measured by northern blot in the bottom panel; compare lanes 3 and 4). Expression of DICER IPA has no influence on miRNA processing (compare lanes 4 and 5). ACTIN and U6 were used as loading controls on the same blots, respectively. The experiment was performed twice with similar results. (g) Western blot of MGA. MGA and MGA IPA were cloned and expressed in HEK293T cells to confirm the predicted protein size. The experiment was performed twice with similar results. Shown is also the endogenous MGA expression in Raji cells. ACTIN was used as loading control on the same blot. *, denotes an unspecific band. (h) Protein models of full-length and FOXN3 IPA are shown as in Fig. 2b. The IPA-generated protein truncates the fork-head domain and is predicted to lose the repressive activity. (i) As in (a), but for FOXN3. FOXN3 IPA generates 32 new amino acids downstream of exon 2. (j) FOXN3 IPA significantly de-represses expression of the oncogenic targets MYC and PIM2. Fold-change in mRNA level of endogenous genes in MEC1 B cells after transfection of GFP-FOXN3 IPA compared with transfection of full-length GFP-FOXN3. HPRT-normalized values are shown as boxplots (as in Fig. 1e) from N = 5 biologically independent experiments, each performed in technical triplicates. Two-sided t-test for independent samples was applied, **, P = 0.002. For gel source data see Supplementary Fig. 1.

DICER IPA generates a truncated protein that partially lacks the RNase IIIB domain responsible for microRNA (miRNA) processing (Fig. 2b)[16]. In contrast to full-length DICER, DICER IPA entirely lacks miRNA cleavage ability and mimicked TR mutations that remove both RNase III domains (Fig. 2b, 2d, Extended Data Fig. 6e, 6f)[16]. Although DICER IPA does not act as dominant-negative, its expression reduces functional DICER protein, thus potentially decreasing endogenous miRNA expression. The TSG MGA is targeted by TR mutations in CLL and solid cancers (Fig. 2b)[3,7,17]. MGA negatively regulates the MYC transcriptional program and represses genes with MYC and E2F binding sites in a Polycomb-dependent manner[18,19]. Expression of MGA from constructs validated MGA IPA detected in CLL cells and confirmed the repressive effect of MGA on MYC target gene expression in malignant B cells (Fig. 2e, Extended Data Fig. 6g). Intriguingly, on genes with binding sites for both MYC and E2F, MGA IPA acts as dominant-negative regulator of full-length MGA as it significantly induced the expression of 5/6 genes in cells that endogenously express full-length MGA (Fig. 2e). However, as MGA IPA retains the N-terminal T-box, it still acts as a repressor on T-box target genes (Fig. 2e). Lastly, the IPA isoform of the transcriptional repressor FOXN320 derepressed its oncogenic targets MYC and PIM2 (Extended Data Fig. 3, 6h–j). In summary, the CLL-IPA-generated proteins can contribute to cancer pathogenesis in various ways. Their generation can reduce expression of functional TSGs (DICER IPA, FOXN3 IPA) or they behave as dominant-negatives, thus acting in an oncogenic manner (MGA IPA). As all functionally validated CLL-IPAs produced dysfunctional proteins, we investigated if this is a general feature. We compared the retained fraction of amino acids of IPA isoforms present in normal B cells (B-IPA, N = 2,690) with CLL-IPAs. Although the protein size of full-length proteins targeted by IPA was similar, CLL-IPAs lose significantly more amino acids than B-IPAs (Fig. 3a, Extended Data Fig. 7a). This suggests that IPA in normal cells contributes to proteome diversity[2], whereas CLL-IPAs tend to produce dysfunctional proteins.

Figure 3.

TSGs are enriched among CLL-IPAs. CLL-IPAs and TR mutations in CLL target the same genes but in different patients.

(a) The fraction of retained coding region (CDR) is shown for genes that generate CLL-IPAs (N = 306, median fraction of retained CDR = 0.21; 112 aa) and B-IPAs (N = 2,690, median fraction of retained CDR = 0.45; 221 aa). ***, Two-sided Mann-Whitney test, P = E-16. Box plots as in Fig. 1e. (b) RT-PCRs on expression of full-length (FL) and IPA isoforms for two TSGs (DICER1, NUP98) in samples from two CLL patients that were collected over a time interval of several years. CLL11: T1, 17 months (mo) after diagnosis, T2, 24 mo, T3, 44 mo; CLL6: T1, 16 mo, T2, 49 mo, T3, 91 mo (42 mo after treatment). Shown are the exons that contain primers for amplifications of the products. BLCL serve as control cells. Expression of HPRT serves as loading control. (c) Genes that are targeted by TR mutations in CLL and CLL-IPAs are shown (N = 36). Dark green bars indicate the fraction of retained CDR for each IPA-generated protein. Black dots indicate the positions of TR mutations in CLL. CLL-IPAs occur mostly in the vicinity of TR mutations or upstream of them (two-sided Wilcoxon rank sum test, P = 0.004). Right panel, the fraction of CLL samples affected is shown for each gene and represents the fraction of CLL samples (out of 59) with significantly upregulated expression of the IPA isoform (CLL-IPA, grey; TR mutations, red).

Extended Data Figure 7.

Inactivation of TSGs by CLL-IPAs independently of DNA mutations.

(a) The distribution of full-length protein size of genes that generate CLL-IPAs (N = 306) and B-IPAs (N = 2,690) is shown in amino acids (aa). Boxplots as in Fig. 1e. Two-sided Mann-Whitney test, P = 0.87. (b) TR rate (ratio of TR mutations over all mutations) is shown for known TSGs obtained from (5). Boxplots as in Fig. 1e. Two-sided Mann-Whitney test, P = E-155. (c) Known TSGs, obtained from (5) that are targeted by CLL-IPAs (N = 21) are shown. Dark green bars indicate the fraction of retained CDR for each IPA-generated protein. Black dots indicate the hot spot positions of TR mutations obtained from MSK cbio portal. CLL-IPAs mostly occur upstream or within 10% (of overall aa length) of the mutations (two-sided Wilcoxon rank test, P = 0.04). (d) Contingency table for enrichment of TSGs among genes that generate CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. TSGs were obtained from (5). (e) TSGs and genes that generate CLL-IPA isoforms have longer CDRs than genes that do not generate IPA isoforms. Boxplots as in Fig. 1e. Two-sided Kruskal-Wallis test, P = E-80. (f) Five control gene lists (N = 306, each) with a similar size distribution as CLL-IPAs and expressed in CLL were tested for enrichment of TSGs. Shown is the number of TSGs found. Chi-square-test did not show a significant enrichment of TSGs among the control genes. (g) Contingency table for enrichment of TR mutation genes in CLL among genes that generate CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. (h) ZMYM5 is truncated by a TR mutation and an IPA isoform in the same patient, but the aberrations are predicted to result in different truncated proteins. A 10 bp deletion in exon 3 results in a frame-shift leading to the generation of a truncated ZMYM5 protein, whereas ZMYM5 IPA (not yet annotated) produces a truncated protein containing 352 more amino acids in the same patient. The genes shown in Extended Fig. 7h and 7i are the only genes with simultaneous presence of a TR mutation and CLL-IPA out of N = 268 tested. The position of the TR mutation is indicated in green. CLL7 and CLL11 3′-seq and RNA-seq tracks are shown for comparison reasons. (i) MGA is truncated by a TR mutation and an IPA isoform in the same patient. The TR mutation affects the 5′ splice site of intron 7, thus generating two additional amino acids downstream of exon 7, whereas the IPA isoform encodes a truncated MGA protein containing three more amino acids downstream of exon 9. Mutation and 3′-seq analysis were performed once. CLL7 and CLL11 are shown for comparison reasons. (j) Shown are additional recurrent (N > 1) DNA mutations found by exome sequencing of CLL patient samples stratified by a high or low number of CLL-IPAs per patient. Only the top and bottom 16 samples with high or low CLL-IPAs are shown to normalize the number of samples analyzed. This analysis is only descriptive and no test was performed. (k) Significant enrichment of SF3B1 mutations in the group of CLL samples with abundant CLL-IPA isoforms. Two-sided Mann-Whitney test was performed. (l) Abundance of CLL-IPAs is not associated with IGVH mutational status. Shown is the number of CLL-IPAs per sample for patients with mutated (MUT, N = 30) or unmutated (UN, N = 21) IGVH genes. Boxplots as in Fig. 1e. Two-sided Mann-Whitney test, P = 0.4.

As genes targeted by TR mutations often are TSGs (Extended Data Fig. 7b)[5], we investigated if TSGs are overrepresented among CLL-IPAs. Compared to protein-size matched control groups, there was a significant enrichment of TSGs among CLL-IPAs (P = 3E-5; Extended Data Fig. 7c–f). Importantly, IPA-generated truncated proteins usually lack either more or a comparable number of amino acids compared to truncated proteins generated by TR mutations, suggesting the IPA isoforms are likely inactive (Extended Data Fig. 7c). However, for CLL-IPAs to inactivate TSGs, they must also be stably expressed. For 11/12 tested CLL-IPAs, we observed stable expression at the mRNA or protein level over a four year time span (Fig. 3b, Extended Data Fig. 5c, 5d), indicating that they have the potential to inactivate TSGs. In addition to TSGs in general, we found that genes inactivated by TR mutations in CLL are enriched among CLL-IPAs (Fig. 3c, Extended Data Fig. 7g)[3,7,8]. Strikingly, the fraction of samples affected by CLL-IPA was substantially larger than the number of CLL samples affected by TR mutations (3.0–85% vs 0.13–2.0%; Fig. 3c, right panel). This indicates that TR mutations and CLL-IPAs target the same genes in different patient groups, thus substantially expanding the proportion of patients with protein truncations in potential drivers. To rule out that CLL-IPAs are caused by somatic mutations, we examined the presence of DNA mutations in the CLL-IPA genes. Two genes were targeted by TR mutations and IPA in the same patient. Interestingly, the two inactivation mechanisms are predicted to generate different truncated protein products, suggesting that they occurred independently (Extended Data Fig. 7h, 7i)[3]. The mutation data also enabled us to associate CLL-IPAs with specific somatic mutations. Interestingly, CLL samples with a high number of IPA were enriched in SF3B1 mutations, but they were independent of IGVH mutational status (Extended Data Fig. 7j–l). Because of the enrichment of known TSGs among CLL-IPAs, we examined if CLL-IPAs may enable us to identify novel TSGs. We selected CLL-IPAs present in at least 20% of CLL samples (N = 199, generated from 190 genes; Fig. 4a, Supplementary Tables S1, S2). We next investigated if these genes are inactivated by TR mutations in solid cancers using mutations from more than 86,000 tumors, compiled by the MSK cbio portal[4]. We observed that 72% of these genes are frequently affected by TR mutations in solid tumors and call them novel TSG candidates (136/190; Fig. 4b). This is a significant enrichment over background and this list contains 17 known TSGs and 119 novel TSG candidates (Extended Data Fig. 8a, 8b)[5]. Again, CLL-IPAs lack more or a comparable number of amino acids as the proteins produced by TR mutations, suggesting that CLL-IPAs inactivate the functions of these genes (Extended Data Fig. 8a).

Figure 4.

Novel TSG candidates are inactivated in CLL at the mRNA level and in solid tumors at the DNA level.

(a) Color-coded IPA usage for a subset of CLL-IPAs (97/199 of samples with significant expression of IPA in ≥ 20% of CLL samples). Gene names and number of affected CLL samples per CLL-IPA is indicated (blue bars, 3′-seq, green bars, RNA-seq). (b) Truncating mutation rates (number of TR mutations/all mutations) in solid tumors, obtained from the MSK cbio portal for genes that generate abundant CLL-IPAs, partially shown in (a). The bimodal distribution was separated at the local minimum (TR mutation/all mutations = 0.12, red line) into two gene groups: those rarely targeted by TR mutations and those with high TR mutation rates in solid cancers, defined as novel TSG candidates. (c) TR mutation rates of known and novel TSG candidates. Two-sided Mann-Whitney test, **, P = 0.0002. Box plots as in Fig. 1e. (d) As in (c), but for overall mutation rates. Two-sided Mann-Whitney test, ***, P = E-10. (e) CHST11 protein models as in Fig. 2b. Loops depict membrane domains. A chromosomal translocation in CLL results in fusion of the immunoglobulin heavy chain locus (IGH) with a truncated CHST11 (23). (f) Western blot of WNT5B, performed once, shown as in Fig. 2a, from cell lysates or conditioned media (CM) of B cells stably expressing GFP, GFP-CHST11 or GFP-CHST11 IPA. CM from cells expressing CHST11 IPA contains unglycosylated WNT5B (25). (g) CM from samples described in (f) was added to HEK293T cells expressing a WNT reporter. Shown is normalized luciferase activity as mean ± SD from N = 7 biologically independent experiments. Two-sided Kruskal-Wallis test: **, P = 0.002; P value of two-sided Mann-Whitney test was adjusted for multiple testing, **, Padj = 0.002.

Extended Data Figure 8.

Novel TSG candidates and validation of CHST11 IPA as cancer-promoting isoform.

(a) As in Fig. 3c, but shown are known (red gene names) and novel TSG candidates (black gene names) among the abundant CLL-IPAs. CLL-IPAs seem to inactivate these genes as they mostly occur upstream or within 10% (of overall aa length) of the mutations. Two-sided Wilcoxon rank sum test performed on all 136 TSGs, P = E-8; two-sided Wilcoxon rank sum test performed on the novel TSGs, N = 119, P = E-8. Position of TR mutation was determined using the data obtained from the MSK cbio portal and indicates the hot spot mutation. Right panel, the fraction of CLL samples affected represents the fraction of CLL samples (out of 59) with significant expression of the IPA isoform. Genes were included if they were affected in at least 20% of samples investigated either by 3′-seq or RNA-seq. (b) Contingency table for enrichment of novel TSGs among highly recurrent CLL-IPAs. P value was obtained from two-sided Fisher’s exact test. (c) TSGs have larger protein sizes. Boxplots as in Fig. 1e. Two-sided Mann-Whitney test, **, P = 0.005. The increased overall mutation rate of known TSGs correlates with larger protein size. Spearman′s correlation coefficient, r = 0.74, P = E-6. (d) CHST11 IPA generates 18 new amino acids (grey) downstream of exon 1. (e) Experimental set-up to measure paracrine WNT activity produced by MEC1 B cells either expressing GFP, GFP-CHST11 or GFP-CHST11 IPA and using a WNT reporter expressed in HEK293T cells. Primary CLL cells and the CLL cell line MEC1 express several WNTs, including WNT5B. In the presence of CHST11 WNT (red dots) binds to sulfated proteins on the surface of WNT producing cells, whereas WNT is secreted into the media in the presence of CHST11 IPA. WNT-conditioned media activates a WNT reporter in HEK293T cells. This set-up refers to Fig. 4f and 4g. (f) Western blot, performed once, for WNT5 shown as in Fig. 4f, but including HeLa cells as positive control for WNT5 expression. ACTIN was used as loading control on the same blot.

Although the TR mutation rates of the novel TSG candidates were comparable with known TSGs found at the lower end of the spectrum, their protein size and overall mutation rates were substantially lower (Fig. 4c, 4d, Extended Data Fig. 8c). This may explain why these potentially cancer-relevant genes have been overlooked thus far[21]. As they are targeted at the mRNA level in leukemia and at the DNA level in solid cancers, they should be considered as a novel class of TSG candidates. To support this, we functionally validated a highly-recurrent CLL-IPA isoform that affected a poorly known cancer gene. CHST11 encodes a Golgi-associated carbohydrate sulfotransferase that modifies chondroitin on the surface of WNT expressing cells. The modification results in binding of secreted WNT and prevents its paracrine action[22]. CHST11 IPA lacks catalytic activity, but retains the cytoplasmic tail (Fig. 4e, Extended Data Fig. 8d) [23]. As exclusive expression of the cytoplasmic tail of Golgi enzymes inhibited localization of full-length enzymes[24], we hypothesized that CHST11 IPA may act as a dominant-negative. We expressed CHST11 and CHST11 IPA, collected the conditioned media, and detected secreted WNT in media only after expressing CHST11 IPA (Fig. 4f, Extended Data Fig. 8e, 8f)[25]. The conditioned media activated a WNT reporter in HEK293T cells (Fig. 4g), demonstrating that CHST11 IPA enabled paracrine WNT action on neighboring cells through dominant-negative action. Thus, in addition to mutations in the WNT pathway[26], CLL-IPAs may also contribute to WNT activation in CLL. A member of this novel class of TSGs was recently found in breast cancers, where tumor-specific expression of MAGI3 IPA generates a truncated protein with dominant-negative activity (Extended Data Fig. 9a)[27]. Combined with our findings on T-ALL (T-lineage acute lymphoblastic leukemia), where we detected more than 100 IPA isoforms (Extended Data Fig. 9b), these data indicate that cancer-upregulated IPA isoforms are not restricted to CLL.

Extended Data Figure 9.

Cancer-upregulated IPA isoforms are also detected in breast cancer and T-ALL.

(a) MAGI3 is a TSG that is preferentially targeted by IPA in breast cancer (27). Shown is the mutation profile obtained from MSK cbio portal. (b) Expression of IPA isoforms in T-ALL detected by RNA-seq. Shown are 3′-seq and RNA-seq tracks of a representative mRNA (out of N = 101) from CLL samples, T-ALL samples and normal thymus. The T-ALL RNA-seq data were obtained from (32). We detected N = 381 IPA isoforms in at least one T-ALL sample, N = 133 in at least one thymus sample, N = 104 in at least one T-ALL and one thymus sample, and N = 101 in at least two T-ALL samples, but not in any of the thymus samples.

In summary, we found that TSGs can be inactivated, either in full or partially, by IPA. Even partial loss of TSG function was shown to critically contribute to tumorigenesis[28]. As CLL-IPAs are not generated by DNA mutations in their corresponding transcription units, DNA and mRNA alterations occur in different patient groups. In CLL, the fraction of patients whose TSGs are inactivated by CLL-IPAs is considerably larger than TSG disruption by TR mutations (Fig. 3c); thus, CLL-IPAs expand substantially the number of patients with affected drivers. Moreover, this data identifies a class of TSGs that are predominantly inactivated at the mRNA rather than the DNA level[27]. Thus, our study demonstrates that cancer-gained changes in mRNA processing can functionally mimic the effects of somatic mutations and shows the need to go beyond genomic analyses in cancer diagnostics.

Methods

Samples for 3′-seq and RNA-seq analyses

Samples were obtained from untreated CLL patients seen at Memorial Sloan Kettering Cancer Center, New York (Extended Data Table 1a). All patients provided written informed consent before participating in the study. The sample collection was approved by the Institutional Review Board of Memorial Sloan Kettering Cancer Center. Peripheral blood mononuclear cells (PBMCs) from CLL samples with a minimum white blood cell count of 75,000/ul were isolated by Ficoll (GE Healthcare) gradient centrifugation at 400 rcf for 30 mins, followed by two washes in PBS at room temperature. Cells were treated with red blood cell lysis buffer (155 mM NH4Cl, 12 mM NaHCO3, 0.1 mM EDTA) for 5 min at room temperature and were washed twice with PBS. Pure CLL B cells were obtained from PBMCs using B-CLL isolation kit (Miltenyi Biotec). This selected untouched CLL cells using a cocktail of magnetic beads coated with CD2, CD3, CD4, CD14, CD15, CD16, CD56, CD61, CD235a, FcεRI, and CD34. The purity of CLL B cells (CD5+ and CD19+) was analyzed by FACS and the cells were immediately dissolved in TRI Reagent (Ambion) for RNA extraction, followed by 3′-seq or RNA-seq library preparation. For longitudinal analyses, samples from two patients were investigated at different time points during the course of the disease. CLL11, time point 1 (T1) 17 months (mo) after diagnosis, T2, 24 mo after diagnosis, T3, 44 mo after diagnosis. The patient was not treated with chemotherapy during the sample collection period. CLL6: T1, 16 mo after diagnosis, T2, 49 mo, T3, 91 mo (42 mo after chemotherapeutic treatment). In addition to the newly generated CLL 3′-seq data, we also used 3′-seq data from normal tissues, cell lines and immune cell subsets that were previously generated by us (Extended Data Table 1b)[1,2]. We performed RNA-seq on 11 CLL samples (Supplementary Table S1) and obtained access to apreviously published RNA-seq data set from 44 CLL patients[3] which was kindly provided by Dr. Dan A. Landau (NY Genome Center). RNA-seq data from normal immune cells were obtained from samples previously generated by us (Extended Data Table 1c)[2]. For validation of 3′-seq data, we also used publicly available RNA-seq (tonsil-derived NB, GSE45982 (GSM1129340-GSM1129347)[29], blood-derived NB, ERR431624, ERR431586[30], CD3+ T cells, GSM1576415 [31] and 3′ end sequencing data[10]. For RNA-seq based identification of IPA isoforms expressed in T-ALL we used publicly available RNA-seq data from 10 primary T-ALL samples and 2 whole human thymus extracts (GSE57982)[32].

FACS sorting of immune cell populations

Cells were washed with ice-cold PBS once, incubated with appropriate fluorochrome-conjugated antibodies for 30 min at 4°C and washed twice with ice-cold PBS containing 0.5% FCS. The following antibodies were used: anti-CD3-PE (mouse, BD Biosciences, 555333), anti-CD5-FITC (mouse, BD Biosciences, 555352), anti-CD14-PECy7 (mouse, ebioscience, 25–0149-42), anti-CD19-APC (mouse, BD Biosciences, 555415), anti-CD27-PE (mouse BD Biosciences, 555441), anti-CD38-APC (mouse, BD Biosciences, 555462), anti-CD38-FITC (mouse, BD Biosciences, 555459). Surface protein expression was detected by a BD FACSCalibur cell analyzer (BD Biosciences) and data were analyzed using the FlowJo software.

3′-seq and RNA-seq analyses

3′-seq libraries were generated as previously described and sequenced with Illumina HiSeq using single-end 50 nt reads[1,2]. RNA-seq libraries were prepared at the Weill Cornell and the MSKCC Genomics core facilities. Analysis of 3′-seq data was performed as described previously by us[1] with a few modifications that are extensively described in Singh et al.[2]. Briefly, a gene is considered to be expressed if either the IPA isoform (≥ 5 TPM) or the full-length isoform (≥ 5.5 TPM) were expressed in 75% of the samples of a particular cell type. We focused our analysis on robustly expressed transcript isoforms and filtered 3′-seq peaks according to their usage. Robustly expressed 3′UTR isoforms that are part of the atlas are expressed with at least 3 TPM (transcripts per million) in at least one sample and each peak combines at least 10% of all reads that map to the 3′UTR. Robustly expressed IPA isoforms that are part of the atlas are expressed with 5 TPM or more and had ≥ 0.1 IPA site usage in at least one sample. IPA site usage is the relative expression of each IPA isoform with respect to the total expression of 3′UTR isoforms (all reads that fall into robust 3′UTR peaks are summed up). We only analyzed IPA isoforms of protein coding genes.

Validation of IPA isoforms using external data sources

To obtain evidence of IPA isoforms from independent methods, we first used RNA-seq data obtained from the same RNA or from the same cell type to identify IPA isoforms. We used the coordinates of the IPA events obtained from 3′-seq and tested the RNA-seq read counts in windows of 100 nucleotides (nt) located upstream and downstream of the IPA peak using a GLM (Extended Data Fig. 1a)[2]. The windows were separated by 51 nt centered on the first nt of the polyadenylation signal. Not all IPA isoforms could be tested. For example, if the defined windows overlapped with an annotated exon, the IPA event was excluded from further analysis. An IPA isoform was considered present if we detected a significant difference in read counts within the upstream and downstream windows (Padj < 0.1) using DESeq. This analysis was also used to validate CLL-gained IPA events in an independent CLL data set. We further regarded an IPA isoform as validated if reads that overlap with IPA peaks had at least four untemplated adenosines in the RNA-seq data and a polyadenylation signal (or one of its variants)[33] was detected within 50 nt upstream of the read. In addition, we considered IPA isoforms as validated if we detected read evidence in independent 3′-seq data sets[10]. As no previous 3′-end sequencing data exist for many of our cell types, we also included highly expressed (≥ 10 TPM and ≥ 0.1 IPA site usage) IPA isoforms with an upstream polyadenylation signal (AAUAAA and its variants)[33] in our downstream analysis.

Identification of the normal counterpart of CLL and of CLL-IPAs

Hierarchical clustering was performed on the normal human B cell subsets derived from lymphoid tissues or peripheral blood and CLL samples using RNA-seq derived mRNA expression levels (quantile normalized log2 reads per kilobase of transcript per million mapped reads (rpkm)). Genes expressed with greater than 5.5 rpkm in 75% of normal B cells or any of the CLL samples went into the analysis. The 20% most variable genes by median absolute deviation across the data set were used for the clustering. The heatmap was generated using aheatmap (http://cran.r-project.org/package=NMF) with row scaling. This analysis showed that lymphoid-tissue derived CD5+B cells are most closely related in their gene expression profile to CLL cells (Extended Data Fig. 2). We performed hierarchical unsupervised clustering of CLL and control samples based on IPA site usage to test if IPA site usage separates normal and malignant B cells (Extended Data Fig. 1c). The top 20% most variable genes by median absolute deviation across all the CD5+B and CLL samples were used. This analysis showed two main clusters: Four CLL samples (CLL4, CLL7, CLL11, CLL12) clustered separately from the rest of the samples. However, within the rest of the samples, the control group (CD5+B) clustered separately. The four CLL samples that differed the most from the rest of the samples had a high number of significantly upregulated IPA isoforms (CLL high, median number of CLL-IPAs/sample, N = 100, range, N = 42 – 274), whereas the remaining samples had a low number of CLL-IPAs (CLL low, median, N = 9, range, N = 5 – 28, Extended Data Fig. 1e). To identify CLL-upregulated IPA isoforms, we applied a GLM[1,2,34] and tested usage of each IPA isoform between the normal B cell group and each CLL sample. We only considered IPA isoforms that were significantly upregulated in CLL (FDR-adjusted P value < 0.1, usage difference between CLL and CD5+B ≥ 0.05) and were either not or lowly expressed in CD5+B cells (TPM < 8, corresponding to 75% quantile for CD5+B TPM). This resulted in 931 significantly upregulated IPA events observed in 13 CLL samples. N = 454 IPA events were detected in only a single sample and were regarded as non-recurrent, whereas 477 IPA events occurred in more than one sample (≥ 2/13), and were considered recurrent events by 3′-seq (Extended Data Fig. 1d). The recurrent events resulted in 168 recurrent CLL-IPA isoforms. As CLL-IPAs are detectable by RNA-seq, we used an independent RNA-seq data set containing 46 CLL samples for validation[3]. We verified up to 71% of testable IPAs by this independent method and data set. Because of the high validation rate, we combined the two data sets (N = 59 CLL samples) and focused on CLL-IPAs present in more than 10% of the whole CLL sample cohort. This resulted in 330 CLL-upregulated IPA isoforms, derived from 306 genes (Supplementary Table S1). The list of 330 CLL upregulated IPA isoforms contains the 168 CLL-IPAs identified in at least 2/13 3′-seq samples, but contains also CLL-IPA isoforms detected in one 3′-seq and in at least five additional RNA-seq samples (≥ 6/59 total samples). We detected 33 IPA events that showed significantly higher IPA site usage in CD5+B cells compared with CLL. IPA site usage was required to be higher than in 2 CLL samples (TPM < 10, corresponding to 75% quantile for CLL TPM; FDR-adjusted P value < 0.1, usage difference between CLL and CD5+B ≥ 0.05, Supplementary Table S1). The fraction of CLL patients affected by IPA or TR mutations shown in Fig. 3d, Extended Data Fig. 7c, and 8a were calculated as follows: If the CLL-IPA isoform was testable by RNA-seq, all 59 CLL samples were considered. If the CLL-IPA isoform was not being tested by RNA-seq (because for example, the upstream exon is located too close to the IPA isoform), then only the 13 CLL samples analyzed by 3′-seq were taken into account for calculating the fraction of samples with significant expression of the IPA isoform.

Cell lines

B lymphoblastoid cells (BLCL) are Epstein Barr virus-immortalized human blood B cells[1]. MEC1 cells are malignant B cells from B-Prolymphocytic leukemia and were provided by Dr. Abdel-Wahab. Raji and TMD8 cells are malignant B cells from lymphomas and were a gift from Dr. Hans-Guido Wendel (MSKCC). HEK293 and HEK293T cells (embryonic kidney), HeLa cells (cervical cancer) and A549 cells (lung adenocarcinoma) were purchased from ATCC. Wild-type and DICER KO HCT116 cells were generously provided by V. Narry Kim (Seoul National University)[35]. BLCL, MEC1, and Raji cells were cultured in RPMI with 20% FBS and 1% penicillin/streptomycin. HEK293, HEK293T, HeLa, and A549 were cultured in DMEM with 10% FBS and 1% penicillin/streptomycin, whereas HCT116 cells were cultured in McCoy′s media with 10% FBS and 1% penicillin/streptomycin.

Western blotting

Cells were lysed on ice for 30 min with RIPA buffer (50 mM Tris pH 7.4, 150 mM NaCl, 1% NP-40, 1% Na-deoxycholate, 1 mM EDTA, 0.05% SDS), containing freshly added proteinase inhibitor cocktail (Thermo Scientific). For MGA, NUP98, SGK223, and DICER immunobloting, cell lysates were run using 3–8% Tris-Acetate NuPAGE® gels with Tris-Acetate running buffer (Life Technologies). For CARD11, AKAP10, BAZ1B, SENP1, CUL3, and RIPK1, 4–12% Bis-Tris NuPAGE® gels (Life Technologies) were run with MOPS running buffer and all other proteins were run with MES running buffer (Natural Diagnostics). The separated proteins were transferred to nitrocellulose membranes (Bio-Rad, 1620252), blocked with Odyssey Blocking Buffer (Li-Cor, 927–40000) for 1 hour at room temperature, followed by incubation with primary antibodies at 4°C overnight. After two washes using PBS and 0.1% Tween 20 (PBST), the blots were incubated with IRDye-conjugated secondary antibodies for 50 min at room temperature. After one wash with PBST and two washes with PBS, proteins were detected with Odyssey CLx imaging system (Li-Cor). The following primary antibodies were used: anti-ACTIN (mouse, Sigma, A4700; rabbit, Sigma, A2066), anti-AKAP10 (mouse, clone 51, Santa Cruz Biotechnology, sc-136512), anti-CARD11 (rabbit, Cell Signalling, 4440S), anti-DICER (rabbit, a kind gift from Dr. Witold Filipowicz (FMI Basel)), anti-DNM1L (mouse, Abcam, ab56788), anti-MGA (rabbit, H-286, Santa Cruz Biotechnology, sc-382569), anti-SFRS15 (SCAF4; mouse, Abnova, H00057466-B01), anti-WSTF (BAZ1B; mouse, clone G-5, Santa Cruz Biotechnology, sc-514287), anti-NUP98 (rabbit, Novus Biologicals, NB100–93325), anti-SGK223 (mouse, Santa Cruz Biotechnology, sc-398164), anti-SENP1 (rabbit, Bethyl Labs, A302–927A-T), anti-CUL3 (rabbit, Bethyl Labs, A301–108A-T), anti-PAWR (Abcam ab92590), anti-RIPK1 (Cell Signaling #4926), anti-GAPDH (goat, V-18, Santa Cruz Biotechnology), and anti-WNT5a/b (rabbit, clone C27E8, Cell Signaling 2530). The secondary antibodies used included anti-mouse IRDye 700 (donkey, Rockland Immunochemicals, 610–730-002), anti-rabbit IRDye 680 (donkey, Li-Cor Biosciences, 926–68073), anti-rabbit IRDye 800 (donkey, Li-Cor Biosciences, 926–32213), and anti-mouse IRDye 800 (donkey, Li-Cor Biosciences, 926–32212).

RT-PCR of IPA isoforms

Total RNA was isolated using Tri reagent solution (Invitrogen #AM9738) and digested with DNase I (Invitrogen #AM1906). RNA was reverse transcribed using the qScript cDNA SuperMix (Quanta Biosciences #101414–106). RT-PCR reactions were carried out using purified Taq polymerase using a 50°C annealing temperature and 30 s extension at 72°C. The linear range of amplification was determined by independent PCRs for each primer set. Primers were designed to be intron-spanning and are listed in Supplementary Table S3.

Induction of IPA isoforms

Endogenous U2AF1, U2AF2, and hnRNPC were knocked-down using pLKO-puro lentiviral vector-based shRNAs (Sigma). Virus was produced using the helper plasmids pCMV-VSVG and pCMV-dR8.2 and cells were transduced in 6-well plates, selected with puromycin (2 μg/ml) for 5 days and then harvested for RT-PCR or western blot analysis. To induce IPA isoform expression of DICER, an antisense morpholino oligonucleotide (GeneTools) targeting the 5′ splice site of DICER exon 23 was added directly to sub-confluent HeLa cells at the indicated concentrations in the presence of 6 μM EndoPorter-PEG delivery peptide (GeneTools) and harvested at the indicated time points. The control morpholino was used at 12 μM concentration.

Knock-down of CARD11 full-length and IPA isoforms

Isoform-specific shRNA primers were cloned into the TRC2-pLKO-GFP plasmid using KpnI and EcoRI. Lentivirus was produced as described above and centrifuged at 25,000rpm for 1h 45 mins at 4°C (Sorvall WX Ultracentrifuge). Pellets were resuspended and dissolved in cold PBS overnight at 4°C. The virus titer was estimated by transducing wild-type HEK293T cells. The 12-well culture plate was coated overnight with 5 μg/ml fibronectin. TMD8 cells were spin-infected and cultivated for three days, followed by western blot analysis of FACS-sorted GFP-positive cells.

Constructs

The V5-DICER construct was obtained from Dr. Joshua Mendell (UT Southwestern). To generate the DICER-IPA expression plasmid, the DICER-IPA cDNA was amplified from BLCL and cloned into the pCK-V5 plasmid using the BamHI and ApaI restriction sites. The human MGA cDNA (Dharmacon, clone BC136659) was used to PCR-amplify the coding region of full-length MGA (8,571 nt plus 6 nt of endogenous Kozak sequence) as well as MGA IPA (3,430 nt (end of exon 9) plus gtgagtattaa [intronic sequence that will be translated, followed by a stop codon; see Extended Data Fig. 6a]). MGA IPA was cloned into the pcDNA3.1 expression vector (Life Technologies) using NheI and XhoI sites. GFP fused-MGA IPA was generated by inserting MGA IPA downstream of eGFP using the restriction sites BsrGI and XhoI in the pcDNA3.1-GFP vector. MGA was cloned into pcDNA3.1-GFP using Gibson Assembly Cloning (New England Biolabs) from three pieces. The full-length FOXN3 mRNA was amplified from BLCL cDNA. To obtain GFP-FOXN3, it was cloned into pcDNA3.1-GFP[36] using BsrGI and XhoI restriction sites. FOXN3 IPA was PCR-amplified from two fragments. Fragment 1 was amplified from BLCL cDNA and corresponds to amino acids 1–180, whereas fragment 2 was amplified from genomic DNA from PBMC and corresponds to the 32 amino acids generated from intronic sequence, followed by a stop codon. FOXN3 IPA was fused with GFP at the C-terminus as described above. Full-length CHST11 was amplified from BLCL cDNA, whereas CHST11 IPA was amplified from genomic DNA. Both were fused to GFP at the C-terminus as described above. The integrity of all constructs was confirmed by sequencing.

Functional validation of CLL-IPAs

CARD11 IPA

To assess NF-κB activation, lentiviral-transduced TMD8 cells (described above) were used. Cells were fixed with 4% formaldehyde at room temperature for 15 mins. After two washes with excess PBS, fixed cells were resuspended with ice-cold PBS and permeablized with 90% methanol for 20 mins on ice. Cells were then washed with cold PBS twice and resuspended with the incubation buffer (PBS + 0.5% BSA). Cells were aliquoted and incubated with anti-phospho-NF-κB p65 (1:1,500 dilution, Cell Signaling #3033) for 1.5 hrs at room temperature. Cells were washed with incubation buffer twice and incubated with fluorochrome-conjugated secondary antibody solution (1:10,000 Alexa Fluor 647 A27040, Invitrogen) for 15 mins at room temperature. After two washes with incubation buffer, cells were analyzed using a FACS Calibur.

DICER IPA

Full-length V5-DICER and V5-DICER IPA were immunoprecipitated from HEK293T cells as described before[16]. Briefly, 48 hours after transfection, cells were washed with cold PBS and lysed with IP buffer (20 mM Tris-HCl pH = 8.0, 150 mM NaCl, 1 mM EDTA, 0.5% NP-40 and 1x EDTA-free protease inhibitor (Thermo Fisher)) for 30 mins on ice with occasional vortexing. The cell lysate was then centrifuged at 20,000 x g for 10 mins at 4°C and the supernatant was collected. The cell lysate was incubated with 3 μg of anti-V5 tag antibody (Invitrogen R960–25) for 30 mins on ice, then 900 μg of protein G Dynabeads were added and the reaction was rotated for an additional 2 hrs at 4°C. After five washes with IP buffer and twice in DICER assay buffer (20 mM Tris-HCl pH = 8.0, 100 mM KCl, 0.2 mM EDTA), 90% of beads were resuspended in DICER assay buffer for miRNA cleavage assay and the remaining beads were stored in 2x Laemmli sample buffer (Sigma) for western blot analysis. The miRNA cleavage assay was performed as described previously[16]. Briefly, synthesized pre-miRNA let-7i oligo (Dharmacon) was incubated with immunoprecipitated beads prepared as described above in the enzymatic mixture (10 μl of immunoprecipitated beads in DICER assay buffer, 2 μl of 20 mM MgCl2, 0.2 μl of 0.4 μM pre-miRNA, 0.1 μl of 100 mM DTT, 0.5 μl of RNaseOUT (Invitrogen) and 7.2 μl of RNase-free water) at 37°C for 30 mins with interval mix. The reaction was stopped by chilling samples on ice and analyzed by northern blot. To investigate if DICER IPA acts as dominant-negative version of full-length DICER, different ratios of V5-DICER and V5-DICER IPA were mixed and tested with respect to miRNA cleavage. Reaction mixtures (10 μl) were added to 10 μl RNA loading buffer (95% formamide, 0.025% SDS, 0.025% bromophenol blue, 0.025% xylene cyanol FF, 0.5 mM EDTA) and denatured at 95°C for 5 minutes followed by quenching on ice. Samples were run on a 15% TBE/Urea gel followed by transfer to a Hybond-N+ nylon membrane (GE Healthcare #RPN303B) using a semi-dry transfer apparatus (Hoefer TE70X). Following transfer, membranes were briefly dried and then UV cross-linked twice with 1200 µJ/cm2 each cycle. Cross-linked membranes were pre-hybridized for 1 hour at 37°C in ULTRAhyb-Oligo hybridization buffer (Ambion #AM8663) in a rotary oven. DNA probes against the intended target RNA were synthesized as oligos and labeled with γ32P-ATP in the presence of T4 polynucleotide kinase (NEB #M0201S) for 30 minutes at 37°C. Labeled probes were purified through G-25 microspin columns containing Sephadex resin (GE Healthcare #27–5325-01). Membranes were hybridized with labeled probe overnight at 37°C in a rotary oven. The next day, membranes were washed twice in 2x SSC/0.1% SDS for 5 minutes each at 37°C followed by one wash in 0.1x SSC/0.1% SDS for 5 minutes at 37°C. Membranes were exposed to phosphorimager screens and scanned. To assess if expression of DICER IPA influences miRNA expression in vivo, endogenous let-7 miRNA expression levels were measured by northern blot analysis of total RNA (22 μg) from wild-type and DICER KO HCT116 cells. DICER KO HCT116 cells were transfected with different amounts of V5-DICER and V5-DICER IPA. Cells were harvested 3 days after transfection with Lipofectamine 2000 to assess DICER protein expression and corresponding endogenous let-7 levels.

FOXN3 IPA

The fork-head domain of FOXN3 is necessary for transcriptional repression of FOXN3 target genes. Thus, truncation of the fork-head domain predicts de-repression of the target genes. Known target genes are PIM2 and MYC[20,37]. MEC1 cells were nucleofected with pcDNA 3.1 vector containing GFP, GFP-FOXN3 or GFP-FOXN3 IPA using SF Cell Line 4D-Nucleofector® X Kit (Lonza, Program FF-120). After 48 hours, GFP-positive cells were FACS sorted, lysed immediately (Cells-to-cDNA™ II Kit, Ambion) and RNA was extracted. cDNA was synthesized by qScript™ cDNA SuperMix (Quanta Biosciences) and quantitative PCR was performed using FastStart universal SYBR green master mix (Roche) on a 7900HT Fast Real-Time PCR System (Applied Biosystems). The experiment was performed from five biologically different replicates.

MGA IPA

Raji cells were nucleofected with pcDNA3.1 vector containing GFP, GFP-MGA or GFP-MGA IPA using Cell Line Nucleofector Kit V (Lonza, Program M-013). After 48 hours, GFP-positive Raji cells were FACS-sorted and lysed immediately in lysis buffer (Cells-to-cDNA™ II Kit, Ambion) and RNA was extracted. cDNA synthesis and qRT-PCR was as described for FOXN3. qRT-PCR was done in technical triplicates from three biologically different experiments. MYC target genes were previously published[38,39]. E2F binding sites in MYC target genes were identified using the Encode Transcription Factor ChIP-seq track, or they were previously described[19,39-41]. T-boxes were described for ATF4 and CDKN1B[42,43].

CHST11 IPA

3′-seq data were used to identify overexpressed WNTs in CLL cells compared to normal B cells. The expression of WNTs was validated in MEC1 cells by qRT-PCR. WNT5B was the WNT with the highest expression in MEC1 cells. For WNT detection in media, MEC1 cells stably expressing GFP, GFP-CHST11 or GFP-CHST11 IPA were counted and washed once with RPMI without FCS. Twenty million cells were cultured in 10 ml RPMI + 1% Pen/Strep in one 10 cm culture dish. After 18 hrs, conditioned media was collected by centrifugation at 280 x g for 5 mins and passed through a 0.45 μM filter. The supernatant was concentrated by an Amicon Ultra-4 centrifugal filter (Millipore, UFC800324) at 3,000 x g at 10°C for 2 hrs. The concentrated media (~50 μl) was collected and subjected to western blot analysis using anti-WNT5a/b antibody (Cell Signaling #2530). The corresponding cell pellets were also collected for western blot analysis. To assess paracrine WNT activity in MEC1 cells expressing CHST11 IPA, MEC1 cells were nucleofected with pcDNA3.1 vector containing GFP, GFP-CHST11 or GFP-CHST11 IPA. After 24 hours, GFP-positive cells were FACS sorted and cultivated for three days. The conditioned media was collected and added to HEK293T cells which were transiently transfected with a WNT reporter plasmid (Addgene #12456, M50, Super 8x TOPFlash) or WNT reporter control plasmid with mutated TCF/LEF binding sites (Addgene #12457, M51, Super 8x TOPFlash mutant)[44]. The conditioned media was added 24 hours after transfection. Luciferase activity was measured 24 hours after the addition of conditioned media using a Glomax 96 Microplate Luminometer as described previously[45].

Intersection of somatic mutations in CLL with IPA

CLL RNA-seq samples (N = 44) with available somatic DNA mutation and prognostic data were available to us to map IPA isoform expression[3]. The somatic mutations were obtained using exome sequencing that included extended exon boundaries[46]. We intersected the occurrence of somatic mutations with IPA isoforms in these samples. We focused on truncating mutations (nonsense mutations, frame-shift mutations and splice-site mutations) in expressed genes as they were likely to have a similar outcome as IPA. The IGVH status of CLL samples was assessed at MSKCC for the CLL samples studied by 3′-seq. The IGVH status of 44 RNA-seq samples was published[3].

Positions of TR mutations

The positions of TR mutations in CLL were obtained from the published CLL somatic mutation data sets[3,7,8]. The positions of truncating (TR) mutations in solid cancers of TSGs and of genes targeted by CLL-IPAs were obtained from the MSK cbio portal (date of reference, 02/23/2018, containing > 86,000 cancer samples with 97% derived from solid tumors)[4]. The position with the highest number of TR mutations was used (hot spot) and is indicated by the symbol. The symbol is lacking if the genes had TR mutations without a hot spot.

Number of amino acids of full-length or IPA-generated truncated proteins

To calculate the number of amino acids of full-length proteins, we used the longest Ref-seq annotated mRNA isoform, obtained the number of coding nucleotides and divided this number by three to obtain the total number of amino acids. To calculate the number of amino acids of the IPA-generated truncated proteins we counted the number of nucleotides from the start codon to the end of the exon located upstream of the IPA isoform and divided this number by three to obtain the number of retained amino acids. This number also provided information about the reading frame of the protein at the exon/intron junction located upstream of the IPA isoform. We then used the correct reading frame and translated the intronic nucleotides until an in-frame stop codon was detected. The amino acids translated from intronic sequence were added to the retained amino acids to obtain the size of the IPA-generated truncated proteins. The fraction of retained CDR is the number of amino acids retained (up to the end of the exon located upstream of the IPA isoform) divided by the number of amino acids calculated from the longest mRNA isoform encoding the full-length protein.

Identification of known and novel TSGs

For known TSGs, we used the 301 TSGs reported by Davoli et al.[5] that were expressed in CLL samples. Davoli used a computational method (TUSON Explorer) to predict 301 TSGs from genomic sequencing data obtained from more than 8,200 cancers (> 90% are derived from solid tumors). For novel TSGs, we used the data from the MSK cbio portal (see above). It was previously reported that the variable with the highest predictive power for TSGs was the proportion of TR mutations to all mutations[5]. We calculated this ratio for the 190 genes that generated CLL-IPAs in more than 20% of samples and identified a bimodal distribution with a separation point at 12% TR mutations to all mutations. The genes that generated CLL-IPAs in more than 20% of samples and had a TR mutation rate ≥ 12% in the data from MSK cbio portal were called novel TSG candidates (Supplementary Table S2). To assess if known TSGs are enriched among CLL-IPAs a Chi-square-test was performed. To exclude that this association occurred by chance, five control lists containing genes with similar coding region length and expression were generated and tested for enrichment of TSGs.

Others statistical methods

To perform enrichment statistics, we used a Chi-square-test and calculated the P value using a two-sided Fisher′s exact test. To assess the functional differences between full-length proteins and IPA-generated truncated proteins (MGA, FOXN3), we used a two-sided t-test for independent samples. When comparing three groups (CARD11, CHST11), a two-sided Kruskal-Wallis test was used. For subsequent pair-wise comparisons a two-sided Mann-Whitney test was applied and the P values were adjusted with Bonferroni multiple testing correction. For all other tests that assessed the differences of features between two groups, we used a two-sided Mann-Whitney test. To investigate the spatial relationship between the IPA-generated truncated proteins and hot spot TR mutations, we performed a two-sided Wilcoxon rank sum test.

Data Availability Statement

All 3′-seq and RNA-seq data generated and analyzed for this study have been deposited in the Gene Expression Omnibus database under accession numbers GSE111310 and GSE111793. The code to analyze the data is available under https://bitbucket.org/leslielab/apa_2018/ and the processed data are available in Supplementary Table S1 (for Fig. 1b–d, 2a, 4a, Extended Data Fig. 3, and 4) and Supplementary Table S2 (for Extended Data Fig. 8a), and in the Source data files (for Fig. 1e, 2c, 2e, 3a, 3d, 4b–d, 4g, Extended Data Fig. 2c, 6j, 7c, and 8a). Data on DNA mutations from CLL patients were provided by Dan A. Landau (Weill-Cornell Medical College) and need to be requested from him. The mutation data on solid cancers was obtained through the MSK cbio portal. The data can be accessed through www.cbioportal.org.

Validation of IPA isoforms by independent methods and identification of CLL-IPAs used for further analysis.

The normal B cell counterpart of CLL cells are CD5+B cells derived from lymphoid tissue.

3′-seq and RNA-seq tracks of functionally validated CLL-IPAs.

Five CLL-IPAs were functionally validated. Their 3′-seq and RNA-seq tracks are shown here and in Fig. 2a. Data are shown as in Fig. 1b. The corresponding RT-PCRs are shown in Extended Data Fig. 5a.

CLL-IPAs generate truncated mRNAs and proteins.

Validation of the IPA-generated truncated mRNAs and validation of their stable expression over time.

IPA-generated truncated proteins resemble the protein products of truncating DNA mutations and have cancer-promoting properties.

Inactivation of TSGs by CLL-IPAs independently of DNA mutations.

Novel TSG candidates and validation of CHST11 IPA as cancer-promoting isoform.

Cancer-upregulated IPA isoforms are also detected in breast cancer and T-ALL.

Samples investigated by 3′-seq and RNA-seq.

(a) CLL sample characteristics. (b) Normal human immune cells investigated by 3′-seq. (c) Normal human immune cells investigated by RNA-seq. N, No; Y, Yes; NA, not analyzed BM, bone marrow

46 in total

1. The cytoplasmic tail of alpha 1,3-galactosyltransferase inhibits Golgi localization of the full-length enzyme.

Authors: Julie Milland; Sarah M Russell; Hayley C Dodson; Ian F C McKenzie; Mauro S Sandrin
Journal: J Biol Chem Date: 2002-01-02 Impact factor: 5.157

2. Deregulation of the carbohydrate (chondroitin 4) sulfotransferase 11 (CHST11) gene in a B-cell chronic lymphocytic leukemia with a t(12;14)(q23;q32).

Authors: Helmut H Schmidt; Vadim G Dyomin; Nallasivam Palanisamy; Takahiro Itoyama; Gouri Nanjangud; Hendrati Pirc-Danoewinata; Oskar A Haas; R S K Chaganti
Journal: Oncogene Date: 2004-09-09 Impact factor: 9.867

3. Down-regulation of chondroitin 4-O-sulfotransferase-1 by Wnt signaling triggers diffusion of Wnt-3a.

Authors: Satomi Nadanaka; Hiroki Kinouchi; Kayo Taniguchi-Morita; Jun-ichi Tamura; Hiroshi Kitagawa
Journal: J Biol Chem Date: 2010-12-01 Impact factor: 5.157

4. Induction of antagonistic soluble decoy receptor tyrosine kinases by intronic polyA activation.

Authors: Sandra Vorlová; Gina Rocco; Clare V Lefave; Francine M Jodelka; Ken Hess; Michelle L Hastings; Erik Henke; Luca Cartegni
Journal: Mol Cell Date: 2011-09-16 Impact factor: 17.970

5. E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints.

Authors: Bing Ren; Hieu Cam; Yasuhiko Takahashi; Thomas Volkert; Jolyon Terragni; Richard A Young; Brian David Dynlacht
Journal: Genes Dev Date: 2002-01-15 Impact factor: 11.361

6. EZH2 is required for germinal center formation and somatic EZH2 mutations promote lymphoid transformation.

Authors: Wendy Béguelin; Relja Popovic; Matt Teater; Yanwen Jiang; Karen L Bunting; Monica Rosen; Hao Shen; Shao Ning Yang; Ling Wang; Teresa Ezponda; Eva Martinez-Garcia; Haikuo Zhang; Yupeng Zheng; Sharad K Verma; Michael T McCabe; Heidi M Ott; Glenn S Van Aller; Ryan G Kruger; Yan Liu; Charles F McHugh; David W Scott; Young Rock Chung; Neil Kelleher; Rita Shaknovich; Caretha L Creasy; Randy D Gascoyne; Kwok-Kin Wong; Leandro Cerchietti; Ross L Levine; Omar Abdel-Wahab; Jonathan D Licht; Olivier Elemento; Ari M Melnick
Journal: Cancer Cell Date: 2013-05-13 Impact factor: 31.743

7. The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4.

Authors: Valeria Ranzani; Grazisa Rossetti; Ilaria Panzeri; Alberto Arrigoni; Raoul Jp Bonnal; Serena Curti; Paola Gruarin; Elena Provasi; Elisa Sugliano; Maurizio Marconi; Raffaele De Francesco; Jens Geginat; Beatrice Bodega; Sergio Abrignani; Massimiliano Pagani
Journal: Nat Immunol Date: 2015-01-26 Impact factor: 25.606

8. CHES1/FOXN3 regulates cell proliferation by repressing PIM2 and protein biosynthesis.

Authors: Geneviève Huot; Mathieu Vernier; Véronique Bourdeau; Laurent Doucet; Emmanuelle Saint-Germain; Marie-France Gaumont-Leclerc; Alejandro Moro; Gerardo Ferbeyre
Journal: Mol Biol Cell Date: 2014-01-08 Impact factor: 4.138

9. Genome-wide mapping and characterization of Notch-regulated long noncoding RNAs in acute leukemia.

Authors: Thomas Trimarchi; Erhan Bilal; Panagiotis Ntziachristos; Giulia Fabbri; Riccardo Dalla-Favera; Aristotelis Tsirigos; Iannis Aifantis
Journal: Cell Date: 2014-07-31 Impact factor: 41.582

10. Somatic mutations in DROSHA and DICER1 impair microRNA biogenesis through distinct mechanisms in Wilms tumours.

Authors: Dinesh Rakheja; Kenneth S Chen; Yangjian Liu; Abhay A Shukla; Vanessa Schmid; Tsung-Cheng Chang; Shama Khokhar; Jonathan E Wickiser; Nitin J Karandikar; James S Malter; Joshua T Mendell; James F Amatruda
Journal: Nat Commun Date: 2014-09-05 Impact factor: 14.919

65 in total

Review 1. Altered RNA Processing in Cancer Pathogenesis and Therapy.

Authors: Esther A Obeng; Connor Stewart; Omar Abdel-Wahab
Journal: Cancer Discov Date: 2019-10-14 Impact factor: 39.397

Review 2. RNA-biology ruling cancer progression? Focus on 3'UTRs and splicing.

Authors: Ayse Elif Erson-Bensan
Journal: Cancer Metastasis Rev Date: 2020-09 Impact factor: 9.264

Review 3. Clinical potential of mass spectrometry-based proteogenomics.

Authors: Bing Zhang; Jeffrey R Whiteaker; Andrew N Hoofnagle; Geoffrey S Baird; Karin D Rodland; Amanda G Paulovich
Journal: Nat Rev Clin Oncol Date: 2019-04 Impact factor: 66.675

4. Cell-type-specific analysis of alternative polyadenylation using single-cell transcriptomics data.

Authors: Eldad David Shulman; Ran Elkon
Journal: Nucleic Acids Res Date: 2019-11-04 Impact factor: 16.971

5. DNA Methylation Regulates Alternative Polyadenylation via CTCF and the Cohesin Complex.

Authors: Vishal Nanavaty; Elizabeth W Abrash; Changjin Hong; Sunho Park; Emily E Fink; Zhuangyue Li; Thomas J Sweet; Jeffrey M Bhasin; Srinidhi Singuri; Byron H Lee; Tae Hyun Hwang; Angela H Ting
Journal: Mol Cell Date: 2020-04-24 Impact factor: 17.970