Shih-Han Lee1, Irtisha Singh2,3, Sarah Tisdale1, Omar Abdel-Wahab4, Christina S Leslie2, Christine Mayr5. 1. Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 2. Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 3. Tri-I Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, USA. 4. Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 5. Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. mayrc@mskcc.org.
Abstract
DNA mutations are known cancer drivers. Here we investigated whether mRNA events that are upregulated in cancer can functionally mimic the outcome of genetic alterations. RNA sequencing or 3'-end sequencing techniques were applied to normal and malignant B cells from 59 patients with chronic lymphocytic leukaemia (CLL)1-3. We discovered widespread upregulation of truncated mRNAs and proteins in primary CLL cells that were not generated by genetic alterations but instead occurred by intronic polyadenylation. Truncated mRNAs caused by intronic polyadenylation were recurrent (n = 330) and predominantly affected genes with tumour-suppressive functions. The truncated proteins generated by intronic polyadenylation often lack the tumour-suppressive functions of the corresponding full-length proteins (such as DICER and FOXN3), and several even acted in an oncogenic manner (such as CARD11, MGA and CHST11). In CLL, the inactivation of tumour-suppressor genes by aberrant mRNA processing is substantially more prevalent than the functional loss of such genes through genetic events. We further identified new candidate tumour-suppressor genes that are inactivated by intronic polyadenylation in leukaemia and by truncating DNA mutations in solid tumours4,5. These genes are understudied in cancer, as their overall mutation rates are lower than those of well-known tumour-suppressor genes. Our findings show the need to go beyond genomic analyses in cancer diagnostics, as mRNA events that are silent at the DNA level are widespread contributors to cancer pathogenesis through the inactivation of tumour-suppressor genes.
DNA mutations are known cancer drivers. Here we investigated whether mRNA events that are upregulated in cancer can functionally mimic the outcome of genetic alterations. RNA sequencing or 3'-end sequencing techniques were applied to normal and malignant B cells from 59 patients with chronic lymphocytic leukaemia (CLL)1-3. We discovered widespread upregulation of truncated mRNAs and proteins in primary CLL cells that were not generated by genetic alterations but instead occurred by intronic polyadenylation. Truncated mRNAs caused by intronic polyadenylation were recurrent (n = 330) and predominantly affected genes with tumour-suppressive functions. The truncated proteins generated by intronic polyadenylation often lack the tumour-suppressive functions of the corresponding full-length proteins (such as DICER and FOXN3), and several even acted in an oncogenic manner (such as CARD11, MGA and CHST11). In CLL, the inactivation of tumour-suppressor genes by aberrant mRNA processing is substantially more prevalent than the functional loss of such genes through genetic events. We further identified new candidate tumour-suppressor genes that are inactivated by intronic polyadenylation in leukaemia and by truncating DNA mutations in solid tumours4,5. These genes are understudied in cancer, as their overall mutation rates are lower than those of well-known tumour-suppressor genes. Our findings show the need to go beyond genomic analyses in cancer diagnostics, as mRNA events that are silent at the DNA level are widespread contributors to cancer pathogenesis through the inactivation of tumour-suppressor genes.
In addition to DNA-based mutations, recent studies found that alterations in mRNA
processing, including splicing, promote tumorigenesis[6]. In CLL, up to 25% of patients have mutations in ATM or SF3B1,
but a third has less than two mutated driver genes and most patients (58%) only have a
13q deletion or a normal karyotype[3,7-9]. Here, we investigated if intronic polyadenylation (IPA) might serve
as a novel driver of tumorigenesis. As 16% of genes in normal immune cells use IPA to
generate truncated mRNAs that contribute to transcriptome diversity[2], we hypothesized that cancer-specific IPA would
generate truncated proteins that lack essential domains, and thus, may phenocopy
truncating (TR) mutations (Fig. 1a).
Figure 1.
Hundreds of genes generate recurrent CLL-IPAs.
(a) Schematic showing full-length mRNA and protein
expression in normal cells and generation of a truncated mRNA and protein
through cancer-specific IPA, despite no difference in DNA sequence.
Polyadenylation sites (pA) are shown in light green. Loss of essential protein
domains (dark green boxes) through cancer-gained IPA may inactivate TSGs, thus
contributing to cancer pathogenesis. (b) Representative CLL-IPAs
(from N = 330) are shown. mRNA 3′ ends detected by
3′-seq are depicted as peaks whose height corresponds to transcript
abundance shown in transcripts per million (TPM). The bottom panel shows RNA-seq
reads and numbers correspond to read counts. MemB, memory B cells, NB,
naïve B cells. Full-length and IPA-generated truncated proteins are
depicted in grey, known domains are shown in green and the domains lost through
IPA are named. CC, coil-coil. For CLL-IPA, the number of retained and novel
amino acids (aa) and aa of full-length proteins are given. (c)
Representative RNA-seq tracks from two independent CLL data sets are shown as in
(b), one is indicated by ′L′. Zoom-in shows the exonized part of
intron 23 of DICER1 (green). (d) Difference in relative abundance
(usage) of IPA isoforms between CLL and normal CD5+B cells. A GLM was used to
identify significant events. CLL-IPAs with significantly higher usage are shown
in red (FDR-adjusted P value < 0.1, usage difference
≥ 0.05, TPM in CD5+B < 8) and CD5+B-IPAs are shown in blue. Grey,
IPAs present in CLL and CD5+B cells without significantly different usage.
(e) Number of CLL-IPAs per sample is shown as box plots,
horizontal line, median; box, 25th through 75th percentile; error bars, range.
CLL high, N = 21/59, median of CLL-IPAs/sample = 98 vs CLL low,
N = 38/59, median = 29. Two-sided Mann-Whitney test, ***,
P = 6E-10.
Using 3′-seq of 44 samples, including normal B cells and CLL, we
identified 5,587 IPA isoforms, including 3,484 without previous annotation (Extended Data Table 1 and methods)[1,2]. We validated 4,630 IPA isoforms using RNA-seq
and additional 3′ end sequencing data (Extended
Data Fig. 1a, 1b)[2,10]. To
assess IPA usage in CLL, we first identified the normal B cell subset whose gene
expression profile was most closely related to CLL cells. Lymphoid tissue-derived CD5+ B
(CD5+B) cells were most similar (Extended Data Fig.
2), but clustered separately from CLL samples based on IPA site usage (Extended Data Fig. 1c). Using a generalized linear model (GLM), we identified 931
IPA events with significantly higher expression among 13 CLL samples, but low or absent
expression in CD5+B cells (Fig. 1b, Extended Data Fig. 1d)[1,2]. As CLL-IPAs are
detectable by RNA-seq, we used an unrelated RNA-seq data set to validate our CLL-IPA
events (Fig. 1c)[3]. We verified up to 71% of testable IPAs by this independent
method and data set (Extended Data Fig. 1d). For
further analysis, we combined the data sets (N = 59 CLL samples) and
focused only on CLL-IPAs present in more than 10% of the sample cohort resulting in 330
CLL-IPAs, derived from 306 genes (Fig. 1d, Supplementary Table S1). While
CLL-IPAs were detected in all CLL samples, one third of samples had a significantly
higher number of CLL-IPAs (Fig. 1e, Extended Data Fig. 1e).
Extended Data Table 1.
Samples investigated by 3′-seq and RNA-seq.
(a) CLL sample characteristics. (b) Normal
human immune cells investigated by 3′-seq. (c) Normal
human immune cells investigated by RNA-seq.
a
CLL low vsCLL high
Number ofCLL-IPAs
Age atdiagnosis
Rai stage
atsamplecollection
WBC countat
samplecollection
IgVH status
Cyto-genetics
Treated beforesample
collection
Treated
aftersamplecollection
Diagnosis
tosamplecollec-tion(time; mo)
Treatment-free
survival(yr)
RNA-seq
3’-seq
CLL1
L
13
62
III
153
UN
Del 11q
N
Y
10
1
Y
Y
CLL2
L
7
62
III
300
NA
Del 17p
N
Y
112
9
Y
Y
CLL3
L
26
54
IV
139
NA
Tri12, t(14;19)
N
Y
84
8
Y
Y
CLL4
H
93
72
0
173
UN
Normal
N
Y
37
4
Y
Y
CLL5
L
11
55
III/IV
193
UN
Tri8, del 13q
N
Y
46
4.5
Y
Y
CLL6
L
28
39
I
137
MUT
Del 13q
N
Y
49
3.3
Y
Y
CLL7
H
108
54
IV
111
MUT
Del 13q
N
Y
108
8
Y
Y
CLL8
L
12
72
III
365
UN
Tri12
N
Y
109
9
N
Y
CLL9
L
5
63
III
200
UN
Del 13q, t(6;19)
N
Y
30
2
N
Y
CLL10
L
11
51
III
77
UN
Del 11q
N
Y
70
6
N
Y
CLL11
H
274
39
0
100
UN
Del 11q, 13q, 14q
N
Y
44
5.5
Y
Y
CLL12
H
42
49
II
178
NA
NA
N
N
240
23.3
Y
Y
CLL13
L
7
66
I
125
UN
Del 11q, del 13q
N
Y
5
0.5
N
Y
CLL14
H
160
45
NA
NA
NA
NA
N
NA
112
NA
Y
N
CLL15
L
49
NA
NA
NA
NA
NA
N
NA
NA
NA
Y
N
N, No; Y, Yes; NA, not analyzed
BM, bone marrow
Extended Data Figure 1.
Validation of IPA isoforms by independent methods and identification of
CLL-IPAs used for further analysis.
(a) RNA-seq data were used to validate the presence of
IPA isoforms using a GLM. Within two 100 nt windows (green bars) separated
by 51 nt and located up- and downstream of the IPA peak the RNA-seq reads
were counted. The IPA peak was considered validated if Padj
< 0.1 (see methods). Out of
N = 5,587 tested IPA isoforms, N =
1,662 were validated by this method. Shown is MGA as a
representative example. (b) As only a fraction of IPA isoforms
were validated by the method from (a), additional methods were used to
obtain independent evidence for the presence of the IPA isoforms.
Independent evidence was obtained using untemplated adenosines from RNA-seq
data or through the presence of the IPA isoform in other 3′-seq protocols (10). As the majority of immune cell
types used in this study have not been investigated using other 3′
end sequencing protocols and IPA isoform expression is cell type-specific
(2), highly expressed IPA isoforms (>10 TPM)
were not excluded from further analysis even if no read evidence was found
by other protocols. (c) Hierarchical clustering based on IPA
site usage separates the 3′-seq dataset into four groups. It
separates CD5+B from CLL samples and clusters CLL samples into three
different groups. Shown is the usage difference of the 20% most variable IPA
isoforms across the data set (N = 342). Four of 13 CLL
samples cluster away from the rest of the samples and are characterized by a
high number of IPA isoforms (CLL high). (d) The GLM
(FDR-adjusted P value < 0.1, IPA usage difference
≥ 0.05, IPA isoform expressed in CD5+B < 8 TPM) identified 477
recurrent (significantly upregulated in at least 2/13 CLL samples by
3′-seq) and 454 non-recurrent (significantly upregulated in 1/13 CLL
samples by 3′-seq). IPAs were validated in an independent RNA-seq
data set containing 46 new CLL samples. Among the recurrent IPAs, 71% of
testable IPAs were verified using another GLM (see a). Among the
non-recurrent IPAs, 64% of testable IPAs were verified. (e)
Plotting the number of CLL-IPAs per sample separates the CLL samples
investigated by 3′-seq into two groups: 4/13 samples generate a high
number of CLL-IPAs (CLL high, median of CLL-IPAs/sample, N
= 100, range, 42 – 274), whereas the rest of the samples generate
lower numbers (CLL low, median, N = 9, range, 5 –
28). Center bar shows median and the error bars show the interquartile
range. Two-sided Mann-Whitney test, **, P = 0.003.
Extended Data Figure 2.
The normal B cell counterpart of CLL cells are CD5+B cells derived from
lymphoid tissue.
(a) Hierarchical clustering of normal human B cells
(naive B (NB), memory B (MemB) and CD5+B) derived from lymphoid tissues or
peripheral blood based on mRNA expression obtained from RNA-seq. The heatmap
shows the 20% most variable genes across the data set (N =
1,887). The gene expression profiles of B cell subsets derived from
peripheral blood or lymphoid tissue differ substantially, although the same
markers were used for purification. (b) As in (a), but RNA-seq
data from CLL samples were added to the analysis. The heatmap shows the 20%
most variable genes across the data set (N = 2,078). CLL
samples cluster with tissue-derived and not with blood-derived normal immune
cells. (c) Number of all differentially expressed genes from
the analysis shown in (b).
To investigate if CLL-IPAs express truncated proteins we performed western blots
on 13 candidates. Whereas normal B cells only expressed the full-length proteins, the
malignant B cells also expressed truncated proteins whose size was consistent with the
predicted size of IPA-generated proteins (Fig. 2a,
Extended Data Fig. 3 and 4).
Figure 2.
IPA-generated truncated proteins resemble the protein products of truncating
DNA mutations and have cancer-promoting properties.
(a) 3′-seq and RNA-seq data of functionally
validated CLL-IPAs (N = 5) as in Fig. 1b. The remaining tracks are shown in Extended Data Fig. 3. Endogenous full-length proteins
are detected by western blot analysis in CLL and normal B cells (BLCL), whereas
IPA-generated truncated proteins (red arrows) are only present in primary CLL
cells. ACTIN was used as loading control on the same blot. The experiment was
replicated with similar results (CARD11, N = 4, DICER,
N = 3, MGA, N = 2). For gel source data
see Supplementary Fig.
1. *, indicates an unspecific band. (b) Protein models
are shown as in Fig. 1b. The aa positions
of recurrent TR mutations are shown in blue. (c) Endogenous
phospho-NF-κB-p65 levels are shown as normalized mean fluorescent
intensity (MFI) values as mean ± SD from N = 5 (shRNA
Control (Co), shRNA1 full-length CARD11 (FL) or N = 6 (shRNA2/3
CARD11 IPA, N = 3, each) biologically independent experiments.
Two-sided Kruskal-Wallis test, **, P = 0.002;
P value of two-sided Mann-Whitney test was adjusted for
multiple testing, *, Padj = 0.036. (d) miRNA
cleavage assay, performed twice with similar results, showing processing of
pre-let-7i into mature let-7i by V5-DICER.
Mock, no protein was added. V5-DICER IPA shows a complete loss of function, but
no dominant-negative activity. (e) qRT-PCR of endogenous MYC target
genes after expression of full-length or MGA IPA in Raji cells. Shown are
GAPDH-normalized values as mean ± SD from three
biological replicates, each performed in technical triplicates. Two-sided t-test
for independent samples, *, P < 0.05, **,
P < E-3, NS, not significant. Exact
P values are shown in Supplementary Fig. 1. MGA represses
all MYC target genes. Binding sites, BS.
Extended Data Figure 3.
3′-seq and RNA-seq tracks of functionally validated
CLL-IPAs.
Five CLL-IPAs were functionally validated. Their 3′-seq and
RNA-seq tracks are shown here and in Fig.
2a. Data are shown as in Fig.
1b. The corresponding RT-PCRs are shown in Extended Data Fig. 5a.
Extended Data Figure 4.
CLL-IPAs generate truncated mRNAs and proteins.
Gene models and western blots of 10 candidates depicted as in Figures 1b and 2a show that CLL B cells generate full-length and
IPA-generated truncated proteins. BLCL were used as control B cells and were
included in the 3′-seq tracks. ACTIN was used as loading control on
the same blots. For gel source data see Supplementary Fig. 1.
To rule out that proteolytic cleavage truncates the proteins, we validated the
presence of the IPA-generated truncated mRNAs (Extended
Data Fig. 5a). Moreover, we were able to induce IPA isoform expression
through downregulation of splicing factors or through inhibition of 5′ splice
site recognition using an antisense oligonucleotide, indicating that de-regulated mRNA
processing can cause expression of a truncated protein (Extended Data Fig. 5b)[11,12].
Extended Data Figure 5.
Validation of the IPA-generated truncated mRNAs and validation of their
stable expression over time.
(a) Detection of full-length and IPA-generated
truncated mRNAs by RT-PCR in normal B cells (CD5+B, BLCL) and CLL cells used
in the western blot validations shown in Fig.
2a and Extended Data Fig. 4.
All experiments were performed twice with similar results. Primers to
amplify the mRNA isoforms are located in the first and last exons shown in
the gene models and are listed in Supplementary Table S3.
HPRT was used as loading control. (b)
Induction of truncated mRNAs and proteins through shRNA-mediated knock-down
of splicing factors. All experiments were performed twice with similar
results. U2AF1 was knocked-down in HeLa cells, U2AF2 was knocked-down in
HEK293 cells and hnRNPC was knocked down in A549 cells. Shown as in (a),
except for NUP96 which is shown as in Extended
Data Fig. 4. NUP96 is derived from NUP98 precursor. Induction of
DICER1 IPA by transfection of increasing amounts of anti-sense morpholinos
(MO) directed against the 5′ splice site of intron 23 of
DICER1 in HeLa cells. Shown are RT-PCRs.
(c) RT-PCRs, performed once, on expression of full-length
(FL) and IPA isoforms for eight CLL-IPAs in samples from two CLL patients
and control B cells (CD5+B, BLCL). The samples were collected over a time
interval of over 6 years. CLL11: T1, 17 months (mo) after diagnosis, T2, 24
mo, T3, 44 mo; CLL6: T1, 16 mo, T2, 49 mo, T3, 91 mo (42 mo after
treatment). Samples from all time points (except CLL6, T3) were obtained
from untreated patients. The primers for amplifications of the products were
located in the first and last exons shown in the gene models and are listed
in Supplementary Table
S3. Expression of HPRT serves as loading
control. The same gel picture of HPRT is shown in Fig. 3b for CLL samples and Extended Data Fig. 5a,
far right panel, for BLCL and CD5+ control samples. All tested CLL-IPA
isoforms were detectable at several time points during the course of the
disease. Compared with CD5+B cells, expression of FCHSD2 IPA was not
significantly upregulated in CLL. (d) Western blots of
full-length and IPA-generated truncated proteins from CARD11, DICER, and
SCAF4. All experiments were performed twice with similar results. ACTIN was
used as loading control on the same blot. Shown are samples from normal B
cells (BLCL) and two CLL patients, both at two different time points 0.5
– 10 months apart. For gel source data see Supplementary Fig. 1.
Many of the truncated proteins generated by CLL-IPAs are strikingly similar to
the predicted protein products produced by TR mutations, suggesting that CLL-IPAs may
functionally mimic the outcome of genetic mutations (Fig.
2b, Extended Data Fig. 6a). To test
this, we investigated the functional consequences of expression of IPA and full-length
protein isoforms of four candidates in malignant B cells. CARD11 is a positive regulator
of the NF-κB pathway and is important for lymphocyte survival and
proliferation[13]. We observed
substantial CARD11 IPA protein expression compared to only slightly increased CARD11 IPA
mRNA expression, indicating that the truncated protein is more stable and may activate
the NF-κB signaling pathway more potently than the full-length protein (Fig. 2a)[14]. To test this, we exclusively knocked-down either full-length
or CARD11 IPA in a malignant B cell line that expresses comparable CARD11 IPA levels as
CLL cells (Extended Data Fig. 6b, 6c). We measured phospho-p65 (RELA) to assess NF-κB
activity and found significantly lower activity after knock-down of CARD11 IPA than of
the full-length protein (Fig. 2c, Extended Data Fig. 6d). Thus, CARD11 IPA activates
NF-κB more potently than full-length CARD11, suggesting that it may mimic
activating mutations present in high-grade lymphomas[13]. CARD11 IPA may contribute to NF-κB activation in CLL
where the signaling components are rarely mutated[15].
Extended Data Figure 6.
IPA-generated truncated proteins resemble the protein products of
truncating DNA mutations and have cancer-promoting properties.
(a) CARD11 IPA results in translation of intronic
nucleotides (grey) until an in-frame stop codon is encountered. This results
in the generation of 16 new amino acids (grey) downstream of exon 10. In the
case of MGA IPA three new amino acids downstream of exon 9 are generated.
(b) Western blot showing that TMD8 cells express similar
amounts of CARD11 IPA as CLL samples. The western blot is depicted as in
Fig. 2a and was performed twice.
ACTIN was used as loading control on the same blot. (c) Western
blot (as in b) showing full-length CARD11 as well as CARD11 IPA in TMD8
cells expressing a control shRNA (Co), an shRNA that exclusively knocks-down
the full-length protein (FL) and two different shRNAs that exclusively
knock-down the CARD11 IPA isoform (IPA). The experiment was performed twice
with similar results. GAPDH was used as loading control on the same blot.
(d) Endogenous phospho-NF-κB-p65 levels were
measured by FACS in TMD8 cells expressing the indicated shRNAs from (c).
Mean fluorescent intensity (MFI) values are shown in parentheses in FACS
plots of a representative experiment out of three. (e)
Immunoprecipitation of V5-DICER or V5-DICER IPA from HEK293T cells using an
anti-V5 antibody. The experiment was performed twice with similar results.
2.5% of input was loaded. (f) The extent of miRNA processing
depends on the expression levels of full-length (FL) DICER, but not IPA.
Shown are wild-type (WT) and DICER knock-out (KO) HCT116 cells.
Re-expression of different amounts of FL DICER1 protein in the KO cells
(measured by western blot of DICER1 in the top panel) results in different
levels of endogenous let-7 expression (measured by northern
blot in the bottom panel; compare lanes 3 and 4). Expression of DICER IPA
has no influence on miRNA processing (compare lanes 4 and 5). ACTIN and U6
were used as loading controls on the same blots, respectively. The
experiment was performed twice with similar results. (g)
Western blot of MGA. MGA and MGA IPA were cloned and expressed in HEK293T
cells to confirm the predicted protein size. The experiment was performed
twice with similar results. Shown is also the endogenous MGA expression in
Raji cells. ACTIN was used as loading control on the same blot. *, denotes
an unspecific band. (h) Protein models of full-length and FOXN3
IPA are shown as in Fig. 2b. The
IPA-generated protein truncates the fork-head domain and is predicted to
lose the repressive activity. (i) As in (a), but for FOXN3.
FOXN3 IPA generates 32 new amino acids downstream of exon 2.
(j) FOXN3 IPA significantly de-represses expression of the
oncogenic targets MYC and PIM2.
Fold-change in mRNA level of endogenous genes in MEC1 B cells after
transfection of GFP-FOXN3 IPA compared with transfection of full-length
GFP-FOXN3. HPRT-normalized values are shown as boxplots (as
in Fig. 1e) from N = 5
biologically independent experiments, each performed in technical
triplicates. Two-sided t-test for independent samples was applied, **,
P = 0.002. For gel source data see Supplementary Fig. 1.
DICER IPA generates a truncated protein that partially lacks the RNase IIIB
domain responsible for microRNA (miRNA) processing (Fig.
2b)[16]. In contrast to
full-length DICER, DICER IPA entirely lacks miRNA cleavage ability and mimicked TR
mutations that remove both RNase III domains (Fig.
2b, 2d, Extended Data Fig. 6e, 6f)[16]. Although DICER IPA does not act as
dominant-negative, its expression reduces functional DICER protein, thus potentially
decreasing endogenous miRNA expression.The TSG MGA is targeted by TR mutations in CLL and solid
cancers (Fig. 2b)[3,7,17]. MGA negatively regulates the MYC
transcriptional program and represses genes with MYC and E2F binding sites in a
Polycomb-dependent manner[18,19]. Expression of MGA from constructs validated MGA
IPA detected in CLL cells and confirmed the repressive effect of MGA on MYC target gene
expression in malignant B cells (Fig. 2e, Extended Data Fig. 6g). Intriguingly, on genes with
binding sites for both MYC and E2F, MGA IPA acts as dominant-negative regulator of
full-length MGA as it significantly induced the expression of 5/6 genes in cells that
endogenously express full-length MGA (Fig. 2e).
However, as MGA IPA retains the N-terminal T-box, it still acts as a repressor on T-box
target genes (Fig. 2e).Lastly, the IPA isoform of the transcriptional repressor FOXN320
derepressed its oncogenic targets MYC and PIM2 (Extended Data Fig. 3, 6h–j). In summary, the
CLL-IPA-generated proteins can contribute to cancer pathogenesis in various ways. Their
generation can reduce expression of functional TSGs (DICER IPA, FOXN3 IPA) or they
behave as dominant-negatives, thus acting in an oncogenic manner (MGA IPA).As all functionally validated CLL-IPAs produced dysfunctional proteins, we
investigated if this is a general feature. We compared the retained fraction of amino
acids of IPA isoforms present in normal B cells (B-IPA, N = 2,690) with
CLL-IPAs. Although the protein size of full-length proteins targeted by IPA was similar,
CLL-IPAs lose significantly more amino acids than B-IPAs (Fig. 3a, Extended Data Fig. 7a). This
suggests that IPA in normal cells contributes to proteome diversity[2], whereas CLL-IPAs tend to produce dysfunctional
proteins.
Figure 3.
TSGs are enriched among CLL-IPAs. CLL-IPAs and TR mutations in CLL target the
same genes but in different patients.
(a) The fraction of retained coding region (CDR) is shown
for genes that generate CLL-IPAs (N = 306, median fraction of
retained CDR = 0.21; 112 aa) and B-IPAs (N = 2,690, median
fraction of retained CDR = 0.45; 221 aa). ***, Two-sided Mann-Whitney test,
P = E-16. Box plots as in Fig. 1e. (b) RT-PCRs on expression of full-length (FL)
and IPA isoforms for two TSGs (DICER1, NUP98) in samples from two CLL patients
that were collected over a time interval of several years. CLL11: T1, 17 months
(mo) after diagnosis, T2, 24 mo, T3, 44 mo; CLL6: T1, 16 mo, T2, 49 mo, T3, 91
mo (42 mo after treatment). Shown are the exons that contain primers for
amplifications of the products. BLCL serve as control cells. Expression of
HPRT serves as loading control. (c) Genes that
are targeted by TR mutations in CLL and CLL-IPAs are shown (N =
36). Dark green bars indicate the fraction of retained CDR for each
IPA-generated protein. Black dots indicate the positions of TR mutations in CLL.
CLL-IPAs occur mostly in the vicinity of TR mutations or upstream of them
(two-sided Wilcoxon rank sum test, P = 0.004). Right panel, the
fraction of CLL samples affected is shown for each gene and represents the
fraction of CLL samples (out of 59) with significantly upregulated expression of
the IPA isoform (CLL-IPA, grey; TR mutations, red).
Extended Data Figure 7.
Inactivation of TSGs by CLL-IPAs independently of DNA mutations.
(a) The distribution of full-length protein size of
genes that generate CLL-IPAs (N = 306) and B-IPAs
(N = 2,690) is shown in amino acids (aa). Boxplots as
in Fig. 1e. Two-sided Mann-Whitney
test, P = 0.87. (b) TR rate (ratio of TR
mutations over all mutations) is shown for known TSGs obtained from
(5). Boxplots as in Fig.
1e. Two-sided Mann-Whitney test, P = E-155.
(c) Known TSGs, obtained from (5) that are
targeted by CLL-IPAs (N = 21) are shown. Dark green bars
indicate the fraction of retained CDR for each IPA-generated protein. Black
dots indicate the hot spot positions of TR mutations obtained from MSK cbio
portal. CLL-IPAs mostly occur upstream or within 10% (of overall aa length)
of the mutations (two-sided Wilcoxon rank test, P = 0.04).
(d) Contingency table for enrichment of TSGs among genes
that generate CLL-IPAs. P value was obtained from two-sided
Fisher’s exact test. TSGs were obtained from (5).
(e) TSGs and genes that generate CLL-IPA isoforms have
longer CDRs than genes that do not generate IPA isoforms. Boxplots as in
Fig. 1e. Two-sided Kruskal-Wallis
test, P = E-80. (f) Five control gene lists
(N = 306, each) with a similar size distribution as
CLL-IPAs and expressed in CLL were tested for enrichment of TSGs. Shown is
the number of TSGs found. Chi-square-test did not show a significant
enrichment of TSGs among the control genes. (g) Contingency
table for enrichment of TR mutation genes in CLL among genes that generate
CLL-IPAs. P value was obtained from two-sided
Fisher’s exact test. (h) ZMYM5 is truncated by a TR
mutation and an IPA isoform in the same patient, but the aberrations are
predicted to result in different truncated proteins. A 10 bp deletion in
exon 3 results in a frame-shift leading to the generation of a truncated
ZMYM5 protein, whereas ZMYM5 IPA (not yet annotated) produces a truncated
protein containing 352 more amino acids in the same patient. The genes shown
in Extended Fig. 7h and 7i are the only genes with simultaneous
presence of a TR mutation and CLL-IPA out of N = 268
tested. The position of the TR mutation is indicated in green. CLL7 and
CLL11 3′-seq and RNA-seq tracks are shown for comparison reasons.
(i) MGA is truncated by a TR mutation and an IPA isoform in
the same patient. The TR mutation affects the 5′ splice site of
intron 7, thus generating two additional amino acids downstream of exon 7,
whereas the IPA isoform encodes a truncated MGA protein containing three
more amino acids downstream of exon 9. Mutation and 3′-seq analysis
were performed once. CLL7 and CLL11 are shown for comparison reasons.
(j) Shown are additional recurrent (N
> 1) DNA mutations found by exome sequencing of CLL patient samples
stratified by a high or low number of CLL-IPAs per patient. Only the top and
bottom 16 samples with high or low CLL-IPAs are shown to normalize the
number of samples analyzed. This analysis is only descriptive and no test
was performed. (k) Significant enrichment of SF3B1 mutations in
the group of CLL samples with abundant CLL-IPA isoforms. Two-sided
Mann-Whitney test was performed. (l) Abundance of CLL-IPAs is
not associated with IGVH mutational status. Shown is the number of CLL-IPAs
per sample for patients with mutated (MUT, N = 30) or
unmutated (UN, N = 21) IGVH genes. Boxplots as in Fig. 1e. Two-sided Mann-Whitney test,
P = 0.4.
As genes targeted by TR mutations often are TSGs (Extended Data Fig. 7b)[5], we
investigated if TSGs are overrepresented among CLL-IPAs. Compared to protein-size
matched control groups, there was a significant enrichment of TSGs among CLL-IPAs
(P = 3E-5; Extended Data Fig.
7c–f). Importantly,
IPA-generated truncated proteins usually lack either more or a comparable number of
amino acids compared to truncated proteins generated by TR mutations, suggesting the IPA
isoforms are likely inactive (Extended Data Fig.
7c). However, for CLL-IPAs to inactivate TSGs, they must also be stably
expressed. For 11/12 tested CLL-IPAs, we observed stable expression at the mRNA or
protein level over a four year time span (Fig. 3b,
Extended Data Fig. 5c, 5d), indicating that they have the potential to inactivate
TSGs.In addition to TSGs in general, we found that genes inactivated by TR mutations
in CLL are enriched among CLL-IPAs (Fig. 3c, Extended Data Fig. 7g)[3,7,8]. Strikingly, the fraction of samples affected by
CLL-IPA was substantially larger than the number of CLL samples affected by TR mutations
(3.0–85% vs 0.13–2.0%; Fig. 3c, right
panel). This indicates that TR mutations and CLL-IPAs target the same genes in different
patient groups, thus substantially expanding the proportion of patients with protein
truncations in potential drivers.To rule out that CLL-IPAs are caused by somatic mutations, we examined the
presence of DNA mutations in the CLL-IPA genes. Two genes were targeted by TR mutations
and IPA in the same patient. Interestingly, the two inactivation mechanisms are
predicted to generate different truncated protein products, suggesting that they
occurred independently (Extended Data Fig. 7h,
7i)[3]. The mutation data also enabled us to associate CLL-IPAs with
specific somatic mutations. Interestingly, CLL samples with a high number of IPA were
enriched in SF3B1 mutations, but they were independent of IGVH mutational status (Extended Data Fig. 7j–l).Because of the enrichment of known TSGs among CLL-IPAs, we examined if CLL-IPAs
may enable us to identify novel TSGs. We selected CLL-IPAs present in at least 20% of
CLL samples (N = 199, generated from 190 genes; Fig. 4a, Supplementary Tables S1, S2). We next investigated if these genes are inactivated by TR mutations in
solid cancers using mutations from more than 86,000 tumors, compiled by the MSK cbio
portal[4]. We observed that 72%
of these genes are frequently affected by TR mutations in solid tumors and call them
novel TSG candidates (136/190; Fig. 4b). This is a
significant enrichment over background and this list contains 17 known TSGs and 119
novel TSG candidates (Extended Data Fig. 8a, 8b)[5]. Again, CLL-IPAs lack more or a comparable number of amino acids as
the proteins produced by TR mutations, suggesting that CLL-IPAs inactivate the functions
of these genes (Extended Data Fig. 8a).
Figure 4.
Novel TSG candidates are inactivated in CLL at the mRNA level and in solid
tumors at the DNA level.
(a) Color-coded IPA usage for a subset of CLL-IPAs (97/199
of samples with significant expression of IPA in ≥ 20% of CLL samples).
Gene names and number of affected CLL samples per CLL-IPA is indicated (blue
bars, 3′-seq, green bars, RNA-seq). (b) Truncating mutation
rates (number of TR mutations/all mutations) in solid tumors, obtained from the
MSK cbio portal for genes that generate abundant CLL-IPAs, partially shown in
(a). The bimodal distribution was separated at the local minimum (TR
mutation/all mutations = 0.12, red line) into two gene groups: those rarely
targeted by TR mutations and those with high TR mutation rates in solid cancers,
defined as novel TSG candidates. (c) TR mutation rates of known and
novel TSG candidates. Two-sided Mann-Whitney test, **, P =
0.0002. Box plots as in Fig. 1e.
(d) As in (c), but for overall mutation rates. Two-sided
Mann-Whitney test, ***, P = E-10. (e) CHST11
protein models as in Fig. 2b. Loops depict
membrane domains. A chromosomal translocation in CLL results in fusion of the
immunoglobulin heavy chain locus (IGH) with a truncated CHST11
(23). (f) Western blot of WNT5B, performed
once, shown as in Fig. 2a, from cell
lysates or conditioned media (CM) of B cells stably expressing GFP, GFP-CHST11
or GFP-CHST11 IPA. CM from cells expressing CHST11 IPA contains unglycosylated
WNT5B (25). (g) CM from samples described in (f)
was added to HEK293T cells expressing a WNT reporter. Shown is normalized
luciferase activity as mean ± SD from N = 7 biologically
independent experiments. Two-sided Kruskal-Wallis test: **, P =
0.002; P value of two-sided Mann-Whitney test was adjusted for
multiple testing, **, Padj = 0.002.
Extended Data Figure 8.
Novel TSG candidates and validation of CHST11 IPA as cancer-promoting
isoform.
(a) As in Fig. 3c,
but shown are known (red gene names) and novel TSG candidates (black gene
names) among the abundant CLL-IPAs. CLL-IPAs seem to inactivate these genes
as they mostly occur upstream or within 10% (of overall aa length) of the
mutations. Two-sided Wilcoxon rank sum test performed on all 136 TSGs,
P = E-8; two-sided Wilcoxon rank sum test performed on
the novel TSGs, N = 119, P = E-8. Position
of TR mutation was determined using the data obtained from the MSK cbio
portal and indicates the hot spot mutation. Right panel, the fraction of CLL
samples affected represents the fraction of CLL samples (out of 59) with
significant expression of the IPA isoform. Genes were included if they were
affected in at least 20% of samples investigated either by 3′-seq or
RNA-seq. (b) Contingency table for enrichment of novel TSGs
among highly recurrent CLL-IPAs. P value was obtained from
two-sided Fisher’s exact test. (c) TSGs have larger
protein sizes. Boxplots as in Fig. 1e.
Two-sided Mann-Whitney test, **, P = 0.005. The increased
overall mutation rate of known TSGs correlates with larger protein size.
Spearman′s correlation coefficient, r = 0.74, P =
E-6. (d) CHST11 IPA generates 18 new amino acids (grey)
downstream of exon 1. (e) Experimental set-up to measure
paracrine WNT activity produced by MEC1 B cells either expressing GFP,
GFP-CHST11 or GFP-CHST11 IPA and using a WNT reporter expressed in HEK293T
cells. Primary CLL cells and the CLL cell line MEC1 express several WNTs,
including WNT5B. In the presence of CHST11 WNT (red dots) binds to sulfated
proteins on the surface of WNT producing cells, whereas WNT is secreted into
the media in the presence of CHST11 IPA. WNT-conditioned media activates a
WNT reporter in HEK293T cells. This set-up refers to Fig. 4f and 4g. (f) Western blot, performed once, for WNT5
shown as in Fig. 4f, but including HeLa
cells as positive control for WNT5 expression. ACTIN was used as loading
control on the same blot.
Although the TR mutation rates of the novel TSG candidates were comparable with
known TSGs found at the lower end of the spectrum, their protein size and overall
mutation rates were substantially lower (Fig. 4c,
4d, Extended
Data Fig. 8c). This may explain why these potentially cancer-relevant genes
have been overlooked thus far[21]. As
they are targeted at the mRNA level in leukemia and at the DNA level in solid cancers,
they should be considered as a novel class of TSG candidates. To support this, we
functionally validated a highly-recurrent CLL-IPA isoform that affected a poorly known
cancer gene. CHST11 encodes a Golgi-associated carbohydrate
sulfotransferase that modifies chondroitin on the surface of WNT expressing cells. The
modification results in binding of secreted WNT and prevents its paracrine
action[22]. CHST11 IPA lacks
catalytic activity, but retains the cytoplasmic tail (Fig.
4e, Extended Data Fig. 8d) [23]. As exclusive expression of the
cytoplasmic tail of Golgi enzymes inhibited localization of full-length
enzymes[24], we hypothesized
that CHST11 IPA may act as a dominant-negative. We expressed CHST11 and CHST11 IPA,
collected the conditioned media, and detected secreted WNT in media only after
expressing CHST11 IPA (Fig. 4f, Extended Data Fig. 8e, 8f)[25]. The conditioned
media activated a WNT reporter in HEK293T cells (Fig.
4g), demonstrating that CHST11 IPA enabled paracrine WNT action on
neighboring cells through dominant-negative action. Thus, in addition to mutations in
the WNT pathway[26], CLL-IPAs may also
contribute to WNT activation in CLL.A member of this novel class of TSGs was recently found in breast cancers, where
tumor-specific expression of MAGI3 IPA generates a truncated protein with
dominant-negative activity (Extended Data Fig.
9a)[27]. Combined with
our findings on T-ALL (T-lineage acute lymphoblastic leukemia), where we detected more
than 100 IPA isoforms (Extended Data Fig. 9b),
these data indicate that cancer-upregulated IPA isoforms are not restricted to CLL.
Extended Data Figure 9.
Cancer-upregulated IPA isoforms are also detected in breast cancer and
T-ALL.
(a) MAGI3 is a TSG that is preferentially targeted by
IPA in breast cancer (27). Shown is the mutation profile
obtained from MSK cbio portal. (b) Expression of IPA isoforms
in T-ALL detected by RNA-seq. Shown are 3′-seq and RNA-seq tracks of
a representative mRNA (out of N = 101) from CLL samples,
T-ALL samples and normal thymus. The T-ALL RNA-seq data were obtained from
(32). We detected N = 381 IPA isoforms
in at least one T-ALL sample, N = 133 in at least one
thymus sample, N = 104 in at least one T-ALL and one thymus
sample, and N = 101 in at least two T-ALL samples, but not
in any of the thymus samples.
In summary, we found that TSGs can be inactivated, either in full or partially,
by IPA. Even partial loss of TSG function was shown to critically contribute to
tumorigenesis[28]. As CLL-IPAs
are not generated by DNA mutations in their corresponding transcription units, DNA and
mRNA alterations occur in different patient groups. In CLL, the fraction of patients
whose TSGs are inactivated by CLL-IPAs is considerably larger than TSG disruption by TR
mutations (Fig. 3c); thus, CLL-IPAs expand
substantially the number of patients with affected drivers. Moreover, this data
identifies a class of TSGs that are predominantly inactivated at the mRNA rather than
the DNA level[27]. Thus, our study
demonstrates that cancer-gained changes in mRNA processing can functionally mimic the
effects of somatic mutations and shows the need to go beyond genomic analyses in cancer
diagnostics.
Methods
Samples for 3′-seq and RNA-seq analyses
Samples were obtained from untreated CLL patients seen at Memorial Sloan
Kettering Cancer Center, New York (Extended Data
Table 1a). All patients provided written informed consent before
participating in the study. The sample collection was approved by the
Institutional Review Board of Memorial Sloan Kettering Cancer Center. Peripheral
blood mononuclear cells (PBMCs) from CLL samples with a minimum white blood cell
count of 75,000/ul were isolated by Ficoll (GE Healthcare) gradient
centrifugation at 400 rcf for 30 mins, followed by two washes in PBS at room
temperature. Cells were treated with red blood cell lysis buffer (155 mM NH4Cl,
12 mM NaHCO3, 0.1 mM EDTA) for 5 min at room temperature and were washed twice
with PBS. Pure CLL B cells were obtained from PBMCs using B-CLL isolation kit
(Miltenyi Biotec). This selected untouched CLL cells using a cocktail of
magnetic beads coated with CD2, CD3, CD4, CD14, CD15, CD16, CD56, CD61, CD235a,
FcεRI, and CD34. The purity of CLL B cells (CD5+ and CD19+) was analyzed
by FACS and the cells were immediately dissolved in TRI Reagent (Ambion) for RNA
extraction, followed by 3′-seq or RNA-seq library preparation.For longitudinal analyses, samples from two patients were investigated
at different time points during the course of the disease. CLL11, time point 1
(T1) 17 months (mo) after diagnosis, T2, 24 mo after diagnosis, T3, 44 mo after
diagnosis. The patient was not treated with chemotherapy during the sample
collection period. CLL6: T1, 16 mo after diagnosis, T2, 49 mo, T3, 91 mo (42 mo
after chemotherapeutic treatment).In addition to the newly generated CLL 3′-seq data, we also used
3′-seq data from normal tissues, cell lines and immune cell subsets that
were previously generated by us (Extended Data
Table 1b)[1,2].We performed RNA-seq on 11 CLL samples (Supplementary Table S1) and
obtained access to apreviously published RNA-seq data set from 44 CLL
patients[3] which was
kindly provided by Dr. Dan A. Landau (NY Genome Center). RNA-seq data from
normal immune cells were obtained from samples previously generated by us (Extended Data Table 1c)[2]. For validation of 3′-seq data, we
also used publicly available RNA-seq (tonsil-derived NB, GSE45982
(GSM1129340-GSM1129347)[29], blood-derived NB, ERR431624, ERR431586[30], CD3+ T cells, GSM1576415 [31] and 3′ end sequencing
data[10].For RNA-seq based identification of IPA isoforms expressed in T-ALL we
used publicly available RNA-seq data from 10 primary T-ALL samples and 2 whole
human thymus extracts (GSE57982)[32].
FACS sorting of immune cell populations
Cells were washed with ice-cold PBS once, incubated with appropriate
fluorochrome-conjugated antibodies for 30 min at 4°C and
washed twice with ice-cold PBS containing 0.5% FCS. The following antibodies
were used: anti-CD3-PE (mouse, BD Biosciences, 555333), anti-CD5-FITC (mouse, BD
Biosciences, 555352), anti-CD14-PECy7 (mouse, ebioscience, 25–0149-42),
anti-CD19-APC (mouse, BD Biosciences, 555415), anti-CD27-PE (mouse BD
Biosciences, 555441), anti-CD38-APC (mouse, BD Biosciences, 555462),
anti-CD38-FITC (mouse, BD Biosciences, 555459). Surface protein expression was
detected by a BD FACSCalibur cell analyzer (BD Biosciences) and data were
analyzed using the FlowJo software.
3′-seq and RNA-seq analyses
3′-seq libraries were generated as previously described and
sequenced with Illumina HiSeq using single-end 50 nt reads[1,2].
RNA-seq libraries were prepared at the Weill Cornell and the MSKCC Genomics core
facilities.Analysis of 3′-seq data was performed as described previously by
us[1] with a few
modifications that are extensively described in Singh et al.[2]. Briefly, a gene is considered to be
expressed if either the IPA isoform (≥ 5 TPM) or the full-length isoform
(≥ 5.5 TPM) were expressed in 75% of the samples of a particular cell
type. We focused our analysis on robustly expressed transcript isoforms and
filtered 3′-seq peaks according to their usage. Robustly expressed
3′UTR isoforms that are part of the atlas are expressed with at least 3
TPM (transcripts per million) in at least one sample and each peak combines at
least 10% of all reads that map to the 3′UTR. Robustly expressed IPA
isoforms that are part of the atlas are expressed with 5 TPM or more and had
≥ 0.1 IPA site usage in at least one sample. IPA site usage is the
relative expression of each IPA isoform with respect to the total expression of
3′UTR isoforms (all reads that fall into robust 3′UTR peaks are
summed up). We only analyzed IPA isoforms of protein coding genes.
Validation of IPA isoforms using external data sources
To obtain evidence of IPA isoforms from independent methods, we first
used RNA-seq data obtained from the same RNA or from the same cell type to
identify IPA isoforms. We used the coordinates of the IPA events obtained from
3′-seq and tested the RNA-seq read counts in windows of 100 nucleotides
(nt) located upstream and downstream of the IPA peak using a GLM (Extended Data Fig. 1a)[2]. The windows were separated by 51 nt
centered on the first nt of the polyadenylation signal. Not all IPA isoforms
could be tested. For example, if the defined windows overlapped with an
annotated exon, the IPA event was excluded from further analysis. An IPA isoform
was considered present if we detected a significant difference in read counts
within the upstream and downstream windows (Padj < 0.1)
using DESeq. This analysis was also used to validate CLL-gained IPA events in an
independent CLL data set.We further regarded an IPA isoform as validated if reads that overlap
with IPA peaks had at least four untemplated adenosines in the RNA-seq data and
a polyadenylation signal (or one of its variants)[33] was detected within 50 nt upstream of
the read. In addition, we considered IPA isoforms as validated if we detected
read evidence in independent 3′-seq data sets[10]. As no previous 3′-end sequencing
data exist for many of our cell types, we also included highly expressed
(≥ 10 TPM and ≥ 0.1 IPA site usage) IPA isoforms with an upstream
polyadenylation signal (AAUAAA and its variants)[33] in our downstream analysis.
Identification of the normal counterpart of CLL and of CLL-IPAs
Hierarchical clustering was performed on the normal human B cell subsets
derived from lymphoid tissues or peripheral blood and CLL samples using RNA-seq
derived mRNA expression levels (quantile normalized log2 reads per
kilobase of transcript per million mapped reads (rpkm)). Genes expressed with
greater than 5.5 rpkm in 75% of normal B cells or any of the CLL samples went
into the analysis. The 20% most variable genes by median absolute deviation
across the data set were used for the clustering. The heatmap was generated
using aheatmap (http://cran.r-project.org/package=NMF) with row scaling. This
analysis showed that lymphoid-tissue derived CD5+B cells are most closely
related in their gene expression profile to CLL cells (Extended Data Fig. 2).We performed hierarchical unsupervised clustering of CLL and control
samples based on IPA site usage to test if IPA site usage separates normal and
malignant B cells (Extended Data Fig. 1c).
The top 20% most variable genes by median absolute deviation across all the
CD5+B and CLL samples were used. This analysis showed two main clusters: Four
CLL samples (CLL4, CLL7, CLL11, CLL12) clustered separately from the rest of the
samples. However, within the rest of the samples, the control group (CD5+B)
clustered separately. The four CLL samples that differed the most from the rest
of the samples had a high number of significantly upregulated IPA isoforms (CLL
high, median number of CLL-IPAs/sample, N = 100, range,
N = 42 – 274), whereas the remaining samples had a
low number of CLL-IPAs (CLL low, median, N = 9, range,
N = 5 – 28, Extended
Data Fig. 1e).To identify CLL-upregulated IPA isoforms, we applied a GLM[1,2,34] and tested
usage of each IPA isoform between the normal B cell group and each CLL sample.
We only considered IPA isoforms that were significantly upregulated in CLL
(FDR-adjusted P value < 0.1, usage difference between CLL and CD5+B
≥ 0.05) and were either not or lowly expressed in CD5+B cells (TPM
< 8, corresponding to 75% quantile for CD5+B TPM). This resulted in 931
significantly upregulated IPA events observed in 13 CLL samples.
N = 454 IPA events were detected in only a single sample
and were regarded as non-recurrent, whereas 477 IPA events occurred in more than
one sample (≥ 2/13), and were considered recurrent events by
3′-seq (Extended Data Fig. 1d). The
recurrent events resulted in 168 recurrent CLL-IPA isoforms.As CLL-IPAs are detectable by RNA-seq, we used an independent RNA-seq
data set containing 46 CLL samples for validation[3]. We verified up to 71% of testable IPAs
by this independent method and data set. Because of the high validation rate, we
combined the two data sets (N = 59 CLL samples) and focused on
CLL-IPAs present in more than 10% of the whole CLL sample cohort. This resulted
in 330 CLL-upregulated IPA isoforms, derived from 306 genes (Supplementary Table S1). The list
of 330 CLL upregulated IPA isoforms contains the 168 CLL-IPAs identified in at
least 2/13 3′-seq samples, but contains also CLL-IPA isoforms detected in
one 3′-seq and in at least five additional RNA-seq samples (≥ 6/59
total samples).We detected 33 IPA events that showed significantly higher IPA site
usage in CD5+B cells compared with CLL. IPA site usage was required to be higher
than in 2 CLL samples (TPM < 10, corresponding to 75% quantile for CLL
TPM; FDR-adjusted P value < 0.1, usage difference between CLL and CD5+B
≥ 0.05, Supplementary
Table S1).The fraction of CLL patients affected by IPA or TR mutations shown in
Fig. 3d, Extended Data Fig. 7c, and 8a
were calculated as follows: If the CLL-IPA isoform was testable by RNA-seq, all
59 CLL samples were considered. If the CLL-IPA isoform was not being tested by
RNA-seq (because for example, the upstream exon is located too close to the IPA
isoform), then only the 13 CLL samples analyzed by 3′-seq were taken into
account for calculating the fraction of samples with significant expression of
the IPA isoform.
Cell lines
B lymphoblastoid cells (BLCL) are Epstein Barr virus-immortalized human
blood B cells[1]. MEC1 cells are
malignant B cells from B-Prolymphocytic leukemia and were provided by Dr.
Abdel-Wahab. Raji and TMD8 cells are malignant B cells from lymphomas and were a
gift from Dr. Hans-Guido Wendel (MSKCC). HEK293 and HEK293T cells (embryonic
kidney), HeLa cells (cervical cancer) and A549 cells (lung adenocarcinoma) were
purchased from ATCC. Wild-type and DICER KO HCT116 cells were generously
provided by V. Narry Kim (Seoul National University)[35]. BLCL, MEC1, and Raji cells were
cultured in RPMI with 20% FBS and 1% penicillin/streptomycin. HEK293, HEK293T,
HeLa, and A549 were cultured in DMEM with 10% FBS and 1%
penicillin/streptomycin, whereas HCT116 cells were cultured in McCoy′s
media with 10% FBS and 1% penicillin/streptomycin.
Western blotting
Cells were lysed on ice for 30 min with RIPA buffer (50 mM Tris pH 7.4,
150 mM NaCl, 1% NP-40, 1% Na-deoxycholate, 1 mM EDTA, 0.05% SDS), containing
freshly added proteinase inhibitor cocktail (Thermo Scientific). For MGA, NUP98,
SGK223, and DICER immunobloting, cell lysates were run using 3–8%
Tris-Acetate NuPAGE® gels with Tris-Acetate running buffer (Life
Technologies). For CARD11, AKAP10, BAZ1B, SENP1, CUL3, and RIPK1, 4–12%
Bis-Tris NuPAGE® gels (Life Technologies) were run with
MOPS running buffer and all other proteins were run with MES running buffer
(Natural Diagnostics). The separated proteins were transferred to nitrocellulose
membranes (Bio-Rad, 1620252), blocked with Odyssey Blocking Buffer (Li-Cor,
927–40000) for 1 hour at room temperature, followed by incubation with
primary antibodies at 4°C overnight. After two washes
using PBS and 0.1% Tween 20 (PBST), the blots were incubated with
IRDye-conjugated secondary antibodies for 50 min at room temperature. After one
wash with PBST and two washes with PBS, proteins were detected with Odyssey CLx
imaging system (Li-Cor).The following primary antibodies were used: anti-ACTIN (mouse, Sigma,
A4700; rabbit, Sigma, A2066), anti-AKAP10 (mouse, clone 51, Santa Cruz
Biotechnology, sc-136512), anti-CARD11 (rabbit, Cell Signalling, 4440S),
anti-DICER (rabbit, a kind gift from Dr. Witold Filipowicz (FMI Basel)),
anti-DNM1L (mouse, Abcam, ab56788), anti-MGA (rabbit, H-286, Santa Cruz
Biotechnology, sc-382569), anti-SFRS15 (SCAF4; mouse, Abnova, H00057466-B01),
anti-WSTF (BAZ1B; mouse, clone G-5, Santa Cruz Biotechnology, sc-514287),
anti-NUP98 (rabbit, Novus Biologicals, NB100–93325), anti-SGK223 (mouse,
Santa Cruz Biotechnology, sc-398164), anti-SENP1 (rabbit, Bethyl Labs,
A302–927A-T), anti-CUL3 (rabbit, Bethyl Labs, A301–108A-T),
anti-PAWR (Abcam ab92590), anti-RIPK1 (Cell Signaling #4926), anti-GAPDH (goat,
V-18, Santa Cruz Biotechnology), and anti-WNT5a/b (rabbit, clone C27E8, Cell
Signaling 2530). The secondary antibodies used included anti-mouse IRDye 700
(donkey, Rockland Immunochemicals, 610–730-002), anti-rabbit IRDye 680
(donkey, Li-Cor Biosciences, 926–68073), anti-rabbit IRDye 800 (donkey,
Li-Cor Biosciences, 926–32213), and anti-mouse IRDye 800 (donkey, Li-Cor
Biosciences, 926–32212).
RT-PCR of IPA isoforms
Total RNA was isolated using Tri reagent solution (Invitrogen #AM9738)
and digested with DNase I (Invitrogen #AM1906). RNA was reverse transcribed
using the qScript cDNA SuperMix (Quanta Biosciences #101414–106). RT-PCR
reactions were carried out using purified Taq polymerase using a
50°C annealing temperature and 30 s extension at
72°C. The linear range of amplification was
determined by independent PCRs for each primer set. Primers were designed to be
intron-spanning and are listed in Supplementary Table S3.
Induction of IPA isoforms
Endogenous U2AF1, U2AF2, and hnRNPC were knocked-down using pLKO-puro
lentiviral vector-based shRNAs (Sigma). Virus was produced using the helper
plasmids pCMV-VSVG and pCMV-dR8.2 and cells were transduced in 6-well plates,
selected with puromycin (2 μg/ml) for 5 days and then harvested for
RT-PCR or western blot analysis.To induce IPA isoform expression of DICER, an antisense morpholino
oligonucleotide (GeneTools) targeting the 5′ splice site of DICER exon 23
was added directly to sub-confluent HeLa cells at the indicated concentrations
in the presence of 6 μM EndoPorter-PEG delivery peptide (GeneTools) and
harvested at the indicated time points. The control morpholino was used at 12
μM concentration.
Knock-down of CARD11 full-length and IPA isoforms
Isoform-specific shRNA primers were cloned into the TRC2-pLKO-GFP
plasmid using KpnI and EcoRI. Lentivirus was produced as described above and
centrifuged at 25,000rpm for 1h 45 mins at 4°C (Sorvall
WX Ultracentrifuge). Pellets were resuspended and dissolved in cold PBS
overnight at 4°C. The virus titer was estimated by
transducing wild-type HEK293T cells. The 12-well culture plate was coated
overnight with 5 μg/ml fibronectin. TMD8 cells were spin-infected and
cultivated for three days, followed by western blot analysis of FACS-sorted
GFP-positive cells.
Constructs
The V5-DICER construct was obtained from Dr. Joshua Mendell (UT
Southwestern). To generate the DICER-IPA expression plasmid, the DICER-IPA cDNA was
amplified from BLCL and cloned into the pCK-V5 plasmid using the BamHI and ApaI
restriction sites.The human MGA cDNA (Dharmacon, clone BC136659) was used to PCR-amplify the
coding region of full-length MGA (8,571 nt plus 6 nt of endogenous Kozak sequence)
as well as MGA IPA (3,430 nt (end of exon 9) plus gtgagtattaa [intronic sequence
that will be translated, followed by a stop codon; see Extended Data Fig. 6a]). MGA IPA was cloned into the
pcDNA3.1 expression vector (Life Technologies) using NheI and XhoI sites. GFP
fused-MGA IPA was generated by inserting MGA IPA downstream of eGFP using the
restriction sites BsrGI and XhoI in the pcDNA3.1-GFP vector. MGA was cloned into
pcDNA3.1-GFP using Gibson Assembly Cloning (New England Biolabs) from three
pieces.The full-length FOXN3 mRNA was amplified from BLCL cDNA. To obtain
GFP-FOXN3, it was cloned into pcDNA3.1-GFP[36] using BsrGI and XhoI restriction sites. FOXN3 IPA was
PCR-amplified from two fragments. Fragment 1 was amplified from BLCL cDNA and
corresponds to amino acids 1–180, whereas fragment 2 was amplified from
genomic DNA from PBMC and corresponds to the 32 amino acids generated from intronic
sequence, followed by a stop codon. FOXN3 IPA was fused with GFP at the C-terminus
as described above.Full-length CHST11 was amplified from BLCL cDNA, whereas CHST11 IPA was
amplified from genomic DNA. Both were fused to GFP at the C-terminus as described
above. The integrity of all constructs was confirmed by sequencing.
Functional validation of CLL-IPAs
CARD11 IPA
To assess NF-κB activation, lentiviral-transduced TMD8 cells
(described above) were used. Cells were fixed with 4% formaldehyde at room
temperature for 15 mins. After two washes with excess PBS, fixed cells were
resuspended with ice-cold PBS and permeablized with 90% methanol for 20 mins
on ice. Cells were then washed with cold PBS twice and resuspended with the
incubation buffer (PBS + 0.5% BSA). Cells were aliquoted and incubated with
anti-phospho-NF-κB p65 (1:1,500 dilution, Cell Signaling #3033) for
1.5 hrs at room temperature. Cells were washed with incubation buffer twice
and incubated with fluorochrome-conjugated secondary antibody solution
(1:10,000 Alexa Fluor 647 A27040, Invitrogen) for 15 mins at room
temperature. After two washes with incubation buffer, cells were analyzed
using a FACS Calibur.
DICER IPA
Full-length V5-DICER and V5-DICER IPA were immunoprecipitated from
HEK293T cells as described before[16]. Briefly, 48 hours after transfection, cells were
washed with cold PBS and lysed with IP buffer (20 mM Tris-HCl pH = 8.0, 150
mM NaCl, 1 mM EDTA, 0.5% NP-40 and 1x EDTA-free protease inhibitor (Thermo
Fisher)) for 30 mins on ice with occasional vortexing. The cell lysate was
then centrifuged at 20,000 x g for 10 mins at 4°C and the supernatant
was collected. The cell lysate was incubated with 3 μg of anti-V5 tag
antibody (Invitrogen R960–25) for 30 mins on ice, then 900 μg
of protein G Dynabeads were added and the reaction was rotated for an
additional 2 hrs at 4°C. After five washes with IP buffer and twice
in DICER assay buffer (20 mM Tris-HCl pH = 8.0, 100 mM KCl, 0.2 mM EDTA),
90% of beads were resuspended in DICER assay buffer for miRNA cleavage assay
and the remaining beads were stored in 2x Laemmli sample buffer (Sigma) for
western blot analysis.The miRNA cleavage assay was performed as described
previously[16].
Briefly, synthesized pre-miRNA let-7i oligo (Dharmacon) was
incubated with immunoprecipitated beads prepared as described above in the
enzymatic mixture (10 μl of immunoprecipitated beads in DICER assay
buffer, 2 μl of 20 mM MgCl2, 0.2 μl of 0.4
μM pre-miRNA, 0.1 μl of 100 mM DTT, 0.5 μl of RNaseOUT
(Invitrogen) and 7.2 μl of RNase-free water) at 37°C for 30
mins with interval mix. The reaction was stopped by chilling samples on ice
and analyzed by northern blot. To investigate if DICER IPA acts as
dominant-negative version of full-length DICER, different ratios of V5-DICER
and V5-DICER IPA were mixed and tested with respect to miRNA cleavage.Reaction mixtures (10 μl) were added to 10 μl RNA
loading buffer (95% formamide, 0.025% SDS, 0.025% bromophenol blue, 0.025%
xylene cyanol FF, 0.5 mM EDTA) and denatured at 95°C for 5 minutes
followed by quenching on ice. Samples were run on a 15% TBE/Urea gel
followed by transfer to a Hybond-N+ nylon membrane (GE Healthcare #RPN303B)
using a semi-dry transfer apparatus (Hoefer TE70X). Following transfer,
membranes were briefly dried and then UV cross-linked twice with 1200
µJ/cm2 each cycle. Cross-linked membranes were
pre-hybridized for 1 hour at 37°C in ULTRAhyb-Oligo hybridization
buffer (Ambion #AM8663) in a rotary oven. DNA probes against the intended
target RNA were synthesized as oligos and labeled with
γ32P-ATP in the presence of T4 polynucleotide kinase (NEB
#M0201S) for 30 minutes at 37°C. Labeled probes were purified through
G-25 microspin columns containing Sephadex resin (GE Healthcare
#27–5325-01). Membranes were hybridized with labeled probe overnight
at 37°C in a rotary oven. The next day, membranes were washed twice
in 2x SSC/0.1% SDS for 5 minutes each at 37°C followed by one wash in
0.1x SSC/0.1% SDS for 5 minutes at 37°C. Membranes were exposed to
phosphorimager screens and scanned.To assess if expression of DICER IPA influences miRNA expression
in vivo, endogenous let-7 miRNA
expression levels were measured by northern blot analysis of total RNA (22
μg) from wild-type and DICER KO HCT116 cells. DICER KO HCT116 cells
were transfected with different amounts of V5-DICER and V5-DICER IPA. Cells
were harvested 3 days after transfection with Lipofectamine 2000 to assess
DICER protein expression and corresponding endogenous let-7
levels.
FOXN3 IPA
The fork-head domain of FOXN3 is necessary for transcriptional
repression of FOXN3 target genes. Thus, truncation of the fork-head domain
predicts de-repression of the target genes. Known target genes are
PIM2 and MYC[20,37]. MEC1 cells were nucleofected with pcDNA 3.1 vector
containing GFP, GFP-FOXN3 or GFP-FOXN3 IPA using SF Cell Line
4D-Nucleofector® X Kit (Lonza, Program FF-120). After 48 hours,
GFP-positive cells were FACS sorted, lysed immediately
(Cells-to-cDNA™ II Kit, Ambion) and RNA was extracted. cDNA was
synthesized by qScript™ cDNA SuperMix (Quanta Biosciences) and
quantitative PCR was performed using FastStart universal SYBR green master
mix (Roche) on a 7900HT Fast Real-Time PCR System (Applied Biosystems). The
experiment was performed from five biologically different replicates.
MGA IPA
Raji cells were nucleofected with pcDNA3.1 vector containing GFP,
GFP-MGA or GFP-MGA IPA using Cell Line Nucleofector Kit V (Lonza, Program
M-013). After 48 hours, GFP-positive Raji cells were FACS-sorted and lysed
immediately in lysis buffer (Cells-to-cDNA™ II Kit, Ambion) and RNA
was extracted. cDNA synthesis and qRT-PCR was as described for FOXN3.
qRT-PCR was done in technical triplicates from three biologically different
experiments. MYC target genes were previously published[38,39]. E2F binding sites in MYC target genes were identified
using the Encode Transcription Factor ChIP-seq track, or they were
previously described[19,39-41]. T-boxes were described for
ATF4 and CDKN1B[42,43].
CHST11 IPA
3′-seq data were used to identify overexpressed WNTs in CLL
cells compared to normal B cells. The expression of WNTs was validated in
MEC1 cells by qRT-PCR. WNT5B was the WNT with the highest expression in MEC1
cells.For WNT detection in media, MEC1 cells stably expressing GFP,
GFP-CHST11 or GFP-CHST11 IPA were counted and washed once with RPMI without
FCS. Twenty million cells were cultured in 10 ml RPMI + 1% Pen/Strep in one
10 cm culture dish. After 18 hrs, conditioned media was collected by
centrifugation at 280 x g for 5 mins and passed through a 0.45 μM
filter. The supernatant was concentrated by an Amicon Ultra-4 centrifugal
filter (Millipore, UFC800324) at 3,000 x g at 10°C for 2 hrs. The
concentrated media (~50 μl) was collected and subjected to western
blot analysis using anti-WNT5a/b antibody (Cell Signaling #2530). The
corresponding cell pellets were also collected for western blot
analysis.To assess paracrine WNT activity in MEC1 cells expressing CHST11
IPA, MEC1 cells were nucleofected with pcDNA3.1 vector containing GFP,
GFP-CHST11 or GFP-CHST11 IPA. After 24 hours, GFP-positive cells were FACS
sorted and cultivated for three days. The conditioned media was collected
and added to HEK293T cells which were transiently transfected with a WNT
reporter plasmid (Addgene #12456, M50, Super 8x TOPFlash) or WNT reporter
control plasmid with mutated TCF/LEF binding sites (Addgene #12457, M51,
Super 8x TOPFlash mutant)[44]. The conditioned media was added 24 hours after
transfection. Luciferase activity was measured 24 hours after the addition
of conditioned media using a Glomax 96 Microplate Luminometer as described
previously[45].
Intersection of somatic mutations in CLL with IPA
CLL RNA-seq samples (N = 44) with available somatic DNA
mutation and prognostic data were available to us to map IPA isoform
expression[3]. The
somatic mutations were obtained using exome sequencing that included extended
exon boundaries[46]. We
intersected the occurrence of somatic mutations with IPA isoforms in these
samples. We focused on truncating mutations (nonsense mutations, frame-shift
mutations and splice-site mutations) in expressed genes as they were likely to
have a similar outcome as IPA.The IGVH status of CLL samples was assessed at MSKCC for the CLL samples
studied by 3′-seq. The IGVH status of 44 RNA-seq samples was
published[3].
Positions of TR mutations
The positions of TR mutations in CLL were obtained from the published
CLL somatic mutation data sets[3,7,8]. The positions of truncating (TR) mutations in solid
cancers of TSGs and of genes targeted by CLL-IPAs were obtained from the MSK
cbio portal (date of reference, 02/23/2018, containing > 86,000 cancer
samples with 97% derived from solid tumors)[4]. The position with the highest number of TR mutations
was used (hot spot) and is indicated by the symbol. The symbol is lacking if the
genes had TR mutations without a hot spot.
Number of amino acids of full-length or IPA-generated truncated
proteins
To calculate the number of amino acids of full-length proteins, we used
the longest Ref-seq annotated mRNA isoform, obtained the number of coding
nucleotides and divided this number by three to obtain the total number of amino
acids. To calculate the number of amino acids of the IPA-generated truncated
proteins we counted the number of nucleotides from the start codon to the end of
the exon located upstream of the IPA isoform and divided this number by three to
obtain the number of retained amino acids. This number also provided information
about the reading frame of the protein at the exon/intron junction located
upstream of the IPA isoform. We then used the correct reading frame and
translated the intronic nucleotides until an in-frame stop codon was detected.
The amino acids translated from intronic sequence were added to the retained
amino acids to obtain the size of the IPA-generated truncated proteins.The fraction of retained CDR is the number of amino acids retained (up
to the end of the exon located upstream of the IPA isoform) divided by the
number of amino acids calculated from the longest mRNA isoform encoding the
full-length protein.
Identification of known and novel TSGs
For known TSGs, we used the 301 TSGs reported by Davoli et al.[5] that were expressed in CLL
samples. Davoli used a computational method (TUSON Explorer) to predict 301 TSGs
from genomic sequencing data obtained from more than 8,200 cancers (> 90%
are derived from solid tumors).For novel TSGs, we used the data from the MSK cbio portal (see above).
It was previously reported that the variable with the highest predictive power
for TSGs was the proportion of TR mutations to all mutations[5]. We calculated this ratio for the 190
genes that generated CLL-IPAs in more than 20% of samples and identified a
bimodal distribution with a separation point at 12% TR mutations to all
mutations. The genes that generated CLL-IPAs in more than 20% of samples and had
a TR mutation rate ≥ 12% in the data from MSK cbio portal were called
novel TSG candidates (Supplementary Table S2).To assess if known TSGs are enriched among CLL-IPAs a Chi-square-test
was performed. To exclude that this association occurred by chance, five control
lists containing genes with similar coding region length and expression were
generated and tested for enrichment of TSGs.
Others statistical methods
To perform enrichment statistics, we used a Chi-square-test and
calculated the P value using a two-sided Fisher′s exact
test. To assess the functional differences between full-length proteins and
IPA-generated truncated proteins (MGA, FOXN3), we used a two-sided t-test for
independent samples. When comparing three groups (CARD11, CHST11), a two-sided
Kruskal-Wallis test was used. For subsequent pair-wise comparisons a two-sided
Mann-Whitney test was applied and the P values were adjusted
with Bonferroni multiple testing correction. For all other tests that assessed
the differences of features between two groups, we used a two-sided Mann-Whitney
test. To investigate the spatial relationship between the IPA-generated
truncated proteins and hot spot TR mutations, we performed a two-sided Wilcoxon
rank sum test.
Data Availability Statement
All 3′-seq and RNA-seq data generated and analyzed for this study
have been deposited in the Gene Expression Omnibus database under accession
numbers GSE111310 and GSE111793.The code to analyze the data is available under https://bitbucket.org/leslielab/apa_2018/ and
the processed data are available in Supplementary Table S1 (for Fig. 1b–d, 2a, 4a, Extended Data Fig.
3, and 4) and Supplementary Table S2 (for Extended Data Fig. 8a), and in the Source
data files (for Fig. 1e, 2c, 2e, 3a, 3d,
4b–d, 4g, Extended Data Fig. 2c, 6j, 7c, and 8a). Data on DNA mutations from CLL patients were
provided by Dan A. Landau (Weill-Cornell Medical College) and need to be
requested from him. The mutation data on solid cancers was obtained through the
MSK cbio portal. The data can be accessed through www.cbioportal.org.
Validation of IPA isoforms by independent methods and identification of
CLL-IPAs used for further analysis.
(a) RNA-seq data were used to validate the presence of
IPA isoforms using a GLM. Within two 100 nt windows (green bars) separated
by 51 nt and located up- and downstream of the IPA peak the RNA-seq reads
were counted. The IPA peak was considered validated if Padj
< 0.1 (see methods). Out of
N = 5,587 tested IPA isoforms, N =
1,662 were validated by this method. Shown is MGA as a
representative example. (b) As only a fraction of IPA isoforms
were validated by the method from (a), additional methods were used to
obtain independent evidence for the presence of the IPA isoforms.
Independent evidence was obtained using untemplated adenosines from RNA-seq
data or through the presence of the IPA isoform in other 3′-seq protocols (10). As the majority of immune cell
types used in this study have not been investigated using other 3′
end sequencing protocols and IPA isoform expression is cell type-specific
(2), highly expressed IPA isoforms (>10 TPM)
were not excluded from further analysis even if no read evidence was found
by other protocols. (c) Hierarchical clustering based on IPA
site usage separates the 3′-seq dataset into four groups. It
separates CD5+B from CLL samples and clusters CLL samples into three
different groups. Shown is the usage difference of the 20% most variable IPA
isoforms across the data set (N = 342). Four of 13 CLL
samples cluster away from the rest of the samples and are characterized by a
high number of IPA isoforms (CLL high). (d) The GLM
(FDR-adjusted P value < 0.1, IPA usage difference
≥ 0.05, IPA isoform expressed in CD5+B < 8 TPM) identified 477
recurrent (significantly upregulated in at least 2/13 CLL samples by
3′-seq) and 454 non-recurrent (significantly upregulated in 1/13 CLL
samples by 3′-seq). IPAs were validated in an independent RNA-seq
data set containing 46 new CLL samples. Among the recurrent IPAs, 71% of
testable IPAs were verified using another GLM (see a). Among the
non-recurrent IPAs, 64% of testable IPAs were verified. (e)
Plotting the number of CLL-IPAs per sample separates the CLL samples
investigated by 3′-seq into two groups: 4/13 samples generate a high
number of CLL-IPAs (CLL high, median of CLL-IPAs/sample, N
= 100, range, 42 – 274), whereas the rest of the samples generate
lower numbers (CLL low, median, N = 9, range, 5 –
28). Center bar shows median and the error bars show the interquartile
range. Two-sided Mann-Whitney test, **, P = 0.003.
The normal B cell counterpart of CLL cells are CD5+B cells derived from
lymphoid tissue.
(a) Hierarchical clustering of normal human B cells
(naive B (NB), memory B (MemB) and CD5+B) derived from lymphoid tissues or
peripheral blood based on mRNA expression obtained from RNA-seq. The heatmap
shows the 20% most variable genes across the data set (N =
1,887). The gene expression profiles of B cell subsets derived from
peripheral blood or lymphoid tissue differ substantially, although the same
markers were used for purification. (b) As in (a), but RNA-seq
data from CLL samples were added to the analysis. The heatmap shows the 20%
most variable genes across the data set (N = 2,078). CLL
samples cluster with tissue-derived and not with blood-derived normal immune
cells. (c) Number of all differentially expressed genes from
the analysis shown in (b).
3′-seq and RNA-seq tracks of functionally validated
CLL-IPAs.
Five CLL-IPAs were functionally validated. Their 3′-seq and
RNA-seq tracks are shown here and in Fig.
2a. Data are shown as in Fig.
1b. The corresponding RT-PCRs are shown in Extended Data Fig. 5a.
CLL-IPAs generate truncated mRNAs and proteins.
Gene models and western blots of 10 candidates depicted as in Figures 1b and 2a show that CLL B cells generate full-length and
IPA-generated truncated proteins. BLCL were used as control B cells and were
included in the 3′-seq tracks. ACTIN was used as loading control on
the same blots. For gel source data see Supplementary Fig. 1.
Validation of the IPA-generated truncated mRNAs and validation of their
stable expression over time.
(a) Detection of full-length and IPA-generated
truncated mRNAs by RT-PCR in normal B cells (CD5+B, BLCL) and CLL cells used
in the western blot validations shown in Fig.
2a and Extended Data Fig. 4.
All experiments were performed twice with similar results. Primers to
amplify the mRNA isoforms are located in the first and last exons shown in
the gene models and are listed in Supplementary Table S3.
HPRT was used as loading control. (b)
Induction of truncated mRNAs and proteins through shRNA-mediated knock-down
of splicing factors. All experiments were performed twice with similar
results. U2AF1 was knocked-down in HeLa cells, U2AF2 was knocked-down in
HEK293 cells and hnRNPC was knocked down in A549 cells. Shown as in (a),
except for NUP96 which is shown as in Extended
Data Fig. 4. NUP96 is derived from NUP98 precursor. Induction of
DICER1 IPA by transfection of increasing amounts of anti-sense morpholinos
(MO) directed against the 5′ splice site of intron 23 of
DICER1 in HeLa cells. Shown are RT-PCRs.
(c) RT-PCRs, performed once, on expression of full-length
(FL) and IPA isoforms for eight CLL-IPAs in samples from two CLL patients
and control B cells (CD5+B, BLCL). The samples were collected over a time
interval of over 6 years. CLL11: T1, 17 months (mo) after diagnosis, T2, 24
mo, T3, 44 mo; CLL6: T1, 16 mo, T2, 49 mo, T3, 91 mo (42 mo after
treatment). Samples from all time points (except CLL6, T3) were obtained
from untreated patients. The primers for amplifications of the products were
located in the first and last exons shown in the gene models and are listed
in Supplementary Table
S3. Expression of HPRT serves as loading
control. The same gel picture of HPRT is shown in Fig. 3b for CLL samples and Extended Data Fig. 5a,
far right panel, for BLCL and CD5+ control samples. All tested CLL-IPA
isoforms were detectable at several time points during the course of the
disease. Compared with CD5+B cells, expression of FCHSD2 IPA was not
significantly upregulated in CLL. (d) Western blots of
full-length and IPA-generated truncated proteins from CARD11, DICER, and
SCAF4. All experiments were performed twice with similar results. ACTIN was
used as loading control on the same blot. Shown are samples from normal B
cells (BLCL) and two CLL patients, both at two different time points 0.5
– 10 months apart. For gel source data see Supplementary Fig. 1.
IPA-generated truncated proteins resemble the protein products of
truncating DNA mutations and have cancer-promoting properties.
(a) CARD11 IPA results in translation of intronic
nucleotides (grey) until an in-frame stop codon is encountered. This results
in the generation of 16 new amino acids (grey) downstream of exon 10. In the
case of MGA IPA three new amino acids downstream of exon 9 are generated.
(b) Western blot showing that TMD8 cells express similar
amounts of CARD11 IPA as CLL samples. The western blot is depicted as in
Fig. 2a and was performed twice.
ACTIN was used as loading control on the same blot. (c) Western
blot (as in b) showing full-length CARD11 as well as CARD11 IPA in TMD8
cells expressing a control shRNA (Co), an shRNA that exclusively knocks-down
the full-length protein (FL) and two different shRNAs that exclusively
knock-down the CARD11 IPA isoform (IPA). The experiment was performed twice
with similar results. GAPDH was used as loading control on the same blot.
(d) Endogenous phospho-NF-κB-p65 levels were
measured by FACS in TMD8 cells expressing the indicated shRNAs from (c).
Mean fluorescent intensity (MFI) values are shown in parentheses in FACS
plots of a representative experiment out of three. (e)
Immunoprecipitation of V5-DICER or V5-DICER IPA from HEK293T cells using an
anti-V5 antibody. The experiment was performed twice with similar results.
2.5% of input was loaded. (f) The extent of miRNA processing
depends on the expression levels of full-length (FL) DICER, but not IPA.
Shown are wild-type (WT) and DICER knock-out (KO) HCT116 cells.
Re-expression of different amounts of FL DICER1 protein in the KO cells
(measured by western blot of DICER1 in the top panel) results in different
levels of endogenous let-7 expression (measured by northern
blot in the bottom panel; compare lanes 3 and 4). Expression of DICER IPA
has no influence on miRNA processing (compare lanes 4 and 5). ACTIN and U6
were used as loading controls on the same blots, respectively. The
experiment was performed twice with similar results. (g)
Western blot of MGA. MGA and MGA IPA were cloned and expressed in HEK293T
cells to confirm the predicted protein size. The experiment was performed
twice with similar results. Shown is also the endogenous MGA expression in
Raji cells. ACTIN was used as loading control on the same blot. *, denotes
an unspecific band. (h) Protein models of full-length and FOXN3
IPA are shown as in Fig. 2b. The
IPA-generated protein truncates the fork-head domain and is predicted to
lose the repressive activity. (i) As in (a), but for FOXN3.
FOXN3 IPA generates 32 new amino acids downstream of exon 2.
(j) FOXN3 IPA significantly de-represses expression of the
oncogenic targets MYC and PIM2.
Fold-change in mRNA level of endogenous genes in MEC1 B cells after
transfection of GFP-FOXN3 IPA compared with transfection of full-length
GFP-FOXN3. HPRT-normalized values are shown as boxplots (as
in Fig. 1e) from N = 5
biologically independent experiments, each performed in technical
triplicates. Two-sided t-test for independent samples was applied, **,
P = 0.002. For gel source data see Supplementary Fig. 1.
Inactivation of TSGs by CLL-IPAs independently of DNA mutations.
(a) The distribution of full-length protein size of
genes that generate CLL-IPAs (N = 306) and B-IPAs
(N = 2,690) is shown in amino acids (aa). Boxplots as
in Fig. 1e. Two-sided Mann-Whitney
test, P = 0.87. (b) TR rate (ratio of TR
mutations over all mutations) is shown for known TSGs obtained from
(5). Boxplots as in Fig.
1e. Two-sided Mann-Whitney test, P = E-155.
(c) Known TSGs, obtained from (5) that are
targeted by CLL-IPAs (N = 21) are shown. Dark green bars
indicate the fraction of retained CDR for each IPA-generated protein. Black
dots indicate the hot spot positions of TR mutations obtained from MSK cbio
portal. CLL-IPAs mostly occur upstream or within 10% (of overall aa length)
of the mutations (two-sided Wilcoxon rank test, P = 0.04).
(d) Contingency table for enrichment of TSGs among genes
that generate CLL-IPAs. P value was obtained from two-sided
Fisher’s exact test. TSGs were obtained from (5).
(e) TSGs and genes that generate CLL-IPA isoforms have
longer CDRs than genes that do not generate IPA isoforms. Boxplots as in
Fig. 1e. Two-sided Kruskal-Wallis
test, P = E-80. (f) Five control gene lists
(N = 306, each) with a similar size distribution as
CLL-IPAs and expressed in CLL were tested for enrichment of TSGs. Shown is
the number of TSGs found. Chi-square-test did not show a significant
enrichment of TSGs among the control genes. (g) Contingency
table for enrichment of TR mutation genes in CLL among genes that generate
CLL-IPAs. P value was obtained from two-sided
Fisher’s exact test. (h) ZMYM5 is truncated by a TR
mutation and an IPA isoform in the same patient, but the aberrations are
predicted to result in different truncated proteins. A 10 bp deletion in
exon 3 results in a frame-shift leading to the generation of a truncated
ZMYM5 protein, whereas ZMYM5 IPA (not yet annotated) produces a truncated
protein containing 352 more amino acids in the same patient. The genes shown
in Extended Fig. 7h and 7i are the only genes with simultaneous
presence of a TR mutation and CLL-IPA out of N = 268
tested. The position of the TR mutation is indicated in green. CLL7 and
CLL11 3′-seq and RNA-seq tracks are shown for comparison reasons.
(i) MGA is truncated by a TR mutation and an IPA isoform in
the same patient. The TR mutation affects the 5′ splice site of
intron 7, thus generating two additional amino acids downstream of exon 7,
whereas the IPA isoform encodes a truncated MGA protein containing three
more amino acids downstream of exon 9. Mutation and 3′-seq analysis
were performed once. CLL7 and CLL11 are shown for comparison reasons.
(j) Shown are additional recurrent (N
> 1) DNA mutations found by exome sequencing of CLL patient samples
stratified by a high or low number of CLL-IPAs per patient. Only the top and
bottom 16 samples with high or low CLL-IPAs are shown to normalize the
number of samples analyzed. This analysis is only descriptive and no test
was performed. (k) Significant enrichment of SF3B1 mutations in
the group of CLL samples with abundant CLL-IPA isoforms. Two-sided
Mann-Whitney test was performed. (l) Abundance of CLL-IPAs is
not associated with IGVH mutational status. Shown is the number of CLL-IPAs
per sample for patients with mutated (MUT, N = 30) or
unmutated (UN, N = 21) IGVH genes. Boxplots as in Fig. 1e. Two-sided Mann-Whitney test,
P = 0.4.
Novel TSG candidates and validation of CHST11 IPA as cancer-promoting
isoform.
(a) As in Fig. 3c,
but shown are known (red gene names) and novel TSG candidates (black gene
names) among the abundant CLL-IPAs. CLL-IPAs seem to inactivate these genes
as they mostly occur upstream or within 10% (of overall aa length) of the
mutations. Two-sided Wilcoxon rank sum test performed on all 136 TSGs,
P = E-8; two-sided Wilcoxon rank sum test performed on
the novel TSGs, N = 119, P = E-8. Position
of TR mutation was determined using the data obtained from the MSK cbio
portal and indicates the hot spot mutation. Right panel, the fraction of CLL
samples affected represents the fraction of CLL samples (out of 59) with
significant expression of the IPA isoform. Genes were included if they were
affected in at least 20% of samples investigated either by 3′-seq or
RNA-seq. (b) Contingency table for enrichment of novel TSGs
among highly recurrent CLL-IPAs. P value was obtained from
two-sided Fisher’s exact test. (c) TSGs have larger
protein sizes. Boxplots as in Fig. 1e.
Two-sided Mann-Whitney test, **, P = 0.005. The increased
overall mutation rate of known TSGs correlates with larger protein size.
Spearman′s correlation coefficient, r = 0.74, P =
E-6. (d) CHST11 IPA generates 18 new amino acids (grey)
downstream of exon 1. (e) Experimental set-up to measure
paracrine WNT activity produced by MEC1 B cells either expressing GFP,
GFP-CHST11 or GFP-CHST11 IPA and using a WNT reporter expressed in HEK293T
cells. Primary CLL cells and the CLL cell line MEC1 express several WNTs,
including WNT5B. In the presence of CHST11 WNT (red dots) binds to sulfated
proteins on the surface of WNT producing cells, whereas WNT is secreted into
the media in the presence of CHST11 IPA. WNT-conditioned media activates a
WNT reporter in HEK293T cells. This set-up refers to Fig. 4f and 4g. (f) Western blot, performed once, for WNT5
shown as in Fig. 4f, but including HeLa
cells as positive control for WNT5 expression. ACTIN was used as loading
control on the same blot.
Cancer-upregulated IPA isoforms are also detected in breast cancer and
T-ALL.
(a) MAGI3 is a TSG that is preferentially targeted by
IPA in breast cancer (27). Shown is the mutation profile
obtained from MSK cbio portal. (b) Expression of IPA isoforms
in T-ALL detected by RNA-seq. Shown are 3′-seq and RNA-seq tracks of
a representative mRNA (out of N = 101) from CLL samples,
T-ALL samples and normal thymus. The T-ALL RNA-seq data were obtained from
(32). We detected N = 381 IPA isoforms
in at least one T-ALL sample, N = 133 in at least one
thymus sample, N = 104 in at least one T-ALL and one thymus
sample, and N = 101 in at least two T-ALL samples, but not
in any of the thymus samples.
Samples investigated by 3′-seq and RNA-seq.
(a) CLL sample characteristics. (b) Normal
human immune cells investigated by 3′-seq. (c) Normal
human immune cells investigated by RNA-seq.N, No; Y, Yes; NA, not analyzedBM, bone marrow
Authors: Helmut H Schmidt; Vadim G Dyomin; Nallasivam Palanisamy; Takahiro Itoyama; Gouri Nanjangud; Hendrati Pirc-Danoewinata; Oskar A Haas; R S K Chaganti Journal: Oncogene Date: 2004-09-09 Impact factor: 9.867
Authors: Bing Ren; Hieu Cam; Yasuhiko Takahashi; Thomas Volkert; Jolyon Terragni; Richard A Young; Brian David Dynlacht Journal: Genes Dev Date: 2002-01-15 Impact factor: 11.361
Authors: Wendy Béguelin; Relja Popovic; Matt Teater; Yanwen Jiang; Karen L Bunting; Monica Rosen; Hao Shen; Shao Ning Yang; Ling Wang; Teresa Ezponda; Eva Martinez-Garcia; Haikuo Zhang; Yupeng Zheng; Sharad K Verma; Michael T McCabe; Heidi M Ott; Glenn S Van Aller; Ryan G Kruger; Yan Liu; Charles F McHugh; David W Scott; Young Rock Chung; Neil Kelleher; Rita Shaknovich; Caretha L Creasy; Randy D Gascoyne; Kwok-Kin Wong; Leandro Cerchietti; Ross L Levine; Omar Abdel-Wahab; Jonathan D Licht; Olivier Elemento; Ari M Melnick Journal: Cancer Cell Date: 2013-05-13 Impact factor: 31.743
Authors: Dinesh Rakheja; Kenneth S Chen; Yangjian Liu; Abhay A Shukla; Vanessa Schmid; Tsung-Cheng Chang; Shama Khokhar; Jonathan E Wickiser; Nitin J Karandikar; James S Malter; Joshua T Mendell; James F Amatruda Journal: Nat Commun Date: 2014-09-05 Impact factor: 14.919
Authors: Bing Zhang; Jeffrey R Whiteaker; Andrew N Hoofnagle; Geoffrey S Baird; Karin D Rodland; Amanda G Paulovich Journal: Nat Rev Clin Oncol Date: 2019-04 Impact factor: 66.675