Hiromichi Suzuki1,2, Sachin A Kumar1,2,3, Shimin Shuai4,5, Ander Diaz-Navarro6,7, Ana Gutierrez-Fernandez6,7, Pasqualino De Antonellis1,2, Florence M G Cavalli1,2, Kyle Juraschka1,2,3, Hamza Farooq1,2,3, Ichiyo Shibahara1,2, Maria C Vladoiu1,2,3, Jiao Zhang1,2, Namal Abeysundara1,2, David Przelicki1,2,3, Patryk Skowron1,2,3, Nicole Gauer1,2, Betty Luu1,2, Craig Daniels1,2, Xiaochong Wu1,2, Antoine Forget8,9, Ali Momin1,2,5, Jun Wang10, Weifan Dong1,2,5, Seung-Ki Kim11, Wieslawa A Grajkowska12, Anne Jouvet13, Michelle Fèvre-Montange14, Maria Luisa Garrè15, Amulya A Nageswara Rao16, Caterina Giannini17, Johan M Kros18, Pim J French19, Nada Jabado20, Ho-Keung Ng21, Wai Sang Poon22, Charles G Eberhart23,24,25, Ian F Pollack26, James M Olson27, William A Weiss28,29,30, Toshihiro Kumabe31, Enrique López-Aguilar32, Boleslaw Lach33,34, Maura Massimino35, Erwin G Van Meir36,37,38, Joshua B Rubin39,40, Rajeev Vibhakar41, Lola B Chambless42, Noriyuki Kijima43, Almos Klekner44, László Bognár44, Jennifer A Chan45, Claudia C Faria46,47, Jiannis Ragoussis48,49, Stefan M Pfister50,51,52, Anna Goldenberg53,54, Robert J Wechsler-Reya10,55, Swneke D Bailey56,57, Livia Garzia57,58, A Sorana Morrissy45,59, Marco A Marra60,61, Xi Huang1,2, David Malkin62, Olivier Ayrault8,9, Vijay Ramaswamy2,62, Xose S Puente6,7, John A Calarco63, Lincoln Stein4, Michael D Taylor64,65,66,67. 1. The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada. 2. Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada. 3. Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada. 4. Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada. 5. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. 6. Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología, Universidad de Oviedo, Oviedo, Spain. 7. Centro de Investigación Biomédica en Red de Cáncer, Madrid, Spain. 8. CNRS UMR, INSERM, Institut Curie, PSL Research University, Orsay, France. 9. CNRS UMR 3347, INSERM U1021, Université Paris Sud, Université Paris-Saclay, Orsay, France. 10. Tumor Initiation and Maintenance Program, NCI-Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA. 11. Department of Neurosurgery, Division of Pediatric Neurosurgery, Seoul National University Children's Hospital, Seoul, South Korea. 12. Department of Pathology, The Children's Memorial Health Institute, Warsaw, Poland. 13. Centre de Pathologie EST, Groupement Hospitalier EST, Université de Lyon, Bron, France. 14. CNRS UMR5292, INSERM U1028, Centre de Recherche en Neurosciences, Université de Lyon, Lyon, France. 15. Neuro-Oncology Unit, Istituto Giannina Gaslini, Genova, Italy. 16. Division of Pediatric Hematology/Oncology, Mayo Clinic, Rochester, MN, USA. 17. Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA. 18. Department of Pathology, Erasmus University Medical Center, Rotterdam, The Netherlands. 19. Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands. 20. Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada. 21. Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Hong Kong, China. 22. Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China. 23. Department of Pathology, John Hopkins University School of Medicine, Baltimore, MD, USA. 24. Department of Opthalmology, John Hopkins University School of Medicine, Baltimore, MD, USA. 25. Department of Oncology, John Hopkins University School of Medicine, Baltimore, MD, USA. 26. Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. 27. Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. 28. Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA. 29. Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA. 30. Department of Neurology, University of California San Francisco, San Francisco, CA, USA. 31. Department of Neurosurgery, Kitasato University School of Medicine, Sagamihara, Japan. 32. Division of Pediatric Hematology/Oncology, Hospital Pediatría Centro Médico Nacional Century XXI, Mexico City, Mexico. 33. Department of Pathology and Molecular Medicine, Division of Anatomical Pathology, McMaster University, Hamilton, Ontario, Canada. 34. Department of Pathology and Laboratory Medicine, Hamilton General Hospital, Hamilton, Ontario, Canada. 35. Fondazione IRCCS Istituto Nazionale Tumori, Milan, Italy. 36. Winship Cancer Institute, Emory University, Atlanta, GA, USA. 37. Laboratory of Molecular Neuro-Oncology, Department of Neurosurgery, School of Medicine, Emory University, Atlanta, GA, USA. 38. Department of Hematology and Medical Oncology, School of Medicine, Emory University, Atlanta, GA, USA. 39. Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA. 40. Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA. 41. Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. 42. Department of Neurological Surgery, Vanderbilt Medical Center, Nashville, TN, USA. 43. Department of Neurosurgery, Osaka National Hospital, Osaka, Japan. 44. Department of Neurosurgery, Medical and Health Science Centre, University of Debrecen, Debrecen, Hungary. 45. Charbonneau Cancer Institute, University of Calgary, Calgary, Alberta, Canada. 46. Division of Neurosurgery, Centro Hospitalar Lisboa Norte, Hospital de Santa Maria, Lisbon, Portugal. 47. Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal. 48. McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada. 49. Department of Bioengineering, McGill University, Montreal, Canada. 50. Hopp Children's Cancer Center Heidelberg (KiTZ), Heidelberg, Germany. 51. Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany. 52. Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Heidelberg, Germany. 53. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 54. Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada. 55. Department of Pediatrics, University of California San Diego, San Diego, CA, USA. 56. Department of Surgery, Division of Thoracic and Upper Gastrointestinal Surgery, Faculty of Medicine, McGill University, Montreal, Quebec, Canada. 57. Cancer Research Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada. 58. Department of Surgery, Division of Orthopedic Surgery, Faculty of Medicine, McGill University, Montreal, Quebec, Canada. 59. Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada. 60. Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada. 61. Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada. 62. Division of Haematology/Oncology, Department of Pediatrics, The Hospital for Sick Children, Toronto, Ontario, Canada. 63. Department of Cell and Systems Biology, University of Toronto, Toronto, Canada. 64. The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada. mdtaylor@sickkids.ca. 65. Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada. mdtaylor@sickkids.ca. 66. Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada. mdtaylor@sickkids.ca. 67. Division of Neurosurgery, The Hospital for Sick Children, Toronto, Ontario, Canada. mdtaylor@sickkids.ca.
Abstract
In cancer, recurrent somatic single-nucleotide variants-which are rare in most paediatric cancers-are confined largely to protein-coding genes1-3. Here we report highly recurrent hotspot mutations (r.3A>G) of U1 spliceosomal small nuclear RNAs (snRNAs) in about 50% of Sonic hedgehog (SHH) medulloblastomas. These mutations were not present across other subgroups of medulloblastoma, and we identified these hotspot mutations in U1 snRNA in only <0.1% of 2,442 cancers, across 36 other tumour types. The mutations occur in 97% of adults (subtype SHHδ) and 25% of adolescents (subtype SHHα) with SHH medulloblastoma, but are largely absent from SHH medulloblastoma in infants. The U1 snRNA mutations occur in the 5' splice-site binding region, and snRNA-mutant tumours have significantly disrupted RNA splicing and an excess of 5' cryptic splicing events. Alternative splicing mediated by mutant U1 snRNA inactivates tumour-suppressor genes (PTCH1) and activates oncogenes (GLI2 and CCND2), and represents a target for therapy. These U1 snRNA mutations provide an example of highly recurrent and tissue-specific mutations of a non-protein-coding gene in cancer.
In cancer, recurrent somatic single-nucleotide variants-which are rare in most paediatric cancers-are confined largely to protein-coding genes1-3. Here we report highly recurrent hotspot mutations (r.3A>G) of U1 spliceosomal small nuclear RNAs (snRNAs) in about 50% of Sonic hedgehog (SHH) medulloblastomas. These mutations were not present across other subgroups of medulloblastoma, and we identified these hotspot mutations in U1 snRNA in only <0.1% of 2,442 cancers, across 36 other tumour types. The mutations occur in 97% of adults (subtype SHHδ) and 25% of adolescents (subtype SHHα) with SHHmedulloblastoma, but are largely absent from SHHmedulloblastoma in infants. The U1 snRNA mutations occur in the 5' splice-site binding region, and snRNA-mutant tumours have significantly disrupted RNA splicing and an excess of 5' cryptic splicing events. Alternative splicing mediated by mutant U1 snRNA inactivates tumour-suppressor genes (PTCH1) and activates oncogenes (GLI2 and CCND2), and represents a target for therapy. These U1 snRNA mutations provide an example of highly recurrent and tissue-specific mutations of a non-protein-coding gene in cancer.
The cerebellar neuronal cancer medulloblastoma comprises four distinct molecular subgroups (Wnt, Shh, Group 3, and Group 4), each with its own distinct clinical, transcriptomic, and genetic make-up[4-6]. These four molecular subgroups can be further subdivided into molecular subtypes, including Shh-MB which comprises Shhα, Shhβ, Shhγ, and Shhδ[7]. Recently, non-coding SNVs have been discovered in the promoter regions of TERT and a handful of other loci, giving impetus to examine non-coding segments carefully[8,9]. Thus, we sought to explore the genomic landscape of MB, with a particular focus on non-coding regions. We analyzed whole-genome sequencing (WGS) of 114 MBs and observed a novel recurrent hotspot mutation of the non-coding U1-snRNA genes in 10 out of 114 cases (8.8%) (Fig. 1a; Extended Data Fig. 1; Supplementary Table 1, 2; see Methods). Hotspot mutations of U1-snRNA genes occur in the third nucleotide (r.3a>g), and are restricted to Shh-MB. Interestingly, hotspot mutations are localized within the 5′ splice site (SS) recognition sequence, which is ultra-conserved in eukaryotes through nearly one billion years of evolution (Fig. 1b and Extended Data Fig. 2a). The human reference genome (hg19), has four annotated U1-snRNA genes (RNU1–1, RNU1–2, RNU1–3, and RNU1–4) and three ‘pseudogenes’ (RNU1–27P, RNU1–28P, and RNVU1–18), all of which encode completely identical 164 base pair transcripts. In addition, there are >100 U1-snRNA pseudogenes spread across the genome, highly complicating their identification by mutation callers due to the inability to align short reads to any one individual U1-snRNA gene (Extended Data Fig. 3)[10]. We re-mapped sequence reads permitting multi-mapping, and successfully detected the U1-snRNA mutation in five additional cases (see Methods). We validated hotspot U1-snRNA mutations in an additional 40/227 MB cases from the International Cancer Genome Consortium (ICGC) (Supplementary Table 2–4). We also detected recurrent hotspot mutations of the U11-snRNA gene (RNU11) at the fifth nucleotide (r.5a>g), in the highly conserved 5′SS recognition sequence (total 4/341 cases, Extended Data Fig. 2b–d; Supplementary Table 2). Taken together, 51% (56/109) of Shh-MBs have at least one U1/U11 snRNA mutation (Fig. 2). The snRNA mutation significantly co-occurs with mutations of the TERT promoter and DDX3X (Supplementary Table 5,6). We assessed the U1-snRNA(r.3a>g) mutation across 2,442 samples from 36 cancer histologies from ICGC and found the mutation in only one sample (0.04%) – a lone pancreatic ductal adenocarcinoma (Supplementary Table 7). We conclude that U1-snRNA(r.3a>g) mutations are both highly recurrent, and extremely specific to Shh-MB.
Figure 1. –
Highly recurrent mutations of the U1-snRNAs in Shh-MB
a) Cartoon illustrating the number and subgroup specific distribution of somatic mutations in the U1-snRNA genes. U1-snRNA sequence conservation scores as determined by Rfam database.
b) Secondary structure of the mutant U1-snRNA. The red circle identifies the location of the hotspot mutation. The yellow and green rectangles indicate the 5′ splice site recognition site and the Sm protein binding site respectively. Numerals I to IV indicate stem-loops.
Extended Data Fig. 1. –
Overview of analyzed cohorts and methods.
a) The detection methods for U1-snRNA mutations by each cohort and comparison methods for alternative splicing analysis. b) Cohort specification. c) Subgroup distribution of whole genome sequencing cohorts.
Extended Data Fig. 2. –
U11-snRNA mutations and conservation of U1 and U11 snRNA genes across evolution.
a) Seed sequences of the U1-snRNA obtained from the Rfam database demonstrates high level conservation across a variety eukaryotic species, particularly at the site of the Shh-MB mutation. The consensus sequence, and first 50 nucleotides of reference sequences are included for comparison. Grey indicates nucleotide differences, and red identifies the Shh-MB hotspot mutation.
b) Cartoon illustrating the number of somatic mutations in the U11-snRNA genes. U11-snRNA sequence conservation scores as determined by Rfam database.
b) Secondary structure of the mutant U11-snRNA. The red circle identifies the location of the hotspot mutation. The yellow and green rectangles indicate the 5′ splice site recognition site and the Sm protein binding site respectively. Numerals I to IV indicate stem-loops.
d) Seed sequences of the U11-snRNA obtained from the Rfam database demonstrates high level conservation across a variety eukaryotic species, particularly at the site of the Shh-MB mutation. The consensus sequence, and first 30 nucleotides of reference sequences are included for comparison. Grey indicates nucleotide differences, and red identifies the Shh-MB hotspot mutation.
Extended Data Fig. 3. –
High levels of genomic conservation surrounding human U1-snRNAs complicate the specific PCR amplification of any one individual locus.
a) Genomic locations of the four expressed U1 spliceosomal RNA genes (on Chromosome 1p, red), and 136 pseudogenes across the H. Sapiens genome as indicated. Three pseudogenes with sequences identical (hg19) to the expressed U1 genes are indicated in orange.
b) Average mapping quality of bwa-mem and coverage of each expressed U1 and U11 spliceosomal RNA genes from WGS from germline samples of medulloblastoma patients are illustrated (n = 341). Blue bars represent Alignability of 100mers by GEM from ENCODE/CRG. Scales of >1,000 bases upstream and downstream are logarithm 10. Red bar indicates the gene body.
c) Average number of multi-mapped reads overlapped for each gene pair by STAR aligner. Heatmap shows the average number of mapped reads across WGS from germline samples of medulloblastoma patients (n = 341).
d) Sequence similarity of U1-snRNA genes, U1-snRNA pseudogenes with identical 164 bps, and U11-snRNA gene. The numbers in each square and heatmap indicate identity scores and bit scores calculated by blast software. Blank indicates no hit found.
Figure 2. –
Mutational repertoire of snRNA mutant Shh-MBs
Genomic landscape of mutations in Shh-MBs (n = 109) with and without U1/U11 mutations. Odds ratios (red dots) of coexistence of U1 and U11 snRNA mutations with other somatic events are shown with 95% confidence interval. Arrowheads represent values out of axis range. Significantly correlated mutations are denoted in red (False-discovery-rate (FDR) < 0.1, asymptotic P-values from odd-ratio tests (H0: odds-ratio = 1, see Methods) with Benjamini and Hochberg adjustment for multiple testing.
We validated the U1-snRNA(r.3a>g) mutation in an additional 159 cases of Shh-MB using allele-specific PCR. We detected mutations in the RNU1–27P and/or RNU1–28P genes, confirmed by Sanger sequencing, which were not identified by WGS (Extended Data Fig. 4a, b; Supplementary Table 8, see Methods). Combining the results of WGS and allele-specific PCR, we found that U1-snRNA(r.3a>g) mutations were largely restricted to adulthood (Shhδ - 97%) and adolescence (Shhα - 25%), and absent from infancy (Fig. 3a, b). This remains true if only age and not molecular subtype is accounted for. Indeed, most Shhα patients with TP53 mutations also have U1-snRNA(r.3a>g) mutations (Fig. 3c). Both broad and focal somatic copy number variations (sCNVs) are divergent between Shhα U1-wildtype, Shhα U1-mutants and Shhδ U1-mutants, supporting a model where they follow different genetic pathways to transformation (Extended Data Fig. 4c, d; Supplementary Table 9, 10). An analysis of focal CNVs demonstrates that Shhα U1-wildtype tumors have an increased incidence of CNVs that encompass several oncogenes and tumor-suppressor genes, including MYCN, CCND2, and PPM1D.
Extended Data Fig. 4. –
Allele-specific rhAmp SNP PCR of RNU1 loci, significant copy number changes in U1- mutant versus U1- wildtype Shh-MB and prognostic analysis.
a) The frequency of any U1-snRNA mutation by RNU1_Batch primer set (RNU1–1, RNU1–2, RNU1–3, RNU1–4, and RNVU1–18) (left) and RNU1_Pseudo primer set (RNU1–27P and RNU1–28P) (right).
b) Hotspot mutations of RNU1–27P/RNU1–28P U1-snRNA pseudogenes as confirmed by Sanger sequencing.
c) Broad copy number aberrations in U1-wildtype Shhα (n = 25), U1-mutant Shhα (n = 8), and U1-mutant Shhδ (n = 41). Dark blue and dark red bars, as well as asterisks, identify statistically significant regions comparing Shhα U1-mutant versus wildtype (P < 0.05, two-sided Fisher’s exact test).
d) Significant focal copy number aberrations in U1-wildtype Shhα (n = 25), U1-mutant Shhα (n = 8), and U1-mutant Shhδ (n = 41) illustrate significant genomic differences between U1-wildtype and U1-mutant cases. Candidate target genes within the corresponding loci are indicated. q-values were calculated by GISTIC see Methods.
e–g) Overall survival of Shhα stratified by mutational status of U1-snRNA mutation (n = 10 for mutant, n = 27 for wildtype) (e), TP53 (n = 15 for mutant, n = 22 for wildtype) (f), or both (n = 9 for both mutant, n = 1 for U1 mutation only, n = 6 for TP53 mutation only, n = 21 for both wildtype) (g). P-values were determined using the two-sided log-rank test. + indicates censored cases.
h, i) Progression-free survival (h) and overall survival (i) stratified by U1-snRNA mutation and Shh subtypes (n = 10 for U1-mutant Shhα, n = 27 for U1-wildtype Shhα, n = 23 for U1-wildtype Shhβ, n = 24 for U1-wildtype Shhγ, n = 46 for U1-mutant Shhδ). P-values were determined using the two-sided log-rank test. + indicates censored cases.
Figure 3. –
Clinical and cytogenetic features of U1-mutant Shh-MBs
a) Frequency of U1-snRNA mutations across Shh-MB subtypes. NA denotes samples where subtype is unknown.
b) Upper: frequency of U1-snRNA mutation by age group (n = 74 for infants, n = 53 for children, n = 32 for adolescents, n = 95 for adults). Bottom: age distribution by subtype (n = 47 for Shhα, n = 28 for Shhβ, n = 34 for Shhγ, n = 63 for Shhδ, n = 180 for unknow subtype) and U1-snRNA mutational status (n = 122 for mutant, n = 132 for wildtype). P-values were calculated by two-sided Wilcoxon-rank sum test. Boxplot center lines show data median; box limits indicate the interquartile range (IQR) from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points.
c) Frequency of TP53 mutation in U1-mutant and wildtype tumors.
d–f) Progression-free survival of Shhα stratified by mutational status of U1-snRNA(d) (n = 10 for mutant, n = 27 for wildtype), TP53 (e) (n = 15 for mutant, n = 22 for wildtype), or both (f) (n = 9 for both mutant, n = 1 for U1 mutation only, n = 6 for TP53 mutation only, n = 21 for both wildtype). P-values were determined using the two-sided log-rank test. + indicates censored cases.
A univariate log-rank analysis of both progression-free survival (PFS) and overall survival (OS) reveals that within Shhα both U1-snRNA(r.3a>g) and TP53 mutational status are each associated with a significantly poor outcome (Fig. 3d–f; Extended Data Fig. 4e–i). However, in a multivariate Cox regression analysis, TP53 mutations alone are no longer significant for PFS, whereas U1-snRNA(r.3a>g) confers a very strong risk for relapse (U1-snRNA(r.3a>g) hazard ratio (HR) 5.51 95% confidence interval (CI) 1.15–26.35, P=0.03, TP53 HR 3.01 95% CI 0.55–16.65, P=0.21). A similar trend was observed for OS (U1-snRNA(r.3a>g) HR 3.72 95% CI 0.74–18.87, P=0.11, TP53 HR 2.70 95% CI 0.46–15.88, P=0.27). This suggests that within Shhα, the combination of both a TP53 mutation and the U1-snRNA(r.3a>g) mutation is associated with an extremely poor prognosis.Intron-centric alternative splicing analysis using LeafCutter confirms that both U1-mutant Shhα and Shhδ have 2.5–3 times more alternative 5′ cryptic splicing events than Shh-MBs with wildtype U1-snRNA (Extended Data Fig. 5a, b, 6a–c; Supplementary Table 11)[11]. The U1-snRNA(r.3a>g) mutations would be predicted to affect the recognition of the 6th intronic nucleotide from the 5′SS, and indeed, cryptic 5′SSs recognized in U1-mutant Shh-MB demonstrate enrichment of a dominant ‘C’ base as opposed to the ‘T’ base observed in U1-wildtype tumors (Extended Data Fig. 5c and 6d, e). Pathway analysis of differentially expressed transcripts between U1-mutant, versus wildtype Shh-MB demonstrates an increase in nonsense mediated decay, consistent with destruction of aberrantly spliced transcripts (Extended Data Fig. 7a). To validate the effect of the U1-snRNA mutation, we transfected wildtype or mutant U1-snRNA(r.3a>g) vectors into humanembryonic kidney293T cells, and examined effects on splicing. Intron-centric analysis clearly demonstrates an enrichment of a ‘C’ base at the 6th intronic position, and a significant increase in the incidence of cryptic 5′ splicing events which do not overlap with U1-wildtype Shh (Extended Data Fig. 7b–d, Supplementary Table 12, 13).
Extended Data Fig. 5. –
Intron-centric analysis of Shhδ medulloblastomas.
a) Quantitation of alternative splicing events by Shh subtype as detected by intron-centric alternative splicing analysis (n = 30 each subtype). Bar plot shows adjusted standardized residual of included alternative splicing events, of which positive values indicate relatively higher number, and negative values indicate relatively lower number among subtypes.
b) Volcano plots of alternative splicing events (n = 30 each subtype). Significant events (FDR < 0.01 and absolute log effect size > 1.5 calculated by Leafcutter see Methods) are illustrated by color. Alternative splicing events of PTCH1 and GLI2 with the highest effect size are annotated.
c) Splice site sequences of included alternative splicing events by subtype (n = 30 each subtype). Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).
Extended Data Fig. 6. –
Intron-centric analysis of Shhα medulloblastomas.
a,b) Quantitation (a) and proportion (b) of alternative splicing events between U1-mutant Shhα medulloblastoma (n = 13), and U1-wildtype Shhα medulloblastoma (n = 39) as detected by intron-centric alternative splicing analysis. P-value was calculated by Chi-Squared Test.
c) Volcano plots of alternative splicing events (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). X axis shows the difference of PSI (percent spliced in) calculated by Leafcutter. Significant events (FDR < 0.01 and absolute log effect size > 1.5 calculated by Leafcutter, see Methods) are illustrated by color.
d) Splice site sequences of included alternative splicing events in U1-mutant Shhα subtype (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). Size and color for each circle indicate the q-values and Cramer’s V values for each nucleotide position (q-values were calculated by Chi-square Test and BH method, the precise values were described in Supplementary Table 11).
e) Residual analysis of 5′ splice site sequences of Annotated and Cryptic 5′ alternative splicing (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). The size and color for each circle denote the two-sided P-value, and adjusted standardized residual calculated by Haberman’s method. The precise values were described in Supplementary Table 11.
Extended Data Fig. 7. –
Nonsense mediated decay pathway in U1-mutant Shh-MB and exogenous expression analyses
a) Enrichment plots of “GO NUCLEAR TRANSCRIBED MRNA CATABOLIC PROCESS NONSENSE MEDIATED DECAY” by GSEA between U1-mutant Shhδ (n = 30) and U1-wildtype other Shh subtypes (n = 90) and U1-mutant Shhα (n = 13) and U1-wildtype Shhα (n = 39). P-values were calculated gsea see Methods.
b) Quantitation of alternative splicing events between U1-mutant HEK-293T and U1-wildtype HEK293T as detected by intron-centric alternative splicing analysis.
c) Splice site sequences of included alternative splicing events in U1-mutant HEK-293T. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).
d) Comparison of the extent of overlap between detected alternative splicing events by U1-mutant Shh (either of Shhα or δ), U1-wildtype Shh (either of Shhα, β or γ) and HEK293T with U1-Mut exogenous expression. (Left) alternatively spliced events with cryptic 5 prime sites, and (Right) alternatively spliced events with cryptic 5 prime sites and ‘C’ base at 6th intron.
e) Alternative splicing signatures by t-SNE analysis. Left: The percent spliced-in (psi) values of detected cryptic 5′ alternative splicing events, with a ‘C’ nucleotide at the 6th base in the intron from 5′ splice site. Right Upper: psi values of all cryptic 5′ alternative-splicing events. Right Lower: psi values of all alternative splicing events.
Clustering based on significant alternative splicing events is clearly driven by U1-snRNA mutational status (Extended Data Fig. 7e, see Methods), with U1-mutant tumors segregated distinctly from the U1-wildtype tumors. We conclude that the U1-snRNA(r.3a>g) mutation has a profound effect on alternative splicing in affected tumors.As a complementary approach, we conducted exon-centric alternative splicing analysis using rMATS[12]. We observed that U1-mutant Shh tumors have a higher incidence of cassette exons than U1-wildtype controls (Extended Data Fig 8a–c and 9 a, b; Supplementary Table 14). Similar to cryptic 5′ alternative splicing events, the dominant base at the 6th intronic base is ‘C’ (Extended Data Fig. 8d, 9c; Supplementary Table 15). In addition, an increase of retained introns (RIs) is observed in U1-mutant tumors. The 5′SS sequences of missed splice sites in RIs do not have a dominant ‘C’ at 6th nucleotide, but rather the canonical ‘T’. This latter result suggests a novel mechanism in which mutant U1-snRNA(r.3a>g) not only recognizes alternative 5′ SSs, but also inhibits the wildtype U1-snRNA from detecting canonical SSs resulting in their aberrant splicing. The RI event with the highest psi validated by real-time qPCR occurs in the gene PAX6, which undergoes frequent somatic mutation in Shh-MB, and a chromatin remodeling gene TOX4 (Extended Data Fig. 8e–h, 9d; Supplementary Table 16)[13,14]. The RI in both genes results in a frameshift, leading to loss of function. These data may support a model in which the U1-snRNA(r.3a>g) impedes normal splicing, leading to intron retention, and an mRNA frameshift.
Extended Data Fig. 8. –
Retained introns inactivate tumor suppressor genes in U1-snRNA(r.3a>g) mutant tumors.
a) Illustration of the different types of alternative splicing events analyzed by rMATS (n = 30 each subtype). Red arrows indicate expected 5′ prime sites recognized by the mutant U1-snRNA.
b) Quantitation of alternative splicing events by Shh subtype, as detected by exon-centric alternative splicing analysis.
c) Scatter plots of alternative splicing events (n = 30 each subtype). X axis shows the difference of PSI (percent spliced in) calculated by rMATS. Different types of significant events (FDR < 0.01 and absolute differential PSI > 0.05 calculated by rMATS see Methods) are illustrated by different colors as annotated.
d) Splice site sequences of alternative five prime splice site, included cassette exon (CE) and included retained intron (RI) events in U1-snRNA mutant Shhδ (n = 30). Each event corresponds to a red arrow cartoon in ‘a’. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).
e) Distribution of percent spliced in for PAX6 based on U1-snRNA mutation status (n = 13 for Shhα U1-mutant, n = 30 for Shhδ U1-mutant, n = 99 for U1-wildtype Shh (n = 90) and normal brain tissue (n = 9)). Dashed line defines threshold that divides the dataset into two groups (k-means method). Table displays number of samples above the threshold (high) or below (low) based on mutational status. P-value is calculated using two-sided Fisher’s exact test compared to U1-wildtype samples. U1-snRNA Mutant samples are indicated in pink, and wildtype samples in blue.
f) Sashimi-plot of splicing of PAX6 based on mutational status determined by exon-centric alternative splicing analysis (rMATS). The bar plot shows modified fragments per kilobase per millions mapped (mFPKM). Numbers enumerate average junctional reads across all samples. Annotated exon tracks are shown below with genomic positions marked.
g) Distribution of percent spliced in for TOX4 based on U1-snRNA mutation status status (n = 13 for U1-mutant Shhα, n = 30 for U1-mutant Shhδ, n = 99 for U1-wildtype Shh (n = 90) and normal brain tissue (n = 9)). Dashed line defines threshold that divides the dataset into two groups (k-means method). Table displays number of samples above the threshold (high) or below (low) based on mutational status. P-value is calculated using two-sided Fisher’s exact test compared to U1-wildtype samples. U1-snRNA mutant samples are indicated in pink, and wildtype samples in blue.
h) Sashimi-plot of splicing of TOX4 based on mutational status determined by exon-centric alternative splicing analysis (rMATS). The bar plot shows modified fragments per kilobase per millions mapped (mFPKM). Numbers enumerate average junctional reads across all samples. Annotated exon tracks are shown below with genomic positions marked.
Extended Data Fig. 9. –
Exon-centric analysis of Shhα medulloblastomas and overlapped splicing events
a) Quantitation of alternative splicing events between U1-mutant Shhα medulloblastoma, and U1-wildtype Shhα medulloblastoma, as detected by exon-centric alternative splicing analysis.
b) Scatter plots of alternative splicing events (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). X axis shows the difference of PSI (percent spliced in) calculated by rMATS. Different types of significant events (FDR < 0.01 and absolute differential PSI > 0.05 calculated by rMATS, see Methods) are illustrated by different colors as annotated.
c) Splice site sequences of alternative five prime splice site, included cassette exon (CE), included retained intron (RI) events in U1-mutant Shhα medulloblastoma and U1-wildtype Shhα medulloblastoma. Each event corresponds to a red arrow cartoon in ‘a’. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).
d) Boxplot of fold changes in expression of the alternatively spliced isoform as compared to the wildtype isoform in subsets of Shh-MB as determined by real-time qPCR. Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
e) Comparison of the extent of overlap between splicing events by Shh subtype and U1 mutational status. Effect sizes are calculated by LeafCutter with an absolute effect size threshold of 1.5.
To detect pathogenic alternative splicing, we identified cryptic 5′ events with a ‘C’ base at the 6th intronic position shared by both U1-mutant Shhα and Shhδ tumors (Extended Data Fig. 9e; Supplementary Table 17,18). Fascinatingly, we detected cryptic splicing events with high effect sizes in both PTCH1 and GLI2, highly specific to both Shhα and Shhδ tumors carrying the U1-snRNA(r.3a>g) mutation as compared to wildtype U1-snRNA controls by both RNA sequencing and real-time qPCR (Fig. 4a–e). PTCH1 is known to have at least three different initial exons. Splicing mediated by the U1-snRNA(r.3a>g) mutant results in the inclusion of a cassette exon between exon 2 and 3, causing a frameshift, and therefore predicted translation from the ATG in exon 3 (Fig. 4f). It has been previously reported that loss of expression of the 1,447 amino acid isoform of PTCH1 results in de-repression of Hedgehog signaling[15]. Similarly, the U1-snRNA(r.3a>g) cassette exon in GLI2 is spliced between exon 4 and 5, resulting in a putative GLI2 protein lacking the repressor domain (Extended Data Fig. 10a–f). Physiological GLI2 protein has a repressor domain at its amino terminus, and constructs missing the amino terminus are much more potent at activating Hedgehog signaling than the full-length protein[16].
Figure 4. –
Aberrant splicing of Hedgehog signaling genes in U1-mutant Shh-MB
a) Overview of cryptic alternative splicing of PTCH1 demonstrating the position of a cryptic cassette exon with the 5′ splice site sequence.
b) Sashimi-plot of splicing of PTCH1 in representative cases. The bar plot shows counts per million reads. Numbers enumerate junctional reads. Annotated exon tracks are shown below with genomic positions marked. Junctional reads specific to U1-mutants are in red.
c) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice site. Jittering was performed for both values.
d) ‘Percent spliced in’ values by U1-mutant Shhα, U1-mutant Shhδ, and U1-wildtype Shh (all Shh subtypes). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
e) Boxplot of fold changes in expression of the alternatively spliced isoform of PTCH1 as compared to the wildtype isoform of PTCH1 in subsets of Shh-MB as determined by real-time qPCR. Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. Data represent means ± standard deviation. P-values were calculated using two-sided Wilcoxon-rank sum test.
f) Illustration of canonical isoforms and the cryptic alternative isoform of PTCH1. Putative translation start sites are indicated with an arrow. Resulting proteins (and size) are displayed for each isoform. UTR denotes an untranslated region. Amino acids are denoted aa.
Extended Data Fig. 10. –
Aberrant splicing of oncogenes and tumor suppressor genes in U1- mutant Shh-MB.
a) Overview of cryptic alternative splicing of GLI2 demonstrating the position of a cryptic cassette exon with the 5′ splice site sequence.
b) Sashimi-plot of splicing of GLI2 in representative cases. The bar plot shows counts per million reads. Numbers enumerate junctional reads, with U1-mutant isoform reads in red.
c) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice site. Jittering was performed for both values.
d) ‘Percent spliced in’ values by U1-mutant Shhα, U1-mutant Shhδ, and U1-wildtype Shh (all Shh subtypes). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
e) Boxplot of fold changes in expression of the alternatively spliced isoform as compared to the wildtype isoform of GLI2 in subsets of Shh-MB as determined by real-time qPCR. Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
f) Illustration of canonical and cryptic isoform of GLI2. Translation start sites are indicated with an ATG arrow. Resulting proteins (and size) are displayed for each isoform. Repression and activation domains are indicated in blue and orange respectively. aa denotes amino acids.
g) Overview of cryptic alternative splicing of CCND2 illustrating the position of a cryptic cassette exon with the 5′ splice site sequence.
h) Sashimi-plot of representative cases demonstrates alternative splicing at the CCND2 locus. Numbers illustrate junctional reads. Junctional reads specific to U1-mutants are in red.
i) The canonical isoform and the cryptic isoform of CCND2.
j) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice. Jittering was performed for both values.
k) ‘Percent spliced in’ values by U1-mutant Shhα (n = 13), U1-mutant Shhδ (n = 58), and U1-wildtype Shh (all Shh subtypes, n = 104). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
l) Real-time qPCR comparing the expression of the cryptic isoform of CCND2 demonstrates high levels of expression of CCND2 restricted to Shhδ cases (n = 6 for U1-mutant Shhα, n = 6 for U1-mutant Shhδ, n = 6 for U1-wildtype Shhα). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
m) Overview of cryptic alternative splicing of PAX5 illustrating the position of a cryptic cassette exon with the 5′ splice site sequence.
n) Sashimi-plot of representative cases demonstrates alternative splicing at the PAX5 locus. Numbers illustrate junctional reads. Junctional reads specific to U1-mutants are in red.
o) The canonical isoform and the cryptic isoform of PAX5.
p) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice site. Jittering was performed for both values.
q) ‘Percent spliced in’ values by U1-mutant Shhα (n = 5), U1-mutant Shhδ (n = 27), and U1-wildtype Shh (all Shh subtypes, n = 7). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
Alternative splicing of the cell cycle gene CCND2, a known downstream target of Shh signaling that is recurrently amplified in Shh-MB, is detected in Shhδ U1-snRNA(r.3a>g) mutants, but not in Shhα (Extended Data Fig. 10g–l) [17,18]. Curiously, focal amplifications of CDK6 are highly recurrent in Shhα U1-snRNA(r.3a>g) mutants, but not in Shhα U1-wildtype or Shhδ U1-snRNA(r.3a>g) mutants, suggesting convergence on dysregulation of the G1/S cell cycle checkpoint. The CCND2 alternative isoform is prematurely terminated, resulting in N-terminal sequences where the PEST domain is predicted to be deleted. Deletion of the PEST domain causes resistance to protein degradation, and impaired export from the nucleus, resulting in CCND2 accumulating in the nucleus to promote cell cycle progression[19]. PAX5, another known tumor suppressor gene is affected by cryptic 5′ alternative splicing in U1-snRNA(r.3a>g) mutants (Extended Data Fig. 10m–q). Both U1-mutant and U1-wildtype Shh-MBs express distinct cryptic isoforms. The cryptic isoform present in U1-wildtype Shh-MBs translates the complete DNA binding domain of PAX5. However, the cryptic exon (also called a poison exon[20,21]) present in U1-mutant Shh-MBs results in a stop codon, before the DNA binding domain. Mutations of PAX5 in cancer are typically concentrated in the DNA binding site[22]. Taken together, the data on alternative splicing of PTCH1, GLI2, CCND2, and PAX5 support a model in which cryptic alternative splicing mediated by mutant U1-snRNA(r.3a>g) functions as a driver in subsets of Shh-MB.The U1-snRNA(r.3a>g) mutation is the most common SNV in MB. The restriction of these mutations not just to Shh-MB, but to the Shhα and Shhδ subtypes suggests a model in which either the specific cell of origin, the temporally specific microenvironment, or co-occurring mutations (i.e., TP53) are necessary for U1 to contribute to oncogenesis. While the almost universal occurrence of U1-snRNA mutation in Shhδ highly supports its role in tumor initiation, proof for the ongoing role of mutant U1-snRNA(r.3a>g) in tumor maintenance will await its knockdown in a tumor where it was the initiating genetic event.Shhα patients with the U1-snRNA(r.3a>g) mutation are an extremely high-risk population that should be prioritized for the development of targeted therapies. Drugs are under development that directly target the spliceosome, which may show anti-tumor effects in cancers with spliceosomal mutations[23]. Loss of expression of specific genes through cryptic splicing or intron retention could create opportunities for synthetic lethal approaches. Finally, cryptic splicing in U1-mutant Shh-MB leads to a unique form of post-transcriptional hypermutation, which would be predicted to result in the expression of numerous cell surface neo-epitopes, which are never seen in healthy tissues, and which could be targeted using immunotherapies.
Methods
Subjects and materials
The study included two large cohorts of medulloblastomas from Toronto and International Cancer Genome Consortium (ICGC) (Extended Data Fig. 1). The Toronto cohort consisted of 294 cases (WGS 114 cases and RNA-seq 225 cases, overlapped 46 cases) which were collected at diagnosis after informed consent was obtained from subjects as part of the Medulloblastoma Advanced Genomics International Consortium. All patient recruitment and tumour sample collection was approved and in compliance with the ethical regulations of each of the following institutions: The Hospital for Sick Children, Seoul National University Children’s Hospital, The Children’s Memorial Health Institute, Mayo Clinic, The Chinese University of Hong Kong, John Hopkins University School of Medicine, Seattle Children’s Hospital, University of California San Francisco, McMaster University, Erasmus University Medical Center, Kitasato University School of Medicine, Fondazione IRCCS Istituto Nazionale Tumori, Emory University, Osaka National Hospital, Washington University School of Medicine, University of Calgary, Children’s Hospital of Pittsburgh, Hospital Pediatría CentroMé dico Nacional Century XXI, University of Debrecen, McGill University, Vanderbilt Medical Center, University of Colorado Denver, Istituto Giannina Gaslini, Université de Lyon. The whole genome sequence consists of 109 published[3] and 5 unpublished. (Wnt, n = 2; Shh, n = 37; Group 3, n = 26; Group 4, n = 49). Sample were obtained as fresh frozen tissue from the time of diagnosis and stored at −80°C until processed for the purification of nucleic acids. Genomic DNA was isolated by incubation with proteinaseK overnight at 55°C followed by three sequential phenol extractions and ethanol precipitation. Messenger RNA library construction and sequencing were performed as previously described[24]. ICGC cohort consisted of 227 cases which were downloaded from ICGC under accession DACO-1036229.
Whole-genome sequencing
Whole genome sequencing (WGS) was performed at Canada’s Michael Smith Genome Science Centre at the BC Cancer Agency using the Illumina HiSeq 2000/2500 platform as previously described[24].
Sequence Alignment of Whole Genome Sequencing Data
Whole genome sequencing reads were aligned to the human reference genome “hs37d5” by 1000 Genomes Project Phase II using Burrows-Wheeler Aligner (BWA) - MEM, version 0.7.8 with ‘-T 0’ parameter. Duplicates were marked using biobambam version 0.0.148. Sequencing coverages were calculated using GenomonQC software which is downloaded from Genomon-Project and shown in Supplementary Table 1.
Somatic Variant Calling
Somatic variants were called using eight variant callers: MuTect2[25], EBCall[26], Varscan2[27], Strelka[28], SomaticSniper[29], Virmid[30], Platypus[31], and Seurat[32].MuTect2 was run using GATK v3.5.0 with the default setting. Candidate variants were filtered a panel of normal which was made by MuTect2 with ‘--artifact_detection_mode’ and GATK ‘CombineVariants’ function with ‘–minN 2’. EBCall v0.2.1 was run with the default setting. We used the following criteria, requiring P-value (by EBCall) <10−3, variant reads in Tumor ≥ 2 and variant reads in Normal ≤ 1. Varscan2 v2.4.3 was run with parameters ‘--strand-filter 1 –min-var-freq 0.08’. The results were filtered by ‘fpfilter’ function with the option ‘--dream3-settings’. Strelka v1.0.15 was run with default parameters. Virmid v1.1.0 was run with the option ‘-q 10’. Somatic Sniper v1.0.5.0 was run with the parameters ‘-Q 15 -q 1 -G -L’ and the results were filtered by the author’s recommendate filter using bam-readcount. The candidates with more than 0.03 of variant allele frequency in matched-normal sample are discarded. Platypus v0.8.1 was run with a default setting. Detected variants which passed the standard Platypus filtering criteria or showed “allele bias” were used. We used the following additional criteria, requiring likelihood (reference allele)/ likelihood (variant allele) <10−5 in tumor, likelihood (variant allele)/ likelihood (reference allele) <10−5 in matched control, variant reads in Tumor ≥ 2 and variant reads in Normal ≤ 1. Seurat v2.5 was run with the option ‘--indels’. We used variants which are called by at least two callers. Obtained results are filtered by ≤ 2 variants reads in matched-normal control calculated by realignment function of GenomonMutationFilter v0.2.1. Variants are annotated using ANNOVAR[33]. Correlation of U1 and U11 snRNA mutations with other somatic events were analyzed using R package “Epi” version 2.30. Asymptotic P-values from odds-ratio tests was calculated using twoby2 function followed by Benjamini and Hochberg adjustment for multiple testing.
Copy number calling for WGS
Copy number alterations were detected using Control-FREEC v10.3 with the following parameters: breakPointType=4, ploidy=”2,3,4”, step=10000, window=50000[34].
Variant Calling of U1 and U11 snRNA genes
To explore mutations on low mappability regions, we first picked up reads from whole genome sequencing data on U1 and U11 snRNA genes and pseudogenes using samtools and biobambam. To accept multi-mapping, we employed STAR aligner[35]. To prevent gaps, we set the setting with ‘-scoreGap −20 --alignEndsType EndToEnd’. Mutations were called by EBCall with the same setting with WGS except for acceptance of secondary alignment. We used the following criteria, requiring P-value (by EBCall) <10−3, variant reads in Tumor ≥ 4, and variant reads in matched-control ≤ 1.To evaluate exact loci of variant reads and multiple mutations of U1-snRNAs, we mapped variants reads to case specific reference again. First, we extracted all variant reads of U1-snRNA mutations (r.3a>g) with mate paired reads. Then, we constructed case specific reference which included U1-snRNA hotspot mutation (r.3a>g) and case specific germline variants detected from extracted variant reads using samtools mpileup function. Variant reads were mapped again on the case specific reference using bwa-mem with the same setting with WGS analysis. Using bam files with case specific reference, we called variants on flanking regions of the U1-snRNA hotspot mutation (r.3a>g) by samtools mpileup function to evaluate multiple mutations. No samples have recurrent variant reads. Therefore, we conclude that U1-snRNA mutation occur in one allele. To interpret the mutated genes, we extracted consecutive consensus sequence of upstream U1-snRNA sequences with two or more than two supported reads. Then, the consensus sequence was mapped using BLAST software to U1-snRNA genes and pseudogenes with 1,000bps upstream sequences from hg19 reference. Because of many variants and highly similarity in the upstream sequences, we cannot detect exact positions of mutated reads except for RNVU1–18 mutations. Therefore, we classified U1-snRNA mutations into 1) RNU1 genes (RNU1–1, RNU1–2, RNU1–3, or RNU1-4), 2) RNVU1–18, and 3) RNU1 pseudogenes (RNU1–27P or RNU1–28P) based on the similarity of sequences of flanking region. Finally, we performed manual review of detected mutations with Integrative Genome Viewer (IGV)[36]. Detected mutations are shown in Supplementary Table 2–4.
Secondary structure of U1 and U11 snRNAs
The conservation scores of U1 (RF00003) and U11 (RF00548) snRNAs are downloaded from Rfam[37]. U1 and U11 sequences of other species are downloaded from seed sequences from Rfam. The secondary structures are described based on the consensus structure in Rfam using VARNA software[38]. U2-type intron and U12-type intron sequences are downloaded from SpliceRack[39].
rhAmp Genotyping
Genomic DNA from primary tumours was tested using custom rhAmp™ SNP assays (Integrated DNA Technology). Briefly, locus and allele specific primers were generated individually for RNU1_Batch (RNU1–1, RNU1–2, RNU1–3, RNU1–4, and RNVU1–18) and RNU1_Pseudo (RNU1–27P and RNU1–28P). Assays were run in technical triplicate in 5μL volume (DNA concentration is at least 5ng/μL), with control gBlocks for wildtype, mutant and heterozygous genotypes. Reporter mix used Yakima Yellow (mutant) and FAM (wildtype) dyes as well as ROX dye for passive reference. Plates were read on the StepOnePlus (Applied Biosystems) RT-PCR machine, and genotypes called using the StepOne v2.3 software. The primer sequences are available in Supplementary Table 19.
RNA sequencing
Sequencing reads are mapped by STAR version 2.5.1b on fasta which includes the human reference genome “hs37d5” by 1000 Genomes Project Phase II, spike-in sequences of profile C1_2 ERCC spike-in concentrations used for C1 fluidigm and Caltech profile 3 spike-ins by ENCODE with the option ‘--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignMatesGapMax 200000 --alignIntronMax 200000 --alignSJDBoverhangMin 10 --alignSJstitchMismatchNmax 5 −1 5 5 --outSAMmultNmax 20 --twopassMode Basic’[35]. Mapping results are shown in Supplementary Table 20.
Intron-Centric Alternative Splicing Analysis
Intro-Centric alternative splicing analysis was performed using LeafCutter[11]. Leafcutter is an annotation-free quantification method. Intron clustering was run with minimum required read = 50 and max_intron= 500000. LeafCutter was run with the option “-g 0”. Each 30 cases of Shh subtype was compared with other Shh subtype samples, five adult brain, and four fetal brain with default setting. Shhα U1-snRNA(r.3a>g) mutants (n = 13) were compared with Shhα U1-snRNA wildtype cases (n = 39). Obtained results were filtered by q-value of each cluster < 0.01 where at least one absolute effect size calculated by LeafCutter is more than 1.5. Each event was annotated by LeafViz with GENCODE v19 gtf file. Then, events with unknown strand directions are not analyzed. Logo sequences are built using R package “ggseqlogo” v0.1[40]. Statistical analysis for comparison of sequences are performed by Chi-square test. Adjusted standardized residual was calculated by Haberman’s method. We selected cryptic 5´ splicing events with a C base at the sixth base in the intron. Subsequently, we further prioritized alternatively spliced genes which are reported as recurrent genetic aberrations in Shh-MB[3,41], are transcriptionally up-regulated or down-regulated in both the Shhα and Shhδ subtypes[7], or registered as tier 1 in Cancer Gene Census.t-SNE analysis is performed using R package “Rtsne” v0.13. Analyzed events are choose with the following, 1) Significant events in at least one Shh subtype. 2) Length of cluster of junction reads are same among all subtype. Percent Spliced In (PSI) is calculated by the number of junction reads of alternative splicing events divided by the total number of junction reads in a cluster. t-SNE is run with a default setting along with 3 Wnt, 20 Group 3, and 22 Group 4 medulloblastomas which are used for our previous study[42].
Exon-Centric Alternative Splicing Analysis
Exon-Centric alternative splicing analysis was performed using rMATS version 4.0.1[12]. rMATS was run with default setting with GENCODE v19 for alternative 3 splice site, alternative 5 splice site, retained intron, and skipped exon. We filtered the events with FDR < 0.01 and change of splicing inclusion calculated by rMATS > 0.05. Sashimi_plot was described using MISO v0.5.4[43].
Gene set enrichment analysis of nonsense mediated decay
We counted reads using GENCODE v19 gtf file and htseq version 0.6.0 with the setting “--stranded reverse -m union”. Differential expression analysis was performed using DESeq2 version 1.16.1 with the default setting after extracting genes expressed at >5 counts per million in at least 20% of cases. We performed two comparison, which are U1-mutant Shhδ (n = 30) vs U1-wildtype other Shh subtypes (n = 90) and U1-mutant Shhα (n = 13) vs U1-wildtype Shhα (n = 39). Gene set enrichment analysis (GSEA) for differentially expressed genes was performed using pre-ranked gene lists ordered by -log10(P-value) multiplied by +1 for up regulation or −1 for down regulation with gsea v3.0. We used two datasets for a pathway of nonsense mediated decay, “GO NUCLEAR TRANSCRIBED MRNA CATABOLIC PROCESS NONSENSE MEDIATED DECAY” from C5 gene set and “REACTOME NONSENSE MEDIATED DECAY ENHANCED BY THE EXON JUNCTION COMPLEX” from C2 gene set.
TP53 mutation status
Germline mutations of TP53 were analyzed using EBCall v.0.2.1. EBCall was run with the default setting. We used the following criteria, requiring P-value (by EBCall) <10−3, 90% posterior quantile calculated by EBCall > 0.3. The results were annotated using ANNOVAR.Mutation call from RNA-seq was run using GATK v3.8.0. Adding read groups and flagging duplicate reads were performed using Picard tool v2.18.0. Then, we split reads into exon segments using GATK with the setting ‘-rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS’. Base recalibration was performed using GATK. Mutation call was performed using ‘HaplotypeCaller’ function of GATK with the setting ‘-dontUseSoftClippedBases -stand_call_conf 20.0’. Variants were filtered using ‘VariantFiltration’ function of GATK with the setting ‘-window 35 -cluster 3 -filterName FS -filter “FS > 30.0” -filterName QD -filter “QD < 2.0”‘. The variants were filtered using a panel of normal which was generated from nine normal brain samples. Sanger sequencing was performed in the previous study[44]. We discarded the mutations which showed 0.01 or more frequency in 1000 Genomes v5b or ESP-6500, or dbSNP138.
Survival analysis
Overall survival and progression-free survival were evaluated using the log-rank with R package “survival” version 2.40.1. Overall survival was defined as the time from date of surgery to death or date of last follow-up and progression-free survival as the time from date of surgery to first event (progression or relapse) or date of last follow-up.
Pan-cancer analysis
We analyzed 2,442 cases across 36 tumor types from ICGC. The hotspot mutations are analyzed with the same method described above except for mapping tool. For pan-cancer data, we use bowtie aligner instead of STAR[45].
SNP6 Copy Number analysis
Array files were downloaded Gene Expression Omnibus (GEO) under GSE37385, and the relevant Affymetrix SNP6 arrays were extracted. Affymetrix Power Tools v1.18.2 was used to process and normalize the probe intensities to generate Log R Ratio (LRR) and B Allele Frequency (BAF) using the PennCNV-Affy pipeline[46]. The affygw6.hg19.pfb file was used to map the probes onto the hg19 genome. All other parameters were left on default.The resulting probe level LRR and BAF were taken into ASCAT v2.4.3[47]. GC wave correction was then performed, followed by predicting germline genotypes, finally leading to running the ASCAT algorithm to determine the copy number values for each genomic region as well as the overall ploidy and purity of the sample. Samples whose model fit was less than 80% failed their ASCAT processing stage. Log ratios for each segment were calculated by using the copy number of each segment as well as the average ploidy of the sample, according to the equation:
Adjacent segments whose log rations differed by less than 0.25 were then merged using their size weighted mean:
Copy number states were assigned to each segment based on their log ratio and their ploidy values, according to the Supplementary Table 21. Broad copy number changes are defined as in 75% or more of chromosome arm in size. Focal copy number variants were analyzed using GISTIC v.2.0.23[48]. GISTIC was run with the setting ‘-ta 0.25 -td 0.3 -js 10 -brlen 0.7 -gcm “extreme” -armpeel’.
RT-PCR and qPCR analysis
RNA was obtained for 18 patient samples which has more than 2 FPKM values of targeted genes from our larger cohort (6 U1-Wildtype Shhα, 6 U1-mutant Shhα, 6 U1-mutant Shhδ). cDNA was synthesized using SuperScript III (ThermoFisher 18080400). PCRs were performed with cDNA and Taq polymerase using 35 cycles, and products run on a 2% agarose gel. qPCRs were performed using SYBR-Green with ROX (ThermoFisher 11744500), two-step at 35 cycles. Calculation of ΔΔCT was done comparing mutant isoform to WT isoform expression. The primer sequences are available in Supplementary Table 19.
Generation of a lentiviral vector for the expression of U1 r.3a>g
The pLKO.1-puro U6 sgRNA BfuAI stuffer lentiviral vector (Addgene #50920) was modified by removing the internal U6 promoter (between NdeI and EcoRI), and it was replaced by the U1 locus, including 393 bases of internal native U1 promoter, the U1 sequence, and 39 bases of 3’-flanking region using the following oligonucleotides (5’-GTCGAGAATTCTTGGCGTACAGTCTGTTTTTG and 5’-CTATCATATGTAAGGACCAGCTTCTTTGGGA). The PCR products were digested with NdeI and EcoRI, and cloned in the modified pLKO.1 plasmid. The r.3a>g mutation was introduced by site-directed mutagenesis. All plasmids were verified by Sanger sequencing.
Exogenous expression of the U1 r.3a>g mutation
Humanembryonic kidney293T (HEK-293T) cells were grown in DMEM, 10%FBS, 1%PSG. For exogenous expression of U1-snRNA, HEK-293T cells (5 × 106 cells) were cultured in 10 cm plates and transfected using Lipofectamine Plus (Invitrogen) with 2 μg of either pLKO.1-U1wt (containing the wild-type U1 locus) or pLKO.1-U1r.3a>g (containing the r.3a>g mutation) in duplicate. Twelve hours after transfection the medium was replaced with complete media, and 48 hours later total RNA was extracted with the Trizol method.
Verification of the expression of the U1 r.3a>g mutation
Rapid amplification of cDNA ends (RACE) was performed using 1 μg of total RNA from HEK-293T cells transfected with either pLKO.1-U1wt or pLKO.1-U1r.3a>g following the recommendations of the manufacturer (Sigma-Aldrich 3353621001), and the following specific oligonucleotides (U1-RACE-SP1: 5’- CAGGGGAAAGCGCGAACGCAGT and U1-RACE-SP2: 5’- CCCACTACCACAAATTATGC). A single amplification band of the expected size (160 bps) was excised from the gel, purified and sequenced with the internal oligonucleotide U1-RACE-SP2.
Sequence analyses of Exogenous expression analysis
Messenger RNA library construction was performed based on oligo dT-based mRNA isolation using NEBNext® Poly(A) mRNA Magnetic Isolation Module. RNA Sequence was performed on NextSeq 550 using 100-bp paired-end mode. Mapping and intron clustering were performed with the same methods described above. LeafCutter was run with the option “-g 0 -i 2” and the obtained results were filtered by q-value of each cluster < 0.1 where at least one absolute effect size calculated by LeafCutter is more than 1.5.
Data availability
Sequencing data have been deposited in the European Genome-Phenome Archive (EGA) and Gene Expression Omnibus (GEO): RNA-seq (EGAD00001001899, and EGAD00001004958), whole genome sequence (EGAD00001003125 and EGAD00001004347) and RNA-seq of exogenous expression analyses (GSE128005).
Overview of analyzed cohorts and methods.
a) The detection methods for U1-snRNA mutations by each cohort and comparison methods for alternative splicing analysis. b) Cohort specification. c) Subgroup distribution of whole genome sequencing cohorts.
U11-snRNA mutations and conservation of U1 and U11 snRNA genes across evolution.
a) Seed sequences of the U1-snRNA obtained from the Rfam database demonstrates high level conservation across a variety eukaryotic species, particularly at the site of the Shh-MB mutation. The consensus sequence, and first 50 nucleotides of reference sequences are included for comparison. Grey indicates nucleotide differences, and red identifies the Shh-MB hotspot mutation.b) Cartoon illustrating the number of somatic mutations in the U11-snRNA genes. U11-snRNA sequence conservation scores as determined by Rfam database.b) Secondary structure of the mutant U11-snRNA. The red circle identifies the location of the hotspot mutation. The yellow and green rectangles indicate the 5′ splice site recognition site and the Sm protein binding site respectively. Numerals I to IV indicate stem-loops.d) Seed sequences of the U11-snRNA obtained from the Rfam database demonstrates high level conservation across a variety eukaryotic species, particularly at the site of the Shh-MB mutation. The consensus sequence, and first 30 nucleotides of reference sequences are included for comparison. Grey indicates nucleotide differences, and red identifies the Shh-MB hotspot mutation.
High levels of genomic conservation surrounding human U1-snRNAs complicate the specific PCR amplification of any one individual locus.
a) Genomic locations of the four expressed U1 spliceosomal RNA genes (on Chromosome 1p, red), and 136 pseudogenes across the H. Sapiens genome as indicated. Three pseudogenes with sequences identical (hg19) to the expressed U1 genes are indicated in orange.b) Average mapping quality of bwa-mem and coverage of each expressed U1 and U11 spliceosomal RNA genes from WGS from germline samples of medulloblastomapatients are illustrated (n = 341). Blue bars represent Alignability of 100mers by GEM from ENCODE/CRG. Scales of >1,000 bases upstream and downstream are logarithm 10. Red bar indicates the gene body.c) Average number of multi-mapped reads overlapped for each gene pair by STAR aligner. Heatmap shows the average number of mapped reads across WGS from germline samples of medulloblastomapatients (n = 341).d) Sequence similarity of U1-snRNA genes, U1-snRNA pseudogenes with identical 164 bps, and U11-snRNA gene. The numbers in each square and heatmap indicate identity scores and bit scores calculated by blast software. Blank indicates no hit found.
Allele-specific rhAmp SNP PCR of RNU1 loci, significant copy number changes in U1- mutant versus U1- wildtype Shh-MB and prognostic analysis.
a) The frequency of any U1-snRNA mutation by RNU1_Batch primer set (RNU1–1, RNU1–2, RNU1–3, RNU1–4, and RNVU1–18) (left) and RNU1_Pseudo primer set (RNU1–27P and RNU1–28P) (right).b) Hotspot mutations of RNU1–27P/RNU1–28P U1-snRNA pseudogenes as confirmed by Sanger sequencing.c) Broad copy number aberrations in U1-wildtype Shhα (n = 25), U1-mutant Shhα (n = 8), and U1-mutant Shhδ (n = 41). Dark blue and dark red bars, as well as asterisks, identify statistically significant regions comparing Shhα U1-mutant versus wildtype (P < 0.05, two-sided Fisher’s exact test).d) Significant focal copy number aberrations in U1-wildtype Shhα (n = 25), U1-mutant Shhα (n = 8), and U1-mutant Shhδ (n = 41) illustrate significant genomic differences between U1-wildtype and U1-mutant cases. Candidate target genes within the corresponding loci are indicated. q-values were calculated by GISTIC see Methods.e–g) Overall survival of Shhα stratified by mutational status of U1-snRNA mutation (n = 10 for mutant, n = 27 for wildtype) (e), TP53 (n = 15 for mutant, n = 22 for wildtype) (f), or both (n = 9 for both mutant, n = 1 for U1 mutation only, n = 6 for TP53 mutation only, n = 21 for both wildtype) (g). P-values were determined using the two-sided log-rank test. + indicates censored cases.h, i) Progression-free survival (h) and overall survival (i) stratified by U1-snRNA mutation and Shh subtypes (n = 10 for U1-mutant Shhα, n = 27 for U1-wildtype Shhα, n = 23 for U1-wildtype Shhβ, n = 24 for U1-wildtype Shhγ, n = 46 for U1-mutant Shhδ). P-values were determined using the two-sided log-rank test. + indicates censored cases.
Intron-centric analysis of Shhδ medulloblastomas.
a) Quantitation of alternative splicing events by Shh subtype as detected by intron-centric alternative splicing analysis (n = 30 each subtype). Bar plot shows adjusted standardized residual of included alternative splicing events, of which positive values indicate relatively higher number, and negative values indicate relatively lower number among subtypes.b) Volcano plots of alternative splicing events (n = 30 each subtype). Significant events (FDR < 0.01 and absolute log effect size > 1.5 calculated by Leafcutter see Methods) are illustrated by color. Alternative splicing events of PTCH1 and GLI2 with the highest effect size are annotated.c) Splice site sequences of included alternative splicing events by subtype (n = 30 each subtype). Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).
Intron-centric analysis of Shhα medulloblastomas.
a,b) Quantitation (a) and proportion (b) of alternative splicing events between U1-mutant Shhα medulloblastoma (n = 13), and U1-wildtype Shhα medulloblastoma (n = 39) as detected by intron-centric alternative splicing analysis. P-value was calculated by Chi-Squared Test.c) Volcano plots of alternative splicing events (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). X axis shows the difference of PSI (percent spliced in) calculated by Leafcutter. Significant events (FDR < 0.01 and absolute log effect size > 1.5 calculated by Leafcutter, see Methods) are illustrated by color.d) Splice site sequences of included alternative splicing events in U1-mutant Shhα subtype (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). Size and color for each circle indicate the q-values and Cramer’s V values for each nucleotide position (q-values were calculated by Chi-square Test and BH method, the precise values were described in Supplementary Table 11).e) Residual analysis of 5′ splice site sequences of Annotated and Cryptic 5′ alternative splicing (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). The size and color for each circle denote the two-sided P-value, and adjusted standardized residual calculated by Haberman’s method. The precise values were described in Supplementary Table 11.
Nonsense mediated decay pathway in U1-mutant Shh-MB and exogenous expression analyses
a) Enrichment plots of “GO NUCLEAR TRANSCRIBED MRNA CATABOLIC PROCESS NONSENSE MEDIATED DECAY” by GSEA between U1-mutant Shhδ (n = 30) and U1-wildtype other Shh subtypes (n = 90) and U1-mutant Shhα (n = 13) and U1-wildtype Shhα (n = 39). P-values were calculated gsea see Methods.b) Quantitation of alternative splicing events between U1-mutant HEK-293T and U1-wildtype HEK293T as detected by intron-centric alternative splicing analysis.c) Splice site sequences of included alternative splicing events in U1-mutant HEK-293T. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).d) Comparison of the extent of overlap between detected alternative splicing events by U1-mutant Shh (either of Shhα or δ), U1-wildtype Shh (either of Shhα, β or γ) and HEK293T with U1-Mut exogenous expression. (Left) alternatively spliced events with cryptic 5 prime sites, and (Right) alternatively spliced events with cryptic 5 prime sites and ‘C’ base at 6th intron.e) Alternative splicing signatures by t-SNE analysis. Left: The percent spliced-in (psi) values of detected cryptic 5′ alternative splicing events, with a ‘C’ nucleotide at the 6th base in the intron from 5′ splice site. Right Upper: psi values of all cryptic 5′ alternative-splicing events. Right Lower: psi values of all alternative splicing events.
Retained introns inactivate tumor suppressor genes in U1-snRNA(r.3a>g) mutant tumors.
a) Illustration of the different types of alternative splicing events analyzed by rMATS (n = 30 each subtype). Red arrows indicate expected 5′ prime sites recognized by the mutant U1-snRNA.b) Quantitation of alternative splicing events by Shh subtype, as detected by exon-centric alternative splicing analysis.c) Scatter plots of alternative splicing events (n = 30 each subtype). X axis shows the difference of PSI (percent spliced in) calculated by rMATS. Different types of significant events (FDR < 0.01 and absolute differential PSI > 0.05 calculated by rMATS see Methods) are illustrated by different colors as annotated.d) Splice site sequences of alternative five prime splice site, included cassette exon (CE) and included retained intron (RI) events in U1-snRNA mutant Shhδ (n = 30). Each event corresponds to a red arrow cartoon in ‘a’. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).e) Distribution of percent spliced in for PAX6 based on U1-snRNA mutation status (n = 13 for Shhα U1-mutant, n = 30 for Shhδ U1-mutant, n = 99 for U1-wildtype Shh (n = 90) and normal brain tissue (n = 9)). Dashed line defines threshold that divides the dataset into two groups (k-means method). Table displays number of samples above the threshold (high) or below (low) based on mutational status. P-value is calculated using two-sided Fisher’s exact test compared to U1-wildtype samples. U1-snRNA Mutant samples are indicated in pink, and wildtype samples in blue.f) Sashimi-plot of splicing of PAX6 based on mutational status determined by exon-centric alternative splicing analysis (rMATS). The bar plot shows modified fragments per kilobase per millions mapped (mFPKM). Numbers enumerate average junctional reads across all samples. Annotated exon tracks are shown below with genomic positions marked.g) Distribution of percent spliced in for TOX4 based on U1-snRNA mutation status status (n = 13 for U1-mutant Shhα, n = 30 for U1-mutant Shhδ, n = 99 for U1-wildtype Shh (n = 90) and normal brain tissue (n = 9)). Dashed line defines threshold that divides the dataset into two groups (k-means method). Table displays number of samples above the threshold (high) or below (low) based on mutational status. P-value is calculated using two-sided Fisher’s exact test compared to U1-wildtype samples. U1-snRNA mutant samples are indicated in pink, and wildtype samples in blue.h) Sashimi-plot of splicing of TOX4 based on mutational status determined by exon-centric alternative splicing analysis (rMATS). The bar plot shows modified fragments per kilobase per millions mapped (mFPKM). Numbers enumerate average junctional reads across all samples. Annotated exon tracks are shown below with genomic positions marked.
Exon-centric analysis of Shhα medulloblastomas and overlapped splicing events
a) Quantitation of alternative splicing events between U1-mutant Shhα medulloblastoma, and U1-wildtype Shhα medulloblastoma, as detected by exon-centric alternative splicing analysis.b) Scatter plots of alternative splicing events (n = 13 for U1-mutant Shhα, n = 39 for U1-wildtype Shhα). X axis shows the difference of PSI (percent spliced in) calculated by rMATS. Different types of significant events (FDR < 0.01 and absolute differential PSI > 0.05 calculated by rMATS, see Methods) are illustrated by different colors as annotated.c) Splice site sequences of alternative five prime splice site, included cassette exon (CE), included retained intron (RI) events in U1-mutant Shhα medulloblastoma and U1-wildtype Shhα medulloblastoma. Each event corresponds to a red arrow cartoon in ‘a’. Asterisk denotes nucleotide sites with q-value < 10−2 (Chi-Squared Test and BH Method).d) Boxplot of fold changes in expression of the alternatively spliced isoform as compared to the wildtype isoform in subsets of Shh-MB as determined by real-time qPCR. Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.e) Comparison of the extent of overlap between splicing events by Shh subtype and U1 mutational status. Effect sizes are calculated by LeafCutter with an absolute effect size threshold of 1.5.
Aberrant splicing of oncogenes and tumor suppressor genes in U1- mutant Shh-MB.
a) Overview of cryptic alternative splicing of GLI2 demonstrating the position of a cryptic cassette exon with the 5′ splice site sequence.b) Sashimi-plot of splicing of GLI2 in representative cases. The bar plot shows counts per million reads. Numbers enumerate junctional reads, with U1-mutant isoform reads in red.c) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice site. Jittering was performed for both values.d) ‘Percent spliced in’ values by U1-mutant Shhα, U1-mutant Shhδ, and U1-wildtype Shh (all Shh subtypes). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.e) Boxplot of fold changes in expression of the alternatively spliced isoform as compared to the wildtype isoform of GLI2 in subsets of Shh-MB as determined by real-time qPCR. Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.f) Illustration of canonical and cryptic isoform of GLI2. Translation start sites are indicated with an ATG arrow. Resulting proteins (and size) are displayed for each isoform. Repression and activation domains are indicated in blue and orange respectively. aa denotes amino acids.g) Overview of cryptic alternative splicing of CCND2 illustrating the position of a cryptic cassette exon with the 5′ splice site sequence.h) Sashimi-plot of representative cases demonstrates alternative splicing at the CCND2 locus. Numbers illustrate junctional reads. Junctional reads specific to U1-mutants are in red.i) The canonical isoform and the cryptic isoform of CCND2.j) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice. Jittering was performed for both values.k) ‘Percent spliced in’ values by U1-mutant Shhα (n = 13), U1-mutant Shhδ (n = 58), and U1-wildtype Shh (all Shh subtypes, n = 104). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.l) Real-time qPCR comparing the expression of the cryptic isoform of CCND2 demonstrates high levels of expression of CCND2 restricted to Shhδ cases (n = 6 for U1-mutant Shhα, n = 6 for U1-mutant Shhδ, n = 6 for U1-wildtype Shhα). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.m) Overview of cryptic alternative splicing of PAX5 illustrating the position of a cryptic cassette exon with the 5′ splice site sequence.n) Sashimi-plot of representative cases demonstrates alternative splicing at the PAX5 locus. Numbers illustrate junctional reads. Junctional reads specific to U1-mutants are in red.o) The canonical isoform and the cryptic isoform of PAX5.p) Scatter plot comparing detected alternatively spliced read and total junction reads which shared 3 prime splice site. Jittering was performed for both values.q) ‘Percent spliced in’ values by U1-mutant Shhα (n = 5), U1-mutant Shhδ (n = 27), and U1-wildtype Shh (all Shh subtypes, n = 7). Boxplot center lines show data median; box limits indicate the IQR from the 25th and 75th percentiles; lower and upper whiskers extend 1.5 times the IQR. Outliers are represented by individual points. P-values were calculated using two-sided Wilcoxon-rank sum test.
Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Florence M G Cavalli; Marc Remke; Ladislav Rampasek; John Peacock; David J H Shih; Betty Luu; Livia Garzia; Jonathon Torchia; Carolina Nor; A Sorana Morrissy; Sameer Agnihotri; Yuan Yao Thompson; Claudia M Kuzan-Fischer; Hamza Farooq; Keren Isaev; Craig Daniels; Byung-Kyu Cho; Seung-Ki Kim; Kyu-Chang Wang; Ji Yeoun Lee; Wieslawa A Grajkowska; Marta Perek-Polnik; Alexandre Vasiljevic; Cecile Faure-Conter; Anne Jouvet; Caterina Giannini; Amulya A Nageswara Rao; Kay Ka Wai Li; Ho-Keung Ng; Charles G Eberhart; Ian F Pollack; Ronald L Hamilton; G Yancey Gillespie; James M Olson; Sarah Leary; William A Weiss; Boleslaw Lach; Lola B Chambless; Reid C Thompson; Michael K Cooper; Rajeev Vibhakar; Peter Hauser; Marie-Lise C van Veelen; Johan M Kros; Pim J French; Young Shin Ra; Toshihiro Kumabe; Enrique López-Aguilar; Karel Zitterbart; Jaroslav Sterba; Gaetano Finocchiaro; Maura Massimino; Erwin G Van Meir; Satoru Osuka; Tomoko Shofuda; Almos Klekner; Massimo Zollo; Jeffrey R Leonard; Joshua B Rubin; Nada Jabado; Steffen Albrecht; Jaume Mora; Timothy E Van Meter; Shin Jung; Andrew S Moore; Andrew R Hallahan; Jennifer A Chan; Daniela P C Tirapelli; Carlos G Carlotti; Maryam Fouladi; José Pimentel; Claudia C Faria; Ali G Saad; Luca Massimi; Linda M Liau; Helen Wheeler; Hideo Nakamura; Samer K Elbabaa; Mario Perezpeña-Diazconti; Fernando Chico Ponce de León; Shenandoah Robinson; Michal Zapotocky; Alvaro Lassaletta; Annie Huang; Cynthia E Hawkins; Uri Tabori; Eric Bouffet; Ute Bartels; Peter B Dirks; James T Rutka; Gary D Bader; Jüri Reimand; Anna Goldenberg; Vijay Ramaswamy; Michael D Taylor Journal: Cancer Cell Date: 2017-06-12 Impact factor: 31.743
Authors: Michael Seiler; Akihide Yoshimi; Rachel Darman; Betty Chan; Gregg Keaney; Michael Thomas; Anant A Agrawal; Benjamin Caleb; Alfredo Csibi; Eckley Sean; Peter Fekkes; Craig Karr; Virginia Klimek; George Lai; Linda Lee; Pavan Kumar; Stanley Chun-Wei Lee; Xiang Liu; Crystal Mackenzie; Carol Meeske; Yoshiharu Mizui; Eric Padron; Eunice Park; Ermira Pazolli; Shouyong Peng; Sudeep Prajapati; Justin Taylor; Teng Teng; John Wang; Markus Warmuth; Huilan Yao; Lihua Yu; Ping Zhu; Omar Abdel-Wahab; Peter G Smith; Silvia Buonamici Journal: Nat Med Date: 2018-02-19 Impact factor: 53.440
Authors: Nataliya Zhukova; Vijay Ramaswamy; Marc Remke; Elke Pfaff; David J H Shih; Dianna C Martin; Pedro Castelo-Branco; Berivan Baskin; Peter N Ray; Eric Bouffet; André O von Bueren; David T W Jones; Paul A Northcott; Marcel Kool; Dominik Sturm; Trevor J Pugh; Scott L Pomeroy; Yoon-Jae Cho; Torsten Pietsch; Marco Gessi; Stefan Rutkowski; Laszlo Bognar; Almos Klekner; Byung-Kyu Cho; Seung-Ki Kim; Kyu-Chang Wang; Charles G Eberhart; Michelle Fevre-Montange; Maryam Fouladi; Pim J French; Max Kros; Wieslawa A Grajkowska; Nalin Gupta; William A Weiss; Peter Hauser; Nada Jabado; Anne Jouvet; Shin Jung; Toshihiro Kumabe; Boleslaw Lach; Jeffrey R Leonard; Joshua B Rubin; Linda M Liau; Luca Massimi; Ian F Pollack; Young Shin Ra; Erwin G Van Meir; Karel Zitterbart; Ulrich Schüller; Rebecca M Hill; Janet C Lindsey; Ed C Schwalbe; Simon Bailey; David W Ellison; Cynthia Hawkins; David Malkin; Steven C Clifford; Andrey Korshunov; Stefan Pfister; Michael D Taylor; Uri Tabori Journal: J Clin Oncol Date: 2013-07-08 Impact factor: 44.544
Authors: Ioanna Kalvari; Joanna Argasinska; Natalia Quinones-Olvera; Eric P Nawrocki; Elena Rivas; Sean R Eddy; Alex Bateman; Robert D Finn; Anton I Petrov Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971
Authors: Carol C L Chen; Shriya Deshmukh; Selin Jessa; Djihad Hadjadj; Véronique Lisi; Augusto Faria Andrade; Damien Faury; Wajih Jawhar; Rola Dali; Hiromichi Suzuki; Manav Pathania; Deli A; Frank Dubois; Eleanor Woodward; Steven Hébert; Marie Coutelier; Jason Karamchandani; Steffen Albrecht; Sebastian Brandner; Nicolas De Jay; Tenzin Gayden; Andrea Bajic; Ashot S Harutyunyan; Dylan M Marchione; Leonie G Mikael; Nikoleta Juretic; Michele Zeinieh; Caterina Russo; Nicola Maestro; Angelia V Bassenden; Peter Hauser; József Virga; Laszlo Bognar; Almos Klekner; Michal Zapotocky; Ales Vicha; Lenka Krskova; Katerina Vanova; Josef Zamecnik; David Sumerauer; Paul G Ekert; David S Ziegler; Benjamin Ellezam; Mariella G Filbin; Mathieu Blanchette; Jordan R Hansford; Dong-Anh Khuong-Quang; Albert M Berghuis; Alexander G Weil; Benjamin A Garcia; Livia Garzia; Stephen C Mack; Rameen Beroukhim; Keith L Ligon; Michael D Taylor; Pratiti Bandopadhayay; Christoph Kramm; Stefan M Pfister; Andrey Korshunov; Dominik Sturm; David T W Jones; Paolo Salomoni; Claudia L Kleinman; Nada Jabado Journal: Cell Date: 2020-11-30 Impact factor: 41.582