Literature DB >> 25500544

The landscape and therapeutic relevance of cancer-associated transcript fusions.

K Yoshihara^1,2, Q Wang¹, W Torres-Garcia¹, S Zheng¹, R Vegesna¹, H Kim¹, R G W Verhaak^1,3.

Abstract

Transcript fusions as a result of chromosomal rearrangements have been a focus of attention in cancer as they provide attractive therapeutic targets. To identify novel fusion transcripts with the potential to be exploited therapeutically, we analyzed RNA sequencing, DNA copy number and gene mutation data from 4366 primary tumor samples. To avoid false positives, we implemented stringent quality criteria that included filtering of fusions detected in RNAseq data from 364 normal tissue samples. Our analysis identified 7887 high confidence fusion transcripts across 13 tumor types. Our fusion prediction was validated by evidence of a genomic rearrangement for 78 of 79 fusions in 48 glioma samples where whole-genome sequencing data were available. Cancers with higher levels of genomic instability showed a corresponding increase in fusion transcript frequency, whereas tumor samples harboring fusions contained statistically significantly fewer driver gene mutations, suggesting an important role for tumorigenesis. We identified at least one in-frame protein kinase fusion in 324 of 4366 samples (7.4%). Potentially druggable kinase fusions involving ALK, ROS, RET, NTRK and FGFR gene families were detected in bladder carcinoma (3.3%), glioblastoma (4.4%), head and neck cancer (1.0%), low-grade glioma (1.5%), lung adenocarcinoma (1.6%), lung squamous cell carcinoma (2.3%) and thyroid carcinoma (8.7%), suggesting a potential for application of kinase inhibitors across tumor types. In-frame fusion transcripts involving histone methyltransferase or histone demethylase genes were detected in 111 samples (2.5%) and may additionally be considered as therapeutic targets. In summary, we described the landscape of transcript fusions detected across a large number of tumor samples and revealed fusion events with clinical relevance that have not been previously recognized. Our results support the concept of basket clinical trials where patients are matched with experimental therapies based on their genomic profile rather than the tissue where the tumor originated.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2014 PMID： 25500544 PMCID： PMC4468049 DOI： 10.1038/onc.2014.406

Source DB: PubMed Journal: Oncogene ISSN： 0950-9232 Impact factor: 9.867

Introduction

Transcript fusions resulting from chromosomal rearrangements are an important class of cancer-contributing somatic alteration[1]. Examples such as BCR-ABL1, first reported in chronic myeloid leukemias[2], have led to novel first line therapies with ABL inhibitors such as dasatinib[3]. Similarly, EML4-ALK fusions were detected in subset of non-small cell lung cancer[4] and ALK inhibitors were reported to improve outcome for patients with EML4-ALK positive tumors[5]. Recent advances in sequencing technology have enabled the comprehensive detection of rearrangements in the cancer genome and transcriptome[6, 7]. For example, transcriptome sequencing has identified FGFR3-TACC3 fusions in glioblastoma[8], bladder cancer[9] and head and neck, lung squamous cell carcinoma[10], and cell lines expression FGFR3 chimeras were found to be sensitive to the FGFR inhibitors. In addition, recent studies have revealed highly frequent oncogenic fusions in rare tumor types, such as C11orf95-RELA fusion in supratentorial ependymoma[11] and DNAJB1-PRKACA fusion in fibrolamellar hepatocellular carcinoma[12]. Tumor specific fusion gene landscapes of different cancers have been described using genomic and transcriptomic data[13-16]. To comprehensively identify fusion transcripts with the potential to be exploited therapeutically across many cancers, we analyzed RNA sequencing and DNA copy number data from 4,366 primary tumor samples and 364 normal samples spanning 13 tumor types. We assessed the significance of fusions per cancer type and evaluated their potential as molecular therapeutic targets by integrating mRNA exon/gene expression, somatic mutations, copy number gains and losses, and protein kinase annotation. Our fusion gene list of TCGA samples is available through a web portal via http://www.tumorfusions.org.

Results

Detection of fusion transcripts

An overview of this study is shown in Supplementary Figure 1. We compiled a mRNA sequencing data set consisting of 4,366 primary tumor samples and 369 normal samples from 13 tissue types (Table 1). Data was generated by The Cancer Genome Atlas and made available through the Cancer Genomics Hub (CGHub, https://cghub.ucsc.edu/). Using supervised hierarchical clustering analysis, we identified five normal samples with a high likelihood of tumor cell contamination and these were excluded from further study (See Supplementary Figure 2 and Methods). We used the Pipeline for RNA sequencing Data Analysis (PRADA)[17] to detect 26,995 fusion transcripts supported by at least two discordant read pairs plus one perfect-match junction spanning read, with the other end of the read pair mapping to either of the fusion gene partners. To reduce the number of false positive predictions, we filtered fusion transcripts according to gene homology, transcript allele fraction, and partner gene variety. We used BLASTn to determine homology between partner genes and removed 6,138 fusion pairs consisting of two genes with high similarity. Next, to consider the influence of transcript expression level in the process of fusion detection, we calculated the transcript allele fraction, which is the ratio of junction spanning reads to the total number of reads crossing the junction points in the reference transcripts, and removed fusion candidates with a transcript allele fraction of less than 0.01. Finally, we calculated partner gene variety for each gene and excluded non-specific fusions involving genes showing a large diversity amongst partner genes. After filtering, 9,047 and 192 fusion pairs were identified in 4,366 primary tumor and 364 normal tissue samples, respectively. After removing fusion pairs overlapping between tumor and normal samples, 8,695 tumor specific fusion pairs were identified (Supplementary Table 1). We further classified the final fusion transcript list into four tiers based on level of evidence. Fusions designated as tier 1 were detected through at least three discordant read pairs, two perfect-match junction spanning reads, and gene partner uniqueness within a sample. Tier 2 fusions required at least two discordant read pairs, one perfect-match junction spanning read, plus breakpoints detected in the DNA profile, within 100Kb from predicted junction point. Tier 3 was categorized as fusions with at least two discordant read pairs, one perfect-match junction spanning read, high consistency of predicted junction, and gene partner uniqueness within a sample. The remainder of fusions was directed to tier 4. In total, 6,219 and 1,668 fusion pairs were annotated as tier 1 or tier 2, respectively.

Table 1

A list of The Cancer Genome Atlas RNAseq data sets

Tumor type	Tumor*	Normal
Bladder urothelial carcinoma	121	16
Breast cancer	1,019	110
Glioblastoma multiforme	158	-
Head and neck squamous cell carcinoma	300	37
Clear cell renal cell carcinoma	474	71
Acute myeloid leukemia	171	-
Low grade glioma	266	-
Lung adenocarcinoma	487	57
Lung squamous cell carcinoma	220	17
Ovarian serous cystadenocarcinoma	400	-
Prostate adenocarcinoma	178	-
Skin cutaneous melanoma	78	-
Thyroid carcinoma	494	56

Total	4,366	364

Primary tumor samples with both RNAseq and Affymetrix SNP6 array data were analyzed.

Validation of fusion transcript predictions

To verify the reliability of our fusion transcript predictions, we performed BreakDancer[18] on whole genome sequencing data from 48 glioma samples and low pass whole genome sequencing from 15 melanoma tumors. A minimum of five supporting read pairs were required for detection of structural variants in whole genome sequencing data and three supporting read pairs in low-pass whole genome sequencing data. Next, we correlated the presence of genomic structural variants with fusion gene predictions from RNA. Structural DNA variants involving both fusion gene partners were considered as high confidence validation, and events involving one of the gene partners were interpreted as medium confidence. As a result, high or medium confidence structural variants were found to support 78 of 79 fusion transcripts detected in 48 glioma samples within 1Mb from the predicted junction points (Supplementary Table 2). As expected, the rate of validation was reduced in low pass sequencing data, where we found support for 31 of 48 fusion transcripts detected in 15 melanoma samples. The validation rate of tier 1 and 2 events in low pass sequencing data was higher compared to tier 3 and 4, and we limited further analysis to 7,887 tier 1 and 2 fusion transcripts.

Diversity of fusion transcripts across 13 tumor types

Determining how fusion transcripts promote cancer in various tumor types is an important goal. We categorized fusion transcripts into eight categories based on (i) distance between the two fusion gene partners and (ii) the presence of copy number alterations in proximity to the fusion junction, and examined the distribution of each category for each tumor type. We observed substantial diversity in the frequency of gene fusions, with thyroid carcinoma, clear cell renal cell carcinoma, and acute myeloid leukemia representing the lower end of the spectrum (Figure 1A). A corresponding relative reduction in the frequency of DNA segments was found in these cancer types (Figure 1B). In nine of ten remaining tumor types, the exception being prostate adenocarcinoma, more than 80% of fusion transcripts were associated with DNA amplification or deletion events (Supplementary Figure 3). Acute myeloid leukemia and thyroid carcinoma demonstrated a relatively high frequency of copy-neutral interchromosomal fusions (Figure 1C), suggesting the frequent occurrence of balanced genomic rearrangements[1]. Fusion transcripts originating from genes within 1 Megabase of each other were dominant in ovarian cancer, which might be related to the high frequency of copy number alteration in ovarian cancer[19]. Overall, these findings suggest that fusion transcripts resulting from copy number balanced translocations are relatively rare and instead are preferentially derived through genomic instability[20, 21].

Figure 1

The distribution of fusion transcripts across twelve tumor types

(A) Bar plots show the fraction of samples in which at least a single fusion transcript was detected per tumor type (green). The dot plots illustrates the number of detected fusion transcripts per megabase per sample normalized by the sequencing coverage. Tumor types were sorted according to the fraction of samples with fusions. (B) Box-Whisker plots showing the number of DNA segments per sample as a relative measure of genome instability across 13 tumor types. (C) Barplots representing the fraction of different types of fusions classified based on the distance between the genes constituting the fusion and the presence or absence of a DNA copy number alteration within 100Kb of the junction point.

Next, we generated a summary of recurrent fusion transcripts across 13 tumor types (Supplementary Table 3) which included 263 fusions occurring at least twice. Of these, 24 recurrent fusions have been previously reported[22, 23]. Furthermore, we focused on fusions with the same gene fused to multiple different partners (Figure 2 and Supplementary Table 4). Perhaps the most prominent and novel recurrent gene was the estrogen receptor 1 (ESR1). We identified 16 ESR1 associated fusions in breast cancer, which represents 1.5%) of the entire breast cancer cohort. Only one of these was predicted to be in frame and these fusions may thus be disruptive events rather than activating (Supplementary Figure 4). On the basis of this result, we extracted 221 fusions involving a tumor suppressor gene (TSG) that have a potential to result in loss of function (Supplementary Table 5). All samples harboring TSG fusions were called wild type except one low grade glioma sample.

Figure 2

The chromosomal location of recurrent fusion genes for each tumor types

Line plots representing the frequency of fusion gene A and B across the genome (green), the negative log (q-value) of DNA amplifications (red) and deletions (blue) per tumor type.

DNA copy number alterations with q-value less than 0.05 as determined by GISTIC are shown.

Approximately 36% of detected fusion transcripts were predicted to be in-frame and thus may result in a functional protein, with acute myeloid leukemia and thyroid carcinoma showing relatively high fractions of in-frame fusions (78.5% and 70.3%) compared to other tumor types (Supplementary Figure 5A). When we re-evaluated the distribution of eight fusion categories only using 2,811 in-frame fusions, the distribution of eight categories for each tumor type was generally similar with those based on 7,887 fusion transcripts (Supplementary Figure 5B, Supplementary Table 6). In total, 80 of 2,811 in-frame fusion transcripts were detected in at least two samples across the entire cohort (Supplementary Table 7), including well known fusions such as TMPRSS2-ERG[24], PML-RARA[25], FGFR3-TACC3[8], EGFR-SEPT14[26, 27]. Interestingly, we observed reduced frequencies of significant gene mutation in samples with recurrent in frame fusion transcripts compared to those without recurrent in frame fusion transcripts. The difference was statistically significant in bladder carcinoma, breast cancer, head and neck squamous cell carcinoma, clear cell renal cell carcinoma, acute myeloid leukemia, and thyroid cancer (Welch’s t-test, P = 0.0067, 0.022, 0.030, 0.063, 4.8-e15, and 8.3e–88, respectively), suggesting that fusions in these cancer types could be functioning as incidental cancer driving events (Supplementary Figure 6).

Protein kinase fusions across 13 tumor types

Fusion genes with oncogenic kinase activation have been identified in many cancers[1, 2, 4] and cancer cells harboring these types of fusions are frequently highly susceptible to kinase inhibitors[28]. To discover fusion candidates with therapeutic potential, we focused on fusion transcripts involving a protein kinase gene. An in-frame protein kinase fusion was detected in 324 (7.4%) of 4,366 samples (minimum, 0.8% in clear cell renal cell carcinoma; maximum, 11.6% in bladder carcinoma) (Supplementary Table 8). The majority of in frame kinase fusions belonged to the tyrosine kinase family (36.1%), the AGC serine/threonine protein kinases (14.8%) and the tyrosine kinase-like serine/threonine protein kinase group (10.1%) (Supplementary Table 9). The fraction of protein kinase fusions was significantly higher in thyroid carcinoma compared to other tumor types, involving genes such as RET (n=24), NTRK3 (n=9) and BRAF (n=16) (Fisher’s exact test, P = 2.2e–16) (Figure 3A). BRAF fusions were also detected in two prostate adenocarcinoma, two melanoma and one low grade glioma samples (Supplementary Table 10). BRAF fusions are notable because of mutually exclusivity with BRAF mutation (Figure 4) as well as the life-prolonging effects of RAF and MEK inhibitors for patients with melanoma harboring BRAF V600E mutations[29]. RET is frequently activated by mutations in medullary thyroid cancer and inhibitors of multiple tyrosine kinases including RET has been approved for medullary thyroid cancer by the Food and Drug Administration (FDA), while treatment of NTRK1 fusion positive lung cancer cells with a kinase inhibitor led to suppression of cell growth[30]. Our findings suggest that kinase inhibition may have broad applicability for treatment of thyroid cancers[31, 32].

Figure 3

An overview of protein kinase fusions across 13 tumor types

(A) Bar plots show the fraction of in-frame protein kinase fusions relative to the total number of in frame fusions per tumor type. (B) Recurrent in-frame protein kinase fusion across 13 tumor types (n≥2). Color represents tumor type. (C) The landscape of protein kinase fusions across cancer. The horizontal and vertical axes represent tumor samples and kinase genes, respectively. Genes were ordered based on kinase family annotation. Color bar depicts tumor type.

Figure 4

Significance of RAF family fusions in thyroid cancer

(A) The top panel indicates frequencies of somatic mutations (lightblue) and significant mutations (pink). To compare the frequency between samples with and without recurrent fusions (n≥2), a Welch’ s t-test was performed. The bottom panel shows a heatmap of fusions and significant gene mutations in 312 thyroid cancers. (B) Position of each domain in BRAF gene and junction points of BRAF fusions. (C) Exon expression plots demonstrated Z-normalized exon expression for each exon in thyroid cancers. Red and blue represent relatively high and low exon expression.

Amongst 357 kinase fusions, the ALK-ROS1-RET lineage, FGFR, and NTRK family kinase fusions have previously been considered as druggable[28] and were commonly detected in tumor types including bladder carcinoma (3.3%), glioblastoma (4.4%), head and neck cancer (1.0%), low grade glioma (1.5%), lung adenocarcinoma (1.6%), lung squamous cell carcinoma (2.3%), prostate adenocarcinoma (1.7%), and thyroid carcinoma (8.7%)(Figure 3B and Supplementary Table 10), suggesting a potential for application of kinase inhibitors across tumor types (Figure 3C and Supplementary Table 11). ALK fusions can be targeted by ALK inhibitors and have been reported in non-small cell lung cancer as well as breast, colorectal, esophageal, renal cell, and renal medullary cancers[33]. We detected ALK fusions in lung adenocarcinoma (0.8%), bladder (0.8%), melanoma (1,3%), and thyroid cancer (0.6%), suggesting that ALK fusions are rare but occur across different tumor lineages.

Chromatin modifier fusions across 13 tumor types

Recent studies demonstrated that genes associated with chromatin modification are frequently mutated and drive many types of cancers, leading to developing new drugs for epigenetic protein families. Inhibitors of DNA methylation and histone deacetylates (HDAC) show antitumor activity[34, 35], and have been approved for the treatment of myelodysplastic syndrome[36] and cutaneous T cell lymphoma[37] by the FDA. In-frame gene fusions involving a chromatin modifier gene were detected in 115 (2.6%) of 4,366 samples (Supplementary Table 12 and 13) and were mutually exclusive with protein kinase fusions (Fisher’s exact test, P = 0.031). The fraction of chromatin modifier fusions in acute myeloid leukemia was higher than other tumor types (Figure 5A), and included four samples with MLL fusions (5.8%) which may be druggable by DOT1L inhibitors[38]. Although there were only seven recurrent chromatin modifier fusions across 13 tumor types (Figure 5B), fusions related to histone methyltransferases and demethylase families with potential as a target for anticancer therapy were detected in 48 (1.1%) of 4,366 samples (Figure 5C)[35]. For example, an association of the lysine-specific demethylase 5A gene (KDM5A/JARID1A/RBP2) overexpression with tumorigenesis or metastasis has been previously reported in lung cancer[39]. The KDM5A JmjC domain plays an important role in demethylating lysine 4 of histone 3 and upregulated of this domain was observed in three of four samples harboring KDM5A fusions (Supplementary Figure 7).

Figure 5

A survey of chromatin modifier fusions across 13 tumor types

(A) Bar plots show the fraction of in-frame chromatin modifier fusions relative to the total number of in frame fusions per tumor type. (B) Recurrent in-frame chromatin modifier fusions across 13 tumor types (n≥2). Color represents tumor type. (C) The landscape of chromatin modifier fusions across cancer. The horizontal and vertical axes represent tumor samples and chromatin modifier genes, respectively. Genes were ordered based on chromatin modifier class. Color bar depicts tumor type.

A resource of fusion transcripts from The Cancer Genome Atlas

To allow integration of structural transcript variations with other types of molecular data generated by The Cancer Genome Atlas, we developed a user-friendly fusion gene database which is accessible at http://54.84.12.177/PanCanFusV2/. Through a user-friendly web interface, this portal enables users to search fusion transcripts by gene, by fusion, by TCGA patient ID and tumor type.

Discussion

This study presents a bona-fide catalog of fusion transcripts through analysis of 4,730 paired-end RNA sequencing data sets. We comprehensively identified the diversity of fusion transcripts across 13 tumor types, including the association of fusion transcripts with somatic mutation and DNA double strand breaks. Although the frequency of recurrent fusion transcripts is generally substantially less compared to somatic mutation events[40] such as TP53, PIK3CA, or PTEN, the detection of specific events such as the EML4-ALK protein kinase fusion in non-small cell lung cancer has led to development of treatment effectively targeted this lesion[28]. Importantly, we showed that in-frame and potentially activating NTRK1, and ALK rearrangements are not limited to breast, thyroid, and lung cancer respectively[28, 41], but can be detected across cancer at low frequency. Similarly, FGFR related fusions with therapeutic potential have been reported across tumor types[10], which was confirmed by our study. Similar to kinase fusions, our cross-sectional fusion list suggests that there may be opportunity for sporadic application of DNA methylation and histone deacetylase inhibitors, such as have been approved for clinical use in hematological malignant tumors[35, 42]. For instance, Cadot et al. have reported that suppressing HDAC4 causes chromosome segregation defects in p53-deficient tumor cells[43] and one lung adenocarcinoma sample harboring HDAC4-SNX18 fusion showed HDAC4 mRNA overexpression and TP53 somatic mutation, suggesting a possible beneficial effect of HDAC inhibitors. We observed a significant anti-correlation between the presence of a transcript fusion and significant gene mutations in most tumor types, which suggested that driver genome and transcriptome rearrangements may occur infrequently but with high relevance to the tumor in which they are detected. Out findings provide a strong rationale for unbiased clinical testing of targetable fusion events. Basket clinical trials in which patients are treated on the basis of gene abnormalities, instead of tumor type tissue of origin, have the potential to overcome the infrequency of druggable events and may particularly evaluated in the context of transcript fusions Our fusion database (http://54.84.12.177/PanCanFusV2/) would be a largest database of fusion transcripts obtained from pair-end RNA sequencing data based on unified criteria, demonstrating that druggable fusions are not so frequent but relevant across many tumor types. A comprehensive understanding of fusion transcripts across tumor types could facilitate development of new therapeutic strategies for various tumor types based on fusion events.

Methods

Data preparation

TCGA RNA sequencing data were downloaded from Cancer Genome Hub (CGHub, https://cghub.ucsc.edu). In this study, we used RNA sequencing data obtained from 4,730 TCGA samples (4,366 primary tumor and 364 normal tissues) consisting of 13 tumor types (Table 1). To exclude the possibility of tumor cell contamination in normal tissue, we compared gene expression profiles between primary tumor and normal samples by using SAM algorithm[44] (Fold change >2 and p < 0.0001) and performed supervised hierarchical clustering using differentially expressed genes for each tumor type. Of 369 normal samples, five samples (one clear cell renal cell cancer, one lung adenocarcinoma, and three thyroid carcinoma) belonging to tumor cluster were excluded in this study (Supplementary Figure 2).

Identification of fusion transcripts

We used the pipeline for RNA sequencing Data Analysis (PRADA, http://bioinformatics.mdanderson.org/Software/PRADA/)[17]. Briefly, PRADA extracts all best alignments per read from the dual (genome and transcriptome) reference file using BWA[45]. After initial mapping, the alignments of reads that map to both genome and transcriptome are collapsed into single genome coordinates. Once mapped, reads are filtered out if their best placements are not mapped to multiple genomic coordinates. Quality scores are recalibrated using the Genome Analysis Toolkit (GATK)[46]. Index files are generated using Samtools[47] and duplicate reads are flagged using Picard (http://picard.sourceforge.net). The PRADA fusion module detects fusion transcripts through identification of discordant read pairs and junction spanning reads. Discordant read pairs are paired read ends that map uniquely to different protein-coding genes with orientation consistent with formation of a sense-sense chimera. Junction spanning reads are detected by the construction of a sequence database that holds all possible exon-exon junctions that match the 3’ end of one gene fused to 5’ end of a second gene. Unmapped reads aligned to the database of all hypothetical exon junctions created by using the Ensembl transcriptome reference. Only reads of which the mate pair maps to either of the two fusion partner genes are considered as fusion transcripts. In this study, we extracted fusions (1) with at least two discordant read pairs, (2) at least one junction spanning reads, and (3) without high gene homology between each fusion gene partner (E-value > 0.001). Next, we applied the concept of mutation allele fraction to RNA sequencing data, and calculated the ratio of junction spanning reads to the total number of reads crossing over the junction point in the reference transcript (Supplementary Figure 8). We used the transcript allele fraction (TAF) to exclude artifacts depending on highly expressed transcripts. We included fusion transcripts showing more than 0.01 in TAF of both genes in our fusion list. In addition, we assessed a variety of partner genes for each gene. The partner genes variety was defined as the kinds of chromosome arms in which partner genes were located (Supplementary Figure 9). We calculated random distribution of partner gene variety (permutation: 100,000 times) per number of fusions comprising one specific gene with consideration of gene frequency for each chromosomal arm, and excluded fusions in which partner genes were randomly distributed to various chromosome arms (p < 0.00001). Next we utilized TCGA level 3 copy number data to scan the existence of breakpoints within 100 Kb from predicted junction point[26]. We set copy number threshold value as 0.3. We applied fusion transcripts to a four-tier system as follow; Tier 1: fusions harboring at least three discordant read pairs, at least two junction spanning reads, and gene partner uniqueness within a sample[48]. Tier 2: fusions having at least two discordant read pairs and at least one junction spanning reads, plus breakpoints within 100Kb from predicted junction point. Tier 3: fusions showing high consistency of predicted junction and gene partner uniqueness within a sample as well as having at least two discordant reads pairs and at least one junction spanning reads. Tier 4: other than tier 1 to 3. We used total 7,415 tier 1 and tier 2 fusion transcripts in this study. Fusions that have never been reported were annotated as “novel” based on Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (http://cgap.nci.nih.gov/Chromosomes/Mitelman), Cancer genome project (http://www.sanger.ac.uk/research/projects/cancergenome/) and ChimerDB 2.0 (http://biome.ewha.ac.kr:8080/FusionGene/). We included genes overlapping between TSGene: Tumor Suppressor Gene Database (http://bioinfo.mc.vanderbilt.edu/TSGene/)[49] and Cancer Gene census (http://cancer.sanger.ac.uk/cancergenome/projects/census/)[50] in a list of tumor suppressor gene.

Validation of fusion transcripts

We obtained TCGA whole genome sequence data on 28 glioblastoma, 20 low-grade glioma, 18 melanoma (low pass), and matched normal samples from CGHub. We applied BreakDancer (version 1.12)[18] to whole genome sequencing data and identified somatic rearrangements that had 5 or more supporting reads in whole genome sequencing data, or 3 or more supporting reads in low-pass whole genome sequencing data, and were not in matched normal samples. To validate fusion transcripts by using whole genome sequencing data, we set two confidence level (high and medium) and two window size (100Kb and 1Mb). When Break Dancer predicts structural variant involving connecting both gene partners of fusion transcripts or involving one of the gene partners, we defined “high confidence” and “medium confidence”, respectively (Supplementary Figure 9).

Exon expression analysis

TCGA level 3 RNA sequence exon level expression data was obtained from TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/). The Generic Annotation files (GAF) including annotations for all exon was downloaded from https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/other/GAF/GAF_bundle/outputs/TCGA.Sept2010.09202010.gaf. We used exon quantification text file to perform Z-normalization for each exon expression in each tumor type. To examine the association of fusion events with gene expression, we performed Welch’s t-test score between exons before and after junction point for each gene.

Copy number alteration analysis

TCGA level 3 copy number data based on Affymetrix SNP 6.0 array was obtained from TCGA Data Portal. We calculated DNA segments per sample as a measure of genome instability. To detect high frequent region of copy number alterations and copy number status for each gene for each tumor type, we used the genome identification of significant targets in cancer (GISTIC) algorithm (version 2)[51]. Copy number levels were categorized into five levels (high and low-level amplification, high and low-level deletion, and no alteration).

Mutation data analysis

We downloaded somatic mutation data (syn1710680) from Synapse (https://www.synapse.org/#) and determined significant mutated genes per tumor type by MutSigCV[40]. Of 13 tumor types, melanoma samples with recurrent fusion had no mutation data. For each tumor type, we extracted overlapped samples between fusion and mutation data sets to compare mutation rate and significant mutation frequency between samples with and without recurrent fusions. Low grade glioma, prostate adenocarcinoma and melanoma data sets in which no sample with recurrent fusions was detected in the overlapped data set were excluded in this analysis.

Protein expression data analysis

We downloaded reverse phase protein array (RPPA) data in TCGA breast cancer from The Cancer Protein Atlas (TCPA) website (http://app1.bioinformatics.mdanderson.org/tcpa/_design/basic/index.html)[52]. We focused on ER alpha protein expression in ESR1 fusion positive breast cancer samples and compared ER alpha and phosphorylated ER alpha expression between samples with ESR1 fusion positive and negative breast cancer samples.

Statistical analysis

We conduced all computations with R 3.0.1[53] and used standard statistical tests as appropriate.

51 in total

Review 1. Epigenetic protein families: a new frontier for drug discovery.

Authors: Cheryl H Arrowsmith; Chas Bountra; Paul V Fish; Kevin Lee; Matthieu Schapira
Journal: Nat Rev Drug Discov Date: 2012-04-13 Impact factor: 84.694

2. Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation.

Authors: Guangwu Guo; Xiaojuan Sun; Chao Chen; Song Wu; Peide Huang; Zesong Li; Michael Dean; Yi Huang; Wenlong Jia; Quan Zhou; Aifa Tang; Zuoquan Yang; Xianxin Li; Pengfei Song; Xiaokun Zhao; Rui Ye; Shiqiang Zhang; Zhao Lin; Mingfu Qi; Shengqing Wan; Liangfu Xie; Fan Fan; Michael L Nickerson; Xiangjun Zou; Xueda Hu; Li Xing; Zhaojie Lv; Hongbin Mei; Shengjie Gao; Chaozhao Liang; Zhibo Gao; Jingxiao Lu; Yuan Yu; Chunxiao Liu; Lin Li; Xiaodong Fang; Zhimao Jiang; Jie Yang; Cailing Li; Xin Zhao; Jing Chen; Fang Zhang; Yongqi Lai; Zheguang Lin; Fangjian Zhou; Hao Chen; Hsiao Chang Chan; Shirley Tsang; Dan Theodorescu; Yingrui Li; Xiuqing Zhang; Jian Wang; Huanming Yang; Yaoting Gui; Jun Wang; Zhiming Cai
Journal: Nat Genet Date: 2013-10-13 Impact factor: 38.330

3. Combined BRAF and MEK inhibition in melanoma with BRAF V600 mutations.

Authors: Keith T Flaherty; Jeffery R Infante; Adil Daud; Rene Gonzalez; Richard F Kefford; Jeffrey Sosman; Omid Hamid; Lynn Schuchter; Jonathan Cebon; Nageatte Ibrahim; Ragini Kudchadkar; Howard A Burris; Gerald Falchook; Alain Algazi; Karl Lewis; Georgina V Long; Igor Puzanov; Peter Lebowitz; Ajay Singh; Shonda Little; Peng Sun; Alicia Allred; Daniele Ouellet; Kevin B Kim; Kiran Patel; Jeffrey Weber
Journal: N Engl J Med Date: 2012-09-29 Impact factor: 91.245

4. Comprehensive genomic analysis of rhabdomyosarcoma reveals a landscape of alterations affecting a common genetic axis in fusion-positive and fusion-negative tumors.

Authors: Jack F Shern; Li Chen; Juliann Chmielecki; Jun S Wei; Rajesh Patidar; Mara Rosenberg; Lauren Ambrogio; Daniel Auclair; Jianjun Wang; Young K Song; Catherine Tolman; Laura Hurd; Hongling Liao; Shile Zhang; Dominik Bogen; Andrew S Brohl; Sivasish Sindiri; Daniel Catchpoole; Thomas Badgett; Gad Getz; Jaume Mora; James R Anderson; Stephen X Skapek; Frederic G Barr; Matthew Meyerson; Douglas S Hawkins; Javed Khan
Journal: Cancer Discov Date: 2014-01-23 Impact factor: 39.397

Review 5. Histone lysine demethylases as targets for anticancer therapy.

Authors: Jonas W Højfeldt; Karl Agger; Kristian Helin
Journal: Nat Rev Drug Discov Date: 2013-11-15 Impact factor: 84.694

6. Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer.

Authors: Shanker Kalyana-Sundaram; Sunita Shankar; Scott Deroo; Matthew K Iyer; Nallasivam Palanisamy; Arul M Chinnaiyan; Chandan Kumar-Sinha
Journal: Neoplasia Date: 2012-08 Impact factor: 5.715

7. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.

Authors: Craig H Mermel; Steven E Schumacher; Barbara Hill; Matthew L Meyerson; Rameen Beroukhim; Gad Getz
Journal: Genome Biol Date: 2011-04-28 Impact factor: 13.583

8. Oncogenic and drug-sensitive NTRK1 rearrangements in lung cancer.

Authors: A Vaishnavi; M Capelletti; P A Jänne; R C Doebele; A T Le; S Kako; M Butaney; D Ercan; S Mahale; K D Davies; D L Aisner; A B Pilling; E M Berge; J Kim; H Sasaki; S Park; G Kryukov; L A Garraway; Peter S Hammerman; J Haas; S W Andrews; D Lipson; P J Stephens; V A Miller; M Varella-Garcia
Journal: Nat Med Date: 2013-10-27 Impact factor: 53.440

9. TCPA: a resource for cancer functional proteomics data.

Authors: Jun Li; Yiling Lu; Rehan Akbani; Zhenlin Ju; Paul L Roebuck; Wenbin Liu; Ji-Yeon Yang; Bradley M Broom; Roeland G W Verhaak; David W Kane; Chris Wakefield; John N Weinstein; Gordon B Mills; Han Liang
Journal: Nat Methods Date: 2013-09-15 Impact factor: 28.547

10. Exploration of the gene fusion landscape of glioblastoma using transcriptome sequencing and copy number data.

Authors: Nameeta Shah; Michael Lankerovich; Hwahyung Lee; Jae-Geun Yoon; Brett Schroeder; Greg Foltz
Journal: BMC Genomics Date: 2013-11-22 Impact factor: 3.969

197 in total

1. Discovery of New Fusion Transcripts in a Cohort of Pediatric Solid Cancers at Relapse and Relevance for Personalized Medicine.

Authors: Célia Dupain; Anne C Harttrampf; Yannick Boursin; Manuel Lebeurrier; Windy Rondof; Guillaume Robert-Siegwald; Pierre Khoueiry; Birgit Geoerger; Liliane Massaad-Massade
Journal: Mol Ther Date: 2018-11-02 Impact factor: 11.454

2. Fusion proteins in head and neck neoplasms: Clinical implications, genetics, and future directions for targeting.

Authors: Derek A Escalante; He Wang; Christopher E Fundakowski
Journal: Cancer Biol Ther Date: 2016-09-16 Impact factor: 4.742

3. Targetable molecular alterations in congenital glioblastoma.

Authors: Ahmed Gilani; Andrew Donson; Kurtis D Davies; Susan L Whiteway; Jessica Lake; John DeSisto; Lindsey Hoffman; Nicholas K Foreman; B K Kleinschmidt-DeMasters; Adam L Green
Journal: J Neurooncol Date: 2019-12-24 Impact factor: 4.130

4. Identification of Novel Fusion Transcripts in Undifferentiated Pleomorphic Sarcomas by Transcriptome Sequencing.

Authors: Biqiang Zheng; Shuirong Zhang; Weiluo Cai; Jian Wang; Ting Wang; Ning Tang; Yingqiang Shi; Xiaoying Luo; Wangjun Yan
Journal: Cancer Genomics Proteomics Date: 2019 Sep-Oct Impact factor: 4.069

5. FGFR-TACC approaches the first turn in the race for targetable GBM mutations.

Authors: Cameron Brennan
Journal: Neuro Oncol Date: 2017-04-01 Impact factor: 12.300

6. Engineering and Functional Characterization of Fusion Genes Identifies Novel Oncogenic Drivers of Cancer.

Authors: Hengyu Lu; Nicole Villafane; Turgut Dogruluk; Caitlin L Grzeskowiak; Kathleen Kong; Yiu Huen Tsang; Oksana Zagorodna; Angeliki Pantazi; Lixing Yang; Nicholas J Neill; Young Won Kim; Chad J Creighton; Roel G Verhaak; Gordon B Mills; Peter J Park; Raju Kucherlapati; Kenneth L Scott
Journal: Cancer Res Date: 2017-05-16 Impact factor: 12.701