| Literature DB >> 33097743 |
Babak Alaei-Mahabadi1, Kerryn Elliott1, Erik Larsson2.
Abstract
One of the ways in which genes can become activated in tumors is by somatic structural genomic rearrangements leading to promoter swapping events, typically in the context of gene fusions that cause a weak promoter to be substituted for a strong promoter. While identifiable by whole genome sequencing, limited availability of this type of data has prohibited comprehensive study of the phenomenon. Here, we leveraged the fact that copy number alterations (CNAs) arise as a result of structural alterations in DNA, and that they may therefore be informative of gene rearrangements, to pinpoint recurrent promoter swapping at a previously intractable scale. CNA data from nearly 9500 human tumors was combined with transcriptomic sequencing data to identify several cases of recurrent activating intrachromosomal promoter substitution events, either involving proper gene fusions or juxtaposition of strong promoters to gene upstream regions. Our computational screen demonstrates that a combination of CNA and expression data can be useful for identifying novel fusion events with potential driver roles in large cancer cohorts.Entities:
Mesh:
Year: 2020 PMID: 33097743 PMCID: PMC7584658 DOI: 10.1038/s41598-020-74420-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Pipeline overview. (a) Underlying principle of the promoter substitution events. A deletion, shown in the blue box and a tandem duplication, shown in the red box resulting in the substitution of the strong promoter of green gene A with the weaker promoter of orange gene B. The breakpoints near orange gene B could be both within and upstream of the gene body. (b) Analysis workflow: 9423 SNP6 derived copy number profiles and RNA-seq based gene expression profiles were used in this study. (c) Violin plot showing SNP6 based deletions (left) and tandem duplications across (right) multiple cancer types. OV, Ovarian serous cystadenocarcinoma; SARC, Sarcoma; UCS, Uterine Carcinosarcoma; ESCA Esophageal carcinoma; UCEC, Uterine Corpus Endometrial Carcinoma; BRCA, Breast invasive carcinoma; ACC, Adrenocortical carcinoma; BLCA, Bladder Urothelial Carcinoma; STAD, Stomach adenocarcinoma; LUSC, Lung squamous cell carcinoma; SKCM, Skin Cutaneous Melanoma; LIHC, Liver hepatocellular carcinoma; GBM, Glioblastoma multiforme; LUAD, Lung adenocarcinoma; READ, Rectum adenocarcinoma; CESC, Cervical squamous cell carcinoma and endocervical adenocarcinoma; HNSC, Head and Neck squamous cell carcinoma; PRAD, Prostate adenocarcinoma; CHOL, Cholangiocarcinoma; MESO, Mesothelioma; KICH, Kidney Chromophobe; DLBC, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma; COAD, Colon adenocarcinoma; UVM, Uveal Melanoma; PCPG, Pheochromocytoma and Paraganglioma; TGCT, Testicular Germ Cell Tumors; PAAD, Pancreatic adenocarcinoma; LGG, Brain Lower Grade Glioma; KIRP, Kidney renal papillary cell carcinoma; KIRC, Kidney renal clear cell carcinoma; THYM, Thymoma; THCA, Thyroid carcinoma. Colors in this graph are used throughout to indicate cancer type. The distribution of deletions and tandem duplications is shown for each cancer type. Green boxes and red plus signs show the median and mean, respectively.
Figure 2Pan-cancer analysis of SVs resulting in promoter substitutions. (a) Manhattan plot showing the recurrent promoter substitution (PS) events across the genome by chromosome in all cancer types. Note that the dot color represents the number of cancer types with the event. (b) Induction of 3′ partners tends to occur when the 5′ partners have a stronger promoter. The log2 transformed expression difference of 3′ partner of PS events, comparing affected samples with the median of the unaffected samples in the same cancer type is shown on Y-axis. X-axis represents the expression difference of 5′ partner with the 3′ partner, comparing median expression within the same cancer type. Circle sizes correspond to the frequency of the event. (c) Volcano plots showing recurrent cases with 3′ partner induction, where the 5′ partner has the stronger promoter. Pairs highlighted in the text are labeled. Cancers are color-coded similar to Fig. 1c. WT, unaffected wild type samples, with respect to the indicated alteration, from the same cancer type. q, false discovery rate.
Figure 3TIAM2 overexpression as a result of promoter substitution with SCAF8. (a) Genomic deletions (blue bars) juxtapose the SCAF8 promoter (below Fig. 3c) in five ovarian (brown) and one endometrial (cyan) tumors. Cancers are color-coded similar to Fig. 1c. (b) Strong activation of TIAM2 in PS positive cases. mRNA level of TIAM2 is shown across 418 ovarian and 180 endometrial tumors. Wild type tumors (WT) without SCAF8-TIAM2 fusions, SCAF8- fusion (Fus), as well as two SCAF8-TIAM2 read through (RT) and amplified samples (Amp) are shown separately in ovarian cancer samples (brown). P-values are calculated using the Wilcoxon rank-sum test comparing the expression of the altered tumors with other samples. (c) Splice junction derived from RNA-Seq data for PS events. Red arcs are RNA reads. Reads supporting the deletions (blue boxes) are shown in green. (d) Two possible ORFs of the new fusion transcript. Blue lines indicate exon structure for SCAF8 and TIAM2 genes. Dashed lines show exon junctions. (e) Splice junction for two samples with read through events are shown here. (f) SCAF8 expression across normal tissues from GTEx. Red plus signs indicate mean expression per tissue. The dashed red line indicates the mean expression of all tissue samples.
Figure 4Overexpression of ANK3 and SCARB1 through hijacking the strong promoter of CCDC6 and NCOR2. (a) Recurrent tandem duplication events causing CCDC6-ANK3 fusion. Red bars indicate copy number gains, blue lines indicate exon structure of genes, green lines indicate Pfam protein domains. The scissors show the transcript junction for ANK3 derived from matching RNA-Seq. Cancers are color-coded similar to Fig. 1c. Crosses next to cancer types indicate how the transcript breakpoint affects the coding sequence (CDS) (b) Expression versus copy number change for ANK3 in breast (yellow), ovarian (brown) and endometrial (cyan) respectively. PS positive samples are marked in green. P values are calculated using the Wilcoxon rank-sum test comparing the expression of the altered tumors with other samples. (c) The novel fusion transcript is expressed at a higher level compared to the WT ANK3. Read count based estimation of the expression level of the WT 5′ gene (CCDC6), the WT 3′ gene (ANK3) and the predicted chimeric gene (CCDC6-ANK3) was calculated using ericScript tool. (d) Recurrent tandem duplications creating a novel transcript containing the two first noncoding exons of NCOR2 and SCARB1. Red bars indicate amplified regions, blue lines indicate exon structure of genes, grey lines indicate Pfam protein domains. (e) Expression versus copy number change for SCARB1 in stomach (light blue), lung adenocarcinoma (light grey) and esophageal (dark blue) respectively. PS positive samples are marked in red. (f) NCOR2 expression across different tissues from GTEx. Red plus signs indicate mean expression per tissue. The dashed red line indicates the mean expression of all tissue samples. TPM, transcripts per million.