| Literature DB >> 34316681 |
Julianne K David1,2, Sean K Maden1,2, Benjamin R Weeder1,2, Reid F Thompson1,2,3,4,5,6,7, Abhinav Nellore1,2,8.
Abstract
This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon-exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon-exon junctions may have a substantial causal relationship with the biology of disease.Entities:
Year: 2020 PMID: 34316681 PMCID: PMC8209686 DOI: 10.1093/narcan/zcaa001
Source DB: PubMed Journal: NAR Cancer ISSN: 2632-8674
Figure 2.Clustering by cohort prevalence of shared novel junctions not found in core normal samples. (A) Heatmap showing junction prevalences across every TCGA cohort for each cancer type's top 200 shared junctions that are at least 1% prevalent in that cancer type and are not found in any core normal samples. (B) Heatmap showing shared junction prevalences across selected TCGA cancer types and their assigned histological subtypes for each subtype's top 200 shared junctions that are at least 1% prevalent in that subtype and are not found in any core normal samples. See Supplementary Table S1 for TCGA subtype abbreviations. (C) Heatmap showing shared junction prevalences across selected TCGA cancer types and a set of their matched SRA tissue and cell types of origin, for each cancer type's top 200 shared junctions that are at least 1% prevalent in that cancer cohort and are not found in any core normal samples.
Junction novelty specification
| Junction novelty stage | Definition |
|---|---|
| 0 | All junctions |
| 1+ | Junctions not found in tissue-matched GTEx or TCGA normal samples |
| 2+ | Junctions not found in any GTEx or TCGA normal (‘core normal’) samples |
| 3+ | Junctions not found in any core normal samples or in selected SRA tissue and cell type non-cancer samples |
Figure 1.Distribution of exon–exon junctions across and within TCGA cancer cohorts. (A) Log-scale bar charts describing the percentage of all junctions of a given cancer type cohort present in three subcohorts. Blue (left) bars give the percentage of cohort junctions found in GTEx or TCGA tissue-matched normal samples (Supplementary Table S1), green (center) bars give the percentage of the remaining junctions that are found in other core normals and yellow (right) bars give the percentage of cohort junctions not found in any core normals; cancer types are ordered by relative abundance of junctions in this last set. Cancer types with no blue (left) bar have no tissue-matched normal samples (Supplementary Table S1). (B) Log-scale sorted strip plots representing the number of non-core normal junctions per sample for each of 33 TCGA cancer types. Each point represents a single TCGA tumor sample and the width of each strip is proportional to the size of the cancer type cohort (15). Supplementary Figure S1B shows analogous data with additional filters applied. (C) Log-scale box plots representing the prevalences within each cancer type cohort of junctions occurring in at least 1% of cancer type samples, summarized across all TCGA cancer types. Junction prevalences are shown in blue (left) for those found in GTEx or TCGA tissue-matched normal samples (Supplementary Table S1), junctions not present in tissue-matched normals but found in other core normals are shown in green (center) and junctions not found in any core normals are shown in yellow (right). Note that any junction found in multiple cancer types is represented by multiple data points, one for each cancer type in which it is found. A detailed breakdown by TCGA cancer type is available in Supplementary Figure S1E.
Figure 3.Junction set assignments and antisense junction prevalence in additional normal tissue and cell type categories from the SRA, across cancers. (A) Upset-style plot with bar plots showing junction abundances across major sets (left) and set overlaps (top) across 33 cancers (error bars). Shown junctions are absent from all core normals. Unexplained junctions (red highlights) comprise junctions not present in any set categories studied (see also expanded set assignments in Supplementary Figure S3A). The developmental set comprises human development-related junctions not present in the category placenta. Scale is log10 of percent of junctions not found in core normals, calculated for each cancer. (B) Box plots showing, for each TCGA cancer type, the percent of junctions that are antisense for (green) junctions found in core normals, (aqua) junctions not found in core normals but found in other selected non-cancer adult tissue and cell samples from the SRA, (lavender) junctions not found in core normals or SRA non-cancer adult samples but found in selected developmental samples on the SRA, (apricot) junctions not found in core normals or SRA non-cancer adult samples but found in selected stem cell samples on the SRA and (red) junctions not found in core normals or selected non-cancer adult, developmental or stem cell samples from the SRA. Each point represents the percent of junctions from one cancer type in the given category (e.g. developmental) that are antisense. The table shows the median and interquartile range of the number of junctions in that category across all TCGA cancer types.