Yongfu Xiong1, Ying Deng2, Kang Wang3, He Zhou4, Xiangru Zheng4, Liangyi Si5, Zhongxue Fu6. 1. Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. 2. Department of Cardiovascular, The First Branch, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. 3. Department of Breast Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. 4. Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China; Central Laboratory, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. 5. Department of Cardiovascular, The Third Affiliated Hospital of Chongqing Medical University, Chongqing, China. Electronic address: Si_Liangyi@outlook.com. 6. Department of Gastrointestinal Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China. Electronic address: fzx19990521@126.com.
Abstract
BACKGROUND: Alternative splicing (AS), as a potent and pervasive mechanism of transcriptional regulatory, expands the genome's coding capacity and involves in the initiation and progression of cancer. Systematic analysis of alternative splicing in colorectal cancer (CRC) is lacking and greatly needed. METHODS: RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal. Then, a java application, known as SpliceSeq, was used to evaluate the RNA splicing patterns and calculate the Percent Spliced In (PSI) value. Differently expressed AS events (DEAS) were identified based on PSI value between paired CRC and adjacent tissues. DEAS and its splicing networks were further analyzed by bioinformatics methods. Kaplan-Meier, Cox proportional regression and unsupervised clustering analysis were used to evaluate the association between DEAS and patients' clinical features. RESULTS: After strict filtering, a total of 34,334 AS events were identified, among which 421 AS events were found expressed differently. Parent genes of these DEAS play a important role in regulating CRC-related processes such as protein kinase activity (FDR<0.0001), PI3K-Akt signaling pathway (FDR = 0.0024) and p53 signaling pathway (FDR = 0.0143). 37 DEAS events were found to be associated with OS, and 68 DEAS events were found to be associated with DFS. Stratifying patients according to the PSI value of AT in CXCL12 and RI in CSTF3 formed significant Kaplan-Meier curves in both OS and DFS survival analysis. Unsupervised clustering analysis using DEAS revealed four clusters with distinct survival patterns, and associated with consensus molecular subtypes. CONCLUSIONS: Large differences of AS events in CRC appear to exist, and these differences are likely to be important determinants of both prognosis and biological regulation. Our identified CRC-related AS events and uncovered splicing networks are valuable in deciphering the underlying mechanisms of AS in CRC, and provide clues of therapeutic targets to further validations.
BACKGROUND: Alternative splicing (AS), as a potent and pervasive mechanism of transcriptional regulatory, expands the genome's coding capacity and involves in the initiation and progression of cancer. Systematic analysis of alternative splicing in colorectal cancer (CRC) is lacking and greatly needed. METHODS: RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal. Then, a java application, known as SpliceSeq, was used to evaluate the RNA splicing patterns and calculate the Percent Spliced In (PSI) value. Differently expressed AS events (DEAS) were identified based on PSI value between paired CRC and adjacent tissues. DEAS and its splicing networks were further analyzed by bioinformatics methods. Kaplan-Meier, Cox proportional regression and unsupervised clustering analysis were used to evaluate the association between DEAS and patients' clinical features. RESULTS: After strict filtering, a total of 34,334 AS events were identified, among which 421 AS events were found expressed differently. Parent genes of these DEAS play a important role in regulating CRC-related processes such as protein kinase activity (FDR<0.0001), PI3K-Akt signaling pathway (FDR = 0.0024) and p53 signaling pathway (FDR = 0.0143). 37 DEAS events were found to be associated with OS, and 68 DEAS events were found to be associated with DFS. Stratifying patients according to the PSI value of AT in CXCL12 and RI in CSTF3 formed significant Kaplan-Meier curves in both OS and DFS survival analysis. Unsupervised clustering analysis using DEAS revealed four clusters with distinct survival patterns, and associated with consensus molecular subtypes. CONCLUSIONS: Large differences of AS events in CRC appear to exist, and these differences are likely to be important determinants of both prognosis and biological regulation. Our identified CRC-related AS events and uncovered splicing networks are valuable in deciphering the underlying mechanisms of AS in CRC, and provide clues of therapeutic targets to further validations.
Growing evidence demonstrated that alternative splicing is frequently modified in cancer and is associated with cancer progression. Systematic analysis of alternative splicing signature in colorectal cancer is lacking and greatly needed.
Added Value of This Study
Through our analysis and screening, the genome-wide AS events were profiled, and the CRC-related AS events were identified. Further enrichment analysis confirmed that the parent genes of these AS events (DEAS) have great potential to play vital role in CRC. Interesting splicing correlation network provides novel insights into how these CRC-related AS events were potentially regulated by key splicing factors. In addition, some AS events, which were identified associate with survival, may be most valuable in deciphering the underlying mechanisms of AS in the oncogenesis of CRC, and provide clues of therapeutic targets to further validations.
Implications of All the Available Evidence
Large differences of AS events in CRC appear to exist, and these differences are likely to be important determinants of both prognosis and biological regulation. Here, we provided the most systematic analysis so far of alternative splicing in CRC, which could promote further mechanism and experimental study.Alt-text: Unlabelled Box
Introduction
Despite advances in screening, diagnosis, and curative resection, colorectal cancer (CRC) is still one of the leading causes of cancer-related death worldwide [1]. In addition, it's clinical outcome for individual cases remains unsatisfactory because optimal management and individual therapy strategies still present great challenges [2]. At present, surgical resection is the only potentially curative therapy for CRC. Unfortunately, about 20–25% of patients with newly diagnosed CRC present with distant metastases, and only a small population of these patients can undergo curative operation [3]. More seriously, approximately 50% of CRC patients with resectable tumors will develop recurrence, most within 2 years [4]. Evidence accumulated during the past decades have revealed that cancerogenesis, recurrence and metastasis of colorectal is a complex and tightly regulated process, involving accumulation of numerous genetic alternation over years [5]. These sequential changes not only activate oncogenes or inhibit the action of tumor suppressor genes, but also empower CRC with the ability to invade tissues and metastasize. Thus, further comprehensive understanding of the relationship between the biological mechanism that anticipates in CRC regulation and the corresponding clinicopathological characteristic is the vital step to develop target therapy and improve the prognosis of CRC patients.The rapid development of the high-throughput technology has opened a new era for cancer genomics study. By applying RNA sequencing(RNA Seq) and microarrays in recent year, gene expression and genomic profiling of CRC have been sufficiently evaluated [[6], [7], [8]], and the arm these studies are mainly focusing on differential gene expression for CRC phenotype classification, but also on gene expression variability according to clinical staging [[9], [10], [11]]. For example, a genome-wide CRC-related genes identification was conducted in our recently published study, which combining sequence data of paired tissue samples(e.g., neoplastic vs normal tissue) and corresponding clinical information to insight into the potential biological and clinical value of transcriptome variation [2]. Additionally, studies involved ncRNA expression, copy number variation, DNA methylation and nucleotide polymorphisms have been widely performed, especially in CRC [12,13]. Results obtained from above-referenced studies not only identified CRC-related alterations in several pathways such as TGF-b, WNT, MAPK, PI3K and p53 signaling, but also dynamically revealed the intricate interwoven relationships that govern CRC biological behavior. More importantly, these results reinforcing the concept that multiple genetic events are required to unleash the malignant progression of CRC, and only a genome-wide approach can interpret complex scenarios in the biology of tumors. These studies, although with promising results, mainly focus on alteration at the level of transcript expression. However, systematic analysis of variation on transcript architecture (alternative splicing) is largely ignored, especially in CRC.While human cells contain nearly 20,000 protein-coding genes, their transcriptome is tremendously more complex, with 82,141 distinct mRNA sequences indexed on the current version of GENCODE [14,15]. The remarkable discrepancy between the number of protein-coding genes in an organism and its overall cellular complexity indicate that additional mechanisms are acting to expand the coding capacity of the genome and form an intermediate layer of regulation between transcriptional and post-translational networks [16]. Alternative splicing (AS), the process by which a single RNA precursor can be spliced in different arrangements to produce structurally and functionally distinct mRNA and protein variants, may be one of the most extensively applied mechanisms that account for the proteome diversity and cellular complexity [17]. Indeed, AS have a profound effect on the biologic characteristics of the final protein by adding or deleting functional domains, changing its stability, controlling its location, or modifying its protein-protein interactions [18]. More importantly, data from genome-wide studies suggest that 92–95% of multi-exon genes in human undergo AS [19].In recent years the contribution of AS in human disease, particularly in cancer, has been widely recognized. From the point of view of mechanism, abnormal AS event can directly affect major biogenesis and progression of tumors. For example, the systematic and coordinated alteration of AS in several functionally linked precursor mRNAs could entirely alter the specific process of carcinogenesis. Moreover, increasing evidence suggested that unbalanced expression of splicing variants or the failure to properly express the correct isoforms is another hallmark of cancer [20]. Thus, cancer-specific splice variants may potentially be used as diagnostic, prognostic, and predictive biomarkers as well as therapeutic targets.Due to the technical limitation, the effect or functions of AS events in CRC have been individually studied in only a few cases. But recently several approaches using high-throughout technique have been applied successfully. For example, Mojica and Hawthorn directly compared exon array-based data from paired tumor and normal tissues from 10 CRC patients, revealing the complexity of AS and pointing out the limitations of current transcript annotation [21]. In another study, genome-wide disruption of pre-mRNA splicing was investigated on 160 CRCs and indicated that AS analogous to genomic instability is a characteristic of CRC [5]. More recently, by using whole transcriptome analysis, Bisognin et al. demonstrate that cancer-specific AS is common in early phases of CRC natural history [22]. Comparing with microarray used in these resent examples [5,21,22], RNA-Seq enabling the quantitative measure of known and novel AS isoforms, and provides better resolution, deeper coverage and higher accuracy, is the most suitable method to apply in AS study. However, the scale of RNA-Seq based studies is small because that the cost of sequencing at deep coverage is still very high. Additionally, as rapidly developing technology, the bioinformatics protocols for processing and quantitatively analyzing RNA-Seq data for AS are still need to be optimized. Furthermore, the lack of corresponding clinical information, few studies annotated the CRC-specific AS events with clinical meaning in a systematic way.With the rapid accumulation of RNA-seq data in CRC samples, the Cancer Genome Atlas (TCGA) project provides a rich source for the investigation of AS patterns. At present, there are already a total of 677 RNA-seq data obtained from 627 CRC patients that are publicly available online [23]. The detailed clinical information of these data could also be obtained from TCGA. Therefore, it is possible to study the clinical effect of CRC-related AS events in a relative large-scale of populations. Despite the great potential of using RNA-Seq to study AS in cancers, the advantages could not be realized without efficient and reliable bioinformatics processing. However, recently developed analytical tool, known as SpliceSeq, could unambiguously aligns RNA-Seq reads to gene splice graphs, facilitating accurate analysis of complex or low-frequency AS events [24]. That may be particularly advantageous for application to tumors.To the best of our knowledge, integrating clinical information and large-scale of RNA-seq data to systematic analyses of AS at individual exon resolution have been lacking, and is great needed, especially in CRC. Thus, in this present study, we systematically profiled the genome-wide alternative splicing in CRC cohort from TCGA, and further identified CRC-related AS events and investigated its association with clinical outcome. Our results reveal that such CRC-related AS events as CSTF3-RI, CXCL12-AT et al. are particularly important in CRC, and they can directly affect the major biogenesis and progression of CRC, even serve as reliable prognostic biomarkers.
Materials and methods
Data curation process
RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal (https://portal.gdc.cancer.gov/). TCGA data are classified by data type and data level, to allow structured access to this resource with appropriate patient privacy protection. This study meets the publication guidelines provided by TCGA [25]. Then, downloaded samples and corresponding clinical data were cross-referenced by TCGA barcodes. Some patients do not meet the following criteria were eliminated: [1] a histological diagnosis of CRC [2] patients with complete clinical features including sex, age, tumor location, local invasion, lymph node metastasis, distal metastasis, differentiation grade, pathologic stage, survival information; [3] patients were still alive at least 1 month after initial pathologic diagnosis; [4] patients with corresponding RNA-Seq data. To generate the AS profiles for each CRC patient, SpliceSeq, a java application that unambiguously quantify the inclusion level of each exon and splice junction, was used to evaluate the RNA splicing patterns as previous described [26,27]. The Percent Spliced In (PSI) value, rating from zero to one which was commonly used in quantifying AS events, was calculated for each AS events. To generate as reliable a set of AS events as possible we implemented a series of stringent filters (Percentage of Samples with PSI Value ≥75, Average of PSI value ≥0.05). Upset plot generated by UpSetR (version 1.3.3) was used for the quantitative analysis of interactive sets between seven types of AS [28]. Circos Plots generated by Circlize (version 0.3.9) was created to display the detail of AS events and its parent genes in chromosome [29]. Details of this study design are illustrated in Fig. 1 as a flowchart.
Fig. 1
Flowchart for profiling the alternative splicing of CRC in a large-scale RNA-Seq data. RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal. After excluding patients with incomplete clinical data and duplications, we combined these datasets into a large-scale CRC cohort, which was further analyzed by SpliceSeq. Through this process, PSI value was calculated for each AS events. Then, a series of stringent filters (Percentage of Samples with PSI Value ≥75, Average of PSI value ≥0.05) was implemented. Based on the PSI value of each AS events, we identified DEAS between CRC and normal tissues and further investigated the parent genes of these AS events by enrichment analysis. Next, the interaction network of these parent genes and regulation network of DEAS and splicing factors was visualized and analyzed. Finally, to assess the prognostic value of each DEAS event in CRC, their effect on OS and DFS was determined by Cox regression and Kaplan-Meier analysis.
Flowchart for profiling the alternative splicing of CRC in a large-scale RNA-Seq data. RNA-Seq data and corresponding clinical information of CRC cohort were downloaded from the TCGA data portal. After excluding patients with incomplete clinical data and duplications, we combined these datasets into a large-scale CRC cohort, which was further analyzed by SpliceSeq. Through this process, PSI value was calculated for each AS events. Then, a series of stringent filters (Percentage of Samples with PSI Value ≥75, Average of PSI value ≥0.05) was implemented. Based on the PSI value of each AS events, we identified DEAS between CRC and normal tissues and further investigated the parent genes of these AS events by enrichment analysis. Next, the interaction network of these parent genes and regulation network of DEAS and splicing factors was visualized and analyzed. Finally, to assess the prognostic value of each DEAS event in CRC, their effect on OS and DFS was determined by Cox regression and Kaplan-Meier analysis.
Identification of differentially expressed AS events (DEAS) and Enrichment Analysis
To identify DEAS between CRC and normal tissues, the PSI value of each AS events was detected from the TCGA CRC cohort (627 CRC tissue and 50 paired normal tissue). The batch effect was removed by using a generalized linear model. The expression differences were characterized by log FC (log 2 fold change) and associated adj.p values. The |log2FC| > 1 and adj.p < .05, represented corresponding AS events were upregulated and downregulated, respectively. The parent genes of these AS events were then subjected to biological function enrichment analysis. GO terms and KEGG pathways with significant enrichment false discovery rate (FDR) values <0.05 were selected for further analysis. The analyses were performed by clusterProfler package (version 3.4.4) [30]. In addition, the parent genes of these AS event were mapped and imported into the Retrieval of Interacting Genes/Proteins (STRING) 9.1 database. The correlation network was then visualized by Cytoscape (version 3.4.0) [31]. Cluster analyses were performed using correlation distance metrics and the average linkage agglomeration algorithm.
Construction of splicing correlation network
A list of 71 human splicing factors was created by hand-curated screenings of literature and databases [32]. All of these splicing factors were experimentally validated in previous studies, including 13 SR proteins, 27 heterogeneous nuclear ribonucleoprotein (hnRNP) proteins and other 31 proteins belonging to the ELAV, KHDRBS, CELF, Nova and Fox families. The expression of splicing factor were obtained from level 3 mRNA-seq data in TCGA. Correlation between the expression of splicing factor and the PSI value of DEAS were analyzed by WGCNA (version 1.51) [33]. p values were adjusted by Benjamini & Hochberg (BH) correlation and adjusted P value of <0.05 was considered significant. The correlation plots were generated by Cytoscape (version 3.4.0).
Survival analysis
For DEAS event, CRC patients were divided into two groups based on the PSI value (median cut), and the two-category was modelled as continuous variables to derive more easily interpretable hazard ratios (HRs). Univariate Cox regression followed by multivariate Cox regression was performed based on overall survival (OS) and disease-free survival (DFS) to determine independent prognostic factors respectively. Kaplan-Meier analysis with log-rank test was used to compare patients' survival between subgroups. The effect of each variable on survival was determined by the Cox multivariate regression analysis. Unless otherwise mentioned, all statistical analyses in this study were performed using R software (version 3.2.2), and P value <.05 were considered to be statistically significant.
Evaluation of the correlation with clinical features
Unsupervised classification of the TCGA CRC cohort was performed using hierarchical clustering (Ward linkage and 1-Pearson correlation coefficient distance used) on the identified DEAS (n = 421). To obtain a robust classification, we used an unsupervised consensus approach implemented in the R package Consensus Cluster Plus [34]. The optimal number of clusters was selected according to the Elbow method and the Gap statistic. Consensus molecular subtyping of CRC was achieved based on gene expression data using the CMScaller package [35], as previously described. The associations between clusters and clinical outcome were assessed using the Chi-squared test and logistic regression as described in Marisa et al. studies [8].
Results
Overview of AS events profiling in CRC
Integrated AS events profiling was established using RNA-Seq data obtained from 627 CRC patients. The included population comprised 334 (53.2%) male and 293 (46.7%) female patients, among which 199 patients (18.9%) developed recurrence and 130 (20.7%) had died, respectively. The median follow-up period after radical resection was 27.5 months (range, 1–147 months). The detailed baseline characteristics and histopathological features of the patients are summarized in Supplementary Table S1. By using SpliceSeq based on the standard protocol as previously reported, we identified a total of 82,411 AS events from 13,201 genes. According to their splicing pattern, these AS events can be roughly divided into seven types, including Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA), which were illustrated in Fig. 2a. Of note is that the majority part of AS events could only be detected in a few samples. In addition, the expression level of certain splicing isoforms was extremely low (PSI value <0.05). In order to generate as reliable a set of AS events as possible we implemented a series of stringent filters (Percentage of Samples with PSI Value ≥75, Average of PSI value ≥0.05). After screening, we obtained 34,334 AS events from 8942 genes, which indicated that one gene might have almost four AS events on average (Fig. 2b). All the detected AS events were well organized in Supplementary Table S2. Considering this, UpSet plot was generated to visualize the intersecting sets of each AS type. As shown in Fig. 2c, most of the AS events were from one gene while one gene might have up to four types of AS events. It is noteworthy that >81.6% of genes contain two or more AS events, different combinations of these AS events provides perhaps the largest potential process for enriching the transcriptome diversity. Finally, in order to intuitively visualized the integrated AS events profiling of CRC, Circos Plots were created to display the detail of AS events and its parent genes in the chromosome (Fig. 2d).
Fig. 2
Overview of AS events profiling in CRC. (a) Illustrations for seven types of AS events, including Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). (b) A number of AS events and involved genes from the CRC patients were depicted according to the AS types. Color bar represents the preliminarily detected AS events and involved genes. Black bar represents the AS events and involved genes filtered by stringent criteria. (c) UpSet plot of interactions between the seven types of detected AS events (n = 34,334) in CRC. One gene may have up to four types of alternative splicing. (d) The detail of AS events and its parent genes in chromosome was shown in Circos Plots. The ribbons, in the circos plots, represent the potential interaction between the parent gene of DEAS, and the thickness of the ribbons indicating the extent of the interaction strength. The outer circle is composed of the polyline, each polyline represents an AS event and links to the location of the parent gene in chromosomes. Different color of the short line represent the number of AS events that the corresponding gene have: blue, 1–5 AS events; black, 5–10 AS events; red, greater than or equal to 10 AS events.
Identification of CRC-related aberrant AS events
Directly comparing the expression of gene at different pathological state is an effective approach to screen hub genes that involved in the corresponding biological process. This approach has been widely used to identify CRC-related genes in many previous studies. Similarly, significantly different AS events between primary CRC and corresponding adjacent normal tissues could also play a vital role in the whole course of CRC. Thus, AS profiling obtained from 50 paired tumor and normal tissues were integrated to identify differentially expressed alternative splicing (DEAS). Eventually, a total of 421 DEAS were preliminarily screened from 372 genes with the threshold of |log2FC| >1 and adj.P.Value <0.05 (t-test and BH correction), among which there were 174 APs, 81 ESs, 76 ATs, 57 RIs, 18 ADs, 14 AAs and 1 ME (Fig. 3a; Supplementary Table S3). The detailed information of the top 40 most different AS events was listed in Table 1. Compare with recently published results that obtained from microarray [22,36,37], although we applied a series of stringent filters, more AS events were identified in the present study (Supplementary Fig. S1). More importantly, among the AS events (n = 18) that validated by experiment, we detected 12 (66.7%), and that this well above the result from Gardina et al. [37] (n = 8,44.4%) and Bisognin et al. [22] (n = 8,44.4%) (Supplementary Fig. S1). These findings suggested that RNA-seq may be a more suitable method to apply in AS study, and our RNA-seq based results are more robust and reliable.
Fig. 3
Identification of CRC-related aberrant AS. (a) The difference of AS events between paired colorectal cancer and paracancerous tissue. Volcano Plot visualizing the DEAS identified in CRC. The red and blue points in the plot represent the differentially expressed alternative splicing with statistical significance (adj P value <.05, |logFC| ≥ 1). (b) Heat map of the DEAS. The horizontal axis shows the clustering information of samples which were divided into two major clusters, and the clusters were adjacent normal tissue(N = 50) and paired tumor tissue(N = 50), respectively; the left longitudinal axis showed the clustering information of DEAS. The gradual change of color from green to red represents the expression of DEAS altered from low to high. (c) Venn diagram demonstrated the intersection set of DEAS and DEG. (d) The splice graph of some representative DEAS. The thin exon sections represent untranslated regions (UTR) and the thick exon sections represent coding regions. Exons are drawn to scale and the connecting arcs represent splice paths. (e) The difference of PSI value of AS events between primary CRC and corresponding adjacent normal tissues. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). For Fig. 3a, t-test and BH correction was used for data analysis. For Fig. 3e, Student's t-test was used for data analysis. *: P < .0001 compared with adjacent normal tissue.
Table 1
The detailed information of the top 40 most different AS events.
Symbol
AS type
exons
From exon
To exon
MeanT
MeanN
logFC
adj.P.Val
Upregulated
DUSP15
AT
11
0.365
0.053
2.774
1.48E-09
TNFRSF10C
AT
4.3
0.369
0.068
2.436
1.04E-11
CD44
ES
7:12.1:13:14
5
15
0.270
0.051
2.393
9.70E-07
S100A4
AP
4
0.258
0.050
2.364
1.18E-08
SERPINA1
AA
2.1:2.2:2.3
1.1
2.4
0.644
0.128
2.334
1.30E-24
PODNL1
AP
2
0.378
0.077
2.300
1.27E-10
ISLR
AP
2
0.669
0.139
2.266
2.75E-46
ALS2CL
AA
18.3
18.1
18.4
0.264
0.057
2.221
3.79E-10
PPFIBP2
AP
6
0.239
0.052
2.190
1.18E-11
CDIP1
AP
3.1
0.254
0.056
2.170
3.97E-07
SULT1A2
RI
1.2:1.3
1.1
1.4
0.811
0.181
2.167
1.08E-18
TNC
ES
12:13:14:15:16:19
11
20
0.433
0.098
2.138
6.97E-15
SERPINA1
AA
2.1:2.2:2.3:2.4
1.1
2.5
0.709
0.165
2.101
1.61E-23
COL6A3
ES
6
5
7
0.614
0.145
2.079
1.45E-35
CD44
ES
10:11:12.1:13:14
5
15
0.510
0.123
2.052
6.93E-18
TTLL12
AP
1
0.779
0.196
1.991
1.95E-30
SRI
AP
2
0.729
0.190
1.939
2.64E-29
CGREF1
AP
1.1
0.586
0.154
1.929
3.06E-17
COL18A1
AP
1
0.427
0.117
1.864
1.22E-05
RPL32
RI
1.2:1.3
1.1
1.4
0.429
0.118
1.860
1.12E-13
Downregulated
RBM26
ES
14
13.2
15
0.066
0.221
−1.744
2.23E-07
SULT1A3
AA
11.1
10
11.2
0.058
0.198
−1.765
1.06E-12
TTLL12
AP
12
0.221
0.804
−1.861
1.95E-30
WSB2
AP
3.1
0.136
0.497
−1.867
3.20E-20
FAM60A
AP
2
0.130
0.478
−1.873
9.08E-38
PAPSS2
ES
9
8
10
0.170
0.626
−1.878
1.45E-35
TACC1
AP
1
0.058
0.215
−1.884
3.37E-11
TNFAIP8
AP
2
0.072
0.273
−1.922
2.45E-12
PCYT1A
AT
12
0.051
0.202
−1.980
4.98E-10
NHEJ1
ES
6
5
7
0.061
0.243
−1.985
1.13E-13
SERPINB5
AT
5.2
0.062
0.248
−1.994
2.29E-09
SMTN
AP
12
0.053
0.216
−2.015
1.24E-07
SULT2B1
AP
2.1
0.133
0.542
−2.028
1.47E-16
CCL24
AP
2.1
0.130
0.539
−2.049
6.71E-12
LRRFIP1
ES
15
14
16
0.055
0.234
−2.100
9.70E-07
HNF4A
AP
2
0.108
0.491
−2.187
5.65E-25
SLC39A14
AP
2
0.111
0.518
−2.225
9.89E-34
SLC39A14
ME
5|6
4
7
0.122
0.587
−2.263
2.60E-36
ACAN
AT
10.2
0.085
0.523
−2.624
8.12E-16
E2F5
AP
2
0.052
0.368
−2.815
5.46E-14
MeanT: the mean PSI value in colorectal cancer tissue; MeanN: the mean PSI value in paired normal tissue; logFC: log2 fold change. The Adj.P value was calculated by t-test and adjusted through BH correction.
Supplementary Fig. S1
UpSet and Venn diagram displaying the overlap of AS events. The overlap of the AS events that respectively obtained from the studies of Gardina, Bisognin et al., hand-curated screenings of literature (experimentally validated) and our own results were displayed by the UpSet and Venn diagram.
Intriguingly, the proportion of AS types between DEAS and the entire AS was inconsistent. As the largest number of AS types, ES events (36.8%) only contribute 17.9% DEAS. Moreover, using unsupervised hierarchical clustering based on these DEAS, the samples of tumor and normal were clearly separated into two discrete groups which indicated that the DEAS screened above were credible (Fig. 3b). Aberrant AS events may directly affect the expression of its parent RNA, especially when AT or AP events occurred. In order to investigate the relationship between DEAS and differently expressed genes (DEG), we analyzed the DEAS that occurred in DEG. Fig. 3c summarized the results and detailed information were provided in Supplementary Table S4. As we expected, 94.4% DEAS in its corresponding DEG is AT (44.4%) and AP (50.0%). Fig. 3d depicts part of identified DEAS in splice graph, which summarises the transcript variations into a directed acyclic graph, and represents exons as rectangular nodes and splice junctions as edges. Furthermore, for intuitively showing the difference of these AS events between primary CRC and corresponding adjacent normal tissues, we generate graphs in which scatter plot is overlaid with the boxplot(Fig. 3e). Considering all of these evidence, it suggested that, like CRC-related genes, CRC-related AS events play a vital role in CRC biological and need further research.
Enrichment and interaction analysis of DEAS
It was evident that AS could directly affect the protein function through several mechanisms. Thus, we can shed light on the potential influence of DEAS by analyzing its corresponding protein. Supplementary Fig. S1 shows that specific GO categories closely associated with CRC, such as identical protein binding (Fisher's Exact Test,FDR = 0.0046), cell morphogenesis (Fisher's Exact Test,FDR = 0.0058), GTPase regulator activity (Fisher's Exact Test,FDR = 0.0040), and protein kinase activity (Fisher's Exact Test,FDR<0.0001), were significantly enriched in these genes that occurred DEAS (Supplementary Fig. S2a-c). Additionally, some KEGG pathways well known to be involved in CRC metastasis and recurrence were enriched, including Pathways in cancer (Fisher's Exact Test,FDR = 0.0036), PI3K-Akt signaling pathway (Fisher's Exact Test,FDR = 0.0024), p53 signaling pathway (Fisher's Exact Test,FDR = 0.0143) and NF-kappa B signaling pathway (FDR = 0.0021) (Supplementary Fig. S2d, e). Taken together, above results indicated that the parent genes of DEAS play an important role in regulating the CRC-related biological process. While variation on the structure of these genes' transcript (alternative splicing) may inevitably affect its protein translation, and further modify protein feature. Thus, it is necessary to investigate these AS events from the protein network point of view. PPI network analysis based on the DEAS associated genes not only demonstrated the interactive relationship in normal condition, but also revealed the potential influence of AS events to the entire network (Fig. 4).
Fig. 4
Protein-protein interaction analysis of identified DEAS. Interactome of the 372 genes showing 249 nodes and 514 edges in the PPI network in CRC. Genes were denoted as nodes in the graph and the interactions between them were presented as edges. The shape, size and color of node respectively represent AS type, the value of logFC and change pattern. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA).
DEAS evens correlation network of splicing factors
AS events are mainly regulated by splicing factors, which bind to pre-mRNAs and influence exon selection and splicing site choice and more importantly, dysregulated AS within tumor microenvironment may be orchestrated by a limited number of splicing factors [38]. Therefore, an important issue is that whether a significant fraction of these DEAS events is potentially regulated by a few key splicing factors in CRC. By hand-curated screenings of literature and databases, we firstly identified 71 splicing factors (Supplementary Table S5) which were all experimentally validated in the previous study. The expression of these splicing factors was obtained from RNA sequencing data of TCGA CRC cohort. Next, correlation analyses between expression levels of these 71 splicing factors and the PSI values of each DEAS event were performed in the CRC cohort with splicing regulatory network built among the significant correlations (|R| > 0.5; t-test, P < .05). In the splicing correlation network shown in Fig. 5A, 33 DEAS events, including 22 upregulated AS events (red dots) and 11 downregulated AS events (blue dots), were significantly correlated with the 37 splicing factors (gray dots). Interestingly, the majority of splicing factors (gray dots) were correlated with more than one AS events, among which some were playing opposite roles in the regulation of difference AS events(Fig. 5a). In addition, from the network, we can see that different splicing factors are in competition for the same binding sites (AS event), which at least partly, account for the reason why transcripts can yield several different splicing isoforms. Representative correlations between splicing factors and specific AS events were presented in the dot plots (Fig. 5b–g). For example, expression of TRA2A was negatively correlated with AT of SULT1A3 (Fig. 5C), whereas positively correlated with AP of FOXK1(Fig. 5E).
Fig. 5
Splicing correlation network in CRC. (a) DEAS evens correlation network of splicing factors. The correlation analyses between expression levels of these 71 splicing factors and the PSI values of each DEAS event were performed in the CRC cohort with splicing regulatory network built among the significant correlations. The shape, size and color of node respectively represent a type (AS event or splicing factor), the value of logFC and change pattern (upregulated or downregulated). The breadth of the line represents the extent of interaction strength. (b-g) Representative dot plots of correlations between expression of splicing factors and PSI values of AS events. Spearman's rank correlation analysis was used for non-normal distribution data (Fig. 3d and e). Person correlation was used for continuous variables that meet normal distribution (Fig. 3b, c, f and g). All the absolute value of r coefficient > 0.5 and all P < .0001.
The prognostic value of DEAS in CRC
Before studying the prognostic value of DEAS events, univariate survival tests were conducted to assess the relationship between clinicopathological features and outcome in the TCGA CRC cohort. As revealed in Supplementary Table S6, T stage (HR = 2.603, 95%CI: 1.96–3.44; Likelihood-ratio test, P < .001), N stage (HR = 1.903, 95%CI: 1.59–2.77; Likelihood-ratio test, P < .001) and TNM stage (HR = 3.041, 95%CI: 2.44–3.78; Likelihood-ratio test, P < .001) were significantly associated with OS. Meanwhile, T stage (HR = 1.992, 95% CI: 1.45–2.73; Likelihood-ratio test, P < .001) and TNM stage (HR = 2.311, 95% CI: 1.06–5.03; Likelihood-ratio test, P = .035) was significantly associated with DFS. More important, survival rates predicted by TNM stage depicted significant distinction between the Kaplan-Meier curves in both OS and DFS (Supplementary Fig. S3). The results of this preliminary assessment revealed that the survival data for the TCGA CRC cohort, although containing censored data, were informative and appropriate for use in further survival analysis.
Supplementary Fig. S3
Kaplan-Meier estimates of OS and DFS stratified by the Tumor, Node, Metastasis system and TNM stage in the TCGA CRC cohort. (a) Overall survival. (b) Disease-free survival. Local invasion stage(T1, T2, T3, T4), lymph node metastasis (N0, N1, N2), Distant metastasis (M0, M1); The differences between the two curves were determined by the two-sided log-rank test. Overall log-rank test, p-value <.0001.
The relationship between our identified DEAS and CRC prognosis was investigated in the TCGA cohort. For each DEAS events, CRC patients were divided into two groups based on the PSI value (median cut). Then univariate survival analyses for OS and DFS were conducted respectively (Supplementary Table S7). As results, a total of 37 DEAS events were found to be associated with OS, and 68 DEAS events were found to be associated with DFS. Among these prognosis related DEAS events that simultaneously associated with OS and DFS were shown in Fig. 6a. The DEAS events that significantly correlated with survival in the univariate analysis were further assessed by multivariate analysis to identify independent prognostic indicator in CRC (Supplementary Table S8). As shown in Fig. 6b, there were 4 and 11 DEAS events could be recognized as an independent prognostic indicator for OS and DFS, respectively. From the results of the above analysis, we unexpectedly found that AT in CXCL12 and RI in CSTF3 was an independent prognostic indicator for both OS and DFS in CRC cohort. Actually, as shown in Fig. 7a and b, stratifying patients according to the PSI value (median of AT in CXCL12 and RI in CSTF3) formed significant Kaplan-Meier curves in both OS and DFS survival analysis. Above results directly suggested that DEAS events are of not only important biological meaning but also potential clinical value.
Fig. 6
The prognostic value of DEAS in CRC. (a)The DEAS events that simultaneously associated with OS and DFS. Univariate analysis of DEAS on overall and disease-free survival, respectively. Unadjusted HRs (boxes) and 95% confidence intervals (horizontal lines) limited to AS events with P-value <.05. Box size is inversely proportional to the width of the confidence interval. (b) The DEAS events that present with independent prognostic value on survival. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). Cox regression (univariate and multivariate) was used for data analysis. Likelihood-ratio test was used to determine the P value. All P < .05.
Fig. 7
Kaplan-Meier curves for OS and DFS according to PSI value of AT in CXCL12 (a) and RI in CSTF3 (b). Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). Kaplan-Meier analysis and log-rank test were applied to compare the survival outcomes. All P < .05.
AS clusters associated with prognosis and molecular subtypes
Our findings demonstrated that the expression of each DEAS varied considerably at the individual level and partly reflected the prognosis in patient with CRC. We wonder whether distinct patterns of AS could be discerned based on the 421 DEAS by performing consensus unsupervised analysis of all samples. The optimal number of clusters was determined by combining Elbow method and Gap statistic method, and the more balanced partition, as suggested, appeared to be for k = 4 (Fig. 8a). Consequently, four clusters of samples were determined as follow: C1 (n = 164, 26.1%), C2 (n = 161, 25.6%), C3 (n = 242, 38.5%), C4 (n = 57, 9.8%) (Fig. 8b). The consensus matrix heatmap revealed the identified four clusters with significant interconnectivity, among which C2, C3 and C4 appeared as well individualized clusters, whereas there was more classification overlap between C1 and C3 (Fig. 8b). The association of clusters with anatomoclinical and DNA alterations data are shown in Fig. 8c, which revealed that on the whole the distribution of different CMS, TNM stage, KRASm and survival status (OS and DFS) in CRC samples between clusters was not random. For example, tumors classified as C3 was more frequently KRAS mutant and enriched CMS4 and advanced TNM stage. As the previously defined consensus molecular subtypes of CRC, CMS4 has been proved to comprise the more mesenchymal-like cancers, with high stromal infiltration and poor patient prognosis [7,39]. Thus, Kaplan-Meier analysis was conducted to assess the relationship between cluster and prognosis. The results suggested that clusters were associated with distinct patterns of survival (Fig. 8d), among which C3 were both associated with poor outcome in OS and DFS analysis. In addition, no association between clusters and MSI was found, but CMS1 comprises most tumors with MSI (Fig. 8e). Collectively, these findings suggest that there is considerable variability in the nature of the AS across CRC —partly determined by molecular characteristics of a primary tumor—and that this influences clinical outcome.
Fig. 8
AS clusters associated with prognosis and molecular subtypes. (a) Elbow and Gap statistic analysis for different numbers of clusters (k = 2 to 8). (b) Consensus matrix heatmap defined four clusters of samples for which consensus values range from 0 (in white, samples never clustered together) to 1 (dark blue, samples always clustered together). (c) Heatmap of the 421 DEAS ordered by cluster, with annotations associated with each cluster, data analyzed using chi-squared test. (d) Kaplan-Meier survival analysis of patients within different clusters on both OS and DFS. Depicted P-values are from log-rank tests. (e) Spine plots of the relationship between molecular subtypes and MSI hypotype, AS cluster respectively. P-values are from Kruskal-Wallis tests.
Overview of AS events profiling in CRC. (a) Illustrations for seven types of AS events, including Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). (b) A number of AS events and involved genes from the CRC patients were depicted according to the AS types. Color bar represents the preliminarily detected AS events and involved genes. Black bar represents the AS events and involved genes filtered by stringent criteria. (c) UpSet plot of interactions between the seven types of detected AS events (n = 34,334) in CRC. One gene may have up to four types of alternative splicing. (d) The detail of AS events and its parent genes in chromosome was shown in Circos Plots. The ribbons, in the circos plots, represent the potential interaction between the parent gene of DEAS, and the thickness of the ribbons indicating the extent of the interaction strength. The outer circle is composed of the polyline, each polyline represents an AS event and links to the location of the parent gene in chromosomes. Different color of the short line represent the number of AS events that the corresponding gene have: blue, 1–5 AS events; black, 5–10 AS events; red, greater than or equal to 10 AS events.Identification of CRC-related aberrant AS. (a) The difference of AS events between paired colorectal cancer and paracancerous tissue. Volcano Plot visualizing the DEAS identified in CRC. The red and blue points in the plot represent the differentially expressed alternative splicing with statistical significance (adj P value <.05, |logFC| ≥ 1). (b) Heat map of the DEAS. The horizontal axis shows the clustering information of samples which were divided into two major clusters, and the clusters were adjacent normal tissue(N = 50) and paired tumor tissue(N = 50), respectively; the left longitudinal axis showed the clustering information of DEAS. The gradual change of color from green to red represents the expression of DEAS altered from low to high. (c) Venn diagram demonstrated the intersection set of DEAS and DEG. (d) The splice graph of some representative DEAS. The thin exon sections represent untranslated regions (UTR) and the thick exon sections represent coding regions. Exons are drawn to scale and the connecting arcs represent splice paths. (e) The difference of PSI value of AS events between primary CRC and corresponding adjacent normal tissues. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). For Fig. 3a, t-test and BH correction was used for data analysis. For Fig. 3e, Student's t-test was used for data analysis. *: P < .0001 compared with adjacent normal tissue.Protein-protein interaction analysis of identified DEAS. Interactome of the 372 genes showing 249 nodes and 514 edges in the PPI network in CRC. Genes were denoted as nodes in the graph and the interactions between them were presented as edges. The shape, size and color of node respectively represent AS type, the value of logFC and change pattern. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA).Splicing correlation network in CRC. (a) DEAS evens correlation network of splicing factors. The correlation analyses between expression levels of these 71 splicing factors and the PSI values of each DEAS event were performed in the CRC cohort with splicing regulatory network built among the significant correlations. The shape, size and color of node respectively represent a type (AS event or splicing factor), the value of logFC and change pattern (upregulated or downregulated). The breadth of the line represents the extent of interaction strength. (b-g) Representative dot plots of correlations between expression of splicing factors and PSI values of AS events. Spearman's rank correlation analysis was used for non-normal distribution data (Fig. 3d and e). Person correlation was used for continuous variables that meet normal distribution (Fig. 3b, c, f and g). All the absolute value of r coefficient > 0.5 and all P < .0001.The prognostic value of DEAS in CRC. (a)The DEAS events that simultaneously associated with OS and DFS. Univariate analysis of DEAS on overall and disease-free survival, respectively. Unadjusted HRs (boxes) and 95% confidence intervals (horizontal lines) limited to AS events with P-value <.05. Box size is inversely proportional to the width of the confidence interval. (b) The DEAS events that present with independent prognostic value on survival. Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). Cox regression (univariate and multivariate) was used for data analysis. Likelihood-ratio test was used to determine the P value. All P < .05.Kaplan-Meier curves for OS and DFS according to PSI value of AT in CXCL12 (a) and RI in CSTF3 (b). Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). Kaplan-Meier analysis and log-rank test were applied to compare the survival outcomes. All P < .05.The detailed information of the top 40 most different AS events.MeanT: the mean PSI value in colorectal cancer tissue; MeanN: the mean PSI value in paired normal tissue; logFC: log2 fold change. The Adj.P value was calculated by t-test and adjusted through BH correction.AS clusters associated with prognosis and molecular subtypes. (a) Elbow and Gap statistic analysis for different numbers of clusters (k = 2 to 8). (b) Consensus matrix heatmap defined four clusters of samples for which consensus values range from 0 (in white, samples never clustered together) to 1 (dark blue, samples always clustered together). (c) Heatmap of the 421 DEAS ordered by cluster, with annotations associated with each cluster, data analyzed using chi-squared test. (d) Kaplan-Meier survival analysis of patients within different clusters on both OS and DFS. Depicted P-values are from log-rank tests. (e) Spine plots of the relationship between molecular subtypes and MSI hypotype, AS cluster respectively. P-values are from Kruskal-Wallis tests.
Discussion
The majority of genes in the human genome consists of multiple exons interspersed with introns that undergo splicing to form mature mRNA and protein products [18]. Although AS provides cells with a method to diversify the proteome, increasing evidence suggests that AS plays critical roles in the initiation or maintenance of human disease, including CRC [37,40,41]. The cancerogenesis of CRC is a complex multistep process, involving the unbalanced expression of splicing variants or the failure to properly express the correct isoforms. The adenoma-carcinoma process classically illustrates the multistep etiology of CRC. As the frequently involved genes, APC, K-Ras, and TP53 are all express splice isoforms whose functional properties are distinct and even antagonistic. For example, several AS events of APC have been described that result in proteins with molecular weights ranging from 90 to 300 kDa [20]. Using nested 5′ and 3′ primer to examined 24 patients who with a germline mutation in the APC gene, De Rosa and colleagues identified nine novel splice isoforms [42]. One of the transcripts contained an additional exon termed exon 1A; its inclusion leads to a premature stop codon in exon 2. The inclusion of exon 1A was found to be 3.5-fold higher in CRC versus paired normal tissue. The potential significance is that greater inclusion of exon 1A could effectively result in less APC protein being expressed because less productive mRNAs are synthesized overall. Additionally, many previous studies also reported the biological effect of AS events in K-Ras, especially alternation splicing of the fourth coding exon [43]. Splicing variant contains exon 4A results in a proapoptotic protein, while the inclusion of exon 4B lead to produce an antiapoptotic K-Ras. Both the two K-Ras variants are coexpressed in many tissues, but their ratio is changed in sporadic CRC favoring the antiapoptotic 4B isoform [44]. Moreover, the TP53 gene was thought for a long time to encode a single protein, but it is now abundantly clear that it is extensively alternatively spliced [45]. A remarkably complex series of splice isoforms have been described, and several of these can regulate the transcriptional activity of p53 [46,47]. Besides the three hub genes (APC, K-Ras, and TP53), research of aberrant AS in CRC-related genes, such as CD44,KLF6,NFS2, VEGF and Bcl-x et al., have been frequently reported in recent years [48]. More importantly, increasing data from genome-wide studies suggest that >90% of human genes undergo AS. Taken together, these results confirmed that not only the expression of genes, but also the balance of its splice isoforms need to be better investigated. Because this may immensely expand the scope in the screening of potential biomarkers and therapeutic targets.With the advancement of high throughput technology in the last decades, great success has been made in the research of diversity of AS profiling in CRC. For example, Gardina et al. analyzed 20 paired tumor-normal colon cancer samples using microarray, and reported 189 putative AS events occurred in 168 genes. More recently, by using whole transcriptome analysis, Bisognin et al. revealed that 206 genes with probable AS events in CRC development and progression were identified, and that are involved in processes and pathways relevant to tumor biology, as cell-cell and cell-matrix interactions [22]. Similar to what we observed in the present study, 372 genes with 421 AS events were identified associated with CRC initiation or/and maintenance. Some specific GO categories and KEGG pathways, such as protein binding, cell morphogenesis and PI3K-Akt signaling pathway, were enriched in these genes that occurred DEAS.Despite a series of stringent filters (Percentage of Samples with PSI Value ≥75, Average of PSI value ≥0.05; |log2FC| >1 and adj.P.Value <0.05) were implemented in the process of screening, more AS events were identified in the present study (Supplementary Fig. S1). Additionally, when compared the results that obtained from microarray (Gardina's [37] and Bisognin's [22]), we found that there was little overlap between the gene that occurred DEAS in CRC (Supplementary Fig. S1). More importantly, through hand-curated screenings of literature, we retrieved 18 AS events that were experimentally validated. Among these AS events, we detected 12 (66.7%), and that this well above the result from Gardina et al. [37] (n = 8,44.4%) and Bisognin et al. [22] (n = 8,44.4%) (Supplementary Fig. S1). These findings suggested that RNA-seq may be a more suitable method to apply in AS study.Potential explanation for this results may be that RNA-Seq enabling the quantitative measure of known and novel AS isoforms, and provides better resolution, deeper coverage and higher accuracy. Recent comparisons of RNA-seq with qPCR for differential expression analysis have found good overall agreement [49]. However, measurement performance of RNA-seq depends in large part on analysis pipeline [19,49,50]. Primary results from the Sequencing Quality Control project revealed that appropriate choice of analysis pipeline could allow RNA-Seq with superior performance in many applications [49]. In this present study, a recent developed analysis pipeline (integration tool), known as SpliceSeq, which could facilitate accurate analysis of complex or low-frequency AS events, was applied to detect AS events. Collectively, although this present study is a computational work that solely based on RNA-seq data, we consider the results the most robust and reliable currently available for CRC.Actually, we preliminarily analysis detected a total of 82,411 AS events from 13,201 genes, which accounts for approximately 66% of human genes. More importantly, deep sequencing co-operates with reliable bioinformatics approach could provide more detail on the structural variation of the splice isoform. Based on these information, AS events can be roughly divided into seven types and clearly depicted by the splice graph. According to our results, the vast majority of detected AS events belong to ES (36.8%). However, ES events only account for 17.9% of DEAS. This results is similar to many previous studies [26,27,51] and revealed to some extent that alternative splicing is not caused by transcription errors. Aberrant expression of trans-acting factors is known to trigger differences in AS patterns [52]. Thus, integrating the expression of splicing factors and DEAS provided an approach to address the underlying mechanism of the splicing pathway involved in CRC. The splicing correlation networks in CRC showed distinguished interactions between splicing factors and DEAS events. It is worth noting that the majority of splicing factors (gray dots) were correlated with more than one DEAS events, among which some were playing opposite roles in the regulation of difference AS events (Fig. 5a). This interesting phenomenon brought the hypothesis that the regulation effect of some splicing factors, such as TRA2A in CRC, relies on the joint action of splicing factors itself and its cis-acting regulatory elements.Due to the potential significance of AS in cancer biology, its clinical relevance in malignancies has aroused increasing attention. The diagnostic implication of AS has been systematically demonstrated in cancers of ovarian and lung [26,27]. In addition, the prognostic value of CD44v6, one of the splice isoform of CD44, in CRC has also been revealed [53]. Moreover, it has been clarified recently that cancer-specific splice variants may potentially be used as diagnostic, prognostic, and predictive biomarkers as well as therapeutic targets. However, previous studies have mainly focused on single genes, survival associated AS in CRC remains largely unstudied.To the best of our knowledge, the current study is the first to perform a systematic identification and analysis of survival associated AS events in primary CRC tissues. As results, a total of 37 DEAS events were found to be associated with OS, and 68 DEAS events were found to be associated with DFS (P < .05, Supplementary Table S6). The genes with survival associated AS events, including CD44, CXCL12, EGFL7, IL4R and TCF7, which itself play critical roles in cancer biology [[54], [55], [56], [57]]. More importantly, this genome-wide approach provides numerous prognostic AS events which might have an underlying mechanism in CRC initiation or maintenance. For example, a survival (OS and DFS) associated AS event involving exon 5 of CXCL12 (CXCL12-AT, Fig. 3d) could directly change the nucleic acid sequences and the activity of its protein. This protein functions as the ligand for the G-protein coupled receptor and plays a critical role in determining the metastatic destination of CRC [58]. Despite six different isoforms of CXCL12 have been reported, most studies focus only on the CXCL12-ES (exon: 2.2:3.1:5.1; Supplementary Table S3) or do not distinguish among isoforms [59]. Interestingly, according to our results, only CXCL12-AT, instead of the most common AS events (CXCL12-ES), was prognostic for OS and showed the same trend for DFS. This finding suggested that the expression changes of low abundance AS events may also have important biological meaning in CRC, and need more attention. Another AS events that were identified as an independent prognostic for both OS and DFS is CSTF3-RI. The protein encoded by CSTF3 is one of three (including CSTF1 and CSTF2) cleavage stimulation factors that combine to form the cleavage stimulation factor complex (CSTF) [60]. The retained intron (CSTF3-RI) could directly change the nucleic acid sequences of 3′-untranslated region. This could influence the function of CSFT protein while result in profound effect on the downstream pathway through the mechanism of competitive endogenous [61,62]. As far as we know, the exact role of RI in CSTF3 and its potential biological value has not been reported so far. Our results suggested that deciphering the underlying mechanisms of CSTF3-RI may provide valuable clues for seeking the possible therapeutic targets of CRC.Another interesting finding of the present study was that the distribution of different CMS, TNM stage, KRASm and survival status (OS and DFS) between AS clusters was not random. CRC sample classified as C3 was more frequently KRAS mutant and enriched CMS4 and advanced TNM stage. Moreover, C3 were both associated with poor outcome in OS and DFS analysis. In the consensus molecular subtypes, CMS1-immune comprises most tumors with microsatellite instability (MSI) and is characterized by infiltration of activated immune cells [35]. CMS4 comprises the more mesenchymal-like cancers, with high stromal infiltration and poor patient prognosis [35]. These findings suggested that there is considerable variability in the nature of the DEAS across CRC —partly determined by molecular characteristics of the primary tumor—and that this influences clinical outcome. In addition, as shown in Fig. 7c and e, CMS1 comprises most tumors with MSI, which is in accordance with previously published conclusion [7] and partly validate the reliability of our results.Finally, despite the rigorous quality controls result in the exclusion of the vast majority of AS events, this allowed us to be confident that the screened splice events are ubiquitous in CRC patient. Based on that, 421 AS events were identified with significant difference between CRC and paired normal tissue. Further enrichment analysis confirmed that the parent genes of these AS events have great potential to play a vital role in CRC. In addition, interesting splicing correlation network provides novel insights into how these CRC-related AS events were potentially regulated by key splicing factors. More importantly, the survival associated AS events, which may be most valuable in deciphering the underlying mechanisms of AS in the oncogenesis of CRC, provide clues of therapeutic targets to further validations.The following are the supplementary data related to this article.UpSet and Venn diagram displaying the overlap of AS events. The overlap of the AS events that respectively obtained from the studies of Gardina, Bisognin et al., hand-curated screenings of literature (experimentally validated) and our own results were displayed by the UpSet and Venn diagram.Enrichment analysis of the parent gene of DEAS. (A-D) GO and KEGG pathway analyses of parent genes of identified DEAS. The vertical axis represents Go or KEGG pathway annotations. The horizontal axis represents the number of genes assigned to the corresponding annotation. (a) Biological process; (b) cellular components; (c) molecular function; and (d) KEGG pathway. (e) Enrichment plots for significant pathways.Kaplan-Meier estimates of OS and DFS stratified by the Tumor, Node, Metastasis system and TNM stage in the TCGA CRC cohort. (a) Overall survival. (b) Disease-free survival. Local invasion stage(T1, T2, T3, T4), lymph node metastasis (N0, N1, N2), Distant metastasis (M0, M1); The differences between the two curves were determined by the two-sided log-rank test. Overall log-rank test, p-value <.0001.
Supplementary Table S1
Clinical features for the CRC patients in the TCGA cohort.
Supplementary Table S2
The detailed information of all the detected AS events.
Supplementary Table S3
The detailed information of the detected differently expressed AS events.
Supplementary Table S4
The relationship between DEAS and DEG.
Supplementary Table S5
The detail of the 71 experimentally validated splicing factors.
Supplementary Table S6
Univariate analysis of clinicopathological features for OS and DFS.
Supplementary Table S7
Univariate survival analysis of DEAS for OS and DFS.
Supplementary Table S8
Multivariate analysis of CRC prognosis-related DEAS events.
Authors: Li Peng; Yuwei Liu; Jing Chen; Mengxin Cheng; Ying Wu; Min Chen; Ya Zhong; Dan Shen; Ling Chen; Xujun Ye Journal: BMC Med Genomics Date: 2022-07-02 Impact factor: 3.622