BACKGROUND: Colorectal cancer (CRC) has been divided into 4 consensus molecular subtypes (CMSs), of which CMS4 has the mesenchymal identity and the highest relapse rate. Our goal is to develop a prognostic signature by integrating the immune system and mesenchymal modalities involved in CMS4. METHODS: The gene expression profiles collected from 5 public datasets were applied to this study, including 1280 samples totally. Network analysis was applied to integrate the mesenchymal modalities and immune signature to establish an immune-based prognostic signature for CRC (IPSCRC). RESULTS: We identified 6 immune genes as key factors of CMS4 and established the IPSCRC. The IPSCRC could significantly divide patients into high- and low- risk groups in terms of relapse-free survival (RFS) and overall survival (OS) and in discovery (RFS: P < .0001) and 4 independent validation sets (RFS range: P = .01 to <.0001; OS range: P = .02-.0004). After stage stratification, the IPSCRC could still distinguish poor prognosis patients in discovery (RFS: P = .04) and validation cohorts (RFS range: P = .04-.007) within stage II in terms of RFS. Further, in multivariate analysis, the IPSCRC remained an independent predictor of prognosis. Moreover, Macrophage M2 was significantly enriched in the high-risk group, while plasma cells enriched in the low-risk group. CONCLUSION: We propose an immune-based signature identified by network analysis, which is a promising prognostic biomarker and help for the selection of CRC patients who might benefit from more rigorous therapies. Further prospective studies are warranted to test and validate its efficiency for clinical application.
BACKGROUND: Colorectal cancer (CRC) has been divided into 4 consensus molecular subtypes (CMSs), of which CMS4 has the mesenchymal identity and the highest relapse rate. Our goal is to develop a prognostic signature by integrating the immune system and mesenchymal modalities involved in CMS4. METHODS: The gene expression profiles collected from 5 public datasets were applied to this study, including 1280 samples totally. Network analysis was applied to integrate the mesenchymal modalities and immune signature to establish an immune-based prognostic signature for CRC (IPSCRC). RESULTS: We identified 6 immune genes as key factors of CMS4 and established the IPSCRC. The IPSCRC could significantly divide patients into high- and low- risk groups in terms of relapse-free survival (RFS) and overall survival (OS) and in discovery (RFS: P < .0001) and 4 independent validation sets (RFS range: P = .01 to <.0001; OS range: P = .02-.0004). After stage stratification, the IPSCRC could still distinguish poor prognosis patients in discovery (RFS: P = .04) and validation cohorts (RFS range: P = .04-.007) within stage II in terms of RFS. Further, in multivariate analysis, the IPSCRC remained an independent predictor of prognosis. Moreover, Macrophage M2 was significantly enriched in the high-risk group, while plasma cells enriched in the low-risk group. CONCLUSION: We propose an immune-based signature identified by network analysis, which is a promising prognostic biomarker and help for the selection of CRC patients who might benefit from more rigorous therapies. Further prospective studies are warranted to test and validate its efficiency for clinical application.
Colorectal cancer (CRC) is one of the most common types of cancer worldwide, has a high incidence in developed countries, and is the fourth leading cause of cancer-related death, with a mortality rate of about 608,000 per year.[ By 2030, there will be 2.2 million new cases of CRC worldwide, resulting in 1.1 million deaths.[ Despite new detection methods and improved treatment strategies for CRC, the 5-year survival rate is about 55%.[ Surgical is still the priority treatment. Some patients may have local recurrence or distant metastasis after surgery. At the same time, patients with similar clinical evaluations showed different clinical progression.[ The genetic heterogeneity of patients greatly affects intrinsic clinical diversity.[In the past, researchers have identified many prognostic signature genes, as well as constructed multigene-based prognostic signatures that can divide CRC into different risk groups.[ However, due to the genetic heterogeneity of CRC, the prediction effect of most markers is not as expected and cannot be widely used in clinical practice. Recently, CRC has been classified into 4 consensus molecular subtypes (CMSs) with different molecular characteristics and clinical features, of which CMS4 has mesenchymal modalities.[ Patients within CMS4 had the highest recurrence rate and the worst relapse-free survival (RFS). Therefore, the intrinsic characteristics of the more malignant CMS4 could potentially be used to assess the risk of recurrence in CRC patients, further guiding more accurate and targeted treatment.There is accelerating evidence that tumor microenvironment (TME) plays an important role in the development, progression, and metastasis of cancer.[ Immunotherapy for immune checkpoints is widely studied.[ At the same time, previous studies have revealed that the immune system can be used to assess the prognosis of CRC.[ Therefore, based on immune genes, it is possible to develop markers for the evaluation of CRC prognosis.In this study, we applied network analysis to integrate mesenchymal modalities and immune signature genes from the ImmPort database underlying CMS4. The master regulator analysis showed that 6 immune genes were the key factors of CMS4. We pooled and analyzed 5 public cohorts containing 1280 CRC patients to develop and validate an immune gene-based prognostic signature for CRC (IPSCRC) using these 6 immune genes. Although immune prognostic markers for CRC have been reported,[ no research has been done for risk stratification by integrating the characteristics of the mesenchymal subtype. The robustness and reliability of our model were proved by sufficient verification in 4 independent data sets. In addition, we conducted a comprehensive analysis to investigate the intrinsic biological and clinical relevance of IPSCRC. Our signature combines the mesenchymal modalities involved in CMS4 and would be used to screen CRC patients who may benefit from more rigorous treatment.
Materials and methods
Ethical approval
The researchers were authorized to conduct the study by the Ethics Committee of the Beilun People's Hospital, Ningbo, China. All procedures were implemented in accordance with the Declaration of Helsinki and relevant policies in China.
Patient series
The gene expression profiles (GEPs) from 5 independent datasets were comprehensively analyzed, containing 1280 cases. The complete lists of all GEPs are shown in the Supplemental Table 1. These datasets involved patients from the GSE14333 (n = 290),[ the GSE17538 (n = 232),[ the GSE39582 (n = 564),[ the GSE33113 (n = 90),[ and the GSE37892 (n = 104).[ The expression data of all cohorts together with the corresponding clinical parameters were downloaded from Gene Expression Omnibus (GEO). The molecular subtyping information for all cohorts was retrieved from Guinney study.[ The detailed clinical characteristics of the 5 datasets were described in Table 1. The design and workflow of this study were illustrated in Fig. 1A.
Table 1
Patients’ characteristics in public data sets.
Figure 1
Network inference identifies 6 immune signature genes as key regulators of the mesenchymal subtype in CRC. Study design (A). The integrated network displays the relationships between the 6 immune signature genes and the regulated EMT signature genes (B). CRC = colorectal cancer; EMT = epithelial-mesenchymal transition.
Patients’ characteristics in public data sets.Network inference identifies 6 immune signature genes as key regulators of the mesenchymal subtype in CRC. Study design (A). The integrated network displays the relationships between the 6 immune signature genes and the regulated EMT signature genes (B). CRC = colorectal cancer; EMT = epithelial-mesenchymal transition.
Expression data preprocessing
GEPs were downloaded from GEO by “GEOquery” (R package, version 1.0.7)[ and normalized with the robust multiarray analysis (RMA). For each cohort, the GEPs were collapsed from probe IDs to genes symbols, if multiple probe IDs correspond to the same genes symbol, the one with the highest mean value was kept as the representative of the corresponding gene.[
Integrated network analysis
Immune genes (IRGs) were downloaded from the ImmPort database.[ IRGs measured by all cohorts were kept. Network analysis was applied to integrate mesenchymal modalities and immune genes underlying CMS4. Together, we used the GSE14333 dataset as the training cohort. Fourty five immune genes (log2FC > 0.75, P < .05) and 1319 target genes (log2FC > 0.5, P < .05) differentially expressed by comparing CMS4 with the other 3 subtypes (CMS1, CMS2, and CMS3). Integrated network analysis was performed by the “RTN” (R package, version 2.10.0).[ Master regulator analysis (MRA) was done to examine significantly overrepresented epithelial-mesenchymal transition (EMT) genes[ within each immune gene's regulon. Six immune genes of top significance were kept as the key factors of CMS4.
Development of the immune-based prognostic signature for CRC (IPSCRC)
Six immune genes are differentially up-expressed in the poorest survival subtype and are the master regulatory factors of the mesenchymal subtype-specific genes (including EMT genes). The Cox proportional-hazards model was applied to test their association with relapse-free survival. Based on these 6 immune genes, we develop a Cox-model named the immune-based prognostic signature for CRC (IPSCRC) as follows: risk score = (0.2304 × PROK1) + (0.2989 × THBS1) + (0.3787 × FGF11) + (0.2587 × CRP) + (–0.1192 × S100A14) + (–0.3350 × CCL19).
Validation of the IPSCRC
The IPSCRC score was further evaluated in the 4 independent validation cohorts in terms of RFS and OS by the log-rank test, respectively. The IPSCRC then was evaluated with other clinical parameters in the uni- and multivariate Cox analysis. In the multivariable Cox regression, sex, tumor stages, tumor locations, microsatellite instability (MSI) status, and B-Raf proto-oncogene serine/threonine kinase (BRAF) mutation status were included as covariates.
Profiling of immune cells infiltration
To analyze the immunobiological characteristics of high- and low-risk groups, we used CIBERSORT,[ to characterize immune cells’ abundance of tumor tissue GEPs. Based on a set of reference immune cell GEPs, CIBERSORT used support vector regression[ to deconvolute tumor tissue gene expression profile with each type of immune cell enrichment. More specifically, standardized gene expression profiles were submitted to the CIBERSORT Web portal (http://cibersort.stanford.edu/) with 1000 permutations. For each sample, CIBERSORT quantified the relative proportions of 22 infiltrated immune cell types.
Gene set enrichment analysis (GSEA)
GSEA[ was conducted using “fgsea” (Bioconductor package, version 1.12.0) with 1000 permutations. Gene sets were retrieved from the Molecular Signature Database (MSigDB hallmark and kegg, version 7).[ A P-value below .05 was used to choose significant gene sets.
Statistical analysis
Continuous variables were compared using the Wilcoxon signed-rank test or Kruskal-Wallis rank-sum test. Kaplan-Meier analysis was performed using the log-rank test using “survival” (R package, version 2.41.3). Uni- and multivariable analyses were conducted by the Cox proportional hazards model. For all tests, a P-value below .05 was used to choose significant gene sets. Statistical significance is presented as following ∗P < .05, ∗∗P < .01, ∗∗∗P < .001. All the statistical tests were conducted using R (version 3.6.1).
Results
Integrative analysis reveals 6 immune genes as master regulators for CMS4 of CRC
CRC is a molecularly heterogeneous disease. In recent studies,[ 4 consensus molecular subtypes (CMSs) have been identified, among which CMS4 has the highest recurrence rate and the worst relapse-free survival (RFS) (Supplemental Fig. 1). To investigate the immune system role underlying CMS4, a total of 1280 patients with CRC from 5 independent public cohorts were included (Table 1). We applied network analysis to integrate mesenchymal modalities and immune genes in the GSE14333 cohort (Fig. 1A). The networks consist of immune genes that are significantly up expressed in CMS4 compared with the other 3 subtypes and were found to regulate most of the mesenchymal specific target genes (Fig. 1B). Master regulator analysis (MRA) identified 6 immune genes (PROK1, THBS1, FGF11, CRP, S100A14, CCL19) as the key factors of CMS4 (Supplemental Table 2). These 6 immune genes are significantly up-expressed in CMS4 in the training cohort and validation dataset containing molecular subtyping information (Supplemental Fig. 2). Results from the univariable Cox proportional analysis demonstrated strong prognostic values of the 6 immune genes for RFS (Supplemental Fig. 3). Therefore, these 6 immune genes are key factors of the mesenchymal modalities and can be potentially applied for risk assessment of CRC patients.Using the GSE14333 cohort as the training set, we defined the IPSCRC using Lasso Cox proportional hazards regression of these 6 immune genes: risk score = (0.2304 × PROK1) + (0.2989 × THBS1) + (0.3787 × FGF11) + (0.2587 × CRP) + (–0.1192 × S100A14) + (–0.3350 × CCL19). Risk scores were calculated in all the training and validation cohorts (Supplemental Table 3). For detecting CMS4, the IPSCRC achieved high AUC values for all validation cohorts (Fig. 2). The upper-quartile risk value was set as the cut-off to separate patients into high- and low-risk groups across all datasets. In the training set, the high-risk group displayed a worse RFS (hazard ratio [HR]: 3.65, 95% confidence interval [CI]: 2.09–6.36; P = 1.07 × 10−6) (Fig. 3A and Supplemental Table 3). When considering patients with stage II CRC only, the IPSCRC remained prognostic in terms of RFS for the training set (HR: 2.92, 95% CI: 1.01–8.43; P = 3.74 × 10−2) (Fig. 4A and Supplemental Table 3).
Figure 2
ROC curves of the IPSCRC. ROC curves for the detection efficiency of CMS4 in GSE39582 (A), GSE17538 (B), GSE33113 (C), and GSE37892 (D). ROC = receiver operating characteristic curve, IPSCRC = immune-based prognostic signature for CRC.
Figure 3
Kaplan-Meier plots showing differences in relapse-free survival among different risk groups within GSE14333 (A), GSE17538 (B), GSE39582 (C), GSE33113 (D), GSE37892 (E), and (F) the combination of all cohorts. P-values are based on log-rank tests.
Figure 4
Kaplan-Meier plots showing differences in relapse-free survival among different risk groups for stage II CRC within GSE14333 (A), GSE17538 (B), GSE39582 (C), GSE33113 (D), and (E) the combination of these 4 cohorts. P-values are based on log-rank tests. CRC = colorectal cancer.
ROC curves of the IPSCRC. ROC curves for the detection efficiency of CMS4 in GSE39582 (A), GSE17538 (B), GSE33113 (C), and GSE37892 (D). ROC = receiver operating characteristic curve, IPSCRC = immune-based prognostic signature for CRC.Kaplan-Meier plots showing differences in relapse-free survival among different risk groups within GSE14333 (A), GSE17538 (B), GSE39582 (C), GSE33113 (D), GSE37892 (E), and (F) the combination of all cohorts. P-values are based on log-rank tests.Kaplan-Meier plots showing differences in relapse-free survival among different risk groups for stage II CRC within GSE14333 (A), GSE17538 (B), GSE39582 (C), GSE33113 (D), and (E) the combination of these 4 cohorts. P-values are based on log-rank tests. CRC = colorectal cancer.To verify the prognostic power of the IPSCRC, we calculated the survival difference within the 2 risk groups in 4 validation cohorts. As expected, the IPSCRC significantly stratified patients into high- and low-risk groups in terms of RFS (HR range: 2.16 [95% CI: 1.59–2.93; P = 4.20 × 10−7] to 3.54 [95% CI: 2.08–6.00; P = 5.68 × 10−7]) (Fig. 3B–E and Supplemental Table 3) and OS (HR range: 1.64 [95% CI: 1.07–2.51; P = 2.19 × 10−2] to 1.71 [95% CI: 1.26–2.32; P = 4.49 × 10−4]) (Fig. 5A–B and Supplemental Table 3) in the 4 validation cohorts. When considering patients with stage II CRC only, the IPSCRC remained prognostic in terms of RFS for validation cohorts (HR range: 2.08 [95% CI: 1.20–3.59; P = 7.28 × 10−3] to 3.32 [95% CI: 1.00–11.0; P = 3.78 × 10−2]) (Fig. 4B–D and Supplemental Table 3). In the meta-analysis for all datasets, the prognostic effects of the IPSCRC are more obvious in terms of RFS (HR: 2.58, 95% CI: 2.08–3.21; P < 1.00 × 10−22) (Fig. 3F), stage II RFS (HR: 2.50, 95% CI: 1.70–3.68; P = 1.38 × 10−6) (Fig. 4E), and OS (HR: 1.71, 95% CI: 1.34–2.19; P = 1.56 × 10−5) (Fig. 5C). Furthermore, it remains an independent predictor of prognosis in the uni- and multivariate Cox model, after adjusting for sex, tumor stages, tumor locations, MSI status, and BRAF mutation status (Table 2).
Figure 5
Kaplan-Meier plots showing differences in overall survival among different risk groups within GSE17538 (A), GSE39582 (B), and (C) the combination of these 2 cohorts. P-values are based on log-rank tests.
Table 2
Univariate and multivariate analysis of immune signature and clinicopathological factors.
Kaplan-Meier plots showing differences in overall survival among different risk groups within GSE17538 (A), GSE39582 (B), and (C) the combination of these 2 cohorts. P-values are based on log-rank tests.Univariate and multivariate analysis of immune signature and clinicopathological factors.
In silico functional assessment of the IPSCRC
To gain insight into the biological differences between risk groups, we performed immune cell infiltration and GSEA analyses. We observed a significantly higher proportion of Macrophage M2 in the high-risk group and a significantly higher enrichment of plasma cells in the low-risk group (Fig. 6A–B). Furthermore, these 2 risk groups’ specific immune cell infiltration was also validated in validation cohorts (Supplemental Fig. 4). Enrichment analysis between high- and low- risk groups identified that many mesenchymal phenotype-related pathways, including the TGF-beta signaling, epithelial-mesenchymal transition, and focal adhesion, were positively enriched in the high-risk group (Supplemental Fig. 5 and Supplemental Table 4). When compared with a clinically applicable and commercialized biomarker, the IPSCRC achieved a higher C-index compared with Oncotype Dx Colon Cancer[ in training and validation cohorts (Fig. 7).
Figure 6
Immune infiltration status between the 2 immune risk groups. (A) Immune infiltration status for different immune risk groups. (B) The proportion level of plasma and Macrophage M2 for different immune risk groups. For every immune cell subset, the Wilcoxon test P-value comparing the high- versus low- immune risk groups are shown. ∗∗∗P < .001.
Figure 7
C-index for IPSCRC. C-index comparison between IPSCRC and Oncotype DX colon cancer (A). C-index values in different cohorts (B). IPSCRC = immune-based prognostic signature for CRC.
Immune infiltration status between the 2 immune risk groups. (A) Immune infiltration status for different immune risk groups. (B) The proportion level of plasma and Macrophage M2 for different immune risk groups. For every immune cell subset, the Wilcoxon test P-value comparing the high- versus low- immune risk groups are shown. ∗∗∗P < .001.C-index for IPSCRC. C-index comparison between IPSCRC and Oncotype DX colon cancer (A). C-index values in different cohorts (B). IPSCRC = immune-based prognostic signature for CRC.
Discussion
Colorectal cancer (CRC) is one of the most common types of cancer worldwide and the fourth leading cause of cancer-related death.[ In the past, researchers have constructed numerous multigene-based prognostic signatures that can divide CRC into different risk groups.[ However, due to the genetic heterogeneity of CRC, the prediction effect of most markers is not as expected. Therefore, a new signature that can accurately recognize patients with poor CRC prognosis is urgently needed to give more rigorous treatments.CRC has been classified into 4 consensus molecular subtypes (CMSs) with different molecular characteristics and clinical features, of which CMS4 has mesenchymal modalities.[ Prognostic signature screened based on molecular portraits specific to the worst prognosis subtype may be used for risk stratification of CRC patients.[ Recent studies have shown that the tumor microenvironment plays an important role in the occurrence and development of tumors.[ Previous studies have revealed that the immune system can be used to assess the prognosis of CRC.[ In this study, we established an immune gene-based prognostic signature for CRC (IPSCRC) by integrating mesenchymal modalities and the immune system underlying CMS4 and validated it in 4 independent validation cohorts. The large sample size provided sufficient validation for the IPSCRC and makes it more robust. To our knowledge, no research has been done for risk stratification by integrating the immune system and the characteristics of CMS4 in CRC.The IPSCRC was constructed by 6 immune genes as the key factors of CMS4 and could stratify patients into different risk groups. Within these 6 immune genes, positive expression PROK1 was significantly enriched CRC patients with high recurrence rate, lymphatic invasion, and lymph node metastasis.[ THBS1 over-expression was significantly associated with regional lymph node involvement and poor survival in esophageal squamous cell carcinoma (ESCC) patients.[ High expression of CRP is associated with poorer disease-specific survival in CRC patients.[ S100A14 promotes hepatocellular carcinoma invasion and migration.[ Elevated expression of CCL19 correlates with progression in cervical cancer.[ The defined high-risk group showed a worse RFS and OS than the low-risk group. The IPSCRC remained an independent prognostic predictor in multivariate Cox proportional hazards analysis after adjusting for other clinical factors. Previous studies have described evaluated plasma cells counts in CRC,[ while Macrophage M2 indicates a poor prognosis in ovarian cancer.[ We observed a significantly higher proportion of Macrophage M2 in the high-risk group and a significantly higher enrichment of plasma cells in the low-risk group. Moreover, some mesenchymal phenotype-related pathways, such as EMT, TGF-beta, and focal adhesion, were positively enriched in the high-risk group. Our findings inferred the important role of IPSCRC in tumor invasion and therefore, could sever as a robust prognostic signature in CRC.This study still has some limitations. First of all, the prognostic signature was screened from gene expression profiles generated from microarray platforms, which are expensive, difficult to operate and involves professional bioinformatics expertise, so it is difficult to be popularized in daily clinical application. Second, the training and validation data sets were all from retrospective studies in the study, including fresh frozen samples. Therefore, the efficiency and stability of FFPE samples are still in doubt. In the following improvement process, more datasets containing more clinical characteristics need to be included for more extensive screening and validation.Taken together, our network analysis established an immune gene-based signature, which could effectively predict CRC patients’ survival. Our study is the first attempt to integrate tumor heterogeneity and the immune system to develop the prognostic signature for CRC.
Author contributions
Conceptualization: Peng Shu, Bin Zhang.Data curation: Bin Zhang, Liangbin Wang, Wenliang Jiang.Formal analysis: Peng Shu, Bin Zhang, Liangbin Wang, Bin Shao.Funding acquisition: Bin Zhang.Investigation: Peng Shu, Bin Zhang.Methodology: Bin Zhang, Liangbin Wang, Zhixian Liu, Wenliang Jiang.Resources: Zhixian Liu, Bin Shao.Software: Bin Shao.Supervision: Peng Shu, Bin Zhang.Validation: Bin Zhang.Writing – original draft: Peng Shu, Bin Zhang.Writing – review & editing: Peng Shu, Bin Zhang, Liangbin Wang, Zhixian Liu, Bin Shao, Wenliang Jiang.
Authors: Christy Ralph; Eyad Elkord; Deborah J Burt; Jackie F O'Dwyer; Eric B Austin; Peter L Stern; Robert E Hawkins; Fiona C Thistlethwaite Journal: Clin Cancer Res Date: 2010-02-23 Impact factor: 12.531
Authors: Sanchita Bhattacharya; Sandra Andorf; Linda Gomes; Patrick Dunn; Henry Schaefer; Joan Pontius; Patty Berger; Vince Desborough; Tom Smith; John Campbell; Elizabeth Thomson; Ruth Monteiro; Patricia Guimaraes; Bryan Walters; Jeff Wiser; Atul J Butte Journal: Immunol Res Date: 2014-05 Impact factor: 2.829
Authors: Andrew J Gentles; Aaron M Newman; Chih Long Liu; Scott V Bratman; Weiguo Feng; Dongkyoon Kim; Viswam S Nair; Yue Xu; Amanda Khuong; Chuong D Hoang; Maximilian Diehn; Robert B West; Sylvia K Plevritis; Ash A Alizadeh Journal: Nat Med Date: 2015-07-20 Impact factor: 53.440
Authors: Greg Yothers; Michael J O'Connell; Mark Lee; Margarita Lopatin; Kim M Clark-Langone; Carl Millward; Soonmyung Paik; Saima Sharif; Steven Shak; Norman Wolmark Journal: J Clin Oncol Date: 2013-11-12 Impact factor: 44.544
Authors: E Fessler; M Jansen; F De Sousa E Melo; L Zhao; P R Prasetyanti; H Rodermond; R Kandimalla; J F Linnekamp; M Franitza; S R van Hooff; J H de Jong; S C Oppeneer; C J M van Noesel; E Dekker; G Stassi; X Wang; J P Medema; L Vermeulen Journal: Oncogene Date: 2016-05-09 Impact factor: 9.867
Authors: Michael N C Fletcher; Mauro A A Castro; Xin Wang; Ines de Santiago; Martin O'Reilly; Suet-Feung Chin; Oscar M Rueda; Carlos Caldas; Bruce A J Ponder; Florian Markowetz; Kerstin B Meyer Journal: Nat Commun Date: 2013 Impact factor: 14.919