Literature DB >> 29228695

Regulatory activity based risk model identifies survival of stage II and III colorectal carcinoma.

Gang Liu1, Chuanpeng Dong1, Xing Wang1, Guojun Hou2, Yu Zheng1, Huilin Xu1, Xiaohui Zhan3, Lei Liu1.   

Abstract

Clinical and pathological indicators are inadequate for prognosis of stage II and III colorectal carcinoma (CRC). In this study, we utilized the activity of regulatory factors, univariate Cox regression and random forest for variable selection and developed a multivariate Cox model to predict the overall survival of Stage II/III colorectal carcinoma in GSE39582 datasets (469 samples). Patients in low-risk group showed a significant longer overall survival and recurrence-free survival time than those in high-risk group. This finding was further validated in five other independent datasets (GSE14333, GSE17536, GSE17537, GSE33113, and GSE37892). Besides, associations between clinicopathological information and risk score were analyzed. A nomogram including risk score was plotted to facilitate the utilization of risk score. The risk score model is also demonstrated to be effective on predicting both overall and recurrence-free survival of chemotherapy received patients. After performing Gene Set Enrichment Analysis (GSEA) between high and low risk groups, we found that several cell-cell interaction KEGG pathways were identified. Funnel plot results showed that there was no publication bias in these datasets. In summary, by utilizing the regulatory activity in stage II and III colorectal carcinoma, the risk score successfully predicts the survival of 1021 stage II/III CRC patients in six independent datasets.

Entities:  

Keywords:  colorectal cancer; model; prognosis; transcription factor

Year:  2017        PMID: 29228695      PMCID: PMC5716735          DOI: 10.18632/oncotarget.21312

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Colorectal carcinoma (CRC) is one of the most important causes of death worldwide [1]. According to recent reports, 376,300 new cases and 191,000 deaths occurred due to CRC, in China, 2015 [2]. Currently, the prognosis of colorectal cancer is controversial in stage II and III colorectal carcinoma [3]. Although the staging system is mature, some stage II colorectal adenocarcinoma patients have relatively poorer prognosis than stage III CRC patients. This indicates that clinical observations, including stage, could not distinguish the good or poor prognosis of colorectal carcinoma well in stage II/III CRC. During the past years, numerous molecular biomarkers have been reported to be able to predict the survival of stage II and III colorectal carcinoma patients [4-8]. However, the single biomarker's prognostic value is usually unfavorable across datasets. In order to elevate the performance on prognosis, multiple gene models for predicting survival of carcinomas have been developed [9-12]. The expression of genes, especially cancer-related genes, are regulated by critical signaling pathways and transcription factors [13-15], The transcription factor activity of core signaling pathways reflects the cell status and cancer heterogeneity. In this article, we evaluated the activities of regulatory factors, and then developed a Cox multivariate model to predict the survival of stage II and III colorectal carcinoma patients from GSE39582 dataset. The risk score is significantly associated with overall and recurrence-free survival. The performance of risk score model in predicting survival of stage II and III colorectal adenocarcinoma was further validated in five independent datasets. Association analysis showed that the risk score was independent from clinical information including age, stage and gender. A nomogram was plotted to facilitate the utilization of risk score. In conclusion, transcription regulation activity based risk score successfully predict the survival of stage II and III colorectal carcinoma.

RESULTS

Candidate gene selection and model development

Detailed information of datasets used in this study were listed in Table 1. Regulators including transcription factors and core pathway genes were important for cancer development. However, the activity of these regulators could not be assessed by the mRNA level because some regulators took effect by protein modifications, thus, the regulatory activity of regulators was calculated based on the expression levels of target genes downstream. The survival significance of candidate regulators (based on their regulatory activity) was evaluated using cox univariate regression (p<0.05). Forty-four regulator activities were detected to be correlated with survival, then random forest was implemented for variable hunting. Totally, ten regulators’ activities (EPAS1, TP73, TEAD1, DBP, NME2, GFI1, NR5A1, ELK1, NANOG and ETS2) were selected as candidate features (regulators). Cox multivariate analysis was performed with above candidate regulators, and the coefficients of each regulator were assigned as its weighting, respectively (Table 2). The hazard ratios <1 suggested that their corresponding regulators were tumor suppressor genes, while genes with hazard ratios >1 were cancerous genes.
Table 1

Sample size and survival information of datasets used in this article

DatasetsSamplsSurvival info provided
GSE143162Disease-free survival
GSE176115Overall survival
GSE17755Overall survival
GSE33390Progression-free survival
GSE272130Metastasis-fee survival
GSE396469Overall survival
Table 2

Basic parameters of regulatory factors used for risk score

TFsGenes downstreamCoefficiensFrequenyHzazad ratioCI 95%p Value
DBP11.4199301251.391.04-1.850.026
ELK130.1978697281.771.21-2.590.0036
EPAS130.219671241.591.09-2.310.015
ETS220.3803902341.931.16-3.220.011
GFI15-0.237219260.5230.331-0.8290.0057
NANOG2-3.168149330.6420.415-0.9930.046
NME25-0.130288260.7360.607-0.8930.0018
NR5A120.0413388283.831.09-13.50.037
TEAD110.5805332251.221.1-1.350.000091
TP73383-0.1662842534.91.33-9200.032

The columns are number of genes used for regulatory factor evaluation, Cox univariate regression p value, Cox multivariate regression beta values, and frequencies of regulatory factors of random forest variable hunting.

The columns are number of genes used for regulatory factor evaluation, Cox univariate regression p value, Cox multivariate regression beta values, and frequencies of regulatory factors of random forest variable hunting.

Risk score predicts survival in the training dataset

After developing risk score staging model in the training dataset, the survival-predicting value of risk score was evaluated. The patients were subtyped into high risk (n = 235) and low risk (n = 234) group by using the median risk score value as cutoff. The overall survival (OS) of high-risk group was significantly shorter than the low-risk group (Figure 1A, p=0.00059). In addition, the recurrence-free survival (RFS) profile of high-risk group resembled that of its overall survival (Figure 1B, p<0.05). Detailed survival information and risk scores were shown in Figure 1C. The regulatory activity pattern of the candidate genes was consistent with their coefficients. The risk score performs better in predicting the three-year survival of stage II and III CRC patients compared with clinicopathological indicators (Figure 2D). Area under receiving operating characteristic curve (AUROC) for three-year survival was plotted, showing a result of 0.66 for risk score and 0.66, 0.53, 0.53 for age, gender, chemotherapy, respectively. This indicated that the risk score was an important survival indicator for stage II and III colorectal carcinoma.
Figure 1

Performance of regulatory factor activity based risk score

The high-risk group has a significant longer overall survival (A) and recurrence-free survival (B) time than low-risk group. The detailed survival information and regulatory factor activity (C) and three-year survival ROC (D) was shown.

Figure 2

Validation of survival-predicting performance of risk score

The performance of risk score was further validated in five independent datasets (A: GSE14333, B: GSE17536, C: GSE17537, D: GSE33113, E: GSE37892).

Performance of regulatory factor activity based risk score

The high-risk group has a significant longer overall survival (A) and recurrence-free survival (B) time than low-risk group. The detailed survival information and regulatory factor activity (C) and three-year survival ROC (D) was shown.

Validation of survival-predicting performance of risk score

The performance of risk score was further validated in five independent datasets (A: GSE14333, B: GSE17536, C: GSE17537, D: GSE33113, E: GSE37892).

Risk score model is robust across the test datasets

The high performance of risk score model in the training dataset may result from over-fitting. To assess its robustness, we carried out risk score performance evaluation on five independent public CRC cohorts, after locking the coefficients of the model. It was shown that the survival time of patients in the high-risk group was significantly shorter than in that the low-risk group, which was in consistent with the survival profile of training datasets (Figure 2A-E). In addition, the regulatory activity of candidate genes in the test datasets also resembled that in training datasets (Supplementary Figure 1A-E). These results above indicate that our risk score model was robust across datasets.

Risk score and clinical/pathological indicators

The relationship between the risk score we developed and clinical/pathological information was measured as well (Figure 3A). It was found to be independent from gender, age and stage (p>0.05). Multiple cox hazard ratio analysis results showed that risk score was an important indicator for predicting survival (Figure 3B). In order to facilitate the utilization of risk score model, a nomogram including gender, age, stage, risk score and chemotherapy was plotted (Figure 3C). The Cox univariate and multivariate regression of risk score and more detailed information indicated that risk score was the most important indicator for prognosis, as shown in Table 3. These results indicated that risk score was an independent and critical indicator for prognosis.
Figure 3

Risk score and another clinical indicator

The risk score is independent from age, gender, and stage (A), and is an important clinical indicator for survival according to multivariate hazard analysis (B) and nomogram (C).

Table 3

Cox univariate and multivariate regression of clinical indicators in GSE39582

GeneUnivariate regressionMultivariate regression
HR95%CIp ValueHR95%CIp Value
Riskscore1.21.1-1.301.151.04-1.270.00692
SexM1.30.91-1.80.168851.350.91-20.13668
Stage1.20.89-1.70.207071.170.79-1.730.44657
Location1.10.81-1.60.4721.040.67-1.60.86886
CIMP0.930.57-1.50.764940.730.34-1.550.41108
CIN0.990.62-1.60.97450.930.54-1.580.78014
KRASmut1.41-20.037511.470.97-2.240.07001
BRAFmut0.890.46-1.70.719061.350.52-3.50.53881
CDX20.830.7-0.980.028380.920.7-1.190.50933

Risk score and another clinical indicator

The risk score is independent from age, gender, and stage (A), and is an important clinical indicator for survival according to multivariate hazard analysis (B) and nomogram (C).

Risk score and chemotherapy

Chemotherapy is the one of most important adjuvant treatment strategies following surgery. Thus, the correlation between risk score and chemotherapy was evaluated. We used overall survival and recurrence-free survival information to estimate the availability of our risk score model for predicting the survival of patients with chemotherapy. As expected, the chemotherapy received patients with high risk score had a worse prognosis both on overall survival (Figure 4A) and recurrence-free survival (Figure 4B), compared to the low risk group. The prognostic value of risk score was also evaluated in patients without chemotherapy, and it was similar with chemotherapy-receiving group (Not shown). These results indicated that the regulatory activity based risk score was also available for the prognosis of CRC patients with chemotherapy.
Figure 4

Risk score and chemotherapy

Overall survival (A) and recurrence-free survival (B) of patients underwent chemotherapy in high risk group is longer than in low risk group.

Risk score and chemotherapy

Overall survival (A) and recurrence-free survival (B) of patients underwent chemotherapy in high risk group is longer than in low risk group.

Identification of biological pathways associated with risk score

In order to investigate why the risk score can predict the survival of colorectal carcinoma, the comparison of gene expression profile between high-risk and low-risk group was performed, according to the median value of risk score in the largest cohort, GSE39582. The altered KEGG pathways was evaluated using Gene Set Enrichment Analysis (Figure 5A). The results showed that the most altered and enriched KEGG pathways were “complements and coagulation cascades” (Figure 5B), “ECM receptor interaction” (Figure 5C), “cell adhesion molecular” (Figure 5D), and “Cytokine-cytokine receptor interaction” (Figure 5E). These results indicated a possible molecular mechanism of the clinical outcome in stage II and III colorectal adenocarcinoma reflected by risk model.
Figure 5

KEGG pathways associated with risk score

Of the KEGG pathways significantly associated with risks score (A), complements and coagulation cascades (B), ECM receptor interaction (C), cell adhesion molecular cams (D), and “cytokine-cytokine receptor interaction” were noted.

KEGG pathways associated with risk score

Of the KEGG pathways significantly associated with risks score (A), complements and coagulation cascades (B), ECM receptor interaction (C), cell adhesion molecular cams (D), and “cytokine-cytokine receptor interaction” were noted.

Publication bias evaluation

Publication bias inspection regarding basic clinical information, including age, gender, and events (relapse, metastasis, death) was performed. Funnel plots indicated that no publication bias for gender, age, or events was detected (Figure 6A, p>0.05). The forest plot showed that no data heterogeneity exists (Figure 6B). Publication bias was not investigated when the number of studies was less than 10 because of the low sensitivity of the qualitative and quantitative tests [16].
Figure 6

Publication bias of risk score and clinical indicator

Funnel plot of age (A, left), gender (A, middle), and events (A, right) has no bias. Forest plot suggests the similar results (B, top-down, age, gender, events).

Publication bias of risk score and clinical indicator

Funnel plot of age (A, left), gender (A, middle), and events (A, right) has no bias. Forest plot suggests the similar results (B, top-down, age, gender, events).

DISCUSSION

Prognosis of stage II and III colorectal carcinoma still remains a problem. Although single biomarker has been reported for survival prediction [8, 17, 18], the robustness of these biomarkers still remains a huge concern. One of the reasons may be that single biomarker fails to reflect the genomic heterogeneity of tumors. Regulatory factors control the expression of genes downstream, and further determine the status of crucial pathways. Activity of multiple core regulatory and transcription factors may reflect the genomic status of cancer cells. In this vein, we evaluated the activities of transcription and regulatory factors by considering the expression of target genes downstream of stage II and III colorectal carcinoma. Using cox univariate regression and random forest variable hunting, activities of ten regulatory factors were identified to develop a risk score model for prognosis. The model successfully predicted survival of 1021 stage II and III colorectal carcinoma patients in six independent datasets. It is also independent from other clinical indicators and performs exceedingly in survival-predicting. We noticed that the most of the 42 regulators are important for prognosis, the combination of the ten regulators effectively reduced the panel and retained the useful information. Among the ten transcription regulators, we noted that the overexpression of EPAS1 was associated with poor prognosis in colorectal carcinoma, according to previous reports [19-21]. Polymorphism and expression of TP73 were associated with carcinogenesis and colorectal carcinoma development [22, 23]. TEAD1 was reported to enhance the proliferation in colorectal carcinoma [24]. DBP and NME2 were associated with carcinogenesis and development of cancer types, including colorectal carcinoma [25-28]. It was similar for GFI1 [29-31], NR5A1 [32], and ELK1 [33-35]. NANOG was related to multiple colorectal tumor development functions, including liver metastasis [36], stemness maintaining [37] and prognosis [38]. ETS2 was shown to be associated with metastasis of colorectal carcinoma [39, 40]. These reports indicated that the regulatory factors included in the risk score model were essential prognostic genes, implying the reliability of this model. The metastasis of CRC is among the most serious events during colorectal carcinoma development [41]. Among pathways and genes involved in CRC metastasis, cell-cell focal adhesion plays important roles [42, 43]. According to GSEA analysis, the most pathways involved in cell-cell interaction and focal adhesion were significantly enriched, which may explain why risk score is associated with stage II/III CRC prognosis. In conclusion, our transcription activity based risk score model successfully predicts the survival of stage II and III colorectal carcinoma. To our knowledge, this is the first model using activities regulatory factors to predict survival of stage II/III colorectal carcinoma.

MATERIALS AND METHODS

Data preprocessing

The raw data of six datasets (GSE39582, GSE14333, GSE17536, GSE17537, GSE33113 and GSE37892) was downloaded in. CEL format. After background correction and normalization, the fold change between expression value of each sample and median expression value for each gene was calculated. Probes were matched to the gene names, and genes matching more than one probe were merged and average values were calculated as the expression of the corresponding genes. Duplicated values were excluded. The regulatory factor-downstream pairs were constructed according to the regulatory data provided by HTRI database [44]. Suppose the downstream genes of regulator k (Rk) are Gene1,2,3…j, and the dataset consist of samples 1,2,3…i. The regulator factor activity (RFA) of regulator k is calculated as the following, Where Genej,i indicates the gene expression value of Genej in sample i, and median (Genej) refers to the median expression values of Genej, refers to the regulatory factor activity of regulator k in sample i. Construct a new matrix containing activity of regulators, in which the rows represent the regulators and the columns indicate the samples. All datasets included in this article was transformed using the same method.

Gene selection and model construction

Cox univariate regression was performed on GSE39582 dataset. Transcription factors that significantly associated with overall survival in this dataset were retained. Random forest variable hunting was performed with 100 replications and 100 steps. Multivariate Cox regression was implemented to construct the risk score model with the candidate genes, and coefficients were locked in the five test datasets. The risk scores (RS) of each sample were calculated as the following, Where indicates the regulatory factor activity of regulator k in sample i, and βi refers to the coefficients for candidate regulators. Coefficients was evaluated using the training dataset, GSE39582, and locked to calculate the risk score in the other five datasets (GSE14333, GSE17536, GSE17537, GSE33113 and GSE37892). The median risk score values in each dataset were used as cutoff to identify the high-risk and low-risk group.

Statistical analyses

All statistical analysis was performed on R language and R packages. Microarray data pre-process was performed with R package “affy”. Survival analysis, Cox univariate regression and Cox multivariate regression were carried out with R package “survival”[45], and random forest variable hunting was implemented with R package “randomForestSRC”[46]. Survival ROC curve was plotted with R package “pROC”[47], and nomogram was drawn with R package “rms”[48]. Publication bias analysis was performed on R package “meta”. Gene Set Enrichment Analysis was carried out on java software “GSEA”[49].
Sample 1sample 2sample 3sample i
Gene 1
Gene 2
Gene 3
Gene j
  49 in total

1.  NME1 and NME2 as markers for myeloid leukemias.

Authors:  Jessica K Altman; Leonidas C Platanias
Journal:  Leuk Lymphoma       Date:  2012-04-30

2.  Expression of an ASCL2 related stem cell signature and IGF2 in colorectal cancer liver metastases with 11p15.5 gain.

Authors:  D E Stange; F Engel; T Longerich; B K Koo; M Koch; N Delhomme; M Aigner; G Toedt; P Schirmacher; P Lichter; J Weitz; B Radlwimmer
Journal:  Gut       Date:  2010-05-17       Impact factor: 23.059

3.  CARMA3 Represses Metastasis Suppressor NME2 to Promote Lung Cancer Stemness and Metastasis.

Authors:  Yi-Wen Chang; Ching-Feng Chiu; Kang-Yun Lee; Chih-Chen Hong; Yi-Yun Wang; Ching-Chia Cheng; Yi-Hua Jan; Ming-Shyan Huang; Michael Hsiao; Jui-Ti Ma; Jen-Liang Su
Journal:  Am J Respir Crit Care Med       Date:  2015-07-01       Impact factor: 21.405

4.  Identification of high-risk stage II and stage III colorectal cancer by analysis of MMP-21 expression.

Authors:  Tao Wu; Yi Li; Xiaohong Liu; Jianguo Lu; Xianli He; Qing Wang; Jipeng Li; Xilin Du
Journal:  J Surg Oncol       Date:  2011-06-07       Impact factor: 3.454

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

6.  The prognostic value of p73 overexpression in colorectal carcinoma: a clinicopathologic, immunohistochemical, and statistical study of 204 patients.

Authors:  Amira Arfaoui Toumi; Olfa El Amine El Hadj; Lilia Kriaa Ben Mahmoud; Abd El Majid Ben Hmida; Ines Chaar; Lasaad Gharbi; Sabeh Mzabi; Saadia Bouraoui
Journal:  Appl Immunohistochem Mol Morphol       Date:  2010-03

7.  High expression of lncRNA MALAT1 suggests a biomarker of poor prognosis in colorectal cancer.

Authors:  Hong-Tu Zheng; De-Bing Shi; Yu-Wei Wang; Xin-Xiang Li; Ye Xu; Pratik Tripathi; Wei-Lie Gu; Guo-Xiang Cai; San-Jun Cai
Journal:  Int J Clin Exp Pathol       Date:  2014-05-15

8.  Expression of osteopontin coregulators in primary colorectal cancer and associated liver metastases.

Authors:  D J Mole; C O'Neill; P Hamilton; B Olabi; V Robinson; L Williams; T Diamond; M El-Tanani; F C Campbell
Journal:  Br J Cancer       Date:  2011-02-22       Impact factor: 7.640

9.  HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions.

Authors:  Luiz A Bovolenta; Marcio L Acencio; Ney Lemke
Journal:  BMC Genomics       Date:  2012-08-17       Impact factor: 3.969

10.  SPAG9 is involved in hepatocarcinoma cell migration and invasion via modulation of ELK1 expression.

Authors:  Qiuyue Yan; Guohua Lou; Ying Qian; Bo Qin; Xiuping Xu; Yanan Wang; Yanning Liu; Xuejun Dong
Journal:  Onco Targets Ther       Date:  2016-03-01       Impact factor: 4.147

View more
  3 in total

1.  Integrative analysis from multi-centre studies identifies a function-derived personalized multi-gene signature of outcome in colorectal cancer.

Authors:  Jie Sun; Hengqiang Zhao; Shuting Lin; Siqi Bao; Yan Zhang; Jianzhong Su; Meng Zhou
Journal:  J Cell Mol Med       Date:  2019-05-29       Impact factor: 5.310

2.  Twenty Metabolic Genes Based Signature Predicts Survival of Glioma Patients.

Authors:  Wenfang Xu; Zhenhao Liu; He Ren; Xueqing Peng; Aoshen Wu; Duan Ma; Gang Liu; Lei Liu
Journal:  J Cancer       Date:  2020-01-01       Impact factor: 4.207

3.  Transcription factor expression as a predictor of colon cancer prognosis: a machine learning practice.

Authors:  Jiannan Liu; Chuanpeng Dong; Guanglong Jiang; Xiaoyu Lu; Yunlong Liu; Huanmei Wu
Journal:  BMC Med Genomics       Date:  2020-09-21       Impact factor: 3.063

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.