Literature DB >> 31893510

Integrative Gene Expression Profiling Analysis to Investigate Potential Prognostic Biomarkers for Colorectal Cancer.

Xinkui Liu¹, Zhitong Bing^2,3,4, Jiarui Wu¹, Jingyuan Zhang¹, Wei Zhou¹, Mengwei Ni¹, Ziqi Meng¹, Shuyu Liu¹, Jinhui Tian^2,3, Xiaomeng Zhang¹, Yingfei Li⁵, Shanshan Jia¹, Siyu Guo¹.

Abstract

BACKGROUND Despite noteworthy advancements in the multidisciplinary treatment of colorectal cancer (CRC) and deeper understanding in the molecular mechanisms of CRC, many of CRC patients with histologically identical tumors present different treatment response and prognosis. Thus, more evidence on novel predictive and prognostic biomarkers for CRC remains urgently needed. This study aims to identify potential prognostic biomarkers for CRC with integrative gene expression profiling analysis. MATERIAL AND METHODS Differential expression analysis of paired CRC and adjacent normal tissue samples in 6 microarray datasets was independently performed, and the 6 datasets were integrated by the robust rank aggregation method to detect consistent differentially expressed genes (DEGs). Aberrant expression patterns of these genes were further validated in RNA sequencing data. Then, gene set enrichment analysis (GSEA) was performed to investigate significantly dysregulated biological functions in CRC. Finally, univariate, LASSO and multivariate Cox regression models were built to identify key prognostic genes in CRC patients. RESULTS A total of 990 DEGs (495 downregulated and 495 upregulated genes) were acquired after integratedly analyzing the 6 microarray datasets, and 4131 DEGs (2050 downregulated and 2081 upregulated genes) were obtained from the RNA sequencing dataset. Subsequently, these DEGs were intersected and 885 consistent DEGs were finally identified, including 458 downregulated and 427 upregulated genes. Two risky prognostic genes (TIMP1 and LZTS3) and 5 protective prognostic genes (AXIN2, CXCL1, ITLN1, CPT2 and CLDN23) were identified, which were significantly associated with the prognosis of CRC. CONCLUSIONS The 7 genes that we identified would provide more evidence for further applying novel diagnostic and prognostic biomarkers in clinical practice to facilitate personalized treatment of CRC.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Biomarkers, Tumor

Year: 2020 PMID： 31893510 PMCID： PMC6977628 DOI： 10.12659/MSM.918906

Source DB: PubMed Journal: Med Sci Monit ISSN： 1234-1010

Background

Colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the second leading cause of cancer death in a global context [1]. The past decades have witnessed a remarkable decline in CRC incidence and mortality overall, and a dramatic rise in the median overall survival (OS) of metastatic colorectal cancer patients [2-9]. The exciting fact is ascribed to advances in comprehensive medical options, such as laparoscopic surgery, radiotherapy, neoadjuvant and palliative chemotherapies and targeted therapies, along with a deeper understanding of epidemiology, pathology and molecular mechanisms related to CRC [2,10]. Despite that, CRC, which accounts for almost one-tenth of cancer cases and deaths (with an estimated 1.8 million new cases and 881 000 deaths in 2018), contributes to high medical burden worldwide [1]. It has been well-known that many of CRC patients present discrepant treatment response and prognosis despite having histologically identical tumors, and thus personalized treatment based on biomarkers is likely to generate great clinical efficacy and public health significance, which not only enhances therapeutic effectiveness but also decreases treatment-related injury and costs [10,11]. Therefore, although the numerous molecular characterization, biological markers and therapeutic targets of CRC formerly discovered have greatly contributed to the diagnosis and treatment of this malignancy, more evidence on predictive and prognostic biomarkers is meaningful and urgently demanded in view of the biological complexity, worse outcome and high metastasis of this deadly disease [2,10,12]. Striking advancements in microarray and high-throughput sequencing technologies have facilitated the discovery of not only the crucial genetic or epigenetic alternations in carcinogenesis, tumor growth, metastasis and recurrence but also the promising cancer biomarkers for diagnosis, prognosis and treatment prediction [12-14]. Nevertheless, inconsistent results often occur due to sample heterogeneity in individual experiments or discrepancy in technological platforms [15]. Furthermore, application of relatively small sample size decreases statistical power, which blocks informative and useful findings [16-18]. To overcome the limitations and obtain convictive outcomes, integrated bioinformatics analysis, a comprehensive strategy to increase sample size, unify cross-platform standardization of expression profiles and discard invalid raw data, has been widely adopted to identify differentially expressed genes (DEGs) at mRNA and non-coding RNA level in CRC [16,19]. This study performed the integrative analysis for the gene expression patterns of 6 microarray datasets in Gene Expression Omnibus (GEO) via using the robust rank aggregation (RRA) method, aiming at discovering the consistent DEGs between human CRC and paired adjacent normal tissue samples. We further validated the aberrant expression patterns of these genes in the RNA sequencing data of the CRC patients from The Cancer Genome Atlas (TCGA). Additionally, we conducted gene set enrichment analysis (GSEA) to investigate significantly dysregulated biological functions in CRC. Finally, we constructed a gene signature with prognostic value in CRC patients through implementing univariate, LASSO and multivariate Cox regression analyses.

Material and Methods

Data collection and preprocessing

Six microarray-based gene expression data (GSE21510, GSE22598, GSE37182, GSE39582, GSE44076 and GSE89076) were accessed from Gene Expression Omnibus [20,21] (GEO; ). All the included datasets met the following inclusion criteria: 1) they used colorectal tissues of CRC patients; 2) they included paired tumor and adjacent normal tissue samples and 3) the sample size of each dataset was at least 30. The samples included in this study came from these datasets and only the paired tumor and adjacent normal samples from colon tissues were used. The microarray data of GSE21510 (23 paired samples), GSE22598 (17 paired) and GSE39582 (17 paired) implemented the platform of Affymetrix Human Genome U133 Plus 2.0 Array. The platforms for GSE44076 (98 paired), GSE89076 (24 paired) and GSE37182 (82 paired) were Affymetrix Human Genome U219 Array, Agilent-039494 SurePrint G3 Human GE v2 8x60K Microarray 039381 and Illumina HumanHT-12 V3.0 expression beadchip, respectively. In total, 261 CRC and 261 matched normal cases were chosen for integrated analysis. The RNA sequencing data containing 398 colon adenocarcinoma and 39 normal samples were downloaded from The Cancer Genome Atlas (TCGA) (up to December 18, 2018, ). Only protein-coding genes were eventually reserved for further study, and corresponding annotation information was accessed from Ensembl (). The AnnotationDbi [22] and org.Hs.eg.db [23] packages were applied to achieve conversion among gene symbol, Entrez ID and Ensembl ID. Furthermore, clinical information of 385 colon adenocarcinoma patients in TCGA was also downloaded, among whom 349 patients were reserved for further study. Thirty-six patients were excluded from our study for 3 reasons: 1) 12 patients for not having overall survival (OS) time, survival status, or pathological stage; 2) 20 patients for having an overall survival time shorter than 30 days and 3) 4 patients for lacking corresponding mRNA expression data. Background correction, normalization, and expression calculation for the raw data (.cel format) of GSE21510, GSE22598, GSE39582 and GSE44076 (based on the Affymetrix platform) were conducted by the Robust Multi-array Average (RMA) method [24,25] in the affy package [26]. The marray package [27] and the neqc function in the limma package [28,29] were used for preprocessing the raw data of the Agilent (GSE89076) and Illumina (GSE37182) microarray platforms, respectively. Annotation files for probes in the different datasets were downloaded from the GEO database. If multiple probes were mapped to one same gene, the average expression value of the different probes represented the final expression level of this gene. Moreover, conversion among gene symbol, Entrez ID and Ensembl ID was also achieved by the AnnotationDbi and org.Hs.eg.db packages.

Differentially expressed genes (DEGs) screening

For each of the 6 microarray datasets, gene expression difference between the tumor and adjacent noncancerous tissues were calculated by the limma package. Then, the integration for the genes in every list was conducted by the RobustRankAggreg package [30], which was based on the robust rank aggregation (RRA) method. This rank aggregation approach detects genes that are ranked consistently better than expected under null hypothesis of randomly ordered input lists and assigns a P value for each gene. Bonferroni correction was also employed in case of false positive results, and genes meeting the criterion of |log2 fold change (FC)| >1 and adjust P<0.05 were taken as DEGs. For the mRNA sequencing data from TCGA, protein-coding genes with counts >1 in more than 75% samples were retained, and duplicate gene expression values were averaged. Expression calculation, normalization and DEGs screening were carried out by edgeR [31], with |log2FC| >1 and false discovery rate (FDR) <0.05 as the threshold. The impute package [32] was used to fill missing values of the normalized expression data. The consistent DEGs in the 6 microarray profiles were intersected with the DEGs in the TCGA dataset by Entrez ID, and the eventually consistent DEGs between the microarray and sequencing data were reserved for further study. Moreover, the expression values of the eventually consistent DEGs in the TCGA colon adenocarcinoma dataset were log2 transformed before the following analysis.

Gene set enrichment analysis (GSEA)

To identify significantly dysregulated biological pathways in CRC, the GSEA [33] was performed by clusterProfiler [34], under functional annotations of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (). Entrez IDs and corresponding log2FC values of all the genes in each dataset were submitted to clusterProfiler, with the permutation number and the minimum gene set size set as 100 000 and 120, respectively. Activated and suppressed pathways with adjust P<0.05 in each dataset were merged, and ones with higher frequency (found in ≥3 datasets) were identified as dramatically changed biological functions in CRC.

Survival analysis

For the included 349 CRC patients, survival time, status, and mRNA expression levels of the consistent DEGs were applied for survival analysis. Firstly, a univariate Cox proportional hazards regression model was built for preliminarily screening OS-related genes, and the genes with P<0.05 were considered statistically significant. Secondly, a least absolute shrinkage and selection operator (LASSO) Cox regression model was adopted to further select key genes from significant ones in univariate analysis. The glmnet package [35] was utilized to perform the LASSO Cox analysis. The maximum number of replacements was set as 100 000 times, and a sequence of tuning parameters (lambdas, λs) were returned according to the expected generalization error estimated from 10-fold cross-validation. The lambda with minimum mean cross-validated error (lambda.min) was employed. Finally, a multivariate Cox proportional hazards regression model was established to estimate the contribution of a gene as an independent prognostic factor for patient survival. The optimal model was selected by the Akaike information criterion (AIC) method, and thereby a prognostic gene signature was established. The univariate and multivariate Cox regression analyses were all conducted by the survival package [36]. A prognosis risk score was calculated based on a linear combination of the expression value of the gene in this prognostic signature multiplied by its regression coefficient derived from the multivariate Cox model. The formula is as follows: Where n is the number of genes, expi is the expression value of the ith variable and coefi is the regression coefficient of the ith variable. These 349 patients were categorized into either low-risk or high-risk group based on the median prognostic risk score. The Kaplan-Meier method with the log-rank test was used to assess the correlation between the risk and OS, and the survival curve was generated by the survminer package [37]. The time-dependent receiver operating characteristic (ROC) curve analysis was conducted by the survivalROC package [38], and the area under the curve (AUC) was calculated to measure the predictive accuracy of this prognostic signature for time-dependent cancer death. All the statistical analyses were performed with R (version 3.5.2, ) in this study.

Results

Identification of DEGs

The clinical information for the CRC patients included in the present study is shown in Table 1, Supplementary Tables 1–3. We obtained 990 DEGs (495 downregulated and 495 upregulated genes) after performing the integrated analysis of the 6 microarray datasets (Figures 1A, 2A–2F, Supplementary Tables 4, 5), and we also acquired 4131 DEGs (2050 downregulated and 2081 upregulated genes) from the TCGA colon cancer dataset (Figures 1B, 2G, and Supplementary Table 6). Subsequently, we intersected these DEGs and finally identified 885 consistent DEGs, including 458 downregulated and 427 upregulated genes (Figure 1C, 1D, and Supplementary Table 7).

Table 1

Clinical information for the included 349 patients.

Characteristics	Number of cases (%)
Gender
Male	189 (54.2)
Female	160 (45.8)
Age
≤60	105 (30.1)
>60	244 (69.9)
TNM stage
Stage I	62 (17.8)
Stage II	138 (39.5)
Stage III	99 (28.4)
Stage IV	50 (14.3)
T stage
Tis	1 (0.3)
T1	7 (2.0)
T2	62 (17.8)
T3	242 (69.3)
T4	37 (10.6)
M stage
M0	266 (76.2)
M1	50 (14.3)
MX	31 (8.9)
Not reported	2 (0.6)
N stage
N0	207 (59.3)
N1	83 (23.8)
N2	59 (16.9)
Vital status
Alive	279 (79.9)
Dead	70 (20.1)

Figure 1

Identification of differentially expressed genes (DEGs). (A) The heatmap of top 20 downregulated and upregulated DEGs identified by the integrated analysis of the 6 microarray datasets. Each column represents 1 dataset and each row represents 1 gene. The number in each rectangle represents the value of log2FC. The gradual color ranging from blue to red represents the changing process from downregulation to upregulation. (B) The heatmap of the 4131 DEGs in The Cancer Genome Atlas (TCGA) colorectal cancer (CRC) dataset. Each column represents 1 sample and each row represents 1 gene. The gradual color ranging from green to red represents the changing process from downregulation to upregulation. (C) The Venn diagram of the DEGs between the integrated Gene Expression Omnibus (GEO) dataset and the TCGA CRC dataset. (D) The heatmap of the 885 consistent DEGs (using the TCGA dataset). Each column represents 1 sample and each row represents 1 gene. The gradual color ranging from green to red represents the changing process from downregulation to upregulation.

Figure 2

The volcano plot of the genes in the 7 datasets. (A) The volcano plot of the genes in GSE21510. (B) The volcano plot of the genes in GSE22598. (C) The volcano plot of the genes in GSE37182. (D) The volcano plot of the genes in GSE39582. (E) The volcano plot of the genes in GSE44076. (F) The volcano plot of the genes in GSE89076. (G) The volcano plot of the genes in The Cancer Genome Atlas (TCGA) dataset. The red dot represents the genes with adjust P<0.05 and log2FC >1, and the green dot represents the genes with adjust P<0.05 and log2FC <−1.

Identification of dysregulated pathways

According to the results of GSEA (Figures 3, 4 and Supplementary Table 8, 9), 24 pathways (including 5 activated and 19 suppressed) found in more than or equal to 4 datasets were identified as significantly dysregulated biological pathways in CRC. Eight suppressed pathways existed in all the 7 datasets, namely, adrenergic signaling in cardiomyocytes, apelin signaling pathway, calcium signaling pathway, cAMP signaling pathway, cGMP-PKG signaling pathway, neuroactive ligand-receptor interaction, Rap1 signaling pathway, and regulation of actin cytoskeleton. The top 3 activated pathways were cell cycle, RNA transport, and Wnt signaling pathway, which were respectively found in 7, 7, and 5 datasets, respectively. Among the 24 significantly changed pathways, it has long been known that the cell cycle, Ras signaling pathway and Wnt signaling pathway play important roles in the initiation and progression of CRC [12,39].

Figure 3

The gene set enrichment analysis (GSEA) for the 7 colorectal cancer (CRC) datasets. (A) The GSEA for GSE21510. (B) The GSEA for GSE22598. (C) The GSEA for GSE37182. (D) The GSEA for GSE39582. (E) The GSEA for GSE44076. (F) The GSEA for GSE89076. (G) The GSEA for The Cancer Genome Atlas (TCGA) dataset. The top 20 suppressed and activated Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in each dataset were shown. The y-axis shows the KEGG pathway terms, and the x-axis shows the gene ratio of each term.

Figure 4

The enrichment plot of the gene set enrichment analysis (GSEA) (using GSE89076). (A) The enrichment plot of 9 suppressed pathways. (B) The enrichment plot of 5 activated pathway. (C) The upSet plot for the GSEA.

We performed the univariate Cox regression to investigate the correlation of the DEGs with OS of CRC patients, and identified 101 OS-related genes with P was <0.05 (Supplementary Table 10). Then, in order to further narrow genes, we employed the LASSO Cox model with 10-fold cross-validation and 100 000 repetitions to acquire optimal penalty parameters. As a result, 22 genes were identified when we chose the minimum criteria where the log (λ)=−3.52 with λ=0.02957 (Figure 5). Finally, we developed a 7-gene prognostic signature after performing the multivariate Cox analysis, which was composed of TIMP metallopeptidase inhibitor 1 (TIMP1), Axin 2 (AXIN2), C-X-C motif chemokine ligand 1 (CXCL1), leucine zipper tumor suppressor family member 3 (LZTS3), intelectin 1 (ITLN1), carnitine palmitoyltransferase 2 (CPT2) and claudin 23 (CLDN23) (Figures 6A, 7A). As shown in Figure 7B, TIMP1, AXIN2, CXCL1 and LZTS3 were upregulated, whereas ITLN1, CPT2 and CLDN23 were downregulated in CRC compared with normal groups. Moreover, lower expression of CXCL1 and CPT2 was shown to be associated with advanced tumor stage (Kruskal-Wallis test P<0.05, Figure 7C, 7D), while the correlation of the other 5 genes with pathological stage was not statistically significant. Among these 7 genes, AXIN2, CXCL1, ITLN1, CPT2, and CLDN23 with HR<1 were identified as protective prognostic genes, whereas TIMP1 and LZTS3 with HR>1 were identified as risky prognostic genes. The regression coefficient for each gene was also generated, and the survival risk score was calculated as follows: risk score=(0.3259×expression level of TIMP1)+ (−0.2607×expression level of AXIN2)+(−0.1289×expression level of CXCL1)+(0.4504×expression level of LZTS3)+ (−0.0619×expression level of ITLN1)+(−0.7526×expression level of CPT2)+(−0.4304×expression level of CLDN23). The 174 patients with risk scores higher than the median risk score (1.0048) were included into the high-risk group, whereas the rest 175 patients were included into the low-risk group (Figure 6B). The Kaplan-Meier survival analysis showed that patients in the high-risk group had shorter survival time and more deaths compared with patients in the low-risk group (Log-rank test P<0.0001), suggesting expression levels of these 7 genes could effectively distinguish the high-risk and low-risk of these colon cancer patients (Figure 6C). The AUC of the time-dependent ROC curve was 0.738, 0.769, and 0.851 for 1-year, 3-year, and 5-year OS, respectively, confirming the good prediction accuracy of this prognostic gene signature (Figure 6D). The nomogram for survival time prediction of CRC patients is shown in ure 8.

Figure 5

Gene selection through the least absolute shrinkage and selection operator (LASSO) Cox regression model. (A) The heatmap of the 22 differentially expressed genes (DEGs) identified by the LASSO Cox regression model. (B) Ten-fold cross-validation for tuning parameter (λ) selection in the LASSO Cox regression model. The vertical lines were drawn at the optimal values by the minimum criteria and the 1-SE criteria. (C) The LASSO coefficient profiles of the 101 DEGs.

Figure 6

Construction of the 7-gene signature with prognostic value. (A) The forest plot of the 7 genes identified by the multivariate Cox regression analysis. (B) The characteristics of the patients order by their risk score. Dotted line: the median risk score (1.0048). From top to bottom is the risk score, patients’ survival status distribution and heatmap of the 7 genes for patients in the low- and high-risk groups. (C) The Kaplan-Meier survival curve for patients in the low- and high- risk groups. (D) The time-dependent receiver operating characteristic (ROC) curve for predicting overall survival (OS) in colorectal cancer (CRC) patients by the risk score.

Figure 7

The expression level distribution of the 7 genes. (A) The expression level of the 7 genes between the low-risk and high-risk groups. (B) The expression level of the 7 genes between the normal and tumor groups. (C) The correlation of CXCL1 expression with pathological stage. (D) The correlation of CPT2 expression with pathological stage.

Discussion

Integrated bioinformatics analysis of CRC gene expression profiles and construction of gene signatures associated with CRC prognosis have aroused extensive attention recently. For example, Sun et al. identified 352 overlapping DEGs in 5 GEO datasets which totally included 207 CRC and matched normal samples and proposed a 5-gene prognostic signature using Cox regression models [40]. Chen et al. detected a 7-gene signature that can predict OS of CRC patients by employing Cox regression analysis combined with a robust likelihood-based survival modeling approach [11]. Xiong et al. extracted expression data of mRNAs, miRNAs, and lncRNAs in TCGA, and built a multi-RNA-based classifier for CRC patient stratification by Cox survival analysis and Lasso regression [41]. Dai et al. also used Lasso Cox regression modeling and developed a robust 15-mRNA prognostic signature from GSE39582 for predicting early relapse in stage I–III colon cancer patients [42]. As for the present study, we used the raw data of 6 whole genome platform-based microarray datasets with paired tumor and noncancerous samples and conducted corresponding normalization for them to make these data more comparable. Meanwhile, we applied the RRA approach to integrate the shared DEGs across the 6 datasets, making the results more reliable than only intersecting DEGs of different expression profiles. Moreover, to detect significantly changed biological functions in CRC, we performed GSEA for each CRC dataset and the pathways found in more than 4 datasets were taken into consideration. Ultimately, we integrated univariate, LASSO and multivariate Cox regression models to identify key prognostic genes in CRC patients. In the current study, we detected 990 common DEGs between 261 CRC and matched normal tissues in 6 microarray datasets, 885 of which were validated thorough TCGA. When conducting the GSEA, we identified 22 significantly dysregulated biological pathways in CRC. The univariate and LASSO Cox regression models selected 22 survival-related genes, and a 7-gene signature with prognostic value in CRC was finally established by the multivariate Cox analysis. The 7-gene prognostic signature consisted of 2 risky prognostic genes (TIMP1 and LZTS3) and 5 protective prognostic genes (AXIN2, CXCL1, ITLN1, CPT2, and CLDN23). Among them TIMP1, AXIN2, CXCL1 and LZTS3 were upregulated, whereas ITLN1, CPT2, and CLDN23 were downregulated in CRC compared with normal groups according to our bioinformatics analysis. For the 2 risky prognostic genes, the prognostic value of TIMP1 in CRC has been confirmed in former works, while that of LZTS3 has not. TIMP-1 is among human natural endogenous inhibitors of matrix metalloproteinases (MMPs). It has been acknowledged that MMPs, a group of proteolytic enzymes, play an important role in the degradation of extracellular matrix (ECM) components, which is critical for tumor growth, invasion and metastasis [43]. In addition to its function as an inhibitor of MMPs, TIMP-1 can stimulate cell proliferation, induce anti-apoptotic signaling and influence angiogenesis in an MMP-independent manner [44-47]. Increasing evidence, especially from meta-analysis, has shown that TIMP-1 has potential diagnostic and prognostic value in CRC, and elevated TIMP-1 may predicts shorter OS among patients with no systemic inflammatory response [48-54]. Consistent with these reports, our study also found that TIMP1 is upregulated in CRC patients and severs as a risky prognostic gene. Members of the leucine zipper tumor suppressor (LZTS) protein family are thought to play roles in cell growth modulation [55]. A past in silico work presented that LZTS3, a member of this protein family, served as a potential tumor suppressor [55]. A latest study showed that highly expressed miR-1275 could promote proliferation and metastasis of non-small cell lung cancer thorough targeting LZTS3 [56]. However, much less is known about the function of LZTS3 in CRC. Regarding the 5 protective prognostic genes, the prognostic value of CXCL1, ITLN1, CPT2 and CLDN23 in CRC have been reported, while that of AXIN2 has not been totally elucidated. CXCL1, a chemotactic cytokine, involves in cancer progression and invasion [57]. Highly elevated CXCL1 expression is found in CRC, promoting tumorigenicity, progression and metastasis [57-61], and higher CXCL1 expression is correlated with larger tumor size and later tumor stage [57]. Recent researches showed that CXCL1 serves as an independent adverse prognostic biomarker in CRC patients, and it might be a novel biomarker and potential therapeutic target for CRC treatment [57,61]. In contrast, our results showed the correlation of higher CXCL1 expression with lower tumor stage in CRC and that high level of CXCL1 predicts better outcome in CRC. The difference may derive from population heterogeneity and small sample size, and thus large-scale multi-center clinical research studies are needed due to limited evidence on the prognostic value of CXCL1 in patients with CRC. Intelectin-1 (also known as omentin-1), encoded by the ITLN1 gene, is reported as a protein that possess metabolic, inflammatory, and immune-related properties, and thereby might be correlated with CRC risk [62-66]. A previous research presented that high intelectin-1 expression is closely associated with favorable prognosis in gastric cancer patients [67]. As for CRC, our findings identified ITLN1 as a protective prognostic gene. Likewise, Kim et al. reported that intelectin-1 predicts better prognosis in stage IV CRC [68]. These findings support the functions of ITLN1 as a potential tumor suppressor in gastrointestinal cancers. Conversely, a prospective cohort study presented that higher circulating intelectin-1 concentrations were related to a higher CRC risk [62]. Since whether ITLN1 is a tumor suppressor or promoter in colorectal carcinogenesis has not been absolutely clarified [69], the prognostic value of ITLN1 should be highly valued and deserves deeper investigation. CPT2, the key enzyme in fatty acid oxidation, locating on the mitochondrial membrane [70]. Consistent with our study, decreased expression of CPT2 was detected in CRC tissue [70,71], and higher expression of CPT2 in cancer tissue as an independent prognostic factor predicts better prognosis in CRC patients [70]. The CLDN23 gene encodes a member of the claudin family, and claudins are known to be crucial in cancer growth and progression [72,73]. It has been reported that CLDN23 expression is significantly reduced in CRC tissue and lower expression of this gene correlates with shorter OS rates in CRC patients [74-76], which is consistent with our finding that CLDN23 could serve as a protective prognostic factor. Furthermore, CLDN23 expression is shown to be epigenetically regulated, and disruption of bivalent histone modifications at the CLDN23 locus probably result in remarkably reduced CLDN23 expression in CRC tissue [74]. As for the AXIN2 gene, both germline and somatic mutations in this gene were found in CRC [77]. The AXIN2 protein, acting as an essential scaffold to help assemble the β-catenin destruction complex, negatively regulates β-catenin-dependent Wnt signaling, the well-known pathway that is critical in initiation and progression of CRC and is featured by accumulation of genetic and epigenetic changes [12,77,78]. Meanwhile, AXIN2 is a transcriptional target of β-catenin-dependent Wnt signaling [79-81], and highly expressed AXIN2 is found in malignancies with activating Wnt pathway mutations [77]. Give that AXIN2 is not only a β-catenin downstream target but also a key negative feedback regulator of Wnt signaling with induction of β-catenin degradation, AXIN2 has long been hypothesized as a potential tumor suppressor [77,82]. However, the prognostic value of AXIN2 in CRC has hardly been reported. In the current study, the gene expression data we used for the integrative analysis were generated from different institutions and accessed from publicly available databases, so we cannot guarantee the quality of these data. Furthermore, the influence of the detailed features such as age, gender, race, tumor grade and stage on gene expression patterns was not considered because our study solely focused on genes consistently identified as significantly altered ones in different researches, which makes some biological information overlooked in our study. Ultimately, given that our findings came from the comprehensive in silico research, additional results from biological experiments and large-scale multi-center clinical research studies will be pivotal for supporting our findings.

Conclusions

In conclusion, we identified 7 potential prognostic biomarkers for CRC by performing the integrative analysis of the gene expression profiles of microarray and RNA sequencing. Our findings would provide more evidence for further applying novel diagnostic and prognostic biomarkers in clinical practice to facilitate the personalized treatment of CRC. Meanwhile, further biological experiments and large-scale multi-center clinical research studies are required to validate our results since our study was conducted based on data analysis.

73 in total

1. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors: B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal: Bioinformatics Date: 2003-01-22 Impact factor: 6.937

2. Expression changes of cell-cell adhesion-related genes in colorectal tumors.

Authors: Mateusz Bujko; Paulina Kober; Michal Mikula; Marcin Ligaj; Jerzy Ostrowski; Janusz Aleksander Siedlecki
Journal: Oncol Lett Date: 2015-04-08 Impact factor: 2.967

Review 3. Colorectal cancer.

Authors: Ernst J Kuipers; William M Grady; David Lieberman; Thomas Seufferlein; Joseph J Sung; Petra G Boelens; Cornelis J H van de Velde; Toshiaki Watanabe
Journal: Nat Rev Dis Primers Date: 2015-11-05 Impact factor: 52.329

Review 4. Wnt signalling and its impact on development and cancer.

Authors: Alexandra Klaus; Walter Birchmeier
Journal: Nat Rev Cancer Date: 2008-05 Impact factor: 60.716

5. Comprehensive molecular characterization of human colon and rectal cancer.

Authors:
Journal: Nature Date: 2012-07-18 Impact factor: 49.962

6. The prognostic significance of CXCL1 hypersecretion by human colorectal cancer epithelia and myofibroblasts.

Authors: Anne-France le Rolle; Thang K Chiu; Michael Fara; Jinru Shia; Zhaoshi Zeng; Martin R Weiser; Philip B Paty; Vi K Chiu
Journal: J Transl Med Date: 2015-06-24 Impact factor: 5.531

Review 7. Adiponectin and Intelectin-1: Important Adipokine Players in Obesity-Related Colorectal Carcinogenesis.

Authors: Keisuke Kawashima; Kenichi Maeda; Chiemi Saigo; Yusuke Kito; Kazuhiro Yoshida; Tamotsu Takeuchi
Journal: Int J Mol Sci Date: 2017-04-19 Impact factor: 5.923

8. Risk analysis of colorectal cancer incidence by gene expression analysis.

Authors: Wei-Chuan Shangkuan; Hung-Che Lin; Yu-Tien Chang; Chen-En Jian; Hueng-Chuen Fan; Kang-Hua Chen; Ya-Fang Liu; Huan-Ming Hsu; Hsiu-Ling Chou; Chung-Tay Yao; Chi-Ming Chu; Sui-Lung Su; Chi-Wen Chang
Journal: PeerJ Date: 2017-02-15 Impact factor: 2.984

Review 9. The claudin family of proteins in human malignancy: a clinical perspective.

Authors: Lei Ding; Zhe Lu; Qun Lu; Yan-Hua Chen
Journal: Cancer Manag Res Date: 2013-11-08 Impact factor: 3.989

10. An integrated lncRNA, microRNA and mRNA signature to improve prognosis prediction of colorectal cancer.

Authors: Yongfu Xiong; Rong Wang; Linglong Peng; Wenxian You; Jinlai Wei; Shouru Zhang; Xingye Wu; Jinbao Guo; Jun Xu; Zhenbing Lv; Zhongxue Fu
Journal: Oncotarget Date: 2017-08-07

11 in total

1. Overcoming the Challenges of High Quality RNA Extraction from Core Needle Biopsy.

Authors: Hanne Locy; Rohann J M Correa; Dorien Autaers; Ann Schiettecatte; Jan Jonckheere; Wim Waelput; Louise Cras; Stefanie Brock; Stefaan Verhulst; Keith Kwan; Marian Vanhoeij; Kris Thielemans; Karine Breckpot
Journal: Biomolecules Date: 2021-04-22

2. Differential Expression Analysis Revealing CLCA1 to Be a Prognostic and Diagnostic Biomarker for Colorectal Cancer.

Authors: Fang-Ze Wei; Shi-Wen Mei; Zhi-Jie Wang; Jia-Nan Chen; Hai-Yu Shen; Fu-Qiang Zhao; Juan Li; Zheng Liu; Qian Liu
Journal: Front Oncol Date: 2020-10-28 Impact factor: 6.244

3. Construction and validation of a metabolic risk model predicting prognosis of colon cancer.

Authors: Didi Zuo; Chao Li; Tao Liu; Meng Yue; Jiantao Zhang; Guang Ning
Journal: Sci Rep Date: 2021-03-25 Impact factor: 4.379

4. Glycosyltransferase B4GALNT2 as a Predictor of Good Prognosis in Colon Cancer: Lessons from Databases.

Authors: Michela Pucci; Nadia Malagolini; Fabio Dall'Olio
Journal: Int J Mol Sci Date: 2021-04-21 Impact factor: 5.923

5. HOXD9‑induced SCNN1A upregulation promotes pancreatic cancer cell proliferation, migration and predicts prognosis by regulating epithelial‑mesenchymal transformation.

Authors: Jinhai Chang; Xuguang Hu; Jinniang Nan; Xianghua Zhang; Xintian Jin
Journal: Mol Med Rep Date: 2021-09-24 Impact factor: 2.952

6. Identification of four novel hub genes as monitoring biomarkers for colorectal cancer.

Authors: Danqing Luo; Jing Yang; Junji Liu; Xia Yong; Zhimin Wang
Journal: Hereditas Date: 2022-01-29 Impact factor: 3.271

7. Evaluation of the Prognostic Relevance of Differential Claudin Gene Expression Highlights Claudin-4 as Being Suppressed by TGFβ1 Inhibitor in Colorectal Cancer.

Authors: Linqi Yang; Wenqi Zhang; Meng Li; Jinxi Dam; Kai Huang; Yihan Wang; Zhicong Qiu; Tao Sun; Pingping Chen; Zhenduo Zhang; Wei Zhang
Journal: Front Genet Date: 2022-02-24 Impact factor: 4.599

8. Identification of a novel lipid metabolism-related gene signature for predicting colorectal cancer survival.

Authors: Yanpeng Huang; Jinming Zhou; Haibin Zhong; Ning Xie; Fei-Ran Zhang; Zhanmin Zhang
Journal: Front Genet Date: 2022-09-06 Impact factor: 4.772

9. Long non-coding RNA (lncRNA) PGM5P4-AS1 inhibits lung cancer progression by up-regulating leucine zipper tumor suppressor (LZTS3) through sponging microRNA miR-1275.

Authors: Junpeng Feng; Jianhang Li; Peng Qie; Zhenhua Li; Yanzhao Xu; Ziqiang Tian
Journal: Bioengineered Date: 2021-12 Impact factor: 3.269

10. Machine Learning-Based Identification of Colon Cancer Candidate Diagnostics Genes.

Authors: Saraswati Koppad; Annappa Basava; Katrina Nash; Georgios V Gkoutos; Animesh Acharjee
Journal: Biology (Basel) Date: 2022-02-25