| Literature DB >> 31893510 |
Xinkui Liu1, Zhitong Bing2,3,4, Jiarui Wu1, Jingyuan Zhang1, Wei Zhou1, Mengwei Ni1, Ziqi Meng1, Shuyu Liu1, Jinhui Tian2,3, Xiaomeng Zhang1, Yingfei Li5, Shanshan Jia1, Siyu Guo1.
Abstract
BACKGROUND Despite noteworthy advancements in the multidisciplinary treatment of colorectal cancer (CRC) and deeper understanding in the molecular mechanisms of CRC, many of CRC patients with histologically identical tumors present different treatment response and prognosis. Thus, more evidence on novel predictive and prognostic biomarkers for CRC remains urgently needed. This study aims to identify potential prognostic biomarkers for CRC with integrative gene expression profiling analysis. MATERIAL AND METHODS Differential expression analysis of paired CRC and adjacent normal tissue samples in 6 microarray datasets was independently performed, and the 6 datasets were integrated by the robust rank aggregation method to detect consistent differentially expressed genes (DEGs). Aberrant expression patterns of these genes were further validated in RNA sequencing data. Then, gene set enrichment analysis (GSEA) was performed to investigate significantly dysregulated biological functions in CRC. Finally, univariate, LASSO and multivariate Cox regression models were built to identify key prognostic genes in CRC patients. RESULTS A total of 990 DEGs (495 downregulated and 495 upregulated genes) were acquired after integratedly analyzing the 6 microarray datasets, and 4131 DEGs (2050 downregulated and 2081 upregulated genes) were obtained from the RNA sequencing dataset. Subsequently, these DEGs were intersected and 885 consistent DEGs were finally identified, including 458 downregulated and 427 upregulated genes. Two risky prognostic genes (TIMP1 and LZTS3) and 5 protective prognostic genes (AXIN2, CXCL1, ITLN1, CPT2 and CLDN23) were identified, which were significantly associated with the prognosis of CRC. CONCLUSIONS The 7 genes that we identified would provide more evidence for further applying novel diagnostic and prognostic biomarkers in clinical practice to facilitate personalized treatment of CRC.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31893510 PMCID: PMC6977628 DOI: 10.12659/MSM.918906
Source DB: PubMed Journal: Med Sci Monit ISSN: 1234-1010
Clinical information for the included 349 patients.
| Characteristics | Number of cases (%) |
|---|---|
| Gender | |
| Male | 189 (54.2) |
| Female | 160 (45.8) |
| Age | |
| ≤60 | 105 (30.1) |
| >60 | 244 (69.9) |
| TNM stage | |
| Stage I | 62 (17.8) |
| Stage II | 138 (39.5) |
| Stage III | 99 (28.4) |
| Stage IV | 50 (14.3) |
| T stage | |
| Tis | 1 (0.3) |
| T1 | 7 (2.0) |
| T2 | 62 (17.8) |
| T3 | 242 (69.3) |
| T4 | 37 (10.6) |
| M stage | |
| M0 | 266 (76.2) |
| M1 | 50 (14.3) |
| MX | 31 (8.9) |
| Not reported | 2 (0.6) |
| N stage | |
| N0 | 207 (59.3) |
| N1 | 83 (23.8) |
| N2 | 59 (16.9) |
| Vital status | |
| Alive | 279 (79.9) |
| Dead | 70 (20.1) |
Figure 1Identification of differentially expressed genes (DEGs). (A) The heatmap of top 20 downregulated and upregulated DEGs identified by the integrated analysis of the 6 microarray datasets. Each column represents 1 dataset and each row represents 1 gene. The number in each rectangle represents the value of log2FC. The gradual color ranging from blue to red represents the changing process from downregulation to upregulation. (B) The heatmap of the 4131 DEGs in The Cancer Genome Atlas (TCGA) colorectal cancer (CRC) dataset. Each column represents 1 sample and each row represents 1 gene. The gradual color ranging from green to red represents the changing process from downregulation to upregulation. (C) The Venn diagram of the DEGs between the integrated Gene Expression Omnibus (GEO) dataset and the TCGA CRC dataset. (D) The heatmap of the 885 consistent DEGs (using the TCGA dataset). Each column represents 1 sample and each row represents 1 gene. The gradual color ranging from green to red represents the changing process from downregulation to upregulation.
Figure 2The volcano plot of the genes in the 7 datasets. (A) The volcano plot of the genes in GSE21510. (B) The volcano plot of the genes in GSE22598. (C) The volcano plot of the genes in GSE37182. (D) The volcano plot of the genes in GSE39582. (E) The volcano plot of the genes in GSE44076. (F) The volcano plot of the genes in GSE89076. (G) The volcano plot of the genes in The Cancer Genome Atlas (TCGA) dataset. The red dot represents the genes with adjust P<0.05 and log2FC >1, and the green dot represents the genes with adjust P<0.05 and log2FC <−1.
Figure 3The gene set enrichment analysis (GSEA) for the 7 colorectal cancer (CRC) datasets. (A) The GSEA for GSE21510. (B) The GSEA for GSE22598. (C) The GSEA for GSE37182. (D) The GSEA for GSE39582. (E) The GSEA for GSE44076. (F) The GSEA for GSE89076. (G) The GSEA for The Cancer Genome Atlas (TCGA) dataset. The top 20 suppressed and activated Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in each dataset were shown. The y-axis shows the KEGG pathway terms, and the x-axis shows the gene ratio of each term.
Figure 4The enrichment plot of the gene set enrichment analysis (GSEA) (using GSE89076). (A) The enrichment plot of 9 suppressed pathways. (B) The enrichment plot of 5 activated pathway. (C) The upSet plot for the GSEA.
Figure 5Gene selection through the least absolute shrinkage and selection operator (LASSO) Cox regression model. (A) The heatmap of the 22 differentially expressed genes (DEGs) identified by the LASSO Cox regression model. (B) Ten-fold cross-validation for tuning parameter (λ) selection in the LASSO Cox regression model. The vertical lines were drawn at the optimal values by the minimum criteria and the 1-SE criteria. (C) The LASSO coefficient profiles of the 101 DEGs.
Figure 6Construction of the 7-gene signature with prognostic value. (A) The forest plot of the 7 genes identified by the multivariate Cox regression analysis. (B) The characteristics of the patients order by their risk score. Dotted line: the median risk score (1.0048). From top to bottom is the risk score, patients’ survival status distribution and heatmap of the 7 genes for patients in the low- and high-risk groups. (C) The Kaplan-Meier survival curve for patients in the low- and high- risk groups. (D) The time-dependent receiver operating characteristic (ROC) curve for predicting overall survival (OS) in colorectal cancer (CRC) patients by the risk score.
Figure 7The expression level distribution of the 7 genes. (A) The expression level of the 7 genes between the low-risk and high-risk groups. (B) The expression level of the 7 genes between the normal and tumor groups. (C) The correlation of CXCL1 expression with pathological stage. (D) The correlation of CPT2 expression with pathological stage.