| Literature DB >> 32351322 |
Benjiao Gong1, Yanlei Kao2, Chenglin Zhang1, Fudong Sun3, Zhaohua Gong4, Jian Chen1,4.
Abstract
The high mortality of colorectal cancer (CRC) patients and the limitations of conventional tumor-node-metastasis (TNM) stage emphasized the necessity of exploring hub genes closely related to carcinogenesis and prognosis in CRC. The study is aimed at identifying hub genes associated with carcinogenesis and prognosis for CRC. We identified and validated 212 differentially expressed genes (DEGs) from six Gene Expression Omnibus (GEO) datasets and the Cancer Genome Atlas (TCGA) database. We investigated functional enrichment analysis for DEGs. The protein-protein interaction (PPI) network was constructed, and hub modules and genes in CRC carcinogenesis were extracted. A prognostic signature was developed and validated based on Cox proportional hazards regression analysis. The DEGs mainly regulated biological processes covering response to stimulus, metabolic process, and affected molecular functions containing protein binding and catalytic activity. The DEGs played important roles in CRC-related pathways involving in preneoplastic lesions, carcinogenesis, metastasis, and poor prognosis. Hub genes closely related to CRC carcinogenesis were extracted including six genes in model 1 (CXCL1, CXCL3, CXCL8, CXCL11, NMU, and PPBP) and two genes and Metallothioneins (MTs) in model 2 (SLC26A3 and SLC30A10). Among them, CXCL8 was also related to prognosis. An eight-gene signature was proposed comprising AMH, WBSCR28, SFTA2, MYH2, POU4F1, SIX4, PGPEP1L, and PAX5. The study identified hub genes in CRC carcinogenesis and proposed an eight-gene signature with good reproducibility and robustness at the molecular level for CRC, which might provide directive significance for treatment selection and survival prediction.Entities:
Mesh:
Year: 2020 PMID: 32351322 PMCID: PMC7171686 DOI: 10.1155/2020/5934821
Source DB: PubMed Journal: Mediators Inflamm ISSN: 0962-9351 Impact factor: 4.711
Information for six GEO datasets in the study.
| Dataset | Platform | Number of samples (tumor/control) |
|---|---|---|
| GSE21510 | [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array | 148 (104/44) |
| GSE24514 | [HG-U133A] Affymetrix Human Genome U133A Array | 49 (34/15) |
| GSE32323 | [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array | 44 (22/22) |
| GSE89076 | Agilent-039494 SurePrint G3 Human GE v2 8x60K Microarray 039381 | 80 (41/39) |
| GSE110225 | [HG-U133A] Affymetrix Human Genome U133A Array; [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array | 60 (30/30) |
| GSE113513 | [PrimeView] Affymetrix Human Gene Expression Array | 28 (14/14) |
Figure 1DEG identification from GEO and validation from TCGA. (a) The top 20 up- and downregulated genes in six GEO datasets based on a RRA package. (b) Overlapping DEGs between GEO and TCGA database.
Figure 2Functional enrichment analysis for DEGs. (a) The top 10 terms of biological process enrichment for up- and downregulated DEGs. (b) The top 10 terms of molecular function enrichment for up- and downregulated DEGs. (c) The top 10 terms of reactome pathway enrichment for upregulated DEGs. (d) The top 10 terms of reactome pathway enrichment for downregulated DEGs.
Figure 3Construction of PPI network and module analysis. (a) The PPI network with red nodes for upregulated genes and green nodes for downregulated genes. (b) Module 1 of PPI network. (c) Module 2 of PPI network. (d) Reactome pathway enrichment for module 1. (e) Reactome pathway enrichment for module 2. (f) Survival curve of CXCL8. (g) Survival curve of CXCL13. (h) Survival curve of CLCA1.
Figure 4LASSO regression analysis for the train group. (a) LASSO coefficient profiles of prognostic genes with P < 0.001. (b) Selection of the optimal value of lambda via 10-fold cross-validations.
Prognostic information for the eight genes in train group.
| Gene symbol | Univariate analysis | Multivariate analysis | |||
|---|---|---|---|---|---|
| HR (95% CI) |
| HR (95% CI) |
| Coefficient | |
| AMH | 1.001 (1.000-1.02) | 0.000297 | 1.001 (1.000-1.001) | 0.011546 | 0.000842 |
| WBSCR28 | 1.022 (1.010-1.033) | 0.000139 | 1.012 (0.999-1.026) | 0.080719 | 0.012188 |
| SFTA2 | 1.001 (1.001-1.002) | 1.61 | 1.001 (1.001-1.002) | 0.000137 | 0.001245 |
| MYH2 | 1.061 (1.029-1.095) | 0.000162 | 1.067 (1.027-1.108) | 0.00076 | 0.064845 |
| POU4F1 | 1.005 (1.003-1.008) | 5.65 | 1.004 (1.002-1.007) | 0.002323 | 0.004278 |
| SIX4 | 1.003 (1.002-1.004) | 6.33 | 1.003 (1.002-1.005) | 1.79 | 0.003124 |
| PGPEP1L | 1.061 (1.032-1.090) | 2.46 | 1.070 (1.038-1.103) | 1.43 | 0.067637 |
| PAX5 | 1.001 (1.000-1.001) | 1.53 | 1.001 (1.000-1.001) | 0.000106 | 0.000774 |
Figure 5The evaluation and confirmation of the eight-gene signature. (a) The risk score distribution for the train group. (b) The risk score distribution for the test group. (c) The survival time statistic for the train group. (d) The survival time statistic for the test group. (e) Survival curve for the train group. (f) Survival curve for the test group. (g) ROC curve for the train group. (h) ROC curve for the test group. (i) Gene expression pattern for the train group. (j) Gene expression pattern for the test group.