| Literature DB >> 29344121 |
Yida Pan1, Hongyang Zhang1, Mingming Zhang2, Jie Zhu1, Jianghong Yu1,3, Bangting Wang1, Jigang Qiu4, Jun Zhang1.
Abstract
Colorectal cancer (CRC) is one of the most frequently occurring malignancies worldwide. The outcomes of patients with similar clinical symptoms or at similar pathological stages remain unpredictable. This inherent clinical diversity is most likely due to the genetic heterogeneity. The present study aimed to create a predicting tool to evaluate patient survival based on genetic profile. Firstly, three Gene Expression Omnibus (GEO) datasets (GSE9348, GSE44076 and GSE44861) were utilized to identify and validate differentially expressed genes (DEGs) in CRC. The GSE14333 dataset containing survival information was then introduced in order to screen and verify prognosis-associated genes. Of the 66 DEGs, the present study screened out 46 biomarkers closely associated to patient overall survival. By Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis, it was demonstrated that these genes participated in multiple biological processes which were highly associated with cancer proliferation, drug-resistance and metastasis, thus further affecting patient survival. The five most important genes, MET proto-oncogene, receptor tyrosine kinase, carboxypeptidase M, serine hydroxymethyltransferase 2, guanylate cyclase activator 2B and sodium voltage-gated channel a subunit 9 were selected by a random survival forests algorithm, and were further made up to a linear risk score formula by multivariable cox regression. Finally, the present study tested and verified this risk score within three independent GEO datasets (GSE14333, GSE17536 and GSE29621), and observed that patients with a high risk score had a lower overall survival (P<0.05). Furthermore, this risk score was the most significant compared with other predicting factors including age and American Joint Committee on Cancer stage, in the model, and was able to predict patient survival independently and directly. The findings suggest that this survival associated DEGs-based risk score is a powerful and accurate prognostic tool and is promisingly implemented in a clinical setting.Entities:
Keywords: DEG; colorectal cancer; microarray; overall survival; risk score
Year: 2017 PMID: 29344121 PMCID: PMC5754913 DOI: 10.3892/ol.2017.7097
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Figure 1.DEGs in colorectal cancer. (A) Heatmap of 66 DEGs expression in cancer and non-cancerous tissue of GSE9348. More detailed information could be achieved by contacting the corresponding author. (B) Immunohistochemistry (IHC) pictures of SCN9A, UGP2 and CWH43 as downregulated DEGs were archieved from the Human Protein Atlas database (HPA). (C) IHC results of MET, MYC and SHMT2 as upregulated DEGs from HPA. (D) ROC curves of three linear classifier CC, DLDA and SVM in training set GSE9348. FPR, false positive rate; TPR, true positive rate; AUC, area under curve; DEGs, Differentially expressed genes; T, tumor; N, normal.
Survival related DEGs by univariable cox proportional hazards regression analysis.
| Gene | P-value | HR | Gene | P-value | HR |
|---|---|---|---|---|---|
| LOC339166 | <1e-07 | 7.748 | MYC | 7E-07 | 0.581 |
| SCN9A | <1e-07 | 0.154 | SQRDL | 7E-07 | 0.513 |
| LGI1 | <1e-07 | 0.115 | SHMT2 | 0.000001 | 0.509 |
| P2RY1 | <1e-07 | 3.592 | PDE6A | 2.1E-06 | 2.229 |
| PRPF4 | <1e-07 | 0.245 | UGDH | 2.3E-06 | 1.792 |
| GUCA2B | <1e-07 | 1.688 | PTPRH | 2.5E-06 | 1.733 |
| ENOX2 | <1e-07 | 0.193 | PPP2R3A | 8.4E-06 | 2.19 |
| NPY | <1e-07 | 4.787 | HSPH1 | 2.62E-05 | 1.61 |
| SCGN | <1e-07 | 2.266 | NR5A2 | 3.16E-05 | 0.585 |
| TMEM9B | <1e-07 | 3.445 | TRIP13 | 3.21E-05 | 0.631 |
| RNASEH2A | <1e-07 | 0.438 | CPM | 6.06E-05 | 0.498 |
| HSD11B2 | <1e-07 | 0.647 | DUSP14 | 0.000183 | 0.54 |
| DENND2A | <1e-07 | 0.299 | RCL1 | 0.000274 | 0.415 |
| ASPA | <1e-07 | 3.507 | ETV4 | 0.000396 | 0.672 |
| CA7 | <1e-07 | 2.626 | SEMA6D | 0.000472 | 1.9 |
| LPHN3 | <1e-07 | 0.247 | HOMER1 | 0.000475 | 0.666 |
| ABCG2 | <1e-07 | 1.497 | CCND1 | 0.000522 | 1.584 |
| GALNT6 | <1e-07 | 0.588 | METTL7A | 0.000543 | 2.012 |
| PTGDR | <1e-07 | 0.336 | MET | 0.000577 | 1.528 |
| TST | <1e-07 | 0.497 | CWH43 | 0.0006 | 0.699 |
| SMPDL3A | 1E-07 | 0.428 | DHRS11 | 0.000607 | 0.748 |
| HSD17B11 | 1E-07 | 2.087 | UGP2 | 0.000701 | 1.977 |
| ETFDH | 3E-07 | 0.549 | SLC22A18AS | 0.000812 | 0.558 |
HR, hazard ratio.
GO analysis and KEGG pathway analysis of 46 survival related-DEGs (partial data).
| Genes | Hyp | Hyp[ | Annotations |
|---|---|---|---|
| Biological process | |||
| 5 | 4.7E-05 | 0.00408 | GO:0042493: Response to drug (BP) |
| 4 | 0.00079 | 0.02956 | GO:0008152: Metabolic process (BP) |
| 3 | 0.00804 | 0.03142 | GO:0008283: Cell proliferation (BP) |
| 3 | 0.00769 | 0.03249 | GO:0007411: Axon guidance (BP) |
| 3 | 0.01192 | 0.0359 | GO:0008284: Positive regulation of cell proliferation (BP) |
| 3 | 0.02362 | 0.04835 | GO:0045893: Positive regulation of transcription, DNA-dependent (BP) |
| Molecular function | |||
| 13 | 0.00391 | 0.02429 | GO:0005515: Protein binding (MF) |
| 7 | 1.6E-06 | 0.00019 | GO:0016491: Oxidoreductase activity (MF) |
| 7 | 0.01988 | 0.04888 | GO:0000166: Nucleotide binding (MF) |
| 6 | 0.01607 | 0.0431 | GO:0004872: Receptor activity (MF) |
| 5 | 0.00871 | 0.03213 | GO:0016787: Hydrolase activity (MF) |
| 4 | 0.00865 | 0.03294 | GO:0016740: Transferase activity (MF) |
| 4 | 0.01535 | 0.04312 | GO:0004930: G-protein coupled receptor activity (MF) |
| Cellular component | |||
| 15 | 0.00234 | 0.03334 | GO:0005737: Cytoplasm (CC) |
| 13 | 0.00169 | 0.03219 | GO:0016020: Membrane (CC) |
| 11 | 0.00562 | 0.03205 | GO:0005886: Plasma membrane (CC) |
| 9 | 0.00075 | 0.02125 | GO:0005576: Extracellular region (CC) |
| 7 | 0.0028 | 0.02656 | GO:0005730: Nucleolus (CC) |
| 6 | 0.00948 | 0.04156 | GO:0005739: Mitochondrion (CC) |
| 5 | 0.00357 | 0.02911 | GO:0005615: Extracellular space (CC) |
| 4 | 0.00072 | 0.04092 | GO:0005743: Mitochondrial inner membrane (CC) |
| 3 | 0.00236 | 0.02696 | GO:0005759: Mitochondrial matrix (CC) |
| KEGG pathway | |||
| 3 | 0.0089 | 0.0411 | (KEGG) 05200: Pathways in cancer |
| 2 | 0.00215 | 0.02152 | (KEGG) 05213: Endometrial cancer |
| 2 | 0.00258 | 0.02211 | (KEGG) 05221: Acute myeloid leukemia |
| 2 | 0.00304 | 0.02282 | (KEGG) 05210: Colorectal cancer |
| 2 | 0.00077 | 0.02304 | (KEGG) 00040: Pentose and glucuronate interconversions |
| 2 | 0.00207 | 0.02485 | (KEGG) 00500: Starch and sucrose metabolism |
| 2 | 0.00419 | 0.02514 | (KEGG) 05220: Chronic myeloid leukemia |
| 2 | 0.00386 | 0.02574 | (KEGG) 05218: Melanoma |
| 2 | 0.00184 | 0.02755 | (KEGG) 00520: Amino sugar and nucleotide sugar metabolism |
| 2 | 0.00141 | 0.02818 | (KEGG) 05219: Bladder cancer |
| 2 | 0.00551 | 0.03004 | (KEGG) 05222: Small cell lung cancer |
| 2 | 0.00063 | 0.03755 | (KEGG) 05216: Thyroid cancer |
| 2 | 0.01148 | 0.04919 | (KEGG) 04110: Cell cycle |
| 2 | 0.01238 | 0.04953 | (KEGG) 04360: Axon guidance |
Partial data, genes involved ≥3 (GO analysis) or gene involved ≥2 (KEGG pathway analysis). Genes involved in all KEGG pathway above were MYC and CCND1.
corrected Hyp. Hyp, hypergeometric P-value; BP, biological processes; MF, molecular function; CC, cellular component.
Figure 2.ROC curves of linear classifier CC, DLDA and SVM of validation sets. ROC curves of linear classifier CC, DLDA and SVM in validation sets GSE44076 (A) and GSE44861 (B). FPR, false positive rate; TPR, true positive rate; AUC, area under curve.
Multivariable and univariable model tests of risk score and other factors.
| A, GSE14333 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Multivariable model | Univariable model | |||||||
| Variables | HR | 95% CI of HR | P-value | HR | 95% CI of HR | P-value | ||
| Risk score | 2.346 | 1.298 | 4.241 | 0.005 | 2.718 | 1.523 | 4.851 | 0.001 |
| Location | 0.965 | 0.814 | 1.144 | 0.683 | 0.892 | 0.76 | 1.047 | 0.163 |
| Dukes stage | 1.18 | 0.926 | 1.503 | 0.18 | 1.044 | 0.86 | 1.266 | 0.666 |
| Age of diagnosis | 1.008 | 0.994 | 1.023 | 0.257 | 1.105 | 1.002 | 1.028 | 0.02 |
| Sex | 0.926 | 0.683 | 1.255 | 0.62 | 0.877 | 0.651 | 1.182 | 0.39 |
| Adj XRT | 0.463 | 0.218 | 0.984 | 0.045 | 0.433 | 0.212 | 0.884 | 0.021 |
| Adj CTX | 0.867 | 0.568 | 1.325 | 0.51 | 0.847 | 0.618 | 1.16 | 0.3 |
| Risk score | 2.745 | 1.204 | 6.262 | 0.016 | 3.283 | 1.489 | 7.236 | 0.003 |
| Age | 1.015 | 0.999 | 1.031 | 0.061 | 1.018 | 1.003 | 1.034 | 0.016 |
| Sex | 1.084 | 0.747 | 1.572 | 0.672 | 0.953 | 0.666 | 1.362 | 0.79 |
| Ethnicity | 0.967 | 0.728 | 1.284 | 0.817 | 0.915 | 0.685 | 1.221 | 0.545 |
| AJCC stage | 1.107 | 0.892 | 1.373 | 0.357 | 1.051 | 0.861 | 1.284 | 0.625 |
| Grade | 1.254 | 0.828 | 1.898 | 0.285 | 1.375 | 0.924 | 2.045 | 0.116 |
| Risk score | 9.03 | 1.425 | 57.223 | 0.019 | 2.526 | 0.481 | 13.269 | 2.73E-05 |
| Sex | 1.243 | 0.513 | 3.014 | 0.63 | 1.508 | 0.649 | 3.505 | 0.34 |
| T stage | 0.449 | 0.091 | 2.209 | 0.325 | 1.048 | 0.438 | 2.509 | 0.915 |
| N stage | 1.583 | 0.604 | 4.143 | 0.35 | 2.688 | 1.526 | 4.734 | 0.001 |
| M stage | 2.065 | 0.368 | 11.592 | 0.41 | 4.934 | 2.188 | 11.124 | 1.19E-04 |
| Histology grade | 0.849 | 0.325 | 2.219 | 0.738 | 0.665 | 0.284 | 1.558 | 0.348 |
| AJCC stage | 1.965 | 0.518 | 7.45 | 0.321 | 2.708 | 1.615 | 4.542 | 1.59E-04 |
HR, hazard ratio; Adj XRT, adjuvant radiation therapy; Adj CTX, adjuvant chemotherapy.
GO analysis and KEGG pathway analysis of 46 survival related-DEGs (partial data).
| Genes | Hyp | Hyp[ | Annotations |
|---|---|---|---|
| Biological process | |||
| 5 | 4.7E-05 | 0.00408 | GO:0042493: Response to drug (BP) |
| 4 | 0.00079 | 0.02956 | GO:0008152: Metabolic process (BP) |
| 3 | 0.00804 | 0.03142 | GO:0008283: Cell proliferation (BP) |
| 3 | 0.00769 | 0.03249 | GO:0007411: Axon guidance (BP) |
| 3 | 0.01192 | 0.0359 | GO:0008284: Positive regulation of cell proliferation (BP) |
| 3 | 0.02362 | 0.04835 | GO:0045893: Positive regulation of transcription, DNA-dependent (BP) |
| Molecular function | |||
| 13 | 0.00391 | 0.02429 | GO:0005515: Protein binding (MF) |
| 7 | 1.6E-06 | 0.00019 | GO:0016491: Oxidoreductase activity (MF) |
| 7 | 0.01988 | 0.04888 | GO:0000166: Nucleotide binding (MF) |
| 6 | 0.01607 | 0.0431 | GO:0004872: Receptor activity (MF) |
| 5 | 0.00871 | 0.03213 | GO:0016787: Hydrolase activity (MF) |
| 4 | 0.00865 | 0.03294 | GO:0016740: Transferase activity (MF) |
| 4 | 0.01535 | 0.04312 | GO:0004930: G-protein coupled receptor activity (MF) |
| Cellular component | |||
| 15 | 0.00234 | 0.03334 | GO:0005737: Cytoplasm (CC) |
| 13 | 0.00169 | 0.03219 | GO:0016020: Membrane (CC) |
| 11 | 0.00562 | 0.03205 | GO:0005886: Plasma membrane (CC) |
| 9 | 0.00075 | 0.02125 | GO:0005576: Extracellular region (CC) |
| 7 | 0.0028 | 0.02656 | GO:0005730: Nucleolus (CC) |
| 6 | 0.00948 | 0.04156 | GO:0005739: Mitochondrion (CC) |
| 5 | 0.00357 | 0.02911 | GO:0005615: Extracellular space (CC) |
| 4 | 0.00072 | 0.04092 | GO:0005743: Mitochondrial inner membrane (CC) |
| 3 | 0.00236 | 0.02696 | GO:0005759: Mitochondrial matrix (CC) |
| KEGG pathway | |||
| 3 | 0.0089 | 0.0411 | (KEGG) 05200: Pathways in cancer |
| 2 | 0.00215 | 0.02152 | (KEGG) 05213: Endometrial cancer |
| 2 | 0.00258 | 0.02211 | (KEGG) 05221: Acute myeloid leukemia |
| 2 | 0.00304 | 0.02282 | (KEGG) 05210: Colorectal cancer |
| 2 | 0.00077 | 0.02304 | (KEGG) 00040: Pentose and glucuronate interconversions |
| 2 | 0.00207 | 0.02485 | (KEGG) 00500: Starch and sucrose metabolism |
| 2 | 0.00419 | 0.02514 | (KEGG) 05220: Chronic myeloid leukemia |
| 2 | 0.00386 | 0.02574 | (KEGG) 05218: Melanoma |
| 2 | 0.00184 | 0.02755 | (KEGG) 00520: Amino sugar and nucleotide sugar metabolism |
| 2 | 0.00141 | 0.02818 | (KEGG) 05219: Bladder cancer |
| 2 | 0.00551 | 0.03004 | (KEGG) 05222: Small cell lung cancer |
| 2 | 0.00063 | 0.03755 | (KEGG) 05216: Thyroid cancer |
| 2 | 0.01148 | 0.04919 | (KEGG) 04110: Cell cycle |
| 2 | 0.01238 | 0.04953 | (KEGG) 04360: Axon guidance |
Partial data, genes involved ≥3 (GO analysis) or gene involved ≥2 (KEGG pathway analysis). Genes involved in all KEGG pathway above were MYC and CCND1.
corrected hypergeometric P-value; Hyp, Hypergeometric P-value; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological processes; MF, molecular function; CC, cellular component.
Figure 3.Survival related-DEGs ranked by variable importance. (A) Error rate of random survival forests algorithm (Ntree =1,000, default parameters of Hemant Ishwaran algorithm). (B) Variable importance of the 46 survival related-DEGs. DEGs, differentially expressed genes.
Figure 4.Test and validation of risk score in independent GEO datasets. (A) Kaplan-Meier survival curve of low and high risk patients in Training GSE14333 (P=0.001, by Mantel-Cox log rank). (B) Scatter diagram of live and dead outcome with different risk score value of GSE14333. Kaplan-Meier survival curve of low and high risk patients in validation set GSE17536 (P=0.001) (C) and GSE29621 (P=0.038) (D). (E) Gene expression distribution of the 5 most important biomarkers in low and high risk patients in GSE14333, GSE17536 and GSE29621. Genes in GSE14333 were SCN9A, CPM, GUCA2B, MET and SHMT2 from top to bottom. Genes in GSE17536 and GSE29621 were SHMT2, MET, CPM, GUCA2B and SCN9A from top to bottom. GEO, Gene Expression Omnibus.
Multivariable and univariable model tests of risk score and other factors.
| A, GSE14333 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Multivariable model | Univariable model | |||||||
| Variables | HR | 95% CI of HR | P-value | HR | 95% CI of HR | P-value | ||
| Risk score | 2.346 | 1.298 | 4.241 | 0.005 | 2.718 | 1.523 | 4.851 | 0.001 |
| Location | 0.965 | 0.814 | 1.144 | 0.683 | 0.892 | 0.76 | 1.047 | 0.163 |
| Dukes stage | 1.18 | 0.926 | 1.503 | 0.18 | 1.044 | 0.86 | 1.266 | 0.666 |
| Age of diagnosis | 1.008 | 0.994 | 1.023 | 0.257 | 1.105 | 1.002 | 1.028 | 0.02 |
| Sex | 0.926 | 0.683 | 1.255 | 0.62 | 0.877 | 0.651 | 1.182 | 0.39 |
| Adj XRT | 0.463 | 0.218 | 0.984 | 0.045 | 0.433 | 0.212 | 0.884 | 0.021 |
| Adj CTX | 0.867 | 0.568 | 1.325 | 0.51 | 0.847 | 0.618 | 1.16 | 0.3 |
| Risk score | 2.745 | 1.204 | 6.262 | 0.016 | 3.283 | 1.489 | 7.236 | 0.003 |
| Age | 1.015 | 0.999 | 1.031 | 0.061 | 1.018 | 1.003 | 1.034 | 0.016 |
| Sex | 1.084 | 0.747 | 1.572 | 0.672 | 0.953 | 0.666 | 1.362 | 0.79 |
| Ethnicity | 0.967 | 0.728 | 1.284 | 0.817 | 0.915 | 0.685 | 1.221 | 0.545 |
| AJCC stage | 1.107 | 0.892 | 1.373 | 0.357 | 1.051 | 0.861 | 1.284 | 0.625 |
| grade | 1.254 | 0.828 | 1.898 | 0.285 | 1.375 | 0.924 | 2.045 | 0.116 |
| Risk score | 9.03 | 1.425 | 57.223 | 0.019 | 2.526 | 0.481 | 13.269 | 2.73E-05 |
| Sex | 1.243 | 0.513 | 3.014 | 0.63 | 1.508 | 0.649 | 3.505 | 0.34 |
| T stage | 0.449 | 0.091 | 2.209 | 0.325 | 1.048 | 0.438 | 2.509 | 0.915 |
| N stage | 1.583 | 0.604 | 4.143 | 0.35 | 2.688 | 1.526 | 4.734 | 0.001 |
| M stage | 2.065 | 0.368 | 11.592 | 0.41 | 4.934 | 2.188 | 11.124 | 1.19E-04 |
| Histology grade | 0.849 | 0.325 | 2.219 | 0.738 | 0.665 | 0.284 | 1.558 | 0.348 |
| AJCC stage | 1.965 | 0.518 | 7.45 | 0.321 | 2.708 | 1.615 | 4.542 | 1.59E-04 |
HR, hazard ratio; Adj XRT, adjuvant radiation therapy; Adj CTX, adjuvant chemotherapy.