| Literature DB >> 34346563 |
Wenbo Zheng1, Yijia Lu2, Xiaochuang Feng1, Chunzhao Yang1, Ling Qiu1, Haijun Deng1, Qi Xue3, Kai Sun1.
Abstract
Colorectal cancer (CRC) is a malignant tumor and morbidity rates are among the highest in the world. The variation in CRC patients' prognosis prompts an urgent need for new molecular biomarkers to improve the accuracy for predicting the CRC patients' prognosis or as a complement to the traditional TNM staging for clinical practice. CRC patients' gene expression data of HTSeq-FPKM and matching clinical information were downloaded from The Cancer Genome Atlas (TCGA) datasets. Patients were randomly divided into a training dataset and a test dataset. By univariate and multivariate Cox regression survival analyses and Lasso regression analysis, a prediction model which divided each patient into high-or low-risk group was constructed. The differences in survival time between the two groups were compared by the Kaplan-Meier method and the log-rank test. The weighted gene co-expression network analysis (WGCNA) was used to explore the relationship between all the survival-related genes. The survival outcomes of patients whose overall survival (OS) time were significantly lower in the high-risk group than that in the low-risk group both in the training and test datasets. Areas under the ROC curves which termed AUC values of our 9-gene signature achieved 0.823 in the training dataset and 0.806 in the test dataset. A nomogram was constructed for clinical practice when we combined the 9-gene signature with TNM stage and age to evaluate the survival time of patients with CRC, and the C-index increased from 0.739 to 0.794. In conclusion, we identified nine novel biomarkers that not only are independent prognostic indexes for CRC patients but also can serve as a good supplement to traditional clinicopathological factors to more accurately evaluate the survival of CRC patients.Entities:
Keywords: CRC; WGCNA; nomogram; prognosis; survival-related genes
Mesh:
Year: 2021 PMID: 34346563 PMCID: PMC8419765 DOI: 10.1002/cam4.4104
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
Information of CRC patients in the training dataset and test dataset from TCGA database and validation dataset from the GEO database
| Variables | TCGA training dataset | TCGA test dataset | GEO validation dataset |
|---|---|---|---|
| Age (mean, range) | 66.8 (31‐90) | 66.3 (33‐90) | 69.7 (36.6‐94) |
|
| |||
| Male | 130 | 111 | 87 |
| Female | 105 | 121 | 67 |
|
| |||
| Stage I | 43 | 39 | |
| Stage II | 92 | 85 | 83 |
| Stage III | 56 | 69 | 71 |
| Stage IV | 39 | 30 | |
| Unknown | 5 | 9 | |
| T stage | |||
| Tis | 0 | 1 | |
| T1 | 5 | 9 | 1 |
| T2 | 44 | 39 | 6 |
| T3 | 162 | 156 | 109 |
| T4 | 24 | 27 | 38 |
| M stage | |||
| M0 | 172 | 175 | 86 |
| M1 | 38 | 30 | |
| Mx | 20 | 25 | 68 |
| Unknown | 5 | 2 | |
| N stage | |||
| N0 | 141 | 134 | 83 |
| N1 | 57 | 55 | 50 |
| N2 | 36 | 43 | 21 |
| Nx | 1 | 0 | |
| Tumor location | |||
| Colon | 196 | 183 | 127 |
| Rectum | 39 | 49 | 27 |
| Total | 235 | 232 | 154 |
Abbreviations: CRC, colorectal cancer; GEO, Gene Expression Omnibus; TCGA, The Cancer Genome Atlas.
FIGURE 1The workflow of the identification of CRC OS‐related 9‐gene signature
FIGURE 2Seventeen genes were screened out by Lasso regression analysis of the 62 genes in the training dataset. (A) The longitudinal solid line represents the partial likelihood deviation ±standard error and the longitudinal dotted line indicates that the best parameter is selected according to the minimum value (left) and 1‐SE (right). Lambda is the tuning parameter. (B) y axis represents Coefficients. Each curve in the graph corresponds to the value of each characteristic regression coefficient varying with the log(Lambda) value
Overall information of the nine prognostic genes in our signature
| Ensembl ID | Gene symbol | Location | Coefficient | Description | |
|---|---|---|---|---|---|
|
ENSG 00 000 104 892 | KLC3 |
Chr19: 45,333,434‐45,351,520(+) | 0.417 | 0.037 | Kinesin light chain 3 |
|
ENSG 00 000 205 704 | LINC00634 |
Chr 22: 41,952,174‐41,958,933(+) | 1.131 | 0.029 |
Long intergenic non‐protein coding RNA 634 |
|
ENSG 00 000 257 108 | NHLRC4 |
Chr 16: 567,005‐569,495(+) | 2.155 | 0.0007 | NHL repeat containing 4 |
|
ENSG 00 000 174 370 | C11orf45 |
Chr 11: 128,899,565‐128,906,069(−) | 0.979 | 0.005 |
Chromosome 11 open reading frame 45 |
|
ENSG 00 000 155 592 | ZKSCAN2 |
Chr 16: 25,236,001‐25,257,845(−) | 0.417 | 0.009 |
Zinc finger with KRAB and SCAN domains 2 |
|
ENSG 00 000 088 727 | KIF9 |
Chr 3: 47,228,026‐47,283,451(−) | −0.462 | 0.022 | Kinesin family member 9 |
|
ENSG 00 000 166 813 | KIF7 |
Chr 15: 89,608,789‐89,655,467(−) | 0.522 | 0.002 | Kinesin family member 7 |
|
ENSG 00 000 103 449 | SALL1 |
Chr 16: 51,135,975‐51,151,367(−) | −0.741 | 0.119 | Spalt‐like transcription factor 1 |
|
ENSG 00 000 181 781 | ODF3L2 |
Chr 19: 463,346‐474,983(−) | 0.71 | 0.084 |
Outer dense fiber of sperm tails 3 like 2 |
FIGURE 3The risk score distribution, gene expression, and CRC patients’ survival status in the training (A–C) and test (D–F) datasets are based on the risk score of the 9‐gene signature
FIGURE 4The OS time of patients in the high‐and low‐risk groups in the training (A,B) and test (C,D) datasets and the corresponding ROC curve in which the AUC values reached 0.823 in the training dataset and 0.806 in the test dataset
FIGURE 5The expression varieties of these nine genes between CRC tumor tissues and normal tissues (A) and between the high‐and low‐risk groups (B)
FIGURE 6Validation and comparison of the 9‐gene signature with other researcher's prognosis prediction model and TNM stage in GEO dataset. (A) The discrimination of CRC patient's OS time by Zhou's 5‐gene signature. (B) A multi‐ROC curve showed a comparison of AUC values between the 9‐gene signature and other factors including Zhou's signature and TNM stage. (C) The difference in OS time between high‐risk and low‐risk groups based on the 9‐gene signature
Univariate and multivariate Cox regression analyses of OS in the entire dataset
| Variables | Univariate analysis | Multivariate analysis | ||||
|---|---|---|---|---|---|---|
| HR | 95%CI | HR | 95%CI | |||
| Age | 1.733 | 1.138–2.641 | 0.01 | 2.206 | 1.432‐3.401 | 3.36E−04 |
| Gender | 1.039 | 0.643–1.68 | 0.875 | |||
| Stage | 2.511 | 1.904–3.311 | 7.09E−11 | 2.442 | 1.845‐3.231 | 4.17E−10 |
| T | 2.915 | 1.818–4.675 | 9.00E−06 | |||
| M | 5.196 | 3.192–8.458 | 3.37E−11 | |||
| N | 2.205 | 1.664–2.921 | 3.64E−08 | |||
| Risk (high vs. low) | 5.114 | 2.730–9.581 | 3.48E−07 | 4.393 | 2.314‐8.339 | 6.01E−06 |
Abbreviations: CI, confidence interval; HR, hazard ratio; OS, overall survival.
FIGURE 7(A–I) The expression levels of the nine genes in CRC patients across different TNM stages
FIGURE 8(A) The comparison of C‐index after and before adding the risk grade based on the 9‐gene signature to TNM stage, age, and gender. (B–C) A demonstration of the satisfaction between the predicted and actually 3‐and 5‐year survival in all patients through calibration curves. (D) A nomogram constructed by multi‐cox regression analysis on risk, TNM stage, age, and gender to apply the 9‐gene signature in clinical practice
FIGURE 9WGCNA was used to identify the co‐expression of survival‐related genes with these nine genes. (A) The soft threshold of a scale‐free topology model. (B–C) The hierarchical clustering tree and its 19 corresponding gene modules