Literature DB >> 35421268

Identification and construction of a 13-gene risk model for prognosis prediction in hepatocellular carcinoma patients.

Daming Cheng1, Libing Wang1, Fengzhi Qu1, Jingkun Yu1, Zhaoyuan Tang1, Xiaogang Liu1.   

Abstract

We attempted to screen out the feature genes associated with the prognosis of hepatocellular carcinoma (HCC) patients through bioinformatics methods, to generate a risk model to predict the survival rate of patients. Gene expression information of HCC was accessed from GEO database, and differentially expressed genes (DEGs) were obtained through the joint analysis of multi-chip. Functional and pathway enrichment analyses of DEGs indicated that the enrichment was mainly displayed in biological processes such as nuclear division. Based on TCGA-LIHC data set, univariate, LASSO, and multivariate Cox regression analyses were conducted on the DEGs. Then, 13 feature genes were screened for the risk model. Also, the hub genes were examined in our collected clinical samples and GEPIA database. The performance of the risk model was validated by Kaplan-Meier survival analysis and receiver operation characteristic (ROC) curves. While its universality was verified in GSE76427 and ICGC (LIRI-JP) validation cohorts. Besides, through combining patients' clinical features (age, gender, T staging, and stage) and risk scores, univariate and multivariate Cox regression analyses revealed that the risk score was an effective independent prognostic factor. Finally, a nomogram was implemented for 3-year and 5-year overall survival prediction of patients. Our findings aid precision prediction for prognosis of HCC patients.
© 2022 The Authors. Journal of Clinical Laboratory Analysis published by Wiley Periodicals LLC.

Entities:  

Keywords:  bioinformatics methods; feature genes; hepatocellular carcinoma; prognosis; risk model

Mesh:

Substances:

Year:  2022        PMID: 35421268      PMCID: PMC9102505          DOI: 10.1002/jcla.24377

Source DB:  PubMed          Journal:  J Clin Lab Anal        ISSN: 0887-8013            Impact factor:   3.124


INTRODUCTION

As a malignancy, hepatocellular carcinoma (HCC) has high fatality. HCC occurrence and progression are complex, involving changes in gene expression profiles and intracellular signaling pathways. The survival rate of HCC patients in the early stage can be elevated to 60–80% through surgical resection, local ablation, liver transplantation, and other therapeutic options. Therefore, there is an urgent need to construct new biomarkers and models for prognostic prediction of HCC patients, thus providing patients with more effective and individual treatments. As one of the main components of the gene expression network, mRNA affects the biological processes of cells, and it is vital in cancer progression that can regulate cell proliferation, apoptosis, differentiation, metabolism, and metastasis. Recent studies manifested that several mRNAs can be used as potential prognostic biomarkers in varying cancer types. , Li et al. disclosed that based on RNA‐seq data, risk models can precisely predict cancer patients’ survival rate. Through bioinformatics methods, HCC data in The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases were comprehensively analyzed to unveil the mechanism of HCC occurrence and progression, thus identifying novel biomarkers for HCC diagnosis and prognosis. The application of tumor prognostic risk models obtained through bioinformatics analysis is widely used. For instance, Li et al. exhibited that the risk score is a valuable indicator for recurrence prediction of prostate cancer patients. Bai et al. revealed that a novel 14‐gene immune‐related signature determined by bioinformatics analysis may serve as a prognostic predictor for colorectal cancer, thereby contributing to patient individual treatment. However, there remains a paucity of evidence that mRNA serves as a biomarker to systematically evaluate HCC patients’ prognoses. Joint analysis of multi‐chip was conducted on HCC data to screen differentially expressed genes (DEGs) in GEO database. Based on the TCGA‐LIHC training cohort, we established a risk model and a nomogram for precisely prognostic prediction through univariate, LASSO and multivariate Cox regression analyses. This study contributes to precisely prognostic prediction and improvement of the survival rate of HCC patients.

MATERIALS AND METHODS

Data resources

Clinical information and mRNA expression data of HCC were downloaded from TCGA database on June 1, 2020. Four sets of chip data of HCC (GSE19665, GSE25097, GSE54236, and GSE84402) were obtained from GEO database. The mRNA expression and clinical information of HCC were obtained from GSE76427 and ICGC (LIRI‐JP) data sets (Table 1). The data were sorted to joint analysis of multi‐chip cohort (GSE19665, GSE25097, GSE54236, and GSE84402), training cohort (TCGA‐LIHC), and validation cohort (GSE76427, ICGC).
TABLE 1

Information of HCC related data set in the study

Data setData typePlatformNormalTumorFollow‐upCohort
GSE19665 mRNAGPL5701010NoStudy
GSE25097 mRNAGPL10687243268NoStudy
GSE54236 mRNAGPL64808180NoStudy
GSE84402 mRNAGPL5701414NoStudy
TCGAmRNAIllumina50374YesTraining
GSE76427 mRNAGPL10558/115YesValidation
ICGC (LIRI‐JP)mRNA//232YesValidation
Information of HCC related data set in the study

DEGs acquisition

Differential analysis was carried out on the tumor and the normal groups (|logFC|>2, FDR<0.05) using R package “limma” in the HCC gene expression data sets (GSE19665, GSE25097, GSE54236, and GSE84402) from GEO database. A volcano map was plotted using “ggplot” package. DEGs in four sets of chips were screened out using “RobustRankAggreg” (RRA) package (log|FC|>2, FDR<0.05) for subsequent analysis. The top 20 increased and decreased DEGs were selected to draw heat maps, respectively.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses

The obtained DEGs were subjected to GO and KEGG enrichment analyses using R package “clusterprofiler,” so as to analyze the involved biological functions and pathways. Statistical significance was set at q.value<0.05 and p.adjust<0.05.

Construction and evaluation of the risk model

After overlapping genes in the training cohort with DEGs got through multi‐chip joint analysis, “survival” package was used to perform univariate Cox regression analysis (p < 0.01). The DEGs that were conspicuously related to HCC patient's survival were screened out. To prevent the model from over‐fitting, R package “glmnet” was used to select the candidate feature genes by LASSO Cox regression analysis. Finally, multivariate Cox regression analysis was done to select the optimal feature genes for risk model construction. The risk scores were calculated following the formula: Here, Coef is the synergy coefficient, and x is the relative gene expression standardized by Z‐score. The patients were classified into high‐ and low‐risk groups with the median risk score as the cutoff value. In TCGA training cohort, the R package “survival” was implemented to draw survival curves of the two groups. The “survivalROC” package was used to plot receiver operation characteristic (ROC) curves. The area under the curve (AUC) of the 3‐year and 5‐year overall survival (OS) was calculated for efficacy evaluation of the risk model. The same method was undertaken in the GSE76427 and ICGC (LIRI‐JP) validation cohorts to assess the universality of the risk model. To evaluate whether the risk score is instrumental as an independent prognosticator of patients, univariate and multivariate Cox regression analyses were done, with risk score as a prognostic factor combined with clinical features of patients (age, gender, T staging, and stage).

Correlation analysis of expression of optimal feature genes and patient's prognosis

On account of gene expression data of HCC and normal liver tissue samples in TCGA database, along with the normal liver tissue samples in the GTEx database, box plots of expression of optimal feature genes in normal and tumor tissues were drawn on GEPIA website. Tissue samples were sorted into high‐ and low‐expression groups with the median expression of 13 optimal feature genes as the threshold. Later, survival curves were drawn on GEPIA website to validate the effect of gene expression on patient's prognosis.

Nomogram

R package “rms” was implemented to draw nomogram based on TCGA‐LIHC data for 3‐year and 5‐year OS prediction of HCC patients by combining clinical features (age, gender, T staging, and stage) and risk scores.

qRT‐PCR

Total RNA was isolated with miRNA Isolation Kit (Bioteke). Reverse transcription was processed to collect cDNA by QuantiTect Reverse Transcription kit (QIAGEN). mRNA expression level was measured by qRT‐PCR using QuantStudio TM 3(ThemorFisher). 2−ddCT method was used to calculate relative expressions of the genes. GAPDH was used as a reference gene for qRT‐PCR assay. Primers were listed in Table S1.

Clinical sample collection

For the clinical cohort, 20 HCC and 20 corresponding adjacent samples were collected from the HCC patients who received surgical treatment in Tangshan Gongren Hospital (Tangshan City, Hebei, China) from 2019.10 to 2021.11. All the patients were processed with the informed contents, and the experiments involved the clinical samples were accessed by the ethics committee of Tangshan Gongren Hospital.

RESULTS

Identification of DEGs in HCC

DEGs were selected by analyzing microarray data. A total of 194 DEGs were got by joint analysis of 4 sets of chips, containing 62 increased DEGs and 132 decreased DEGs (Figure 1A–D). The top 20 up‐regulated and down‐regulated DEGs were selected, respectively, to draw a heat map (Figure 1E).
FIGURE 1

Identification of DEGs in HCC. (A) Volcano map of genes in GSE19665 data set; (B) Volcano map of genes in GSE25097 data set; (C) Volcano map of genes in GSE54236 data set; (D) Volcano map of genes in GSE84402 data set; (E) Heat map of DEGs by joint analysis of multi‐chip. Green: decreased gene expression; red: increased gene expression

Identification of DEGs in HCC. (A) Volcano map of genes in GSE19665 data set; (B) Volcano map of genes in GSE25097 data set; (C) Volcano map of genes in GSE54236 data set; (D) Volcano map of genes in GSE84402 data set; (E) Heat map of DEGs by joint analysis of multi‐chip. Green: decreased gene expression; red: increased gene expression

GO and KEGG enrichment analyses

GO function annotation and KEGG pathway enrichment analyses were done on 194 DEGs obtained by joint analysis of multi‐chip. GO annotation exhibited that DEGs displayed enrichment in mitosis, nuclear division and chromosome segregation, and other biological processes (Figure 2A). KEGG enrichment analysis unraveled that DEGs mainly presented enrichment in signaling pathways like cell division and cell cycle (Figure 2B). These results demonstrated that DEGs affected HCC malignant progression by regulating cell proliferation.
FIGURE 2

GO and KEGG enrichment analyses. (A–B) Results of GO and KEGG enrichment analysis

GO and KEGG enrichment analyses. (A–B) Results of GO and KEGG enrichment analysis

Generation and evaluation of the risk model

Based on TCGA‐LIHC data, 185 DEGs obtained by joint analysis of multi‐chip in TCGA database were subjected to univariate Cox regression analysis (p < 0.01) (Table S2), wherein 83 DEGs that had significant correlations with patient's prognosis were got (Table S3). To reduce the complexity of the risk model, 23 candidate feature genes were selected by LASSO Cox regression analysis (Figure 3A, B) and were then subjected to multivariate Cox regression analysis. Then, 13 optimal feature genes were determined to establish a risk model (Figure 3C). To confirm its efficacy, the Kaplan–Meier method was used to perform survival analysis in the training cohort. The results exhibited that patients with low‐risk scores presented longer survival time than those with high‐risk scores (Figure 3D). Afterward, ROC curves were plotted. As shown in Figure 3E, AUC values of the patient's 3‐year and 5‐year OS were 0.783 and 0.828, respectively, suggesting that the risk model was effective. In addition, in the validation cohorts GSE76427 and ICGC (LIRI‐JP), the Kaplan–Meier survival curves presented that survival time of patients in the low‐risk score group was dramatically longer than that in the high‐risk score group (Figure 3F–G). The AUC values of the 3‐year and 5‐year OS were 0.698 and 0.694, respectively, in the GSE76427 validation cohort, (Figure 3H), while those in the ICGC (LIRI‐JP) validation cohort were 0.808 and 0.898, respectively (Figure 3I), indicating that the risk model was universal.
FIGURE 3

Construction and evaluation of risk model. (A) Coefficient spectrum of 83 genes got by LASSO Cox regression analysis; (B) Selection of the best penalty parameter of LASSO analysis; (C) Forest plot of the 13 optimal feature genes obtained by multivariate analysis; (D) The survival curves of high‐ and low‐risk groups of 13‐gene risk model in training cohort; (E) The ROC curves of 13‐gene risk model in the training cohort; (F) Survival curves of the high‐ and low‐risk groups of the 13‐gene risk model in GSE76427 validation cohort; (G) Survival curves of the two groups of the 13‐gene risk model in the ICGC (LIRI‐JP) validation cohort; (H0 The ROC curves of the 13‐gene risk model in the GSE76427 validation cohort. (I) The ROC curve of the 13‐gene risk model in the ICGC (LIRI‐JP) validation cohort; (J–K) Forest maps of univariate and multivariate analyses by combining clinical features (age, gender, T staging, and stage) and risk scores

Construction and evaluation of risk model. (A) Coefficient spectrum of 83 genes got by LASSO Cox regression analysis; (B) Selection of the best penalty parameter of LASSO analysis; (C) Forest plot of the 13 optimal feature genes obtained by multivariate analysis; (D) The survival curves of high‐ and low‐risk groups of 13‐gene risk model in training cohort; (E) The ROC curves of 13‐gene risk model in the training cohort; (F) Survival curves of the high‐ and low‐risk groups of the 13‐gene risk model in GSE76427 validation cohort; (G) Survival curves of the two groups of the 13‐gene risk model in the ICGC (LIRI‐JP) validation cohort; (H0 The ROC curves of the 13‐gene risk model in the GSE76427 validation cohort. (I) The ROC curve of the 13‐gene risk model in the ICGC (LIRI‐JP) validation cohort; (J–K) Forest maps of univariate and multivariate analyses by combining clinical features (age, gender, T staging, and stage) and risk scores To validate the independence of risk model, univariate and multivariate Cox regression analyses were completed on data in the training cohort combining the patient's clinical features (age, gender, T staging, and stage) and risk scores. Univariate Cox regression analysis manifested that: T staging, stage, and risk score dramatically affected patient's prognosis (p < 0.001) (Figure 3J). Multivariate Cox regression analysis disclosed that only risk score remarkably affected patient's prognosis (HR =1.145, 95% CI =1.105–1.185, p < 0.001) (Figure 3K). Thus, the risk model was potential to become a prognostic indicator independent of patient's clinical features.

Correlation between feature gene expression and prognosis

Box plots of levels of 13 genes in normal and tumor tissue were plotted on the GEPIA website. Relative to normal tissue, ANXA10, CHST4, PZP, RDH16, and RSPO3 levels were lower in tumor tissue. No remarkable difference in GZMK expression was noted in normal tissue and tumor tissue. The other seven genes were relatively upregulated in tumor tissue (Figure 4A). To identify their expression conditions in the collected cancer and normal tissues (20 cancer vs. 20 normal), we applied qRT‐PCR to measure the expression levels of the hub genes, where the expression conditions were consistent with the above bioinformatic prediction, except for CHST4 (Figure 4B). Moreover, tumor samples from TCGA database were sorted into high/low‐expression groups. Survival curves were used to probe whether gene expression levels influence patients’ prognosis. Results denoted that CCNB2, PZP, and RSPO3 levels were not notably associated with OS rate of patients, while the other 10 genes exhibited high correlations with OS rate of patients (Figure 5).
FIGURE 4

The expressions of the 13 features in the normal and tumor tissues. (A) The expression analyses were conducted based on the public database. The red box: gene expression in tumor tissue; gray box: gene expression in normal tissue; (B) The expression analyses were introduced using qRT‐PCR based on the collected samples

FIGURE 5

Correlation analysis of levels of 13 feature genes and the OS rate of patients

The expressions of the 13 features in the normal and tumor tissues. (A) The expression analyses were conducted based on the public database. The red box: gene expression in tumor tissue; gray box: gene expression in normal tissue; (B) The expression analyses were introduced using qRT‐PCR based on the collected samples Correlation analysis of levels of 13 feature genes and the OS rate of patients

Establishment of the nomogram

To facilitate precise determination of the 3‐year or 5‐year OS of patients with HCC, a nomogram was generated by combining the patient's clinical features (age, gender, T staging, and stage) and risk scores (Figure 6).
FIGURE 6

Nomogram of the 3‐year and 5‐year OS of patients with HCC

Nomogram of the 3‐year and 5‐year OS of patients with HCC

DISCUSSION

HCC is a heterogeneous disease with a considerably variable prognosis and high fatality rate. Although treatment for HCC has improved, clinical outcomes for HCC patients are unfavorable due to tumor recurrence and metastasis after hepatectomy. Regarding the great heterogeneity of HCC, the predictive efficacy of usual models is still unsatisfactory. Therefore, new prognostic biomarkers and prognostic models with higher accuracy are indispensable to contribute to precise clinical guidance for HCC. Joint analysis of multi‐chip was carried out on the microarray data in the GEO database to obtain 194 DEGs, which were for GO and KEGG enrichment analyses. GO enrichment analysis pointed out that DEGs showed enrichment in mitosis, nuclear division, and chromosome duplication. KEGG analysis clarified that DEGs displayed enrichment in cell division and cell cycle. Valery et al. reviewed that targeting mitosis could aid anti‐cancer therapy. Zhou et al.  manifested that LIMD1 is a master modulator of mitotic progression, and its dysregulation fosters tumorigenesis. These investigations demonstrated that DEGs may regulate HCC growth by affecting cancer cell proliferation. At present, studies revealed that combining multiple feature genes related to prognosis may exert favorable predictive efficacy than a single one. For example, Zuo et al. revealed a mighty 6‐gene signature for predicting disease‐free and OS in NSCLC. Sun et al. identified a macrophage‐associated gene signature capable of predicting therapeutic response and patient's prognosis of gliomas through multicellular gene network analysis. Zhang et al. exhibited a relation of 9‐gene signature and glycolysis, and the 9‐gene signature can predict prognosis and metastasis in LUAD patients. To establish a risk model with high accuracy and low complexity, this study identified 13 genes (GZMK, PBK, RDH16, HMMR, CHST4, DTL, FAM83D, PZP, CDCA8, PRC1, RSPO3, CCNB2, and ANXA10) with prognostic value of HCC through bioinformatics analysis. PBK is a new type of serine‐threonine kinase. Yang et al. reported that forced expression of PBK hastens HCC metastasis via ETV4‐uPAR pathway. RDH16 restrains HCC cell growth, clonogenicity, and cell motility. HMMR is a regulator for homeostasis, mitosis, and meiosis. Zhang et al. pointed out that HMMR fosters HCC proliferation, and it is a valuable target for therapy and prognostic prediction in HCC patients. The expression of MECA‐79 is associated with CHST4, and MECA‐79 is a valuable marker for the prognosis of gastric cancer. DTL depletion suppresses cell growth and induces senescence, thus reducing tumorigenesis. FAM83D triggered MEK/ERK pathway and hastens cell proliferation in HCC by stimulating cells to enter into the S phase. CDCA8 gene is a member of CDCA gene family, which has underlying diagnostic and prognostic values for HCC. Reducing PRC1 can prominently attenuate the malignant progression of HCC and liver damage. Down‐regulation of ANXA10 is correlated with malignant phenotype of HCC cells, vascular invasion, and tumor progression, leading to dismal prognosis. PZP is associated with HCC recurrence. Forced expression of CCNB2 is implicated in dismal prognosis of HCC patients. GZMK is increased following the administration of programmed death 1 receptor (PD‐1) checkpoint inhibitor MEDI0680, thereby affecting progression of cancer cells. , RSPO3 is a prognostic marker for bladder cancer and prostate cancer. , Up to now, these two genes are poorly reported in HCC. Collectively, genes screened by bioinformatics means in this study are implicated in HCC patients’ prognosis and are possible targets for HCC therapy. In summary, our research manifested that the risk assessment model on the basis of 13 optimal features genes was a stable instrument for predicting prognosis in HCC patients. The nomogram constructed by the risk assessment model sheds light on precise prognosis prediction of HCC patients. Since this article is only a pure bioinformatics analysis, further cell experiments and in vivo experiments are warranted to explore the effects of the 13 optimal feature gene expression on the malignant progression of HCC in detail, thereby providing molecular targets for HCC targeted therapy.

CONFLICTS OF INTEREST

All authors declare that they have no potential conflicts of interest.

AUTHOR CONTRIBUTIONS

DC and LW contributed to the study design. FQ conducted the literature search. JY acquired the data. DC and ZT wrote the article. FQ and XL performed data analysis and drafted. LW revised the article. XL gave the final approval of the version to be submitted.

CONSENT FOR PUBLICATION

Not applicable. Table S1 Click here for additional data file. Table S2 Click here for additional data file. Table S3 Click here for additional data file.
  33 in total

1.  LIMD1 phosphorylation in mitosis is required for mitotic progression and its tumor-suppressing activity.

Authors:  Jiuli Zhou; Lin Zhang; Wei Zhou; Yuanhong Chen; Yufeng Cheng; Jixin Dong
Journal:  FEBS J       Date:  2019-01-16       Impact factor: 5.542

2.  Identification of a six-gene signature predicting overall survival for hepatocellular carcinoma.

Authors:  Gao-Min Liu; Hua-Dong Zeng; Cai-Yun Zhang; Ji-Wei Xu
Journal:  Cancer Cell Int       Date:  2019-05-21       Impact factor: 5.722

3.  A robust six-gene prognostic signature for prediction of both disease-free and overall survival in non-small cell lung cancer.

Authors:  Shuguang Zuo; Min Wei; Hailin Zhang; Anxian Chen; Junhua Wu; Jiwu Wei; Jie Dong
Journal:  J Transl Med       Date:  2019-05-14       Impact factor: 5.531

4.  RSPO3 is a prognostic biomarker and mediator of invasiveness in prostate cancer.

Authors:  Aruz Mesci; Fabrice Lucien; Xiaoyong Huang; Eric H Wang; David Shin; Michelle Meringer; Christianne Hoey; Jessica Ray; Paul C Boutros; Hon S Leong; Stanley K Liu
Journal:  J Transl Med       Date:  2019-04-15       Impact factor: 5.531

5.  Anti-PD-1 monoclonal antibody MEDI0680 in a phase I study of patients with advanced solid malignancies.

Authors:  Aung Naing; Jeffrey Infante; Sanjay Goel; Howard Burris; Chelsea Black; Shannon Marshall; Ikbel Achour; Susannah Barbee; Rena May; Chris Morehouse; Kristen Pollizzi; Xuyang Song; Keith Steele; Nairouz Elgeioushi; Farzana Walcott; Joyson Karakunnel; Patricia LoRusso; Amy Weise; Joseph Eder; Brendan Curti; Michael Oberst
Journal:  J Immunother Cancer       Date:  2019-08-22       Impact factor: 13.751

6.  Identification of potential hub genes related to the progression and prognosis of hepatocellular carcinoma through integrated bioinformatics analysis.

Authors:  Xiudao Song; Rao Du; Huan Gui; Mi Zhou; Wen Zhong; Chenmei Mao; Jin Ma
Journal:  Oncol Rep       Date:  2019-11-06       Impact factor: 3.906

7.  Significant association between high serum CCL5 levels and better disease-free survival of patients with early breast cancer.

Authors:  Yukie Fujimoto; Natsuko Inoue; Koji Morimoto; Takahiro Watanabe; Seiichi Hirota; Michiko Imamura; Yosuke Matsushita; Toyomasa Katagiri; Haruki Okamura; Yasuo Miyoshi
Journal:  Cancer Sci       Date:  2019-12-16       Impact factor: 6.716

8.  Identification and construction of a 13-gene risk model for prognosis prediction in hepatocellular carcinoma patients.

Authors:  Daming Cheng; Libing Wang; Fengzhi Qu; Jingkun Yu; Zhaoyuan Tang; Xiaogang Liu
Journal:  J Clin Lab Anal       Date:  2022-04-14       Impact factor: 3.124

Review 9.  Hyaluronan Mediated Motility Receptor (HMMR) Encodes an Evolutionarily Conserved Homeostasis, Mitosis, and Meiosis Regulator Rather than a Hyaluronan Receptor.

Authors:  Zhengcheng He; Lin Mei; Marisa Connell; Christopher A Maxwell
Journal:  Cells       Date:  2020-03-28       Impact factor: 6.600

View more
  1 in total

1.  Identification and construction of a 13-gene risk model for prognosis prediction in hepatocellular carcinoma patients.

Authors:  Daming Cheng; Libing Wang; Fengzhi Qu; Jingkun Yu; Zhaoyuan Tang; Xiaogang Liu
Journal:  J Clin Lab Anal       Date:  2022-04-14       Impact factor: 3.124

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.