Literature DB >> 35118360

Field cancerization profile-based prognosis signatures lead to more robust risk evaluation in hepatocellular carcinoma.

Lu Huang¹, Zhou Songyang^1,2, Zhiming Dai³, Yuanyan Xiong¹.

Abstract

The development of reliable biomarkers has been an urgent issue as well as a hot spot of research on the diagnosis, treatment, and prognostic evaluation of hepatocellular carcinoma (HCC). Here, we established and validated two field cancerization profile-based prognostic signatures (gene expression score [GES] and immune score [IS]) for HCC. Our study confirmed that field cancerization profile-based models outperform conventional models on risk evaluation, offering insights for further studies on prognostic model construction. The nomogram constructed by combining GES, IS, and TNM stage was proved effective in improving the individualized prediction of the overall risk of patients. Distinct peritumoral characteristics were observed in several immune cells (e.g., CD8 T cells and dendritic cells), which might explain the diversified prognosis and clinical benefit of immunotherapy. Moreover, a series of drug targets, prognosis-associated genes, and pathways were identified, which may contribute to molecular mechanism studies as well as therapeutic target development of HCC.

Entities: Chemical

Keywords: Bioinformatics; Expression study; Omics

Year: 2022 PMID： 35118360 PMCID： PMC8800113 DOI： 10.1016/j.isci.2022.103747

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

Hepatocellular carcinoma (HCC) is the fourth most common cause of cancer-related death worldwide, and its incidence is steadily increasing throughout the world (Bray et al., 2018; Yang et al., 2019). Unlike other solid malignancies, most HCC arises in a background of chronic liver disease, variably complicated with hepatitis and fibrosis, which makes HCC an occult, highly malignant disease with a high recurrence rate and poor prognosis (Zhu et al., 2013). Therefore, early screening and prevention, as well as outcome monitoring are of great importance to patient survival. The development of reliable biomarkers has been an urgent issue as well as a hot spot of research on the diagnosis, treatment, and prognostic evaluation of HCC. With advances in global molecular profiling based on high-throughput methods, gene expression profiling came as a powerful diagnostic and prognostic tool in molecular medicine. Comparing to traditional staging system, the access to gene expression features enabled the development of specific biomarkers, thus providing new opportunities to disentangle the diversity and heterogeneity of cancers (Dakubo et al., 2007). Recently, gene expression profiling has been widely applied to the prediction of prognosis and response to therapy in a range of cancers, e.g., prostate cancer and breast cancer (Hofmann et al., 2002; Sorlie et al., 2003). In addition to gene expression profile, the immune microenvironment, which indicates the infiltration level of immune cells (e.g., T cells, B cells, and macrophages) was also found to be an indicator of clinical outcome and has been applied as biomarkers for multiple cancers (Bremnes et al., 2016; Song et al., 2019; Wang et al., 2019). However, despite that a multitude of prognostic signatures have been reported in HCC, none of these has become a tangible tool in clinical decision making owing to poor clinical reproducibility (Li et al., 2017; Zucman-Rossi et al., 2015). Therefore, it is crucial to identify new prognostic markers for HCC, thereby promoting individualized therapeutics for patients. Owing to an increasing interest on tumor-host interactions, in-depth studies on the rewiring of biogenesis and metabolic and signaling pathways in an integrated level is arousing more and more interest in the past decade. The concept “field cancerization” was first proposed by Slaughter et al. by referring to the phenomenon that the normal tissue peripheral to malignant cells was histologically abnormal and therefore was part of the transformed cells in a particular tumor field, which may consequently lead to the recurrence of the local tumor (Dakubo et al., 2007; Slaughter et al., 1953). According to field cancerization theory, cancerogenesis is characterized as a multi-step and multi-stage process. Before the appearance of cancerous cells, the surrounding tissue might have suffered from dysregulation at the molecular or phenotypic level, resulting in cancerization fields. Consistent with this, related evidence in HCC had verified the independent prognostic value of peritumor tissue and its contribution to HCC, suggesting its implication in de novo multi-centric occurrence of HCC in cirrhotic tissue (Hoshida et al., 2008). Therefore, the characteristic of peritumor tissue is pivotal for the comprehensive understanding of HCC, and the long neglect of peritumor characteristics may contribute to the misclassification of patients with HCC as well as the lack of reproducibility in conventional prognostic signatures. In this study, we conducted prognostic model construction in HCC. We analyzed the gene expression and immune cell infiltration landscape from the perspective of field cancerization, with the profile from both tumor and its corresponding peritumor combined for feature selection. We attempt to acquire the following results: (1) develop robust signatures that are applicable in the clinical application; (2) prove the advantage of field cancerization profile in prognostic model construction; (3) compare the features (gene expression, immune cell infiltration) identified in tumor and peritumor and analyze their association with patient survival; (4) identify peritumoral genes and pathways that play a role in tumorigenesis and therapy of HCC.

Results

Construction of gene expression score signature

The experimental design of our study is shown in Figure 1. There were a total of 457 non-metastatic HCC specimens from three independent studies, with the GSE14520 cohort as the training group, whereas the other two (LIHC cohort and LIRI cohort) served as the validation group (Figure 1). According to the result of univariate cox regression, we identified 3,816 genes (1,876 in tumor and 1,940 in peritumor) associated with the survival of patients from the GSE14520 cohort (p < 0.05). Then we adopted a LASSO Cox regression model to build a prognosis signature, with which we can establish a regression model with an optimal subset of variables from high-dimensional data, avoiding the multi-collinearity of the model interference and improving the robustness of the model. Finally, we obtained a gene expression score (GES) model with 70 genes (44 in tumor and 26 in peritumor), and the GES risk level can be estimated from the expression level and its corresponding coefficients (Figure 2A, Table S1). The GES was associated with worse overall survival (OS). Based on the result of maxstat, we classified the GSE14520 cohort into a risk high and a risk low group with an optimum cutoff value of 0.067 (Figures 2B–2D). The Kaplan-Meier survival analysis demonstrated that patients in the GES-low group had significantly longer OS than those in the GES-high group (hazard ratio [HR] = 0.08, 95% confidence interval [CI] = 0.04–0.14, log rank test p < 0.001; Figure 2E). Examining the accuracy of the GES model with receiver operating characteristic (ROC) curves analysis, we found it had an area under curve (AUC) = 0.90 and 0.92 in predicting OS at 24 and 36 months, respectively, higher than popular clinical prognosis markers, such as AFP levels (24 months AUC = 0.64 (0.56–0.71), 36 months AUC = 0.59 (0.52–0.67)) and TNM staging (24 months AUC = 0.77 (0.70–0.85), 36 months AUC = 0.74 (0.66–0.81)) (Figure 2F).

Figure 1

Study design

Our main objective was to explore the role of field cancerization profile in the risk evaluation of HCC. Prognostic models were established using a two-step procedure. Gene expression and immune cell fraction profile were applied to the model construction separately. The training cohort was derived from GSE14520. Two independent cohorts, the LIHC cohort from TCGA and the LIRI cohort from ICGA, were used to evaluate the external performance of each model. Only the patients with expression information profiles from both tumor and peritumor were included in the study. GES, gene expression score; IS, immune score; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic; PEC, prediction error curve; DCA, decision curve analysis; GSVA, gene set variation analysis.

Figure 2

Establishment of GES signature based on field cancerization profile of GSE14520 cohort

(A) Expression profiles of the 70 genes across GSE14520 cohort;

(B) GES signature risk score distribution;

(D) Standardized log rank statistic across different cutpoints; the dotted line indicates the selected optimum GES cutpoint;

(E) Kaplan-Meier estimates of survival based on the GES signature in the GSE14520 cohort;

(F) ROC curves indicating the performance of GES in predicting 24- and 36-month OS of the patient in the GSE14520 cohort.

Study design Our main objective was to explore the role of field cancerization profile in the risk evaluation of HCC. Prognostic models were established using a two-step procedure. Gene expression and immune cell fraction profile were applied to the model construction separately. The training cohort was derived from GSE14520. Two independent cohorts, the LIHC cohort from TCGA and the LIRI cohort from ICGA, were used to evaluate the external performance of each model. Only the patients with expression information profiles from both tumor and peritumor were included in the study. GES, gene expression score; IS, immune score; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic; PEC, prediction error curve; DCA, decision curve analysis; GSVA, gene set variation analysis. Establishment of GES signature based on field cancerization profile of GSE14520 cohort (A) Expression profiles of the 70 genes across GSE14520 cohort; (B) GES signature risk score distribution; (C) Relationship between overall survival days and survival status of each patient as sorted by GES; (D) Standardized log rank statistic across different cutpoints; the dotted line indicates the selected optimum GES cutpoint; (E) Kaplan-Meier estimates of survival based on the GES signature in the GSE14520 cohort; (F) ROC curves indicating the performance of GES in predicting 24- and 36-month OS of the patient in the GSE14520 cohort. The final GES signature includes 44 tumoral genes and 26 peritumoral genes. The heatmap combining the expression of selected genes and the tumor/peritumor origin of each gene indicated these genes in the GSE14520 cohort grouped into two clusters (Figure 2A). Cluster I showed a negative correlation between gene expression and GES score, including 18 tumoral genes and 6 peritumoral genes. Cluster II showed a positive correlation between gene expression and GES score, including 26 tumoral genes and 20 peritumoral genes. The GES signature includes three immune-associated genes (CYLD, IRF1, and CXCL10) in the tumor, and five (PIBF1, IDO1, NCAM1, ICOS, and CEBPD) in peritumor, which was consistent with the immunogenic characteristic of HCC.

GES gives more robust prediction on patient survival

To confirm the robustness of the GES model in different populations, we validated its performance in two validation cohorts with the same cutpoint. Our 70 gene-based GES model could effectively predict the OS of patients from each of the three cohorts (GSE14520 cohort: HR = 0.08, 95% CI = 0.04-0.14, p < 0.001; LIHC cohort: HR = 0.23, 95% CI = 0.10-0.49, p < 0.001; LIRI cohort: HR = 0.33, 95% CI = 0.14-0.78, log rank test p = 0.012) (Figures 3A and 3B).

Figure 3

Validation of GES signature and comparison with other signatures

(A and B) Kaplan-Meier estimates of survival based on the GES signature in the (A) LIHC cohort and (B) LIRI cohort;

(C and D) ROC curves indicating the performance of GES in predicting 24- and 36-month OS of the patient in the (C) LIHC cohort and (D) LIRI cohort;

(E) Performance in the prognostic classification of patients from the LIRI cohort using models constructed from 1,000 times replicated sampling of the GSE14520 cohort;

(F) PEC analysis of GES and other HCC prognostic signatures in the LIHC cohort. Apparent error (AE) and 10-fold cross-validated cumulative prediction error (PE) were computed using Kaplan-Meier estimation as reference. GES signature was compared with that from Yuan et al., Kim et al., Jiang et al., Kong et al., Liu et al., and four models constructed in this study.

Validation of GES signature and comparison with other signatures (A and B) Kaplan-Meier estimates of survival based on the GES signature in the (A) LIHC cohort and (B) LIRI cohort; (C and D) ROC curves indicating the performance of GES in predicting 24- and 36-month OS of the patient in the (C) LIHC cohort and (D) LIRI cohort; (E) Performance in the prognostic classification of patients from the LIRI cohort using models constructed from 1,000 times replicated sampling of the GSE14520 cohort; (F) PEC analysis of GES and other HCC prognostic signatures in the LIHC cohort. Apparent error (AE) and 10-fold cross-validated cumulative prediction error (PE) were computed using Kaplan-Meier estimation as reference. GES signature was compared with that from Yuan et al., Kim et al., Jiang et al., Kong et al., Liu et al., and four models constructed in this study. ROC analyses were performed to study the sensitivity and specificity of survival prediction of the GES model. The AUC indicates that, similar to the performance in the GSE14520 cohort (Figure 2F), the performance of the GES signature was also fully verified in two validation cohorts (LIHC cohort, 24-month survival prediction AUC = 0.70 (95% CI 0.54–0.87), 36-month survival prediction AUC = 0.79 (95% CI 0.64–0.93); LIRI cohort, 24-month survival prediction AUC = 0.65 (95% CI 0.54–0.76), 36-month survival prediction AUC = 0.59 (95% CI 0.46–0.71)) (Figures 3C and 3D). To validate the superiority of the GES model over models constructed from tumor-only or peritumor-only profiles, we constructed two types of control models. Type I control model was the tumoral or peritumoral part of GES (namely, TGS1 and PGS1), consisting of the tumor or peritumoral genes from GES with the same coefficient. Type II control model was built from the tumor or peritumoral expression profile (namely, TGS2 and PGS2), with marker genes selected using the same two-step procedure as GES. These two types of control models were necessary as they test the robustness of the field cancerization method in a complementary way. According to the result, although four models could significantly predict the survival of patients in the GSE14520 cohort (log rank test p < 0.001), none of them succeeded in splitting each validation cohort into groups with a significant difference in OS (Figure S1). As for PGS1, its optimal cutpoint determined by maxstat was unable to separate the patients from the LIHC cohort into two datasets (Figure S1). We further validate the superiority of field cancerization profile in model construction in an unbiased way. With 1,000 training sets subsetted from the GSE14520 cohort, we reconstructed the five models (GES, TGS1, TGS2, PGS1, and PGS2) and each was validated in two validation cohorts. During each validation, we reassessed the optimum cutpoint of each model and calculated the p value between the survival of two groups separated by the cutpoint. The p values of the GES model were compared with that of others using the pairwise Wilcoxon rank-sum test. According to the result in the LIRI cohort, the p values of GES models were significantly lower than that of other models (GES versus TGS1 p = 5.82e-19; GES versus TGS2 p = 5.78e-39; GES versus PGS1 p = 1.09e-05; GES versus PGS2 p = 2.67e-14, Figure 3E). Similar results were also obtained in the LIHC cohort (GES versus TGS1 p = 5e-24; GES versus TGS2 p = 8.58e-12; GES versus PGS1 p = 0.0182; GES versus PGS2 p = 0.361, Figure S2). These results proved that the GES model has better performance over models constructed from tumor-only or peritumor-only profiles. With the development of microarray and RNA-seq technologies over the last decade, increasing prognostic gene expression signature of HCC had been published. The GES signature developed in our study was further compared against five reported HCC gene expression signatures. Through literature search, five HCC gene expression signatures that contained validation cohorts were selected, including the 6-gene signature from Yuan et al. (2017), 65-gene signature from Kim et al. (2012), 5-gene signature from Jiang et al. (2019), 3-gene signature from Kong et al. (2019), and 7-gene signature from Li et al. (2020). The prediction accuracy was analyzed on the LIHC cohort. Brier score, which indicates the prediction error over time was calculated in each signature (Mogensen et al., 2012). Among all these signatures, the GES had the lowest apparent error (AE) (0.195) and 10-fold cross-validated 40 months' cumulative prediction error (PE) (0.211), suggesting a higher prediction accuracy in GES signature than the others. Hence, the GES signature seems to be more promising in predicting the prognostic OS outcome of patients with HCC (Figure 3F). We obtained, in addition to gene expression information, a series of clinical indices for each patient, e.g., age, gender, TNM stage, hepatitis B virus (HBV), alanine aminotransferase (ALT), multinodular, cirrhosis, alpha-fetoprotein (AFP), and tumor size. Each clinical variable was applied for univariate Cox regression analysis, and we found that patients from the TNM stage II and TNM stage III subgroups got a significantly shorter OS than those from the TNM stage I subgroup (TNM stage II: HR = 2.04, 95% CI = 1.15–3.60, log-rank test p = 0.015; TNM stage III: HR = 5.89, 95% CI = 3.30–10.51, log-rank test p < 0.001) (Figure 4). Several clinical features (e.g., TNM stage and AFP level) were significantly different between the GES-high group and GES-low group (Table S2). Furthermore, GES levels were included in multivariate Cox regression to test the independence of the GES score over clinical indices. After correcting for TNM staging, age, gender, and other clinicopathological characteristics, the GES score remained a significant prognostic risk factor (GSE14520 cohort, HR = 0.07, 95% CI = 0.04–0.13, log-rank test p < 0.001; LIHC cohort, HR = 0.14, 95% CI = 0.04-0.46, log rank test p = 0.001; LIRI cohort, HR = 0.41, 95% CI = 0.17–0.99, log rank test p = 0.049), indicating the independence of GES signature in predicting patient survival.

Figure 4

Forest plots of univariate and multivariate Cox regression based on GES classifier and clinical risk factors on three cohorts

Solid squares represent the HR (HR) of death, and horizontal lines represent the 95% CIs (CI). All p values were calculated using Cox regression hazards analysis. An AFP cutoff of 300 ng/mL, ALT of 50 U/L, and tumor size of 5 cm were used in Cox regression analysis and are clinically relevant values used to distinguish patient survival. HBV, hepatitis B virus; ALT, alanine aminotransferase; AFP, alpha fetoprotein; CC, chronic carrier; AVR-CC, active viral replication chronic carrier.

Forest plots of univariate and multivariate Cox regression based on GES classifier and clinical risk factors on three cohorts Solid squares represent the HR (HR) of death, and horizontal lines represent the 95% CIs (CI). All p values were calculated using Cox regression hazards analysis. An AFP cutoff of 300 ng/mL, ALT of 50 U/L, and tumor size of 5 cm were used in Cox regression analysis and are clinically relevant values used to distinguish patient survival. HBV, hepatitis B virus; ALT, alanine aminotransferase; AFP, alpha fetoprotein; CC, chronic carrier; AVR-CC, active viral replication chronic carrier.

Immune cell infiltration spectrum across tumor and peritumor of HCC

With the CIBERSORTx algorithm, we systematically evaluated the fractions of 22 immune cells in both tumor and peritumor samples (Figure 5). (Newman et al., 2019) Of the 22 immune cell components, 16 immune cells got a biased immune fraction (p < 0.05, Wilcoxon test) between the tumor group and peritumor group. Nine immune cells (plasma cells, naive CD4 T cells, activated memory CD4 T cells, regulatory T cells, macrophages [M0], activated dendritic cells, resting mast cells, neutrophils, and resting NK cells) got a significantly higher infiltration level in the tumor, and seven immune cells (CD8 T cells, T cells follicular helper, resting dendritic cells, macrophages (M1), activated NK cells, activated mast cells, and monocytes) got a significantly higher infiltration level in peritumor. According to the correlation analysis, the pairwise association of infiltration level between immune cells was generally consistent in tumor and peritumor, with several exceptions (Figure 5A). For example, the infiltration level of macrophages M0 in tumor was negatively associated with tumoral activated dendritic cells, and it turned out to be a positive association in peritumor. Together these results revealed the distinct immuno-characteristics in peritumor tissue, which might contribute to a more precise prognosis prediction.

Figure 5

Immune cell infiltration profiles in tumor and peritumor and their association with patient prognosis

(A) Related fraction profile of 22 infiltrated immune cells in tumor and peritumor GSE14520 cohort. For each group, the immune cell fraction in each patient was estimated by CIBERSORTx, and patients were sorted by their GES level. Significant differences in cell type fractions between tumor and peritumor are marked with an ∗ indicating p < 0.05 (Wilcoxon rank-sum test). The color of circles in each grid indicates the significance for the pairwise test of Spearman correlation, insignificant pairs were left empty;

(B) Forest plots showing the HRs (HR) and p values of each immune cell associated with patient prognosis in tumor and peritumor. HRs were calculated by univariate Cox regression, with grouping based on the optimum cutpoint determined by R package maxstat. Solid squares represent the HR of beneficial effect, and horizontal lines represent the 95% CIs (CI).

Immune cell infiltration profiles in tumor and peritumor and their association with patient prognosis (A) Related fraction profile of 22 infiltrated immune cells in tumor and peritumor GSE14520 cohort. For each group, the immune cell fraction in each patient was estimated by CIBERSORTx, and patients were sorted by their GES level. Significant differences in cell type fractions between tumor and peritumor are marked with an ∗ indicating p < 0.05 (Wilcoxon rank-sum test). The color of circles in each grid indicates the significance for the pairwise test of Spearman correlation, insignificant pairs were left empty; (B) Forest plots showing the HRs (HR) and p values of each immune cell associated with patient prognosis in tumor and peritumor. HRs were calculated by univariate Cox regression, with grouping based on the optimum cutpoint determined by R package maxstat. Solid squares represent the HR of beneficial effect, and horizontal lines represent the 95% CIs (CI). According to the analysis of association with clinical outcomes, 21 immune cells (13 in tumor and 8 in peritumor) appear to be significantly correlated with patient survival in univariate Cox regression (Figure 5B). For the 13 tumor prognosis-associated immune cells, the high infiltration level of 8 cells (B cells naive, T cells CD8, T cells follicular helper, T cells regulatory, NK cells activated, macrophages M1, and macrophages M2) is associated with better prognosis, as opposed to the performance of plasma cells, T cells CD4 naive, macrophages M0, dendritic cells resting, eosinophils, and neutrophils. For the eight peritumor prognosis-associated immune cells, five cells (B cells naive, T cells CD8, dendritic cells resting, dendritic cells activated, and mast cells resting) got a positive association between infiltration level and prognosis, as opposed to the performance of macrophages M0, mast cells activated, and eosinophils. By comparison, several distinctive features were observed in peritumoral immune cells. For example, macrophages M1 and macrophages M2 in tumor were both significantly associated with high OS of patients (p < 0.05); however, this was not observed in peritumor immune cells. T cells CD8 were observed to have a beneficial effect on patient survival (p = 0.02), and a higher significance was observed in peritumor (p < 0.001), suggesting that T cells CD8 may play a greater role in peritumor (Figure 5B). Moreover, the fraction of tumoral dendritic cells component was significantly (highly significant in dendritic cells resting (HR = 0.4 (0.23–0.69), p < 0.001) and marginally significant in dendritic cells activated (HR = 0.56 (0.30–1.04), p value = 0.067)) and positively associated with poor prognosis, suggesting a deleterious effect on clinical outcome, whereas a totally opposite beneficial effect was observed in peritumor samples (dendritic cells resting [HR = 1.97 (1.27–3.06), p value = 0.003]; dendritic cells activated [HR = 2.92 (1.07–7.98), p value = 0.037]) (Figure 5B). Furthermore, we explored the interactions between gene expression and prognosis-associated immune cells ratios. By GO enrichment analysis (Figure S3–S5), the gene sets that were significantly associated with the level of infiltration of each cell type were analyzed. For T cells CD8, most enriched GO terms were shared between tumor and peritumor (e.g., T cell activation, T cell receptor signaling pathway, and cell chemotaxis), which may explain their similar role in HCC. In contrast, several unique features (e.g., gene sets associated with neutrophil activation and neutrophil degranulation) were observed in peritumoral dendritic cells (resting). These results demonstrated that aberrant immune infiltration in not only the tumor itself, but also in the peritumor, might play important roles in the development of HCC. By using a LASSO Cox regression, an immune score (IS) prognostic model was built from 22 survival-associated immune cells at the field cancerization level (Figure 6A, Table S3). Of the 22 candidate immune cells, 8 tumoral immune cells and 4 peritumor immune cells were selected for model construction. Immune scores were then calculated for each patient using the infiltration level of each cell weighted by their regression coefficient (Table S3). The IS was associated with worse OS. We used R package maxstat to assess the optimum cutoff for immune scores in the GSE14520 cohort, which results in the highest log-rank statistic at 1.96 (Figures 6B–6D). Patients with HCC in the GSE14520 cohort were then divided into risk low and risk high immune score groups. Of the 12 immune cells, the higher fraction of 4 was beneficial to clinical outcome and deleterious effect was observed in 7 immune cells. Kaplan-Meier survival analysis demonstrated that patients in the IS-low group had significantly longer OS than those in the IS-high group (HR = 0.28, 95% CI = 0.18–0.45, log-rank test p < 0.001; Figures 6E and 6F).

Figure 6

Establishment of IS signature based on field cancerization profile of the GSE14520 cohort

(A) Infiltration level of the 12 immune cells across the GSE14520 cohort;

(B) IS signature risk score distribution;

(D) Standardized log rank statistic across different cutpoint, the dotted line indicates the selected optimum IS cutpoint;

(E) Kaplan-Meier estimates of survival based on the IS signature in the GSE14520 cohort;

(F) ROC curves indicating the performance of IS in predicting 24- and 36-month OS of the patient in the GSE14520 cohort.

Establishment of IS signature based on field cancerization profile of the GSE14520 cohort (A) Infiltration level of the 12 immune cells across the GSE14520 cohort; (B) IS signature risk score distribution; (C) Relationship between survival days and survival status of each patient as sorted by IS; (D) Standardized log rank statistic across different cutpoint, the dotted line indicates the selected optimum IS cutpoint; (E) Kaplan-Meier estimates of survival based on the IS signature in the GSE14520 cohort; (F) ROC curves indicating the performance of IS in predicting 24- and 36-month OS of the patient in the GSE14520 cohort.

Validation of the IS signature

As with GES signature, we sought to validate the robustness of IS signature from several aspects. In the LIHC cohort, the patients could be stratified according to their IS score and OS time was markedly elongated in the low-risk group (HR = 0.28, 95% CI = 0.18–0.45, log-rank test p = 0.001, Figure 7A). The risk score was further validated in the LIRI cohort (HR = 0.33, 95% CI = 0.18–0.62, log-rank test p < 0.001, Figure 7B). We next carried out ROC analysis to assess the performance of IS signature in predicting 24-, and 36-month OS of patients from each cohort. The AUC of IS signature was high in all three cohorts (GSE14520 cohort, 24-month AUC = 0.75 (0.67–0.83), 36-month AUC = 0.76 (0.68–0.83); LIHC cohort, 24-month AUC = 0.67 (0.50-0.83), 36-month AUC = 0.83 (0.70–0.96); LIRI cohort, 24-month AUC = 0.66 (0.55–0.77), 36-month AUC = 0.65 (0.53–0.77). Figure 6F, Figure 7C and 7D). These results proved the feasibility to determine patient prognosis based on immune cell infiltration. More importantly, our work demonstrated the diagnostic potential of IS signature for survival prediction in clinical application.

Figure 7

Performance validation of IS signature

(A and B) Kaplan-Meier estimates of survival based on the IS signature in the (A) LIHC cohort and (B) LIRI cohort; (C-D) ROC curves indicating the performance of IS in predicting 24- and 36-month OS of the patient in the (C) LIHC cohort and (D) LIRI cohort. ROC, receiver operating characteristic; OS, overall survival.

Performance validation of IS signature (A and B) Kaplan-Meier estimates of survival based on the IS signature in the (A) LIHC cohort and (B) LIRI cohort; (C-D) ROC curves indicating the performance of IS in predicting 24- and 36-month OS of the patient in the (C) LIHC cohort and (D) LIRI cohort. ROC, receiver operating characteristic; OS, overall survival. We further compared the performance of IS model with four alternative models (TIS1, TIS2, PIS1, and PIS2). Four models with their corresponding cutpoints were able to stratify the GSE14520 cohort into two groups, and the patients from the low-risk group all got a significantly higher OS (p < 0.001). However, they failed to make proper stratification on the OS of patients from the LIHC and LIRI cohorts, with no significant difference in OS in groups divided by each model (TIS1, LIHC p = 0.097, LIRI p = 0.051; TIS2, LIHC p = 0.11, LIRI p = 0.11; PIS1, LIHC p = 0.86, LIRI p = 0.82; PIS2, LIHC p = 0.36, LIRI p = 0.5, Figure S6), indicating the insufficiency of model construction from tumor-only and peritumor-only profiles. We next assessed the prognostic association between IS signature and other known clinical risk factors. In the analysis of univariate Cox regression, the IS signature was found to be a significant indicator for OS, together with several already well-known risk factors (e.g., TNM stage, AFP, tumor size). Furthermore, we further analyzed the independence of IS by multivariate Cox regression. After correcting for TNM staging, age, gender, and other clinicopathological characteristics, the IS score remained a significant prognostic risk factor (GSE14520 cohort, HR = 0.33, 95% CI = 0.20–0.55, log-rank test p < 0.001; LIHC cohort, HR = 0.31, 95% CI = 0.12–0.82, p = 0.018; LIRI cohort, HR = 0.38, 95% CI = 0.20–0.75, p = 0.005, Figure 8), indicating the independence of IS signature in predicting patient survival. Taken together, these findings suggest that the risk score retains its prognostic relevance even after the classical clinicopathological prognostic features have been taken into account.23

Figure 8

Forest plots of univariate and multivariate Cox regression based on IS classifier and clinical risk factors on three cohorts

Forest plots of univariate and multivariate Cox regression based on IS classifier and clinical risk factors on three cohorts Solid squares represent the HR (HR) of death, and horizontal lines represent the 95% CIs (CI). All p values were calculated using Cox regression hazards analysis. An AFP cutoff of 300 ng/mL, ALT of 50 U/L, and tumor size of 5 cm were used in Cox regression analysis and are clinically relevant values used to distinguish patient survival. HBV, hepatitis B virus; ALT, alanine aminotransferase; AFP, alpha fetoprotein; CC, chronic carrier; AVR-CC, active viral replication chronic carrier.

Construction and evaluation of a prognostic nomogram

To provide a clinically relevant quantitative method to predict the probability of 24- and 36-month OS in patients with HCC, we constructed a nomogram that integrated the GES, IS, and TNM stages (Figure 9A). Calibration plots showed that the nomograms performed well compared with the performance of an ideal model (Figure 9B). The C-index of the nomogram was 0.834 (SE = 0.02).

Figure 9

Construction and validation of a nomogram to predict 24- and 36-month survival probability in HCC

(A) The clinico-molecular nomogram integrated the GES classifier, IS classifier, and TNM stage. The sum of the points calculated in each component gives a linear predictor to 24- and 36-month survival.

(B) The calibration curve shows the agreement between predicted and observed outcomes; the 45-degree line represents perfect prediction.

(C and D) Decision curve analysis (DCA) comparing the 24- (C) and 36- (D) month survival prediction performance of the nomogram with that of GES classifier, IS classifier, and TNM stage alone;

(E–J) Time-dependent ROC curves by nomograms for 24- and 36-month overall survival probability in the GSE14520 cohort (E, F), LIHC cohort (G, H), and LIRI cohort (I, J). ROC, receiver operating characteristic; OS, overall survival.

Construction and validation of a nomogram to predict 24- and 36-month survival probability in HCC (A) The clinico-molecular nomogram integrated the GES classifier, IS classifier, and TNM stage. The sum of the points calculated in each component gives a linear predictor to 24- and 36-month survival. (B) The calibration curve shows the agreement between predicted and observed outcomes; the 45-degree line represents perfect prediction. (C and D) Decision curve analysis (DCA) comparing the 24- (C) and 36- (D) month survival prediction performance of the nomogram with that of GES classifier, IS classifier, and TNM stage alone; (E–J) Time-dependent ROC curves by nomograms for 24- and 36-month overall survival probability in the GSE14520 cohort (E, F), LIHC cohort (G, H), and LIRI cohort (I, J). ROC, receiver operating characteristic; OS, overall survival. As expected, the result of decision curve analysis showed that the nomogram got a higher accuracy of prediction on 24- and 36-month OS, indicating that the nomogram was more clinically useful than using GES classifier, IS classifier, and TNM stage alone (Figures 9C and 9D). ROC analysis to compare the sensitivity and specificity of the nomogram comprising the IS signature with clinicopathological risk factors showed that overall survival was more accurately predicted by the nomogram than by the risk factors in all three cohorts (Figures 9E–9J).

Functional relevance and clinical potential of diagnostic markers for HCC

To explore the clinical potential of peritumor genes, we searched drugs that target peritumoral GES genes in GeneCards (https://www.genecards.org/). GeneCards integrated information of drugs from DrugBank, ApexBio, DGIdb, FDA Approved Drugs, ClinicalTrials.gov, and PharmGKB; drugs that are marked with Approved, Investigational, and Experimental were selected. With our search, eight targetable peritumoral genes were identified (Table S5). Of these drugs, melatonin was selected as a full agonist of IDO1. Many studies have shown that melatonin’s co-administration improves the sensitivity of cancers to inhibition by conventional drugs. For example, the combination of melatonin and an IDO inhibitor DL-1MT improves the efficacy of an immunotherapy (gDE7) targeting HPV-associated tumors (Moreno et al., 2018). Moreover, besides its effects on cancer cells, the protective effect of melatonin was observed in cells in the liver and the immune systems (Calvo et al., 2013; Carrillo-Vico et al., 2013; Molpeceres et al., 2007; Ren et al., 2017; Sardo et al., 2017), which further suggests the peritumoral effectiveness of melatonin on HCC. To identify the pathophysiological mechanisms underlying the difference in survival time in patients with HCC, we performed gene set variation analysis (GSVA) on both tumor and peritumor. From GSE14520 we selected 141 patients with consistently high or low risk levels in GES and IS classifiers, including 50 high-risk patients and 91 low-risk patients. From the tumor, we identified 36 gene sets with higher enrichment score (ES) in low-risk patients and four gene sets with higher ES in high-risk patients. These gene sets covered several proliferation-, angiogenesis-, and metastasis-associated pathways, e.g., WNT signaling pathway, MDA5 signaling pathway, blood coagulation, and fatty acid metabolism-associated pathways (Figure S7). As for peritumor, we observed 33 gene sets with higher ES in high-risk patients and 7 gene sets with higher ES in low-risk patients (Figure 10). Two significantly enriched gene sets, IL-6-mediated signaling pathway (log2 fold change = -0.50; padj = 2.91e-3) and G-protein-coupled receptor (GPCR) signaling pathway (log2 fold change = 0.46; padj = 1.34e-3, Figure 10), were identified in peritumor, whereas these two pathways were marginally significant in gene set variation analysis in tumor (padj = 5.8e-2 for IL-6-mediated signaling pathway and padj = 9.6e-2 for adenylate cyclase activating GPCR signaling pathway, respectively). This indicates some similarities between tumor and peritumor during tumorigenesis. Moreover, these two pathways got a higher log2 fold change than that in tumor (IL-6-mediated signaling pathway, -0.34 in tumor versus -0.50 in peritumor; adenylate cyclase activating GPCR signaling pathway, 0.30 in tumor versus 0.46 in peritumor), suggesting a stronger effect of these pathways in peritumor. The Toll-like receptor 2 (TLR2) signaling pathway, which plays an essential role in the innate immune response, got significantly higher expression in patients with high risk (log2 fold change = -0.38, padj = 1.7e-4, Figure 10). Notably, such biased expression TLR2 pathway gene sets were not observed in tumor (log2 fold change = -0.005, padj = 0.98). To some extent, the above-mentioned molecular characteristics may provide a reasonable interpretation of the prognostic value of the two signatures in our study and may further give us insights into the underlying mechanisms of tumor prognosis from the perspective of both tumor and peritumor.

Figure 10

Differentially enriched GO and KEGG pathway gene sets between the high-risk group and the low-risk group in peritumor

Significantly enriched gene sets were selected as adjusted p value >0.05, and the 40 top fold change gene sets were plotted in the heatmap (A) and the volcano plot (B).

Differentially enriched GO and KEGG pathway gene sets between the high-risk group and the low-risk group in peritumor Significantly enriched gene sets were selected as adjusted p value >0.05, and the 40 top fold change gene sets were plotted in the heatmap (A) and the volcano plot (B).

Discussion

HCC is a leading cause of cancer-related death in many parts of the world. It is characterized as an occult, highly malignant disease with a high recurrence rate and poor prognosis. Unlike other solid malignancies, the coexistence of inflammation and cirrhosis makes the early diagnosis and prognostic assessment of HCC much more difficult. Despite recent progress on the diagnosis of and therapy for HCC, its morbidity and mortality is still rising in many countries, which indicates an urgent need for valuable biomarkers for the diagnosis and treatment of HCC (Akoad and Pomfret, 2015; Zhu et al., 2013). With the emergence of high-throughput technique as a powerful method for the profiling of gene expression, numerous genomic studies tried to provide gene expression-based prognostic markers for HCC (He et al., 2020; Liang et al., 2020; Zhang et al., 2020). However, recent prognostic markers do not inform the diagnosis, prognosis, or treatment decisions for HCC patients. Limitations of these markers include the lack of large-scale, independent validation and, more importantly, the neglect of peritumor features. In the current work, based on the theory of field cancerization, we selected candidate features from tumor and peritumor simultaneously. Through univariate Cox regression, we were able to uncover the detail of prognosis-associated gene and immune cell subsets in HCC. A series of prognosis-associated features were captured, including a significant association between the fraction of CD8 T cells and patient survival in both tumor and peritumor tissue. More notably, a stronger association was observed between peritumor CD8 T cells and patient survival (tumor, HR = 1.73, p = 0.02; peritumor, HR = 3.27, p < 0.001, Figure 5B), suggesting a greater role of peripheral originated T cells CD8. Recent evidence showed that the clonotypic expansion of effector-like T cells within not only the tumor but also in normal adjacent tissue, and patients with gene signatures of such clonotypic expansion respond best to anti-PDL1 therapy, indicating that non-exhausted T cells and T cell clones supplied from the periphery may be key factors in explaining patient variability and clinical benefit from cancer immunotherapy (Wu et al., 2020). Put these together, a similar mechanism is likely shared in a range of solid malignancies, including HCC, which may hold promise for further therapies. As an effort to overcome the underlying risk of overfitting, we applied a Cox regression model with a LASSO penalty for the selection of prognosis-associated features. Based on the cancerization field features from the GSE14520 cohort, the expression of 70 genes and infiltration levels of 22 immune cells were selected for the construction of GES and IS, respectively. Both signatures were able to perform robust prognosis-based classification in not only the training cohort but also two validation cohorts, indicating good reproducibility of the two signatures in HCC. Further analysis uncovered that the prognostic value of signature in our study was independent of the main prognostic factors in HCC. For example, AFP is most commonly employed in the clinic for HCC screening and as an important predictor for patient survival (Peng et al., 2004). The TNM staging system reports the pathological characteristics of patients (e.g., tumor grade, nodule, and metastasis), which had been widely applied in various cancers (Marrero et al., 2018). By performing multivariable Cox regression analysis, we identified that the prognostic values GES and IS are independent of AFP and TNM stage, which strongly demonstrated that each model could act as an independent prognostic factor for HCC. Furthermore, our study confirmed that field cancerization profile-based models could better predict a patient’s prognosis. In this study, a series of tumor or peritumor profile-based models were trained and compared with GES and IS, which confirmed the advantage of field cancerization profile-based models. We also observed a lower prediction error in GES compared with other published HCC prognosis models (Figures 3E and 3F). Considering that recent efforts on prognostic model development are mostly based on tumor features alone, the neglect of the peritumor profile might lead to their lack of robustness and limited clinical application. To provide a convenient and accurate tool for the evaluation of overall risk (including tumor recurrence and metastasis) at the individual level, we constructed a nomogram by integrating GES, IS, and TNM stage information. Although such methods generally use traditional prognostic factors, such as TNM stage and sex, we propose including our field cancerization profile-based model, which reflects the molecular heterogeneity of these tumors. The performance of the nomogram was verified in all validation cohorts. Thus, our nomogram might provide a simple and accurate method for predicting prognoses in HCC. Cancer is a heterogeneous disease, and exploring the dysregulated genes involved in carcinogenesis and development might help to improve prognostic and therapeutic strategies. According to MalaCards (Rappaport et al., 2017), 28 genes from the GES signature are associated with human cancer, including 17 tumoral genes (CYLD, TGM1, IL2RB, NEDD9, VEGFA, IRF1, MFGE8, CDH13, DAB2, MSX1, DAPK3, CXCL10, CD52, LSM1, SMTN, NMB, and ZNF652) and 11 peritumoral genes (PIBF1, BAX, GPATCH2, CDKN3, ACE2, CUEDC2, DNMT3A, EGR2, IDO1, NCAM1, and ICOS). Among the 11 peritumoral genes, Bax (Bcl2-associated X protein) encoded by gene BAX was reported as a central apoptosis-inducing protein that works through the induction of permeabilization of the outer mitochondrial membrane (Benard et al., 2010). Bax interacts with Bcl-2 and their ratio determines survival or death following an apoptotic stimulus (Oltvai et al., 1993). Specifically, the overexpression of BAX had been reported to induce autophagy and enhance drug sensitivity of HCC in cell experiments. The CDKN3 gene encodes a cyclin-dependent kinase inhibitor, which dephosphorylates CDK2 kinase to prevent its activation. It has been reported to be deleted, mutated, or overexpressed in several kinds of cancers. It has been reported that the overexpression of CDKN3 in HCC could promote cell proliferation by stimulating G1-S transition. In this study, the high expression of peritumoral CDKN3 was associated with poor prognosis (coefficient = 0.466). Two CDKN3 antagonists, AT7519 and Kenpaullone, were identified through the search in GeneCards (Table S5). Our results suggest that these drugs may take effect in peritumor as well. In our research, a limited number of drugs were identified in peritumor genes, which is mainly due to the lack of study on the role of peritumor in cancerogenesis. Although the biological function of some of the marker genes has not been reported in HCC, their underlying effects as targets for HCC might be addressed in further biological and mechanistic investigations. According to the GSVA, a series of pathways associated with patient outcomes were uncovered. Some cancer-associated pathways (e.g., the IL-6 signal pathway and the adenylate cyclase activating GPCR signaling pathway) were identified in both tumor and peritumor with varying degrees of enrichment. Moreover, several pathways with clinical potential were identified. For example, high expression of gene sets from the 3′-phosphoadenosine 5′-phosphosulfate (PAPS) biosynthetic process was observed in the peritumor of risk-high patients (Figure 10). A previous family-based study in 240 families revealed that the 3′-phosphoadenosine 5′-phosphosulfate synthase 1 (PAPSS1) gene was a candidate HCC-susceptibility gene, which was associated with elevated serum α-fetoprotein and with poor survival in familial cases. Studies based on immunohistochemical analysis, western blot, and qRT-PCR revealed a significant change in the expression of PAPSS1 protein in patients under the treatment of IFN-α, and its relation to relapse and prognosis was also reported. Our study indicates that genes involved in this pathway (e.g., PAPSS1) may be applicable in clinical practice. In this multicenter, retrospective cohort study, we developed and validated two weighted prognostic models (GES and IS) based on cancerization-field profile. Our study confirmed that field cancerization profile-based models could better predict the patient’s prognosis, which offers insights for works on prognostic model construction. The nomogram constructed by combining GES, IS, and TNM stage has been proved effective in improving the individualized prediction of the overall risk (e.g., recurrence and metastasis) of patients. According to the peritumoral immune cell infiltration features, CD8 T cells in peritumor tissues showed a stronger correlation with the patient’s prognosis than that in the tumor, suggesting that T cells provided by peritumor tissues may be a key factor explaining the diversified prognosis and clinical benefit of immunotherapy. Moreover, a series of prognosis-associated genes and pathways were identified in peritumor tissue, which may contribute to the study on the molecular mechanism of HCC as well as the development of therapeutic targets. With the prevailing of these prognostic signatures applied in the further classification of patients with HCC, they hold promise to reflect different biologic backgrounds with potential implications in patient selection for therapies and prediction of clinical outcomes.

Limitations of the study

We acknowledge several limitations that could be addressed in our study. First, as a retrospective study based on publicly available data, only a limited number of qualified studies were collected. As the role of peritumor has attracted a rising interest in carcinogenesis, further validation on a greater scale would be expected. In addition, the signatures were constructed based on a limited number of features, which include only 12,749 protein-coding genes, and the immune cell component involved did not represent all the HCC-associated immune features. As a growing number of researches have revealed the carcinogenesis-associated role of various features (e.g., peptides, long noncoding RNA, microRNA, and circle RNA), the inclusion of a wider range of features would not only enable the construction of a more robust model but also bring us closer to the underlying mechanisms of HCC. Finally, the mechanisms of the signature genes have not been clearly identified here, and experimental studies on these genes may provide important information to facilitate our understanding of their functional roles.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Yuanyan Xiong (xyyan@mail.sysu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table. This paper does not report original code. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Method details

HCC datasets preparation

Gene expression profiles for HCC were searched in publicly accessible databases. Datasets that i) from paired tumor and non-tumor specimens, ii) include enough number of patients, and iii) with detailed clinical data were included in our study. Finally, gene expression data from three centers were selected, namely the GSE14520 cohort, the LIHC cohort, and the LIRI cohort. The GSE14520 cohort was selected from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/gds) under the accession number GSE14520, with samples of 247 patients collected by Liver Cancer Institute, Fudan University, Shanghai, China (Roessler et al., 2010). Patients with expression data from paired tumor and non-tumor samples were selected, which result in a set of 209 patients in the GSE14520 cohort. For the LIHC cohort, TCGA Level III gene expression data of 377 patients with HCC were downloaded from Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/) using R package TCGAbiolinks (Colaprico et al., 2016), from which 49 patients with paired expression profile were assigned to the LIHC cohort. Additionally, we obtained the gene expression data from the LIRI-JP project in ICGC (https://icgc.org/). Of the 232 patients included in the LIRI-JP project, paired expression profile was available in 199 patients, which were assigned to the LIRI cohort. For each cohort, the clinical data of patients were obtained for further analysis. As our analyses were based on deidentified data, institutional review board approval and informed consent were not required.

In silico decomposition of the immune microenvironment

We adopted CIBERSORTx, a gene expression-based deconvolution approach to estimate the relative immune cell infiltration levels of tumor and peritumor of each patient. According to the CIBERSORTx algorithm, Monte Carlo sampling was applied to derive the deconvoluted p value for each sample (Newman et al., 2019). The proportions of 22 immune cells in both HCC tumor samples and peritumor samples were quantified using the LM22 signature (consisting of 547 marker genes) under 1,000 permutations. Significant cases (p < 0.05) which indicate accurate deconvolution were kept for further analysis. The fraction distribution of each immune cell in tumor and peritumor was compared using Wilcoxon rank-sum test. Spearman correlation test was applied to study the pairwise correlation in the 22 immune cells of tumor samples and peritumor samples, respectively. We further evaluated the association between immune cell infiltration and patient prognosis. Immune cells significantly associated with patient survival (p < 0.05) were selected by univariate cox regression. For each cell, we select infiltration level-associated gene sets for GO enrichment analysis using R package clusterProfiler (Yu et al., 2012).

Cancerization-field level GES prognosis signature

Unlike the previously published prognosis model constructed from tumor-only characteristics, we adopted the expression profile of both tumor and peritumor into model construction. A two-step procedure was used to select optimal marker genes for distinguishing risk-low patients from risk-high patients. In the first step, a univariate Cox proportional hazards model was applied on each of the profiled 12,749 genes in GSE14520 cohort to explore the association between gene expression and patient survival, significant genes (p < 0.05) were selected as candidates to allow more efficient feature selection. The number of significant genes was 1,876 and 1,940 in tumor and peritumor, respectively. In the second step, candidate genes were subjected to further feature selection using the least absolute shrinkage and selection operator (LASSO) regression. LASSO uses L1 regularization to make the variables in the Cox regression collapsible. By changing the value of regularized parameter λ, we are able to control the number of the final model (Goeman, 2010). By this method, we further screen out the most useful prognostic markers among the candidate genes. The R package glmnet was applied to perform the LASSO Cox regression model analysis (Friedman et al., 2010). Finally, a gene expression score (GES) model was constructed and normalized to predict the prognosis of HCC patients, the risk score for each patient was calculated based on the regression coefficient of each gene in the signature. To make GES comparable between different profiling platforms, e.g., the signal intensity of microarray expression data and the FPKM value of RNAseq expression data, GES was calculated using the following formula:where n is the number of genes in the model, x is the expression of the gene, and w is the gene-specific weight, i.e., the coefficient for each gene.

Cancerization-field level IS prognosis signature

For the construction of a cancerization-field level immune score (IS) prognosis signature, a similar two-step procedure as GES was applied. Briefly, in the first step, univariate cox regression was performed on each of the 44 infiltrated immune cells (22 in tumor and 22 in peritumor), and immune cells that were significantly associated patient survival (p < 0.05) were selected as candidates. In the second step, the infiltration level of the candidate immune cells was subjected to LASSO Cox regression. Finally, an immune score (IS) model was constructed for the prediction of patient prognosis. IS was calculated using the following formula:where n is the number of immune cells in the model, x is the infiltration level of the immune cell, and w is the coefficient for each immune cell.

Assessment of the predictive value of the prognostic models

To figure out whether models constructed from cancerization-field profile (GES and IS) were more accurate on the prediction of patient survival, additional models were constructed, and their capability of risk stratification was compared. 1) For comparison with GES, four models were constructed, including tumor gene score 1 (TGS1), tumor gene score 2 (TGS2), peritumor gene score 1 (PGS1), and peritumor gene score 2 (PGS2). Model TGS1 is the tumoral part of GES, consisting of the tumor genes from GES with the same coefficient. Model TGS2 was built from tumor expression profile, with marker genes selected using the same two-step procedure as GES. For model TGS2, regularized parameter λ was adjusted to keep the number of marker genes identical or similar to GES, ensuring that the prognosis prediction power did not result from a different gene number. Similar to that of TGS1 and TGS2, PGS1 and PGS2 were constructed using peritumor expression profile. 2) For comparison with IS, we constructed four models, tumor immune score 1 (TIS1), tumor immune score 1 (TIS1), tumor immune score 2 (TIS2), peritumor immune score 1 (PIS1), and peritumor immune score 2 (PIS2). Model TIS1 is the tumoral part of IS, consisting of the tumor immune cells from IS with the same coefficient. Model TIS2 was built by multivariate Cox regression on the profile of tumor immune cells. Similar to that of TIS1 and TIS2, PIS1 and PIS2 were constructed using peritumor immune cell infiltration profile. To stratify patients of different risk scores into risk-high and risk-low groups, optimal cutpoint for each model was detected using the maximally selected rank statistics implemented in the R package maxstat (Hothorn and Lausen, 2003). The standardized log-rank statistic across different risk scores was calculated, thus providing a value of a cutpoint that corresponds to the most significant relation with outcome. HCC patients were then dichotomized into a risk-high and risk-low group according to the optimal cutpoint. Each model with its corresponding cutpoint was then validated in validation cohorts, with the survival of the high-risk and low-risk group estimated by the Kaplan-Meier method, and compared using the log-rank test. We further designed a random sampling approach to validate the superiority of cancerization-field profile-derived models. For gene expression profile, 75% of patients in the GSE14520 cohort (156 individuals) were randomly sampled for 1,000 trials, which makes up 1,000 sets of training sets. For each training set, five prognostic models (GES, TGS1, TGS2, PGS1, and PGS2) were constructed, with corresponding optimal cutpoint determined based on the aforementioned methods. Each model was validated in LIHC and LIRI cohort, and log-rank p values were compared across five models using the pairwise Wilcoxon rank-sum test. In an effort to compare GES and previously published models, the performance of prognosis prediction over time was compared. We used R package pec to perform the inverse probability of censoring weighting (IPCW) estimation of time-dependent Brier score based on ten-fold cross-validation, as well the computation of prediction error curves of each model (Mogensen et al., 2012). We performed Cox multivariate regression to check whether GES and IS signature were independent of conventional clinicopathological factors (age, gender, TNM stage, HBV, ALT, cirrhosis, AFP, and tumor size). Furthermore, receiver operating characteristic (ROC) analysis was used in validating the GES and IS signature of their performance in predicting 24- and 36- month OS. Using R package forestplot and survivalROC, the results were visualized by forest plots and ROC curves, respectively (Heagerty et al., 2000).

Development of nomogram and clinical use validation

We formulated a nomogram using the R package rms, which included GES classifier, IS classifier, and TNM stage in the nomogram (Núñez et al., 2011). We use calibration curves to assess the performance of the nomogram, which use the bootstrap method with 1000 resamples to get bias-corrected estimates of predicted versus Observed values, with a concordance index (C-index) calculated. Decision curve analysis was used to assess the clinical practicability of the nomogram, which applied the net benefits of threshold probabilities to quantitatively evaluate the clinical value of the nomogram (Vickers and Elkin, 2006). We further adopted the receiver operating characteristic (ROC) analysis to compare the predictive accuracy of TNM stage, GES classifier, IS classifier, and the nomogram on each of the three cohorts in our study.

Functional relevance exploration of diagnostic gene sets

To explore potential mechanisms underlying groups of distinct risk scores, patients with consistent high- or low- risk in both GES signature and IS signature were selected to analyze their enrichment for gene sets and pathways. The enrichment analysis was carried out by R package gsva, using gene sets from KEGG and GO (Ashburner et al., 2000; Hanzelmann et al., 2013; Kanehisa and Goto, 2000; Kanehisa et al., 2016). The gene sets were obtained from MSigDB database version 7.0 (Subramanian et al., 2005). Gene set enrichment scores (ES) of individual samples were calculated from the gene expression data using a Kolmogorov-Smirnov (KS) like random walk statistic with classical maximum deviation method (Subramanian et al., 2005). The GSVA ES of selected samples were then applied to fit the same linear model using R package limma and moderated t-statistics were estimated for each gene set (Ritchie et al., 2015). Significant gene sets were filtered with FDR adjusted p value < 0.05, and the top 40 gene sets with the highest fold change between groups were then visualized by heatmap and volcano plot, respectively.

Quantification and statistical analysis

Statistical analyses were performed using R (version 4.0.5). Kaplan-Meier analysis or multivariate cox analysis was performed for survival analysis. The spearman coefficient was applied to analyze correlations. Wilcoxon rank-sum test was used to compare the difference between the value from two groups. All reported p Values are two-tailed, and for all analyses, p < 0.05 was considered statistically significant. FDR correction was used for multiple test correction.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

mRNA microarray of 209 HCC patients	Gene Expression Omnibus	GEO: GSE14520
mRNA-seq of 49 HCC patients	The Cancer Genome Atlas	LIHC
mRNA-seq of 199 HCC patients	International Cancer Genome Consortium	LIRI-JP

Software and algorithms

TCGAbiolinks 2.14.1	(Colaprico et al., 2016)	https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html
CIBERSORTx	(Newman et al., 2019)	https://cibersortx.stanford.edu/
clusterProfiler 3.14.3	(Yu et al., 2012)	https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
glmnet 4.1-2	(Friedman et al., 2010)	https://cran.r-project.org/package=glmnet
maxstat 0.7-25	(Hothorn and Lausen, 2003)	https://cran.r-project.org/package=maxstat
pec 2020.11.17	(Mogensen et al., 2012)	https://cran.r-project.org/package=pec
survivalROC 1.0.3	(Heagerty et al., 2000)	https://cran.r-project.org/package=survivalROC
rms 6.2-0	(Núñez et al., 2011)	https://cran.r-project.org/package=rms
gsva 1.34.0	(Hanzelmann et al., 2013)	https://bioconductor.org/packages/release/bioc/html/GSVA.html
limma 3.42.2	(Ritchie et al., 2015)	https://bioconductor.org/packages/release/bioc/html/limma.html

50 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. Time-dependent ROC curves for censored survival data and a diagnostic marker.

Authors: P J Heagerty; T Lumley; M S Pepe
Journal: Biometrics Date: 2000-06 Impact factor: 2.571

3. Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Authors: Ulla B Mogensen; Hemant Ishwaran; Thomas A Gerds
Journal: J Stat Softw Date: 2012-09 Impact factor: 6.440

4. Sixty-five gene-based risk score classifier predicts overall survival in hepatocellular carcinoma.

Authors: Soo Mi Kim; Sun-Hee Leem; In-Sun Chu; Yun-Yong Park; Sang Cheol Kim; Sang-Bae Kim; Eun Sung Park; Jae Yun Lim; Jeonghoon Heo; Yoon Jun Kim; Dae-Ghon Kim; Ahmed Kaseb; Young Nyun Park; Xin Wei Wang; Snorri S Thorgeirsson; Ju-Seog Lee
Journal: Hepatology Date: 2012-03-18 Impact factor: 17.425

5. Decision curve analysis: a novel method for evaluating prediction models.

Authors: Andrew J Vickers; Elena B Elkin
Journal: Med Decis Making Date: 2006 Nov-Dec Impact factor: 2.583

6. Melatonin is able to reduce the apoptotic liver changes induced by aging via inhibition of the intrinsic pathway of apoptosis.

Authors: Virginia Molpeceres; José L Mauriz; María V García-Mediavilla; Paquita González; Juan P Barrio; Javier González-Gallego
Journal: J Gerontol A Biol Sci Med Sci Date: 2007-07 Impact factor: 6.053

7. Determining cell type abundance and expression from bulk tissues with digital cytometry.

Authors: Aaron M Newman; Chloé B Steen; Chih Long Liu; Andrew J Gentles; Aadel A Chaudhuri; Florian Scherer; Michael S Khodadoust; Mohammad S Esfahani; Bogdan A Luca; David Steiner; Maximilian Diehn; Ash A Alizadeh
Journal: Nat Biotechnol Date: 2019-05-06 Impact factor: 54.908

8. A genomic-clinical nomogram predicting recurrence-free survival for patients diagnosed with hepatocellular carcinoma.

Authors: Junjie Kong; Tao Wang; Shu Shen; Zifei Zhang; Xianwei Yang; Wentao Wang
Journal: PeerJ Date: 2019-10-31 Impact factor: 2.984

9. Repeated observation of breast tumor subtypes in independent gene expression data sets.

Authors: Therese Sorlie; Robert Tibshirani; Joel Parker; Trevor Hastie; J S Marron; Andrew Nobel; Shibing Deng; Hilde Johnsen; Robert Pesich; Stephanie Geisler; Janos Demeter; Charles M Perou; Per E Lønning; Patrick O Brown; Anne-Lise Børresen-Dale; David Botstein
Journal: Proc Natl Acad Sci U S A Date: 2003-06-26 Impact factor: 12.779

10. Patterns of Immune Infiltration in HNC and Their Clinical Implications: A Gene Expression-Based Study.

Authors: Jukun Song; Zhenghao Deng; Jiaming Su; Dongbo Yuan; Jianguo Liu; Jianguo Zhu
Journal: Front Oncol Date: 2019-12-04 Impact factor: 6.244

1 in total

1. Identification of Immune-Related Subtypes and Characterization of Tumor Microenvironment Infiltration in Kidney Renal Clear Cell Carcinoma.

Authors: Huisheng Qin; Tiancheng Wang; Hui Zhang
Journal: Front Genet Date: 2022-06-29 Impact factor: 4.772

1 in total