BACKGROUND: Precise prediction of survival after treatment is of great importance for patients with diseases with high mortality. RNA sequencing data and deep learning (DL) methods are expected to become promising approaches in the development of prediction models in the future. We aimed to evaluate the optimal covariates and methodology for patients with hepatocellular carcinoma (HCC) undergoing surgical resection. METHODS: The Cox proportional hazards regression model and the DL approach were used to develop prediction models incorporating clinical, genetic, and combined clinical and genetic variables for survival prediction in patients with HCC after resection. A total of 1,114 patients and 184 patients were enrolled in the present study from 2,163 and 601 patients from Eastern Hepatobiliary Surgery Hospital and Renji Hospital, respectively. The models were internally validated through random sampling and externally validated in clinical cohorts. Between-model comparisons were carried out in terms of the integrated discrimination improvement and net reclassification index. RESULTS: The Cox and DL clinical models were developed by adopting 7 independent prognostic factors (total bilirubin, prothrombin time, tumor size, tumor number, lymph node metastasis, and vascular invasion) and 22 clinical factors, respectively. Both the Cox clinical model and the DL clinical model showed excellent performances in the derivation [area under the curve (AUC): 0.75 vs. 0.77] and validation (AUC: 0.83 vs. 0.80) sets. The derived Cox genetic model with 6 significant prognostic genes was not as effective as the DL approach involving 686 genes. A combined clinical and genetic approach modified the performances of both the Cox and DL models. The integrated discrimination improvement and net reclassification index of the DL clinical model were generally better than those of the Cox clinical model. CONCLUSIONS: Our Cox clinical model sufficiently provided precise survival prediction in patients with HCC after resection. It may serve as an accurate and cost-effective tool for predicting survival in such patients. 2021 Annals of Translational Medicine. All rights reserved.
BACKGROUND: Precise prediction of survival after treatment is of great importance for patients with diseases with high mortality. RNA sequencing data and deep learning (DL) methods are expected to become promising approaches in the development of prediction models in the future. We aimed to evaluate the optimal covariates and methodology for patients with hepatocellular carcinoma (HCC) undergoing surgical resection. METHODS: The Cox proportional hazards regression model and the DL approach were used to develop prediction models incorporating clinical, genetic, and combined clinical and genetic variables for survival prediction in patients with HCC after resection. A total of 1,114 patients and 184 patients were enrolled in the present study from 2,163 and 601 patients from Eastern Hepatobiliary Surgery Hospital and Renji Hospital, respectively. The models were internally validated through random sampling and externally validated in clinical cohorts. Between-model comparisons were carried out in terms of the integrated discrimination improvement and net reclassification index. RESULTS: The Cox and DL clinical models were developed by adopting 7 independent prognostic factors (total bilirubin, prothrombin time, tumor size, tumor number, lymph node metastasis, and vascular invasion) and 22 clinical factors, respectively. Both the Cox clinical model and the DL clinical model showed excellent performances in the derivation [area under the curve (AUC): 0.75 vs. 0.77] and validation (AUC: 0.83 vs. 0.80) sets. The derived Cox genetic model with 6 significant prognostic genes was not as effective as the DL approach involving 686 genes. A combined clinical and genetic approach modified the performances of both the Cox and DL models. The integrated discrimination improvement and net reclassification index of the DL clinical model were generally better than those of the Cox clinical model. CONCLUSIONS: Our Cox clinical model sufficiently provided precise survival prediction in patients with HCC after resection. It may serve as an accurate and cost-effective tool for predicting survival in such patients. 2021 Annals of Translational Medicine. All rights reserved.
Individualized calculation of mortality risk has gained attention in the precision medicine era due to its supportive guidance for treatment selection and the estimation of survival outcomes (1,2). Among the predictive models for cancer, the Cox proportional hazards regression model has been widely applied for both the identification of significant prognostic factors and the prediction of patient survival outcomes. The results of the Cox proportional hazards regression model are frequently visualized as nomograms for clinical application (3,4). In recent years, the deep learning (DL) approach, which allows computational models composed of multiple processing layers to learn data representations with multilevel abstraction, has been applied in some medical fields, including drug discovery, image evaluation and diagnosis, and genomics (5,6).Hepatocellular carcinoma (HCC) is the most common primary hepatic tumor, accounting for the majority of primary liver cancers. The global incidence and mortality of HCC are rapidly increasing (7). Most HCCs arise from viral hepatitis, non-alcoholic fatty liver disease, and liver cirrhosis. Thus close surveillance of patients with these conditions would contribute to the early detection of HCC, which in turn could expand the proportion of eligible candidates for surgical resection (8,9). Recently, it has been reported that T1 stage HCC accounts for more than 40% of the total cases (10). Although surgical resection and orthotopic liver transplantation are the standard of care and provide an opportunity for curative treatment of tumors without extrahepatic metastasis, owing to a shortage of organ donors, surgical resection is recommended for resectable cases (11).Along with advances in the identification of risk factors for the development of HCC and surveillance systems, the effectiveness of surgical resection and the identification of appropriate candidates have become crucial to improving the prognosis of patients with HCC. In the present study, we investigated the derivation of predictive systems composed of clinical and genetic factors using the Cox regression model and DL approaches, with the aim of evaluating the optimal covariates and methodology for patients with HCC undergoing surgical resection. We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/atm-20-4828).
Methods
Patients
This was a retrospective, two-center study. The clinical models were derived from patients with HCC who underwent surgical resection at the Eastern Hepatobiliary Surgery Hospital (EHBH), Second Military Medical University (Shanghai, China) between January 2005 and December 2011. The models were validated in patients with HCC who underwent resection at Renji Hospital, School of Medicine, Shanghai Jiao Tong University (Shanghai, China) between January 2004 and December 2012. The enrolled patients had a diagnosis of HCC based on histopathological examination. To be eligible, patients also needed to have an Eastern Cooperative Oncology Group (ECOG) performance status score of 0 or 1, and to have undergone only surgical resection as the initial treatment. Patients who died perioperatively or who had incomplete follow-up or clinical data were excluded from the analysis. Of 2,163 and 601 patients treated at the Eastern Hepatobiliary Surgery Hospital and Renji Hospital, respectively, 1,114 patients and 184 patients were enrolled into the present study (). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by Eastern Hepatobiliary Surgery Hospital ethics committee (No. 2020024) and individual consent for this retrospective analysis was waived.
Figure 1
Flow chart of HCC patients enrolled and analyzed in this study. HCC, hepatocellular carcinoma.
Flow chart of HCC patients enrolled and analyzed in this study. HCC, hepatocellular carcinoma.
Data sources
For the construction of the genetic and combined clinical and genetic models, gene expression data from 377 patients with HCC were retrieved from The Cancer Genome Atlas (TCGA; https://www.cancer.gov/tcga) Research Network. After screening the available data, 374 patients were enrolled into the analyses for the development of the genetic and combined clinical and genetic models. For validation, we retrieved clinical and RNA sequencing (RNA-seq) data of patients with HCC from the International Cancer Genome Consortium (ICGC; ICGC-LIRI-JP, n=193; validation group 1) and Gene Expression Omnibus (GEO; GSE116174, n=64; validation group 2), respectively ().
Model construction and variables
Two methodologies, Cox regression and DL, were applied in the derivation of the models. For Cox models, all variables were tested for statistical significance through univariate analyses, and multivariate analysis was carried out for factors with statistical significance. Only the significant factors identified in the multivariate analysis were selected for the Cox clinical model. For the development of the genetic nomogram, only covariates found to have a significant prognostic impact in the Cox univariate analysis as well as |log2 (fold change)|>0.6 and P<0.05 were considered eligible for inclusion. Multivariate analysis was not carried out for gene expression variables due to there being a large number of variables that limited the evaluation of independent prognostic impact.For the development of the DL models, 22 demographic and clinical variables were adopted, including sex, age, alpha fetoprotein (AFP), carcinoembryonic antigen (CEA), hepatitis B virus (HBV) and hepatitis C virus (HCV) infection, total bilirubin (TB), albumin, prothrombin time (PT), alanine aminotransferase (ALT), aspartate aminotransferase (AST), liver cirrhosis, tumor size, tumor number, lymph node metastasis, vascular invasion, capsule formation, TNM stage, tumor location, and diabetes mellitus. Among the variables, tumor size and number, vascular invasion, lymph node metastasis, capsule formation, tumor location, and TNM stage were collected from postoperative pathology reports. For the DL genetic model, RNA-seq identified 686 expressive genes that were approved in HUGO Gene Nomenclature Committee (HGNC; https://www.genenames.org) and were subjected to analysis. The full list of the included genes is shown in the supplementary file (https://cdn.amegroups.cn/static/application/3f120d22dea23635dbe86366eed76a7f/atm-20-4828-1.pdf).In the construction and validation of the combined clinical and genetic models, 7 variables (age, sex, HBV infection, TNM stage, vascular invasion, alcohol consumption, and smoking) were overlapping; thus, these variables were included in the analyses for the Cox combined model and in the development of the DL combined model along with 686 genes. The other non-overlapping variables among the databases were excluded from the analyses.
Statistical analysis
All models were developed for the prediction of overall survival (OS) in HCC patients, which was defined as time from surgery to death. Continuous variables were not categorized in the development of any of the models and were presented as median [interquartile range (IQR)]. There were no missing values; any patient with missing data was excluded from the analyses. Kaplan-Meier estimation was performed using the log-rank test for the evaluation of cumulative events. Internal validation was defined and performed through random sampling of 100 patients for 4 times per model. The performances of the models were assessed by receiver operating characteristic (ROC) curve analysis with area under the ROC curve (AUC) and calibration plots. Between-model comparisons were carried out by calculating the net reclassification index (NRI) and integrated discrimination improvement (IDI). The models were developed with 2 major aims: discrimination and individualized provision of probability. Discrimination was carried out by halving according to the risk probability. P values <0.5 were considered to be statistically significant. All statistical analyses were performed using the R Project for Statistical Computing (v3.5.3; https://www.r-project.org). The DL models were derived using TensorFlow (v1.2.1), on servers equipped with the dual-core Intel (R) Core (TM) i7-4650U CPU @1.70 Ghz 2.30 GHz, 8 GB RAM, and Intel (R) HD Graphics 5000 using Python (v3.7.3; https://www.python.org).
Results
Patient characteristics
All derivation and validation patients were Chinese patients with a median age of 53 (IQR, 45–59) and 51 (IQR, 45–59) years, respectively (Table S1). Of the patients, 15% were female, and 10% had diabetes mellitus. Both HBV infection (88.2% in the derivation cohort; 90.2% in the validation cohort) and liver cirrhosis (69.4% in the derivation cohort; 86.4% in the validation cohort) were prevalent, supporting the theory of three-step development of HCC from HBV infection to liver cirrhosis to HCC. However, the prevalence of HCV infection was 0.9% and 2.2% in the derivation cohort and validation cohort, respectively.In terms of patient characteristics, the cohort (TCGA-LIHC, n=374) used for construction of the genetic models was 32.4% female and had a median age of 61 years (IQR, 52–69 years). Of the patients in this cohort, 42.5% had HBV infection, and there was a predominance of TNM stages I–II (74.3%; Table S2). Validation cohort 1 (ICGC-LIRI-JP, n=193) had a relatively high age (median, 69 years; IQR, 62–74 years) and the majority of patients were male (74.6%). In this cohort, 72.0% of patients did not have HBV infection, and there was a high rate of alcohol consumption (59.6%), smoking (59.1%), and vascular invasion (31.6%). Validation cohort 2 (GSE116174, n=64) had a median age of 54 years (IQR, 49–62 years), and 9.4% of patients were female. This cohort showed a high prevalence of HBV infection (73.4%), but a low rate of alcohol consumption (20.3%). Collectively, the characteristics of the training cohort and the 2 validation cohorts were different with the aim to challenge generalization.
Cox clinical model
For the development of the Cox clinical model, univariate and multivariate analyses were carried out for the derivation set (EHBH, n=1,114). Univariate analyses identified ALT, AST, TB, PT, albumin, AFP, tumor size, tumor number, tumor location (left lobe), vascular invasion, lymph node metastasis, and TNM stage to be significant prognostic factors for OS (Table S3). Among these significant prognostic factors, TB [hazard ratio (HR), 1.01; 95% CI, 1.00–1.01; P=0.001], PT (HR, 1.14; 95% CI, 1.04–1.24; P=0.003), tumor size (HR, 1.16; 95% CI, 1.12–1.19; P<0.001), tumor number (HR, 1.48; 95% CI, 1.18–1.86; P=0.001), vascular invasion (HR, 2.22; 95% CI, 1.65–2.99; P<0.001), and lymph node metastasis (HR, 1.99; 95% CI, 1.22–3.24; P=0.006). The derived nomogram for probabilistic ratiocination and the discrimination of risk groups (). The consistency between predicted probability and actual proportion of survival is proved by calibration plot in training (). Sensitivity and specificity of training performance are evaluated by receiver operating curve (AUC: 0.75; ).
Figure 2
Development and validation of the Cox clinical model with 7 independent prognostic factors for overall survival. (A) The derived nomogram for probabilistic ratiocination and the discrimination of risk groups. (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curve for evaluation of validation performance. (F) Calibration plot evaluating consistency between the predicted probability and actual proportion of survival in the validation set. (G) Kaplan-Meier estimation of the risk bisection in the training set. (H) Kaplan-Meier estimation of the risk bisection in the validation set. ROC, receiver operating characteristic.
Development and validation of the Cox clinical model with 7 independent prognostic factors for overall survival. (A) The derived nomogram for probabilistic ratiocination and the discrimination of risk groups. (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curve for evaluation of validation performance. (F) Calibration plot evaluating consistency between the predicted probability and actual proportion of survival in the validation set. (G) Kaplan-Meier estimation of the risk bisection in the training set. (H) Kaplan-Meier estimation of the risk bisection in the validation set. ROC, receiver operating characteristic.When internal validation by random sampling was carried out, the model’s performance remained significantly predictive (AUC: 0.74–0.76; ). In the external validation cohort (patients from Renji Hospital, n=184), both ROC analysis (AUC: 0.83; ) and the calibration plot revealed an excellent predictive performance of the model (). Kaplan-Meier estimation of high- and low-risk groups stratified according to the median risk revealed the HR to be 0.262 (95% CI, 0.216–0.317; P<0.001) in the training set (). In the validation set, the HR was 0.207 (95% CI, 0.135–0.318; P<0.001; ). The between-group OS differed by 37% and 46% at 1 year and 5 years, respectively.
DL clinical model
The DL clinical model was developed by adopting a DL neural network, composed of 1 input, 4 hidden, and 1 output layers, to 22 clinical factors listed in the Methods section (). The derivative performance was comparable to that of the Cox clinical model in terms of the calibration plot () and AUC (0.77; ).
Figure 3
Development and validation of the DL clinical model with 22 clinical variables. (A) The derived DL model consisted of 6 layers (1 input, 4 hidden, and 1 output). (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using the internal random sampling (n=100) 4 times. (E) ROC curve for evaluation of validation performance. (F) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in the validation set. (G) Kaplan-Meier estimation of the risk bisection in the training set. (H) Kaplan-Meier estimation of the risk bisection in the validation set. DL, deep learning; ROC, receiver operating characteristic.
Development and validation of the DL clinical model with 22 clinical variables. (A) The derived DL model consisted of 6 layers (1 input, 4 hidden, and 1 output). (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using the internal random sampling (n=100) 4 times. (E) ROC curve for evaluation of validation performance. (F) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in the validation set. (G) Kaplan-Meier estimation of the risk bisection in the training set. (H) Kaplan-Meier estimation of the risk bisection in the validation set. DL, deep learning; ROC, receiver operating characteristic.Internal validation by random sampling 4 times (n=100) revealed an AUC of 0.73–0.79 (). In the external validation set, an AUC of 0.80 () and the calibration plot () indicated an excellent performance. Evaluation of the cumulative events among probability-bisected risk groups demonstrated the model to have significant discriminatory power in both the training (HR, 0.247; 95% CI, 0.204–0.299; P<0.001; ) and validation (HR, 0.186; 95% CI, 0.121–0.287; P<0.001; ) sets. The differences in the probability of 1-year and 5-year OS were 41% and 50% between the high- and low-risk groups in the validation set; which was 4% larger compared to the Cox clinical model.
Cox genetic model
To develop a Cox-based genetic nomogram, RNA-seq-based 686 genes were evaluated using Cox univariate analysis. The inclusion criteria for the stratification of covariate genes were set as |log2 (fold change)|>0.6 and P<0.05. Of the 686 genes, the following 6 significantly prognostic genes met the inclusion criteria: NLRP5 [HR, 1.41; 95% CI, 1.24–1.59; P<0.001; log2 (fold change)=0.81], MAGEB6 [HR, 1.17; 95% CI, 1.07–1.28; P=0.001; log2 (fold change)=0.81], SGCZ [HR, 1.16; 95% CI, 1.06–1.26; P=0.001; log2 (fold change) =0.78], STARD6 [HR, 1.32; 95% CI, 1.18–1.47; P<0.001; log2 (fold change) =0.70], ZNF560 [HR, 1.09; 95% CI, 1.01–1.17; P=0.026; log2 (fold change) =0.65], and AKNAD1 [HR, 1.44; 95% CI, 1.23–1.68; P<0.001; log2 (fold change) =0.61]. The selected genes were enrolled in the development of the Cox genetic nomogram (). However, the derived model generally predicted a higher probability of survival compared to the actual proportion of survival (). The model also had acceptable sensitivity and specificity, with an AUC of 0.65 ().
Figure 4
Development and validation of the Cox genetic model with 6 significant prognostic genes for the overall survival stratified by the univariate analyses, log2 (fold change), and P value. (A) The derived nomogram for probabilistic ratiocination and discrimination of risk groups. (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. ROC, receiver operating characteristic.
Development and validation of the Cox genetic model with 6 significant prognostic genes for the overall survival stratified by the univariate analyses, log2 (fold change), and P value. (A) The derived nomogram for probabilistic ratiocination and discrimination of risk groups. (B) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. ROC, receiver operating characteristic.Internal validation showed that the AUC values of the model ranged between 0.59 and 0.69 (). In accordance with the derivation set, the predicted probability of survival was higher in both the validation 1 and validation 2 datasets, and the AUC was found to be 0.56 and 0.31, respectively (). Furthermore, Kaplan-Meier analysis of the 2 validation cohorts indicated that the model did not have a significantly effective performance ().
DL genetic model
After the failure of the Cox regression model and fold change to achieve statistical significance, we generated a DL genetic model based on all 686 genes with 7 layers, including 1 input, 5 hidden, and 1 output layer (). The use of numerous gene covariates resulted in a significantly improved derivation, as confirmed by ROC analysis (AUC: 0.95; ) and the calibration plot ().
Figure 5
Development and validation of the DL genetic model with 686 genes. (A) The derived DL model consisted of 7 layers (1 input, 5 hidden, and 1 output). (B) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (C) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. DL, deep learning; ROC, receiver operating characteristic.
Development and validation of the DL genetic model with 686 genes. (A) The derived DL model consisted of 7 layers (1 input, 5 hidden, and 1 output). (B) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (C) Calibration plot evaluating consistency between predicted probability and actual proportion of survival in training. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. DL, deep learning; ROC, receiver operating characteristic.Random sampling showed the DL genetic model to have great effectiveness (AUC: 0.95–0.99; ). In both external validation cohort 1 (AUC: 0.65) and cohort 2 (AUC: 0.61), the model’s performance was excellent compared to that of the Cox genetic model (). Discrimination of the training set was significant (HR, 0.037; 95% CI, 0.027–0.053; P<0.001; ). The DL genetic model could also significantly stratify patients into high- and low-risk groups in the 2 external validation cohorts.
Cox combined model
Considering recent reports that simultaneous evaluation of clinical and genetic factors may be promising for achieving precise prediction of survival, a combined clinical and genetic model was developed using the Cox model-stratified genes and significant clinical independent prognostic factors (). To identify independent clinical prognostic factors, univariate and multivariate analyses were carried out for 7 variables (selected based on overlapping variables between the TCGA-LIHC, ICGC-LIRI-JP, and GSE116174 datasets), including age, sex, HBV infection, alcohol consumption, smoking, and TNM stage in the TCGA-LIHC dataset (n=374). TNM stage (HR, 1.52; 95% CI, 1.19–1.94; P=0.001) was found to be an independent prognostic factor (Table S4). Therefore, the Cox combined model was generated with 6 pre-identified genes, including NLRP5, MAGEB6, SGCZ, STARD6, and ZNF560, and TNM stage (). Despite the addition of clinical factors, the predicted probability of survival remained higher than the actual proportion of survival (). In addition, the ROC analysis revealed an AUC of 0.67 ().
Figure 6
Development and validation of the Cox combined clinical and genetic model with 6 significant prognostic genes and one independent prognostic factor. (A) The derived nomogram for probabilistic ratiocination and discrimination of risk groups. (B) Calibration plot evaluating the consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. ROC, receiver operating characteristic.
Development and validation of the Cox combined clinical and genetic model with 6 significant prognostic genes and one independent prognostic factor. (A) The derived nomogram for probabilistic ratiocination and discrimination of risk groups. (B) Calibration plot evaluating the consistency between predicted probability and actual proportion of survival in training. (C) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. ROC, receiver operating characteristic.In the internal validation set, the AUC ranged from 0.63 to 0.70 (). However, the performance of the model in one of the validation cohorts was poor, as shown by calibration plots and ROC (AUC: 0.45; ). The model showed significant power to discriminate between risk groups in validation group 1 (HR, 0.421; 95% CI, 0.216–0.819; P=0.012); however, its performance in validation group 2 was not significant (HR, 1.682; 95% CI, 0.790–3.584; P=0.176; ).
DL combined model
The DL-based combined clinical and genetic model was developed using 7 overlapping clinical variables and 686 genes (). ROC analysis () and the calibration plot () showed the model to have excellent precision and discrimination.
Figure 7
Development and validation of the DL genetic model with 686 genes and 7 clinical variables. (A) The derived DL model consisted of 7 layers (1 input, 5 hidden, and 1 output). (B) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (C) Calibration plot evaluating the consistency between predicted probability and actual proportion of survival in training. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. DL, deep learning; ROC, receiver operating characteristic.
Development and validation of the DL genetic model with 686 genes and 7 clinical variables. (A) The derived DL model consisted of 7 layers (1 input, 5 hidden, and 1 output). (B) ROC curve for the evaluation of training performance in terms of sensitivity and specificity. (C) Calibration plot evaluating the consistency between predicted probability and actual proportion of survival in training. (D) Internal validation through ROC curve analysis using internal random sampling (n=100) 4 times. (E) ROC curves and calibration plots for the evaluation of validation performance in the two external clinical cohorts. (F) Kaplan-Meier estimation of the risk bisection in the training and two validation groups. DL, deep learning; ROC, receiver operating characteristic.In the internal validation set, the AUCs ranged from 0.89 to 0.97 (). Unexpectedly, in the external validation set, the model’s performance was shown to be effective (AUC: 0.68 and 0.64; ). When the survival curves were drawn and evaluated using the log-rank test, the patients could be significantly stratified into high- and low-risk groups in both validation group 1 (HR, 0.338; 95% CI, 0.174–0.658; P=0.002) and validation group 2 (HR, 0.437; 95% CI, 0.204–0.937; P=0.031; ).
Between-model comparison
For between-model comparison, the IDI and NRI were evaluated for each model and compared between the DL and Cox approaches (). The DL approach comprehensively improved model performance compared to the Cox approach, except in validation group 2, which could be due to limited sample size. However, DL still improved risk reclassification in validation group 2 by 61%. The IDI for DL vs. Cox was 0.35 to 0.41 in the derivation set. In the validation set, the most significant improvement in both IDI and NRI was found for the clinical factor-based models. Improvements in discrimination and risk reclassification were increased for the combined models compared to the genetic models. Collectively, the DL approach had better IDI and NRI than the Cox approach for both model training and performance.
Table 1
Model performance in terms of discrimination and reclassification for predictive models in patients with HCC after resection
Model
Performance
Discrimination
Risk reclassification
Change in χ2
P value
IDI (95% CI)
Event
Non-event
NRI (95% CI)
Risk ↑
Risk ↓
Risk ↑
Risk ↓
Clinical (DL vs. Cox)
Derivation (EHBH)
557.4
<0.001
0.35 (0.35–0.36)
0.73
0.27
0.35
0.65
73.5 (72.8–74.2)
Validation (Renji)
86.3
<0.001
0.49 (0.47–0.51)
0.76
0.24
0.26
0.74
97.6 (93.1–102.2)
Genetic (DL vs. Cox)
Derivation (TCGA-LIHC)
9.6
0.002
0.41 (0.39–0.42)
0.95
0.05
0.26
0.74
89.3 (86.7–91.9)
Validation 1 (ICGC-LIRI-JP)
8.7
0.003
0.10 (0.08–0.12)
0.66
0.34
0.46
0.54
30.2 (20.3–40.1)
Validation 2 (GSE116174)
1.6
0.211
0.06 (0.02–0.11)
0.67
0.33
0.38
0.62
12.8 (3.7–21.8)
Combined (DL vs. Cox)
Derivation (TCGA-LIHC)
33.3
<0.001
0.40 (0.39–0.42)
0.92
0.08
0.29
0.71
90.3 (87.4–93.3)
Validation 1 (ICGC-LIRI-JP)
8.7
0.003
0.12 (0.10–0.13)
0.69
0.31
0.46
0.54
41.9 (38.5–45.2)
Validation 2 (GSE116174)
0.1
0.803
0.05 (0.00–0.10)
0.63
0.37
0.41
0.59
9.32 (−0.5 to 19.1)
IDI, integrated discrimination improvement; NRI, net reclassification index; DL, deep learning.
IDI, integrated discrimination improvement; NRI, net reclassification index; DL, deep learning.
Discussion
Clinical, genetic, and combined clinical and genetic models were developed using Cox regression and DL. Model validation demonstrated significant differences in predictive performance depending on the selection of covariates and methodology. The Cox model, which consisted of TB, PT, tumor size and number, lymph node metastasis, vascular invasion, and TNM stage, and the DL clinical model, which consisted of 22 clinical factors, effectively achieved precise survival prediction in patients with HCC after resection.In recent years, a number of gene signatures have been developed and reported to be predictive of prognosis in various cancers, suggesting their potential application value in clinical practice (12-15). In contrast to previous literature, the adoption of Cox regression and expression fold change in stratified significant prognostic genes had no significant impact on survival prediction in patients with HCC after resection. Instead, the enrollment of 686 genes was highly effective in the training of the DL model, which was also validated to be significantly predictive in two different cohorts. From this point of view, previous models for which an excellent performance has been confirmed in one validation dataset may require further validation before general application. Furthermore, the accuracy of the DL genetic model in survival prediction increased when it was trained with additional clinical factors, suggesting that simultaneous evaluation of clinical and genetic factors may be promising for the precise prediction of survival. Therefore, comprehensive enrollment of clinical and genetic covariates using the DL approach may be promising for the implementation of precise survival prediction.Generalization of predictive models to real-world practice is challenging due to the diverse factors that are not incorporated into the prediction models, such as proficiency of the surgeon, general medical level, and lifestyle and socio-environmental factors. These factors may contribute to the disparity in the identification of prognostic factors. Indeed, independent prognostic factors vary significantly in identical disease and treatment settings at different hospitals. For example, numerous studies have reported that tumor size, which is commonly involved in staging systems for HCC, is not an independent prognostic factor for HCC after resection (16,17). Therefore, considering disparities in prognostic factors influenced by external factors, the performance of a model is likely to be most effective in the center from which the model was derived. In the present study, the Cox clinical and DL clinical models were developed and validated in patients from the same region, while the genetic models were developed and validated in different cohorts from different regions. The generalizability of the clinical factor-derived models has not been evaluated. Future studies are needed to confirm the applicability of the Cox clinical and DL clinical models in order to compare their generalizability.Prediction models can provide guidance in many ways, including for the identification of patients who require preventative interventions, early detection of disease, treatment effectiveness, stratification of patients at risk of recurrence or death, and the estimation of risk probabilities (18-22). The derived models are capable of time-dependent risk probability estimation for the prediction of survival and resection effectiveness in patients with HCC after resection. In this way, individuals who are at high risk of short-term or long-term mortality can be identified, and more intensive follow-up, preventative treatment, and more advanced examination at intervals can be considered.This study has some underlying limitations that should be addressed. The training and validation datasets for the clinical and genetic models were different; thus, comparison of covariate selection among clinical factors and gene expression requires further confirmation. Future prospective studies are needed to evaluate the predictive effectiveness of gene expression and clinical factors in the same study cohort. Also, the cost-effectiveness of RNA-seq for the provision of gene expression data is necessary for clinical practice, but it was not evaluated in this study. The web-based tool for the DL model was not developed due to insufficient precision and prediction, which limits external access. However, despite these limitations, this study is the first to evaluate DL approaches and compare them with conventional methodologies (Cox regression), along with examining the clinical and genetic factors.
Conclusions
In conclusion, in recent years, with the continuous development of genome sequencing, genetic markers have been proven to be effective in predicting the prognosis of a variety of tumors. In clinical practice, the COX model is very mature and accurate in identifying clinical variables that are predicative of prognosis. However, the Cox model is suboptimal for identifying genetic variables for predicting prognosis. By contrast, the DL approach seems to be promising in achieving general application of the prediction model. In addition, the performance of the DL genetic model for survival prediction was enhanced when additionally trained with clinical factors, highlighting the notion that precise survival prediction may be achieved with simultaneous evaluation of clinical and genetic factors. Thus, a comprehensive approach that enrolls both clinical and genetic covariates using the DL technique may be promising in implementing precision survival prediction. For sure, given the cost of obtaining genetic variables, it is of great significance to choose a reasonable prediction model.The article’s supplementary files as
Authors: Dong Do You; Dong Goo Kim; Chang Ho Seo; Ho Joong Choi; Young Kyung Yoo; Yong Gyu Park Journal: Ann Surg Treat Res Date: 2017-10-27 Impact factor: 1.859