| Literature DB >> 35794703 |
Hyunsuk Kim1, Taesung Park2, Jinyoung Jang3, Seungyeoun Lee4.
Abstract
A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.Entities:
Keywords: Cox model; random survival forests; support vector machines; survival prediction model
Year: 2022 PMID: 35794703 PMCID: PMC9299568 DOI: 10.5808/gi.22036
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1.Flowchart of model development and external validation process for the Cox, random survival forests, and support vector machines models. SEER, Surveillance, Epidemiology and End Results; Cox PH, Cox proportional hazard; KOTUS-BP, Korea Tumor Registry System-Biliary Pancreas; CV, cross-validation.
Basic statistics and 5-year overall survival rates for seven variables in the SEER and KOTUS-BP databases
| Variable | SEER database (n=9,624) | KOTUS database (n=3,281) | ||||
|---|---|---|---|---|---|---|
| Patients | 5-Year OS (%) | p-value[ | Patients | 5-Year OS (%) | p-value[ | |
| Age (yr) | 65.6±10.4 | 20.1 | 63.8±10.1 | 32.2 | ||
| Female | 4,755 (49.4) | 21.3 | 1,381 (42.1) | 36.2 | ||
| Male | 4,869 (50.6) | 18.9 | 0.006 | 1,900 (57.9) | 29.2 | 0.146 |
| Head | 8,079 (83.9) | 19.2 | 2,046 (62.4) | 28.5 | ||
| Body/Tail | 1,545 (16.1) | 25.0 | 0.002 | 1,235 (37.6) | 37.8 | <0.001 |
| No adjuvant treatment | 2,948 (30.6) | 17.3 | 2,006 (61.1) | 29.5 | ||
| Adjuvant treatment | 6,676 (69.4) | 21.3 | <0.001 | 1,275 (38.9) | 36.1 | <0.001 |
| Well differentiated | 1,013 (10.5) | 37.4 | 376 (11.5) | 44.9 | ||
| Moderately differentiated | 5,055 (52.5) | 20.5 | <0.001 | 2,362 (72.0) | 32.9 | <0.001 |
| Poorly differentiated | 3,556 (37.0) | 14.6 | <0.001 | 543 (16.5) | 20.8 | <0.001 |
| T1 | 1,603 (16.7) | 32.7 | 672 (20.5) | 45.3 | ||
| T2 | 5,830 (60.6) | 18.8 | <0.001 | 2,007 (61.2) | 29.7 | <0.001 |
| T3 | 2,191 (22.7) | 14.3 | <0.001 | 602 (18.3) | 24.5 | <0.001 |
| N0 | 3,155 (32.8) | 32.4 | 1,313 (40.0) | 42.6 | ||
| N1 | 4,030 (41.9) | 20.5 | <0.001 | 1,347 (41.1) | 28.5 | <0.001 |
| N2 | 2,439 (25.3) | 14.6 | <0.001 | 621 (18.9) | 16.4 | <0.001 |
Values are presented as mean±SD or number (%).
SEER, Surveillance, Epidemiology and End Results; KOTUS-BP, Korea Tumor Registry System-Biliary Pancreas; OS, overall survival.
Log-rank test.
Fig. 2.Kaplan-Meier survival curves with 5-year overall survival (OS) rates and median survival times for the Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP) datasets.
Fig. 3.Hazard ratios and 95% confidence intervals of seven variables in the Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP) datasets. AJCC, American Joint Committee on Cancer.
Hyperparameters for random survival forests
| Hyperparameter | Value |
|---|---|
| No. of trees | 50, 100, 200, 500, 1,000 |
| Max. variables used in split | 1‒10 |
| Splitting rule | log-rank/bs.gradient/logrankscore |
One-hot encoded variables: differentiation, AJCC 8th edition T and N staging.
AJCC, American Joint Committee on Cancer.
Hyperparameters for support vector machines for survival analysis
| Hyperparameter | Value |
|---|---|
| SVM type | SVRC, RankSVMs |
| Kernel | Linear, clinical |
| Distance matrix | Makediff1, makediff3 |
| Regularization constant | 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10 |
SVM, support vector machines; SVRC, support vector regression for censored data.
C-index and 1-year, 2-year, 3-year time-dependent AUCs for the Cox, RSF, and SVM models according to three schemes
| Model | Training | Test | ||||||
|---|---|---|---|---|---|---|---|---|
| C-index | Td1 AUC | Td2 AUC | Td3 AUC | C-index | Td1 AUC | Td2 AUC | Td3 AUC | |
| Training (SEER) | Test (KOTUS) | |||||||
| Cox | 0.65417 | 0.72545 | 0.68776 | 0.68765 | 0.62792 | 0.65489 | 0.66759 | 0.68153 |
| RSF | 0.66520 | 0.72960 | 0.70807 | 0.71722 | 0.63344 | 0.66660 | 0.67675 | 0.69104 |
| SVM | 0.64218 | 0.72258 | 0.65812 | 0.64074 | 0.59956 | 0.61514 | 0.62619 | 0.63458 |
| Training (KOTUS) | Test (SEER) | |||||||
| Cox | 0.65074 | 0.69346 | 0.69524 | 0.70095 | 0.62932 | 0.68365 | 0.67008 | 0.67426 |
| RSF | 0.66293 | 0.70624 | 0.71295 | 0.71676 | 0.62189 | 0.67445 | 0.65885 | 0.66058 |
| SVM | 0.62668 | 0.66973 | 0.66769 | 0.66072 | 0.60061 | 0.64794 | 0.63057 | 0.62372 |
| Training (SEER + KOTUS) | Test (SEER + KOTUS) | |||||||
| Cox | 0.64890 | 0.70718 | 0.69108 | 0.69327 | 0.64361 | 0.69764 | 0.67953 | 0.68726 |
| RSF | 0.66396 | 0.71328 | 0.72110 | 0.73110 | 0.63363 | 0.68239 | 0.66810 | 0.67806 |
| SVM | 0.62538 | 0.69700 | 0.64029 | 0.61994 | 0.62333 | 0.68489 | 0.63515 | 0.62643 |
AUC, area under receiver operating characteristic curve; RSF, random survival forests; SVM, support vector machines; C-index, Harrell’s concordance index; Td1, 1-year time-dependent; Td2, 2-year time-dependent; Td3, 3-year time-dependent; KoTUS, Korea Tumor Registry System-Biliary Pancreas; SEER, Surveillance, Epidemiology and End Results.
Fig. 4.Overlaid predicted survival curves of the Cox model and random survival forests (RSF) method for three patients. Cox PH, Cox proportional hazard.