| Literature DB >> 36230593 |
Jiaxi Lin1,2, Minyue Yin1,2, Lu Liu1,2, Jingwen Gao1,2, Chenyan Yu1,2, Xiaolin Liu1,2, Chunfang Xu1,2, Jinzhou Zhu1,2.
Abstract
Accurate prediction for the prognosis of patients with pancreatic cancer (PC) is a emerge task nowadays. We aimed to develop survival models for postoperative PC patients, based on a novel algorithm, random survival forest (RSF), traditional Cox regression and neural networks (Deepsurv), using the Surveillance, Epidemiology, and End Results Program (SEER) database. A total of 3988 patients were included in this study. Eight clinicopathological features were selected using least absolute shrinkage and selection operator (LASSO) regression analysis and were utilized to develop the RSF model. The model was evaluated based on three dimensions: discrimination, calibration, and clinical benefit. It found that the RSF model predicted the cancer-specific survival (CSS) of the postoperative PC patients with a c-index of 0.723, which was higher than the models built by Cox regression (0.670) and Deepsurv (0.700). The Brier scores at 1, 3, and 5 years (0.188, 0.177, and 0.131) of the RSF model demonstrated the model's favorable calibration and the decision curve analysis illustrated the model's value of clinical implement. Moreover, the roles of the key variables were visualized in the Shapley Additive Explanations plotting. Lastly, the prediction model demonstrates value in risk stratification and individual prognosis. In this study, a high-performance prediction model for PC postoperative prognosis was developed, based on RSF The model presented significant strengths in the risk stratification and individual prognosis prediction.Entities:
Keywords: machine learning; pancreatic cancer; random survival forest; surgery; the Surveillance, Epidemiology, and End Results Program (SEER); visualization
Year: 2022 PMID: 36230593 PMCID: PMC9563591 DOI: 10.3390/cancers14194667
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1The flowchart of developing models.
The information for postoperative pancreatic cancer patients in the training set and the test set.
| Characteristic | Training Set | Test Set | |
|---|---|---|---|
| Age | 65 (58, 73) | 65 (57, 72) | 0.5 |
| Race | 0.9 | ||
| White | 2215 (79%) | 629 (79%) | |
| Black | 293 (10%) | 82 (10%) | |
| Asian or Pacific Islander | 292 (10%) | 84 (11%) | |
| other | 4(0.1%) | 0 (0%) | |
| Sex | 0.5 | ||
| Male | 1432 (51%) | 396 (50%) | |
| Female | 1372 (49%) | 399 (50%) | |
| Marital status | 0.5 | ||
| Married | 2067 (74%) | 577 (73%) | |
| Single | 737 (26%) | 218 (27%) | |
| Radiation | 0.4 | ||
| Yes | 998 (36%) | 271 (34%) | |
| No | 1806 (64%) | 524 (64%) | |
| Chemotherapy | 0.2 | ||
| Yes | 1862 (66%) | 508 (64%) | |
| No | 942 (34%) | 287 (36%) | |
| Histological type | 0.5 | ||
| Epithelial neoplasms | 56 (2.0%) | 9 (1.1%) | |
| Adenomas and adenocarcinomas | 1435 (51%) | 413 (52%) | |
| Cystic, mucinous, and serous | 130 (4.6%) | 32 (4.0%) | |
| Ductal and lobular neoplasms | 1144 (41%) | 332 (42%) | |
| Complex epithelial neoplasms | 39 (1.4%) | 9 (1.1%) | |
| Surgery | 0.1 | ||
| Local excision | 5 (0.2%) | 5 (0.6%) | |
| Partial pancreatectomy | 464 (17%) | 118 (15%) | |
| Local or partial pancreatectomy and duodenectomy | 1882 (67%) | 537 (68%) | |
| Total pancreatectomy | 79 (2.8%) | 32 (4.0%) | |
| Total pancreatectomy and subtotal gastrectomy or duodenectomy | 221 (7.9%) | 68 (8.6%) | |
| Extended pancreatoduodenectomy | 132 (4.7%) | 33 (4.2%) | |
| Pancreatectomy | 21 (0.7%) | 2 (0.3%) | |
| AJCC stage | >0.9 | ||
| I | 357 (13%) | 103 (13%) | |
| II | 2185 (78%) | 621 (78%) | |
| III | 102 (3.6%) | 26 (3.3%) | |
| IV | 160 (5.7%) | 45 (5.7%) | |
| T stage | 0.7 | ||
| T1 | 204 (7.3%) | 55 (6.9%) | |
| T2 | 449 (16%) | 115 (14%) | |
| T3 | 2039 (73%) | 594 (75%) | |
| T4 | 112 (4.0%) | 31 (3.9%) | |
| N stage | 0.1 | ||
| N0 | 942 (34%) | 294 (37%) | |
| N1 | 1862 (66%) | 501 (63%) | |
| M stage | >0.9 | ||
| M0 | 2644 (94.3%) | 750 (94.3%) | |
| M1 | 160 (5.7%) | 45 (5.7%) | |
| Site | 0.4 | ||
| Pancreas Head | 1969 (70%) | 572 (72%) | |
| Pancreas Body Tail | 566 (20%) | 158 (20%) | |
| Other | 269 (9.6%) | 65 (8.2%) | |
| Clinical grade | 0.2 | ||
| I | 449 (16%) | 108 (14%) | |
| II | 1315 (47%) | 394 (50%) | |
| III | 985 (35%) | 283 (36%) | |
| IV | 55 (2.0%) | 10 (1.3%) | |
| Tumor size (mm) | 32 (25, 45) | 32 (25, 42) | 0.5 |
| Examined lymph nodes | 14 (9, 21) | 14 (9, 21) | 0.6 |
| Positive lymph nodes | 1 (0, 3) | 1 (0, 4) | 0.2 |
| Positive lymph nodes rate (%) | 0.10 (0.00, 0.25) | 0.08 (0.00, 0.25) | 0.2 |
Notes: AJCC: American Joint Committee on Cancer
Figure 2The results of LASSO regression analysis for RSF model. (A) LASSO coefficient profiles of the expression of 21 variables. (B) Selection of the λ in the LASSO regression analysis via 10-fold cross-validation. The dotted vertical lines are plotted at the optimal values following the minimum criteria (right) and “one standard error” criteria (left).
The models’ performance in the test set.
| Model | AUC | Brier Score | C-Index | ||||
|---|---|---|---|---|---|---|---|
| 1-Year | 3-Year | 5-Year | 1-Year | 3-Year | 5-Year | ||
| RSF model | 0.753 | 0.744 | 0.759 | 0.188 | 0.177 | 0.131 | 0.723 |
| Cox model | 0.736 | 0.737 | 0.76 | 0.193 | 0.181 | 0.132 | 0.670 |
| Deepsurv model | 0.744 | 0.742 | 0.749 | 0.202 | 0.175 | 0.122 | 0.700 |
Figure 3The decision analysis curves of RSF model. (A) The one-year decision analysis curve of RSF model. (B) The three-year decision analysis curve of RSF model. (C) The five-year decision analysis curve of RSF model. In the decision analysis curve, the x-axis represented the threshold probability while the y-axis represented the clinical net benefits. The blue line in the DCA plot reflects the strategy of “assume all patients have received the assessment of the RSF model”, while the horizontal black line demonstrates the strategy of “assume no patient has received the assessment of the RSF model”.
Figure 4The SHAP plot of the RSF model. In the SHAP plot, the length of the horizontal axis where each variable is located represents the variable’s contribution to the outcome. The color of the dot symbolized the numerical value of the variable. For example, the variable (positive lymph node rate) is the most significant risk factor. The higher the rate is, the higher the probability of poor prognosis is.
Figure 5The RSF risk stratification patients. (A). The RSF risk stratification of patients in the training set. (B). The RSF risk stratification of patients in the test set.
The medium survival time (months) of different risk stratifications in the training set.
| RSF Risk Stratification | Number | Events | Median | 0.95 LCL | 0.95 UCL |
|---|---|---|---|---|---|
| Low-risk | 1332 | 644 | 73 | 63 | 85 |
| Medium-risk | 766 | 719 | 19 | 18 | 20 |
| High-risk | 706 | 686 | 10 | 9 | 11 |
Note: LCL: Low confidence interval, UCL: Up confidence interval.
The medium survival time (months) of different risk stratifications in the test set.
| RSF Risk Stratification | Number | Events | Median | 0.95 LCL | 0.95 UCL |
|---|---|---|---|---|---|
| Low-risk | 218 | 41 | 41 | 35 | 53 |
| Medium-risk | 156 | 20 | 20 | 17 | 21 |
| High-risk | 212 | 14 | 14 | 12 | 18 |
Note: LCL: Low confidence interval, UCL: Up confidence interval.
The area under the curve of RSF risk stratification vs. AJCC Stage.
| 1-Year | 3-Year | 5-Year | |
|---|---|---|---|
| RSF risk stratification | 0.667 | 0.693 | 0.688 |
| AJCC stage | 0.568 | 0.603 | 0.622 |
| <0.001 | <0.001 | 0.012 |
Figure 6The individual postoperative prognostic prediction. (A). The estimated survival function of patients. The green line symbolizes the patients A while the yellow line represents the patient B. The blue line on behalf of the patient C. (B). The local SHAP plot of the patient #1. (C). The local SHAP plot of the patient #2. (D). The local SHAP plot of the patient #3. The red ribbon in the local SHAP plot represented the risk factors, which promoted the poor prognosis, whereas the blue ribbon was the relatively protective factors.