| Literature DB >> 35655073 |
Mei Chen1,2, Zeming Wang2, Shengpeng Jiang2,3, Jian Sun3,4, Li Wang5, Narayan Sahoo2, G Brandon Gunn4, Steven J Frank4, Cheng Xu1, Jiayi Chen1, Quynh-Nhu Nguyen4, Joe Y Chang4, Zhongxing Liao4, X Ronald Zhu2, Xiaodong Zhang6.
Abstract
This study aimed to compare the predictive performance of different modeling methods in developing normal tissue complication probability (NTCP) models for predicting radiation-induced esophagitis (RE) in non-small cell lung cancer (NSCLC) patients receiving proton radiotherapy. The dataset was composed of 328 NSCLC patients receiving passive-scattering proton therapy and 41.6% of the patients experienced ≥ grade 2 RE. Five modeling methods were used to build NTCP models: standard Lyman-Kutcher-Burman (sLKB), generalized LKB (gLKB), multivariable logistic regression using two variable selection procedures-stepwise forward selection (Stepwise-MLR), and least absolute shrinkage and selection operator (LASSO-MLR), and support vector machines (SVM). Predictive performance was internally validated by a bootstrap approach for each modeling method. The overall performance, discriminative ability, and calibration were assessed using the Negelkerke R2, area under the receiver operator curve (AUC), and Hosmer-Lemeshow test, respectively. The LASSO-MLR model showed the best discriminative ability with an AUC value of 0.799 (95% confidence interval (CI): 0.763-0.854), and the best overall performance with a Negelkerke R2 value of 0.332 (95% CI: 0.266-0.486). Both of the optimism-corrected Negelkerke R2 values of the SVM and sLKB models were 0.301. The optimism-corrected AUC of the gLKB model (0.796) was higher than that of the SVM model (0.784). The sLKB model had the smallest optimism in the model variation and discriminative ability. In the context of classification and probability estimation for predicting the NTCP for radiation-induced esophagitis, the MLR model developed with LASSO provided the best predictive results. The simplest LKB modeling had similar or even better predictive performance than the most complex SVM modeling, and it was least likely to overfit the training data. The advanced machine learning approach might have limited applicability in clinical settings with a relatively small amount of data.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35655073 PMCID: PMC9163134 DOI: 10.1038/s41598-022-12898-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Candidate clinical variables and EUD values of the data set.
| Index | Variables | Range/classification | Mean/frequency |
|---|---|---|---|
| 1 | Sex | 0, 1I | 185, 143 |
| 2 | Age, years | 33–95 | 68.6 |
| 3 | Stage | 1, 2, 3, 4 | 59, 47, 208, 14 |
| 4 | CCRT | 0, 1I | 130, 198 |
| 5 | EUD, Gy(RBE) | 0–68.02 | 37.63 |
| 6 | Dmax, Gy(RBE) | 0–86.00 | 62.91 |
| 7 | Dmean, Gy(RBE) | 0–56.70 | 18.53 |
| 8 | V10, % | 0–93.54 | 37.44 |
| 9 | V15, % | 0–91.70 | 34.93 |
| 10 | V20, % | 0–90.23 | 32.48 |
| 11 | V25, % | 0–88.98 | 30.35 |
| 12 | V30, % | 0–87.89 | 28.28 |
| 13 | V35, % | 0–86.93 | 26.14 |
| 14 | V40, % | 0–86.06 | 23.93 |
| 15 | V45, % | 0–85.03 | 21.95 |
| 16 | V50, % | 0–83.15 | 19.82 |
| 17 | V55, % | 0–80.42 | 16.78 |
| 18 | V60, % | 0–73.00 | 14.01 |
| 19 | V65, % | 0–61.99 | 10.32 |
| 20 | V70, % | 0–59.12 | 7.12 |
| 21 | V75, % | 0–47.33 | 3.08 |
I0 = male, 1 = female.
II0 = no, 1 = yes.
CCRT concurrent chemotherapy, EUD equivalent uniform dose, RBE relative biological effectiveness, D maximum dose, D mean dose, V percentage volume receiving dose higher than x Gy (RBE).
Figure 1Selection frequencies of the candidate variables in 1000 bootstrapping samples by forward stepwise selection and LASSO for the MLR model (a), and by forward stepwise selection for the SVM model (b).
Feature selection results and parameter values for sLKB, gLKB, Stepwise-MLR, LASSO-MLR, and SVM models.
| Models | Parameters/features | Coefficients/formula |
|---|---|---|
| sLKB | n, m, TD50 | n = 0.24, m = 0.51, TD50 = 44.83 Gy (RBE) |
| gLKB | n, m, TD50y, TD50n | n = 0.23, m = 0.54, TD50y = 42.17 Gy (RBE) TD50n = 57.84 Gy (RBE) |
| Stepwise-MLR | CCRT, EUD | |
| LASSO-MLR | CCRT, EUD, V75 | |
| SVM | CCRT, EUD | C = 215, δ = 2–13 |
sLKB standard Lyman–Kutcher–Burman, gLKB generalized Lyman–Kutcher–Burman, Stepwise-MLR multivariable logistic regression using stepwise feature selection, LASSO-MLR multivariable logistic regression using least absolute shrinkage and selection operator for feature selection, SVM support vector machine, CCRT concomitant chemotherapy, EUD equivalent uniform dose, V75 percentage of volume receiving dose higher than 75 Gy(RBE).
Apparent, bootstrap performance and optimism of sLKB, gLKB, Stepwise-MLR, LASSO-MLR, and SVM models.
| Performance | sLKB | gLKB | Stepwise-MLR | LASSO-MLR | SVM |
|---|---|---|---|---|---|
| Negelkerke R2 | 0.315 (0.301)* | 0.342 (0.323)* | 0.344 (0.329)* | 0.354 (0.332)* | 0.340 (0.301)* |
| AUC | 0.785 (0.783)* | 0.799 (0.796)* | 0.800 (0.797)* | 0.803 (0.799)* | 0.799 (0.784)* |
| HL test | χ2 = 12.01 ( | χ2 = 5.08 ( | χ2 = 5.18 ( | χ2 = 3.84 ( | χ2 = 5.60 ( |
| LL | − 178.55 | − 174.46 | − 174.20 | − 172.48 | − 174.75 |
| Negelkerke R2 | 0.318 (0.210–0.427) | 0.349 (0.244–0.454) | 0.349 (0.246–0.452) | 0.363 (0.260–0.465) | 0.334 (0.207–0.460) |
| AUC | 0.787 (0.737–0.836) | 0.802 (0.753–0.850) | 0.802 (0.753–0.850) | 0.805 (0.757–0.853) | 0.807 (0.756–0.858) |
| Negelkerke R2 | 0.014 (− 0.096–0.124) | 0.020 (− 0.091–0.130) | 0.015 (− 0.093–0.123) | 0.022 (− 0.088–0.132) | 0.039 (− 0.091–0.168) |
| AUC | 0.002 (− 0.048–0.052) | 0.004 (− 0.045–0.052) | 0.003 (− 0.046–0.051) | 0.004 (− 0.044–0.051) | 0.015 (− 0.037–0.068) |
*Apparent performance (optimism-corrected).
sLKB standard Lyman–Kutcher–Burman, gLKB generalized Lyman–Kutcher–Burman, Stepwise-MLR multivariable logistic regression using stepwise feature selection, LASSO-MLR multivariable logistic regression using least absolute shrinkage and selection operator for feature selection, SVM support vector machine, AUC area under the receiver operator curve, HL Hosmer–Lemeshow, LL log likelihood, AIC Akaike information criterion.
Figure 2Receiver operator curves (a) and the calibration plot of actual outcome vs. predicted probability for the sLKB (b), gLKB (c), Stepwise-MLR (d), LASSO-MLR (e), and SVM (f) models. Results of the Hosmer–Lemeshow test are displayed in the lower right part of each plot. The open circles are the observed frequency in each group by the deciles of mean predicted probabilities. The dashed line represents the ideal prediction and the solid line is the loess fit for the output of the model.