| Literature DB >> 35248997 |
Sumeet Hindocha1, Thomas G Charlton2, Kristofer Linton-Reid3, Benjamin Hunter4, Charleen Chan5, Merina Ahmed6, Emily J Robinson7, Matthew Orton8, Shahreen Ahmad2, Fiona McDonald9, Imogen Locke6, Danielle Power10, Matthew Blackledge11, Richard W Lee12, Eric O Aboagye13.
Abstract
BACKGROUND: Surveillance is universally recommended for non-small cell lung cancer (NSCLC) patients treated with curative-intent radiotherapy. High-quality evidence to inform optimal surveillance strategies is lacking. Machine learning demonstrates promise in accurate outcome prediction for a variety of health conditions. The purpose of this study was to utilise readily available patient, tumour, and treatment data to develop, validate and externally test machine learning models for predicting recurrence, recurrence-free survival (RFS) and overall survival (OS) at 2 years from treatment.Entities:
Keywords: Early detection; Machine learning; Non-small cell lung cancer; Overall survival; Prediction; Radiotherapy; Recurrence
Mesh:
Year: 2022 PMID: 35248997 PMCID: PMC8897583 DOI: 10.1016/j.ebiom.2022.103911
Source DB: PubMed Journal: EBioMedicine ISSN: 2352-3964 Impact factor: 8.143
Demographic and clinical parameters for combined training-validation and external test sets (prior to imputation). Features not used for modelling are not shown. Categorical data are summarised with means and percentages and p-values pertain to Fishers exact test. Continuous data are summarised with median and inter-quartile range (IQR) and p-values pertain to Wilcoxon rank sum test.
| Parameter | Combined Training & Validation Sets | External Test Set | |
|---|---|---|---|
| Age (IQR) | 74 (14) | 72 (14) | ·054 |
| Sex (%) Male Female | ·907 | ||
| WHO Performance Status (%) 0 1 2 Missing | ·023 | ||
| Body Mass Index (IQR) Missing, n (%) | 25·1 (6.5) | 26·22 (7·1) | ·044 |
| Smoking Status (%) Never Ever Missing | ·165 | ||
| TNM8 T stage (%) 1 2 3 4 | ·161 | ||
| TNM8 N stage (%) 0 1 2 3 | ·041 | ||
| FEV1, percent predicted (IQR) Missing, n (%) | 76 (33.2) | 68·5 (34·5) | ·004 |
| TLCO, percent predicted (IQR) Missing, n (%) | 60 (25) | 57 (25·8) | ·092 |
| Days from planning scan to first fraction (IQR) | 18 (7) | 18 (6·0) | ·856 |
| Size of primary (IQR) Missing, n (%) | 33 (28) | 30 (28·5) | ·050 |
| SUV primary (IQR) Missing, n (%) | 10·35 (8·9) | 9·3 (9·8) | ·829 |
| Max nodal SUV (IQR) Missing, n (%) | 7·35 (6·8) | 5·7 (4·7) | <·001 |
| Nodal avidity (%) Yes No Missing | ·024 | ||
| Nodal Sampling (%) Yes No Missing | ·380 | ||
| Histology (%) Adenocarcinoma Squamous Other No pathology | <·001 | ||
| Treatment type (%) SBRT Conventional RT Chemo + RT | ·045 | ||
| Number of fractions (IQR) | 20 (27) | 20 (27) | ·402 |
| Total Dose, Gy (IQR) | 55 (9) | 55 (9) | ·137 |
| Biologically Effective Dose, Gy (IQR) | 76·8 (45·4) | 76·8 (38·7) | ·023 |
| Planning Target Volume, cm3 (IQR) | 218·39 (343·2) | 126·62 (314·3) | ·097 |
| Recurrence at 2 years (%) | 214 (43·0) | 54 (34·0) | ·051 |
| Recurrence or death at 2 years (%) | 267 (53·6) | 74 (46·5) | ·122 |
| Death at 2 years (%) | 185 (37·2) | 54 (34·0) | ·508 |
| Median length of follow-up (range) | 836 (0–2462) | 868 (0–1442) |
Figure 1Heatmaps illustrating the performance of each machine learning algorithm (rows) with each feature reduction method (columns), measured by validation set AUC. No FR: No feature reduction (full feature set used), LASSO: Least Absolute Shrinkage and Selection Operator, E Net: Elastic-Net, RFE: Recursive Feature Elimination, Univariate LR: Univariate Logistic Regression, XGB: Extreme Gradient Boosting machine, NB: Naïve-Bayes, PSL: Partial Least Squares, L-SVM: Linear Support Vector Machine, NL-SVM: Non-linear (radial) SVM, RF: Random Forest, MDA: Mixture Discriminant Analysis, KNN: K-Nearest Neighbours, GLM: Generalised Linear Model, NNET: Neural Network.
Figure 2ROC curves for the validation and external test set for each prediction.
AUC with 95% confidence intervals for the validation and external test set for each prediction model, benchmarked against models based on TNM-stage and performance status.
| Outcome | Validation Set | External Test Set | |||
|---|---|---|---|---|---|
| AUC | 95% CI | AUC | 95% CI | ||
| RFS | 0·682 | 0·575–0·788 | 0·681 | 0·597–0·766 | |
| 0·650 | 0·541–0·760 | 0·695 | 0·616–0·774 | ||
| 0·464 | 0·363–0·565 | 0·499 | 0·418–0·58 | ||
| Recurrence | 0·687 | 0·582–0·793 | 0·722 | 0·635–0·810 | |
| 0·670 | 0·563–0·777 | 0·707 | 0·622–0·791 | ||
| 0·506 | 0·402–0·609 | 0·584 | 0·503–0·665 | ||
| OS | 0·759 | 0·663–0·855 | 0·717 | 0·634–0·800 | |
| 0·649 | 0·541–0·756 | 0·665 | 0·579–0·751 | ||
| 0·459 | 0·357–0·561 | 0·531 | 0·447–0·615 | ||
Figure 3Kaplan Meier survival curves for low and high-risk groups in both validation and external test sets for each prediction model. P-values correspond to log-rank tests.