Literature DB >> 32274426

Machine learning highlights the deficiency of conventional dosimetric constraints for prevention of high-grade radiation esophagitis in non-small cell lung cancer treated with chemoradiation.

José Marcio Luna¹, Hann-Hsiang Chao², Russel T Shinohara³, Lyle H Ungar⁴, Keith A Cengel¹, Daniel A Pryma⁵, Chidambaram Chinniah⁶, Abigail T Berman¹, Sharyn I Katz⁵, Despina Kontos⁵, Charles B Simone⁷, Eric S Diffenderfer¹.

Abstract

BACKGROUND AND
PURPOSE: Radiation esophagitis is a clinically important toxicity seen with treatment for locally-advanced non-small cell lung cancer. There is considerable disagreement among prior studies in identifying predictors of radiation esophagitis. We apply machine learning algorithms to identify factors contributing to the development of radiation esophagitis to uncover previously unidentified criteria and more robust dosimetric factors.
MATERIALS AND METHODS: We used machine learning approaches to identify predictors of grade ≥ 3 radiation esophagitis in a cohort of 202 consecutive locally-advanced non-small cell lung cancer patients treated with definitive chemoradiation from 2008 to 2016. We evaluated 35 clinical features per patient grouped into risk factors, comorbidities, imaging, stage, histology, radiotherapy, chemotherapy and dosimetry. Univariate and multivariate analyses were performed using a panel of 11 machine learning algorithms combined with predictive power assessments.
RESULTS: All patients were treated to a median dose of 66.6 Gy at 1.8 Gy per fraction using photon (89.6%) and proton (10.4%) beam therapy, most often with concurrent chemotherapy (86.6%). 11.4% of patients developed grade ≥ 3 radiation esophagitis. On univariate analysis, no individual feature was found to predict radiation esophagitis (AUC range 0.45-0.55, p ≥ 0.07). In multivariate analysis, all machine learning algorithms exhibited poor predictive performance (AUC range 0.46-0.56, p ≥ 0.07).
CONCLUSIONS: Contemporary machine learning algorithms applied to our modern, relatively large institutional cohort could not identify any reliable predictors of grade ≥ 3 radiation esophagitis. Additional patients are needed, and novel patient-specific and treatment characteristics should be investigated to develop clinically meaningful methods to mitigate this survival altering toxicity.

Entities: Chemical

Keywords: Chemoradiation; Intensity-modulated radiation therapy; Machine learning; Non-small cell lung cancer; Proton beam therapy; Radiation esophagitis; Radiation-induced toxicity

Year: 2020 PMID： 32274426 PMCID： PMC7132156 DOI： 10.1016/j.ctro.2020.03.007

Source DB: PubMed Journal: Clin Transl Radiat Oncol ISSN： 2405-6308

Introduction

Severe radiation esophagitis is a clinically important toxicity that frequently arises during the treatment of locally advanced non-small cell lung cancer (LA-NSCLC) [1], [2], [3]. Radiation esophagitis acutely can be present as dysphagia, odynophagia, sternal or epigastric chest pain, or spasms, which can directly influence patient quality of life [4], or as acute or late esophageal bleeding, perforation, or fistulas, which can be life threatening [5]. The estimated incidence of this toxicity ranges from 7 to 25% in patients receiving standard of care definitive chemoradiation [6], [7], [8], [9] and development of high-grade (≥3) radiation esophagitis can necessitate interventions such as analgesic medications, treatment delays/breaks, hospitalizations, and permanent feeding tube dependence [10]. Prior efforts aimed at improving the survival of LA-NSCLC using radiation dose-escalation were unsuccessful likely in part due to the dose limiting toxicity of radiation esophagitis. In prior multi-institutional randomized clinical trials, high-grade radiation esophagitis is shown to negatively affect overall survival (OS) [7], [11], highlighting the importance of mitigating this toxicity. Prior attempts at identifying predictors of esophagitis have identified the importance of factors such as concurrent chemotherapy, radiation dose intensification, and dosimetric/volumetric factors related to the esophagus itself, but there are conflicting data on the predictors, especially in terms of dose constraints [1], [9], [12], [13], [14], [15], [16], [17], [18]. As such, currently there is no consensus for predicting and preventing radiation esophagitis regarding optimal thresholds for volume criteria, dose-volume criteria, radiation treatment modality, or the comparative importance of these factors. This study aims to employ machine learning techniques to identify the critical predictors of esophageal toxicity and their comparative importance in order to inform clinical decision making. Here, we analyze 35 continuous and categorical variables drawn from previous literature as predictors of grade ≥ 3 radiation esophagitis on a large institutional cohort of 202 consecutively treated LA-NSCLC patients. We apply three variants of a panel of 11 machine learning techniques to robustly identify the important predictive factors in the development of grade ≥ 3 radiation esophagitis.

Methods and materials

Patient cohort

With institutional review board approval (Penn IRB protocol #832329), we identified a cohort of 202 consecutive patients with histologically confirmed Stage II-III LA-NSCLC (AJCC 7th Edition) treated at our institution with sequential or concurrent chemoradiation with platinum-containing regimens between 2008 and 2016. Patients received treatment using either proton beam therapy (PBT) or intensity-modulated radiation treatment (IMRT) with x-rays. Radiation esophagitis was graded according to the common terminology criteria for adverse events (CTCAEv4.0).

Feature definition

In this study, we analyzed a set of 35 predefined continuous and categorical features, including variables previously reported in the literature as strong predictors of grade ≥ 3 radiation esophagitis. The categorical features were ethnicity, pre-treatment ECOG, 3-month post-radiotherapy (RT) ECOG performance status, AJCC clinical stage grouping, T Stage, N Stage, radiation treatment modality (photon or proton), concurrent vs. sequential chemotherapy, specific chemotherapy agents used, tumor grade, and sex. The continuous features were smoking pack-years (pack-year), body mass index (BMI) age at diagnosis, primary tumor size, pulmonary function test (PFT) pre-bronchodilator, DLCO (% predicted), PFT pre-bronchodilator FEV1 (L), radiation total dose (total dose), radiation fraction size, number of radiation fractions (nr. fractions), mean esophagus dose (eso mean), maximum esophagus dose (eso max), eso V40, eso V50, eso V60, mean lung dose (lung mean), lung V5, lung V10, lung V20, mean heart dose (heart mean), heart V5, heart V30, heart V50 and heart V60. For PBT, the dosimetric indices were calculated using the proton convolution superposition algorithm (Varian Medical Systems, Palo Alto, CA, USA), and for IMRT dose calculations with heterogeneity corrections were performed using the analytical anisotropic algorithm (photons). This set of dose parameters and clinical features was thoroughly discussed and selected by three highly experienced board-certified thoracic radiation oncologists (CBS, ATB, KAC) at our institution based on their expertise, best clinical practice, and current national treatment guidelines.

Missing values imputation

Missing values were imputed using trimmed scores regression (TSR), a method that fits principal component analysis (PCA) models iteratively thus exploiting the statistical relationship among features [19]. This imputation is based on the first four principal components, which for this cohort explain 95.84% of the variance of the data. More details about the selection of TSR imputation in Section S3 of the SI Appendix.

Univariate analysis

Based on the two labeled classes (esophagitis/non-esophagitis) in our cohort, we performed a Wilcoxon rank-sum test for each continuous predictor as well as a test for each categorical predictor. The statistical significance (p-value) of the separation between the two classes by each predictor was estimated. Due to Bonferroni correction of a 5% family-wise error rate, the significance level was used for multiple comparisons. The average performance indexes, specifically, balanced accuracy (BACC) [20], the receiver operating characteristic (ROC), the area under the ROC curve (AUC) were estimated. BACC is defined as the average between sensitivity and specificity, and commonly used to calculate performance in two-class imbalanced domains. 95% confidence intervals of the average performance measurements (i.e., BACC and AUC) were calculated using cross-validated estimates as bootstrapped samples and using the standard t-distribution-based approximation. A total of 500,000 bootstrap replicates were used to estimate the confidence intervals. All the analyses were implemented using the Statistics and Machine Learning Toolbox of Matlab R2018b® (MathWorks, Santa Clara, CA, USA) [21]. Additionally, Pearson correlation coefficients were calculated to assess possible confounders associated with radiation esophagitis.

Multivariate analysis

To assess the combined capacity of prediction of the features, we used a set of diverse statistical tools including long-existing methods such as logistic regression [22], elastic net, k-nearest neighbors (k-NN) and linear and quadratic discriminants [23] to more sophisticated methods such as linear, quadratic and Gaussian support vector machines (SVM) [24], classification and regression trees (CART) [25], Random Forest [26] and boosted trees (RUSBoost) [27]. All experiments were performed using nested resampling as shown in Fig. S1. We implemented stratified 5-fold cross-validation for the internal resampling, where the validation set, was used for hyperparameter tuning and feature selection using grid search to maximize BACC. We also used stratified 5-fold cross-validation for the external resampling. The test set in the external resampling (Fig. S1), also known as hold-out set, was used for performance estimation of the model, i.e., BACC and AUC. Same as univariate analysis, 95% confidence intervals of the average performance measurements were calculated using cross-validated estimates as bootstrapped samples using the standard t-distribution-based approximation with 500,000 bootstrap replicates. We assured that the observations used in hyperparameter optimization never appear in the external resampling test set, thus reducing model overfitting. The list of hyperparameters tuned per each implemented algorithm is shown in Table S1. This analysis corresponds to the development and validation of a predictive model using resampling or analysis type 1b as specified in Collins et al, 2015 [28]. The implemented stages of nested resampling are illustrated in the workflow shown in Fig. 1. In the internal resampling for each machine learning algorithm, one model is built per fold, for a total of five models. Then, the hyperparameters of the model with the highest BACC calculated through grid search, are selected and the performance of such model (i.e., BACC and AUC) is subsequently assessed on the respective test set during external resampling.

Fig. 1

Multivariate analysis workflow. Diagram illustrating the workflow by which the input data undergoes stepwise resampling to estimate model performance for prediction of radiation esophagitis.

Multivariate analysis workflow. Diagram illustrating the workflow by which the input data undergoes stepwise resampling to estimate model performance for prediction of radiation esophagitis. For further exploration of the prediction power of the algorithms, three variants of the experiments were proposed: Evaluation of predictive power using all 35 predictors. Predictive power assessment using backward sequential feature selection (BSFS). Predictive power assessment using synthetic minority oversampling technique (SMOTE) [29] and BSFS. SMOTE is a method that combines the under-sampling of the majority class and the over-sampling of the minority class by creating synthetic minority class examples. This increases the sensitivity of the classifiers to the minority class [29]. In the experiments where oversampling was implemented, SMOTE was performed in the internal resampling only, specifically in the training set of each internal fold. Moreover, BSFS was performed in the internal resampling, in the experiments in which feature selection was implemented. The p-values associated with the predictive performance of the different algorithms were calculated using the Wilcoxon rank-sum test with a significance level , due to Bonferroni correction of a 5% error rate, considering the three variants of the 11 algorithms. For each of the machine learning algorithms, we calculate their predictions using the test sets in the outer resampling. Following the Wilcoxon rank sum test procedure, we compare the distribution of the estimated predictions of each algorithm between the two labeled classes (esophagitis/non-esophagitis).

Results

Patient characteristics and outcomes

Characteristics of the 202 consecutive patients with adenocarcinoma who were treated at our center with chemoradiation and included in the current analysis are provided in Table 1, Table 2. Patients were treated homogeneously to a median dose of 66.6 Gy at 1.8 Gy per fraction (range 64.8–66.6 Gy at 1.8 Gy per fraction). The median age of the cohort was 64 years (range 56–73). Radiation was mainly delivered with IMRT (89.6%), with a minority receiving proton beam therapy (10.4%). Overall, 86.6% of patients received concurrent chemotherapy, with a carboplatin-based doublet combination (51.5%) being the most common regimen, followed by cisplatin-based doublet (34.7%).

Table 1

Summary of categorical patient characteristics. Description of clinical characteristics of the cohort with their respective categorization and percentages.

Categorical Predictors	Classes	Number of Patients	(%)
Sex	Male	90	44.6
	Female	112	55.4
Smoking History	Former	136	67.3
	Current	26	12.9
	Never	17	8.4
	Not Available	23	11.4
Ethnicity	White	137	67.8
	Black	47	23.3
	Asian	4	2.0
	Other	14	6.9
Pre Treatment ECOG	0	77	38.1
Perform. Status	1	55	27.2
	2	14	6.9
	3	2	1.0
	4	2	1.0
	Not Recorded	52	25.7
Stage Grouping	IIB	1	0.5
	IIIA	120	58.9
	IIIB	81	40.6
Tumor Stage	Tx	15	7.4
	T1	51	25.2
	T2	63	31.2
	T3	32	15.8
	T4	41	20.3
Nodal Stage	Nx	7	3.5
	N0	9	4.5
	N1	12	5.9
	N2	126	62.4
	N3	48	23.8
Histology	Adenocarcinoma	202	100.0
Radiation Modality	Photon (IMRT)	181	89.6
	Proton	21	10.4
Chemotherapy	Concurrent	176	86.6
	Sequential	21	10.4
	None	5	3.0
Chemotherapy Agents	Carboplatin-based Doublet	104	51.5
	Cisplatin-based Doublet	70	34.7
	Platinum-based Triplet	6	3.0
	Single Agent	2	1.0
	Other	20	9.9

Table 2

Summary of numerical patient characteristics. Description of numerical characteristics of the cohort with their respective median and interquartile ranges.

Continuous Predictors	Median	Range ^‡
Age (yr)	64	(56–73)
Pack-Year (current/former smokers)	35	(14.5–50)
BMI (kg/m²)	26.0	(23.0–30.0)
Radiation Dose Delivered (Gy)	66.6	(64.8–66.6)
Dose per fraction (Gy)	1.8	(1.8–1.8)
Esophagus Mean Dose (Gy)	24.5	(18.3–31.9)
Esophagus Maximum Dose (Gy)	69.4	(65.4–72.4)

‡ Interquartile range.

Summary of categorical patient characteristics. Description of clinical characteristics of the cohort with their respective categorization and percentages. Summary of numerical patient characteristics. Description of numerical characteristics of the cohort with their respective median and interquartile ranges. ‡ Interquartile range. At a median follow-up of 22.6 months (1–88 month range), patients had a median OS of 23.5 months, 1-year OS of 75.0%, 2-year OS of 49.0%, and 5-year OS of 12.0%, all calculated from a Kaplan-Meier plot. Within the cohort, 23 patients (11.4%) developed grade ≥ 3 radiation esophagitis. Dosimetric parameters of the same organ at risk (e.g. heart, lung, and esophagus) were found to be strongly positively correlated to each other and also showed weaker positive correlations with neighboring anatomic organs (e.g. lung-heart, lung-esophagus) (Fig. 2). The univariate analysis showed that no individual features can predict grade ≥ 3 radiation esophagitis, with median AUC = 0.49 (range 0.45–0.55) and p ≥ 0.07 across all 35 features (Table 3).

Fig. 2

Feature correlation heat map. Heat map, illustrating the Pearson correlation between the continuous features under study.

Table 3

Univariate analysis. Predictive performance for individual features using AUC analysis with their respective significance using Wilcoxon rank-sum test (continuous features) and (categorical features). None of the features can predict grade ≥ 3 RE using Bonferroni correction () for multiple comparison.

Feature	AUC ^§	P-value
T Stage	0.41 (0.28,0.55)	0.07
Lung V20	0.39 (0.27,0.52)	0.09
BMI	0.61 (0.48,0.72)	0.09
Pack Years	0.40 (0.29,0.52)	0.12
Concurrent v Sequential	0.57 (0.55,0.60)	0.15
Eso V60	0.59 (0.47,0.70)	0.16
Lung Mean	0.41 (0.30,0.54)	0.16
Total Dose	0.42 (0.31,0.55)	0.18
Heart Mean	0.58 (0.43,0.71)	0.23
Agents Drugs	0.43 (0.33,0.54)	0.25
Heart V30	0.57 (0.43,0.70)	0.28
Pre Treatment ECOG	0.56 (0.43,0.68)	0.31
Sex	0.56 (0.44,0.65)	0.32
Eso Max	0.44 (0.33,0.56)	0.33
Eso V50	0.56 (0.44,0.67)	0.34
Ethnicity	0.45 (0.37,0.58)	0.34
Eso V40	0.56 (0.43,0.67)	0.38
N Stage	0.55 (0.45,0.62)	0.38
Heart V5	0.55 (0.41,0.69)	0.42
Heart V60	0.55 (0.41,0.67)	0.43
Best CS AJCC Stage	0.43 (0.32,0.54)	0.44
Heart V50	0.55 (0.41,0.67)	0.48
Age at Diagnosis	0.46 (0.36,0.57)	0.52
Lung V10	0.46 (0.33,0.58)	0.52
Nr of Fractions	0.46 (0.37,0.59)	0.55
Eso Mean	0.54 (0.41,0.66)	0.55
Grade Differentiation	0.45 (0.36,0.56)	0.56
Fraction Size	0.53 (0.41,0.61)	0.57
PFT DLCO pred	0.47 (0.36,0.59)	0.63
Linac	0.51 (0.46,0.62)	0.66
Proton	0.49 (0.38,0.54)	0.66
Lung V5	0.47 (0.34,0.60)	0.67
PFT Pre Bronch Actual FEV1 L	0.52 (0.40,0.63)	0.76
Primary Tumor Long Dim cm	0.48 (0.35,0.61)	0.78
ECOG 3 mo Post-RT	0.49 (0.37,0.61)	0.85

§ Estimate with 95% confidence interval.

Feature correlation heat map. Heat map, illustrating the Pearson correlation between the continuous features under study. Univariate analysis. Predictive performance for individual features using AUC analysis with their respective significance using Wilcoxon rank-sum test (continuous features) and (categorical features). None of the features can predict grade ≥ 3 RE using Bonferroni correction () for multiple comparison. § Estimate with 95% confidence interval. The predictive power using AUCs, as well as the associated p-values and the optimal BACC for all classifiers are summarized in Table 4. In the first experiment where we trained all 11 algorithms using the complete set of 35 predictors, none of the algorithms combining the effect of all available features could predict grade ≥ 3 radiation esophagitis with median AUC = 0.50 (range 0.45–0.54) and p ≥ 0.09 across all the algorithms (upper third of Table 4). In the second experiment where BSFS in the internal resampling was implemented, a median AUC = 0.52 (range 0.49–0.56) and p ≥ 0.25 across all machine learning algorithms show that the algorithms are unable to perform better than a random classifier (middle third of Table 4). It evidences the lack of capacity of the current combined features to separate grade ≥ 3 radiation esophagitis even when counteracting confounding through feature selection. BSFS was chosen over forward sequential feature selection (FSFS) due to its superior predictive performance using logistic regression in our cohort (Fig. S2). In the last set of experiments, where SMOTE was implemented, a median AUC = 0.49 (range 0.45–0.52) and p ≥ 0.07, show that no algorithm significantly predicted the two classes.

Table 4

Multivariate analysis. Combined predictive performance of features using 11 statistical models with three variants namely, a) 35 handcrafted features, b) BSFS and c) BSFS and SMOTE. None of the models can predict grade ≥ 3 RE using Bonferroni correction () for multiple comparison.

Experiment	Algorithm	AUC ^§	BACC ^§	P-value
All 35 Features	Logistic Regression	0.58 (0.27,0.88)	0.58 (0.29,0.87)	0.09
	Linear Discriminant	0.57 (0.21,0.93)	0.57 (0.25,0.89)	0.30
	Linear SVM	0.56 (0.22,0.90)	0.49 (0.38,0.60)	0.50
	Elastic Net	0.52 (0.17,0.87)	0.47 (0.28,0.66)	0.88
	RUSBoost	0.52 (0.07,0.96)	0.56 (0.23,0.88)	0.62
	k-NN	0.50 (0.20,0.79)	0.53 (0.27,0.79)	0.72
	Quadratic SVM	0.49 (0.08,0.89)	0.53 (0.19,0.88)	0.74
	Random Forest	0.46 (0.10,0.82)	0.50 (0.50,0.50)	0.55
	Quadratic Discriminant	0.45 (0.09,0.80)	0.48 (0.14,0.82)	0.27
	CART	0.44 (0.17,0.71)	0.47 (0.33,0.60)	0.32
	Gaussian SVM	0.40 (0.03,0.77)	0.50 (0.48,0.51)	0.12
BSFS	Logistic Regression	0.61 (0.41,0.81)	0.54 (0.31,0.78)	0.26
	Linear Discriminant	0.59 (0.30,0.88)	0.52 (0.34,0.70)	0.25
	Linear SVM	0.57 (0.50,0.64)	0.50 (0.46,0.54)	0.54
	Random Forest	0.56 (0.22,0.90)	0.53 (0.35,0.71)	0.35
	k-NN	0.53 (0.19,0.86)	0.56 (0.25,0.87)	0.71
	Elastic Net	0.52 (0.17,0.87)	0.47 (0.28,0.66)	0.88
	Quadratic SVM	0.50 (0.14,0.85)	0.50 (0.23,0.78)	0.84
	RUSBoost	0.49 (0.05,0.93)	0.49 (0.18,0.81)	0.76
	Quadratic Discriminant	0.48 (0.09,0.88)	0.50 (0.23,0.77)	0.66
	Gaussian SVM	0.46 (0.13,0.78)	0.52 (0.43,0.61)	0.73
	CART	0.40 (0.20,0.60)	0.46 (0.41,0.52)	0.25
BSFS and SMOTE	Elastic Net	0.61 (0.16,1.00)	0.63 (0.24,1.00)	0.07
	Linear Discriminant	0.58 (0.17,0.98)	0.55 (0.21,0.89)	0.39
	Logistic Regression	0.55 (0.15,0.95)	0.54 (0.18,0.90)	0.42
	Linear SVM	0.50 (0.14,0.86)	0.51 (0.34,0.68)	0.85
	Quadratic SVM	0.49 (0.12,0.86)	0.52 (0.13,0.90)	0.92
	RUSBoost	0.49 (0.12,0.85)	0.50 (0.18,0.82)	0.73
	Random Forest	0.48 (0.12,0.83)	0.53 (0.26,0.80)	0.73
	k-NN	0.47 (0.10,0.85)	0.51 (0.19,0.83)	0.61
	Gaussian SVM	0.43 (0.02,0.84)	0.49 (0.22,0.76)	0.33
	CART	0.42 (0.01,0.83)	0.48 (0.21,0.76)	0.35
	Quadratic Discriminant	0.41 (0.12,0.69)	0.48 (0.45,0.51)	0.54

§ Estimate with 95% confidence interval.

Discussion

Prior attempts to identify predictors of radiation esophagitis have utilized various methodologies, with many of these reports conducted in a pre-IMRT era, resulting in conflicting data regarding the relative importance of these factors [9], [12], [13], [14], [16], [17], [18], [30], [31] as illustrated in Table S1. We sought to apply machine learning algorithms to a contemporary, curated patient cohort treated relatively homogenously with modern radiotherapy techniques in order to examine, validate, and rank factors suggested to predict for high-grade esophagitis. Here, we specifically analyzed predictors of grade ≥ 3 radiation esophagitis, a particularly important toxicity and grade given its association with considerably worse OS as reported on the clinical trial of the Radiation Therapy Oncology Group (RTOG) 0617 [7]. Interestingly, we found that when using a combination of machine learning methodologies coupled with resampling techniques to reduce confounding from overfitting, no single feature reliably predicts grade ≥ 3 esophagitis in our analysis. Although we found that eso mean and dosimetric factors that correlated strongly with esophageal dose (Heart Mean, Heart V30) ranked highly in feature importance (Table 3), none crossed the AUC threshold in our study to be deemed a reliable predictor. This is in contrast to previously published retrospective [12], [32], prospective [16], [33] and randomized [34] studies showing associations between esophageal toxicity and predictors including age, tumor nodal stage, concurrent chemotherapy and BMI. It also contrasts with dosimetric factors including eso mean, eso max, as well as eso V20, eso V35, eso V60 which have been analyzed retrospectively in [35], and using a mixture of retrospective and prospective collected datasets in [9]. It is important to note that we specifically examined grade ≥ 3 RE, whereas much of the prior literature has examined grade ≥ 2 esophagitis, a less clinically impactful toxicity [36], [37]. In our dataset, we observed grade ≥ 3 radiation esophagitis in 23 of 202 patients (11.4%), which is lower than historically observed rates of >15%[3], [10], [35], [37]. The comparatively low radiation esophagitis rates we report here may reflect treatment improvements over eras reflecting improved symptom prevention and proactive care, as well as advanced radiation treatment modalities. Importantly, to our knowledge, our series is the first machine learning analysis to include a cohort of patients treated with proton beam therapy, which may also result in lower than expected radiation esophagitis rates, as has been reported in locally advanced lung cancer prospective population treated with proton therapy [38]. Additionally, the lower than expected toxicity rates may contribute to a lack of reliable predictors in our models due to insufficient radiation esophagitis events. By comparison, other studies reporting robust predictors for grade ≥ 2 radiation esophagitis observed toxicity rates upwards of 50% [9], [10], [35]. Another inherent challenge in using machine learning tools to identify predictive factors is falsely identifying significant factors due to overfitting. We also sought to enhance the robustness of our machine learning models through the use of sequential feature selection and resampling using BSFS and SMOTE. We further attribute the difference in results between our current models and the prior literature to a combination of the different toxicity endpoint assessed (grade ≥ 3 vs. grade ≥ 2), variance in radiation techniques, and implementation of resampling. Previously published reports using models developed using a median cohort size of 141 (well below our actual cohort), and median rate of radiation esophagitis of 11.1%, used logistic regression approaches [1], [9], [12], [17], [31], [32], [34], [36], [37], [39], while some other studies would fit Lyman-Kutcher-Burman (LKB) models, both using pre-IMRT era cohorts [14], [33], which may not reflect the current risks of radiation-induced toxicity in a contemporary setting. A more recent study used lasso regularization using a smaller cohort of 94 patients, where 16% developed radiation esophagitis [35]. Finally, some studies with considerably larger cohorts are limited to conformal radiotherapy patients and/or a rather small number of IMRT patients [9], [14]. To the best of our knowledge, this is the first study with a relatively large cohort using sophisticated machine learning techniques to identify predictors of grade ≥ 3 radiation esophagitis. Radiation esophagitis has been a challenging entity to predict, which is reflected in the variance in esophageal dosimetric constraints employed in recent national prospective randomized trials on LA-NSCLC. Among the two most recent NRG oncology phase III randomized trials for LA-NSCLC, RTOG 0617 [7] recommended constraint of an eso mean below 34 Gy and to record the eso V60 without required dose constraint, whereas RTOG 1308 [40], the currently enrolling prospective trial comparing proton vs. photon radiation therapy, set a per protocol constraint of an eso max of 74 Gy to ≤1 cc of the partial circumference while retaining none of the earlier constraints from RTOG 0617. This is in contrast to the pulmonary constraints, which have remained relatively constant over these trials. An earlier meta-analysis focusing on radiation esophagitis development [9] illustrates some of the potential challenges in identifying reliable predictors. Similar to our analysis, many clinical and dosimetric factors are found to be associated with radiation esophagitis toxicity, with more features associated with grade ≥ 2 than grade ≥ 3 radiation esophagitis. However, when these features are used as predictors, they by and large perform poorly with C-statistics below 0.6. Interestingly, eso V60 was identified as a reliable predictor of grade ≥ 3 radiation esophagitis in [9]. This series does represent an older cohort of patients treated from 1993 to 2011, which may contribute to some of the discrepant findings. Furthermore, none of the eso V40, V50 nor V60 were identified as an important predictor in our study (see Table 3). Given the result from our machine learning analysis that none of the 35 features analyzed performed better than a random classifier, this suggests that our currently utilized clinical, demographic, and dosimetric features could be inadequate to reliably predict radiation esophagitis. One can conclude that we are not currently collecting and capturing the appropriate features to allow a machine learning workflow to predict grade ≥ 3 radiation esophagitis, as we were successfully able to do when using machine learning to predict for pneumonitis in LA-NSCLC [41] and chest wall toxicity in early stage NSCLC [42]. As such, we encourage other investigators to explore and develop new markers directed at this toxicity. This may include more widespread utilization of biomarkers or composite features to generate the appropriate power and granularity to adequately capture radiation esophagitis. In the recent work of Bahn and Alber [43], the authors assume a unimodal beta distribution of the output of the normal tissue complication probability (NTCP), and using Monte Carlo simulations state that a cohort size of N = 300 is necessary to be powered to detect a small difference of 0.1 between two AUCs. Our current cohort size N = 202 does not fulfill this suggested sample size for AUC comparison in weak model settings. It does, however, meet the sample size recommendations for detecting models with medium predictive performance (as defined in [43], AUC = 0.69). In summary, our findings encourage the incorporation of novel predictors of acute esophagitis in our future research agenda, as well as the prospective increase of the cohort size as more patient information becomes available at our institution. This single institution analysis, however, does allow us comparative uniformity in the patient population and minimizes potential heterogeneity in the assessed population. It is also worth noting that our cohort is the largest used in a modern, IMRT analysis of grade > 3 RE in stage II-III NSCLC patients performed to date.

Conclusions

From our analysis, we conclude that current predictors for high-grade radiation esophagitis are unreliable and that continued investigation is necessary to develop clinically useful metrics for prevention of this detrimental toxicity that is associated with overall survival. Reporting and identifying more robust variables will be critically important for future study. Clinicians should employ individualized patient-centered decision making in terms of treatment regimens and toxicity mitigation until reliable radiation esophagitis predictors can be identified.

34 in total

1. Predictors of severe esophagitis include use of concurrent chemotherapy, but not the length of irradiated esophagus: a multivariate analysis of patients with lung cancer treated with nonoperative therapy.

Authors: M Werner-Wasik; E Pequignot; D Leeper; W Hauck; W Curran
Journal: Int J Radiat Oncol Biol Phys Date: 2000-10-01 Impact factor: 7.038

Review 2. Meta-analysis of concomitant versus sequential radiochemotherapy in locally advanced non-small-cell lung cancer.

Authors: Anne Aupérin; Cecile Le Péchoux; Estelle Rolland; Walter J Curran; Kiyoyuki Furuse; Pierre Fournel; Jose Belderbos; Gerald Clamon; Hakki Cuneyt Ulutin; Rebecca Paulus; Takeharu Yamanaka; Marie-Cecile Bozonnat; Apollonia Uitterhoeve; Xiaofei Wang; Lesley Stewart; Rodrigo Arriagada; Sarah Burdett; Jean-Pierre Pignon
Journal: J Clin Oncol Date: 2010-03-29 Impact factor: 44.544

Review 3. Systematic review of dose-volume parameters in the prediction of esophagitis in thoracic radiotherapy.

Authors: Jim Rose; George Rodrigues; Brian Yaremko; Michael Lock; David D'Souza
Journal: Radiother Oncol Date: 2008-10-22 Impact factor: 6.280

4. Dosimetric correlates for acute esophagitis in patients treated with radiotherapy for lung carcinoma.

Authors: Jeffrey Bradley; Joseph O Deasy; Soeren Bentzen; Issam El-Naqa
Journal: Int J Radiat Oncol Biol Phys Date: 2004-03-15 Impact factor: 7.038

5. Simultaneously modulated accelerated radiation therapy reduces severe oesophageal toxicity in concomitant chemoradiotherapy of locally advanced non-small-cell lung cancer.

Authors: Enrique Chajon; Julien Bellec; Joël Castelli; Romain Corre; Mallorie Kerjouan; Elisabeth Le Prisé; Renaud De Crevoisier
Journal: Br J Radiol Date: 2015-09-28 Impact factor: 3.039

6. Quality of Life Analysis of a Radiation Dose-Escalation Study of Patients With Non-Small-Cell Lung Cancer: A Secondary Analysis of the Radiation Therapy Oncology Group 0617 Randomized Clinical Trial.

Authors: Benjamin Movsas; Chen Hu; Jeffrey Sloan; Jeffrey Bradley; Ritsuko Komaki; Gregory Masters; Vivek Kavadi; Samir Narayan; Jeff Michalski; Douglas W Johnson; Christopher Koprowski; Walter J Curran; Yolanda I Garces; Rakesh Gaur; Raymond B Wynn; John Schallenkamp; Daphna Y Gelblum; Robert M MacRae; Rebecca Paulus; Hak Choy
Journal: JAMA Oncol Date: 2016-03 Impact factor: 31.777

7. Predictors of radiation-induced esophageal toxicity in patients with non-small-cell lung cancer treated with three-dimensional conformal radiotherapy.

Authors: Anurag K Singh; Mary Ann Lockett; Jeffrey D Bradley
Journal: Int J Radiat Oncol Biol Phys Date: 2003-02-01 Impact factor: 7.038

8. Predictors of Acute Radiation Esophagitis in Non-small Cell Lung Cancer Patients Treated With Accelerated Hyperfractionated Chemoradiotherapy.

Authors: Kentaro Wada; Noriko Kishi; Naoyuki Kanayama; Takero Hirata; Yoshihiro Ueda; Yoshifumi Kawaguchi; Masahiro Morimoto; Koji Konishi; Fumio Imamura; Kazuhiko Ogawa; Teruki Teshima
Journal: Anticancer Res Date: 2019-01 Impact factor: 2.480

9. Independent test of a model to predict severe acute esophagitis.

Authors: Ellen X Huang; Clifford G Robinson; Alerson Molotievschi; Jeffrey D Bradley; Joseph O Deasy; Jung Hun Oh
Journal: Adv Radiat Oncol Date: 2016-11-16

10. Exploratory analysis using machine learning to predict for chest wall pain in patients with stage I non-small-cell lung cancer treated with stereotactic body radiation therapy.

Authors: Hann-Hsiang Chao; Gilmer Valdes; Jose M Luna; Marina Heskel; Abigail T Berman; Timothy D Solberg; Charles B Simone
Journal: J Appl Clin Med Phys Date: 2018-07-10 Impact factor: 2.102

2 in total

1. Radiation-Induced Esophagitis in Non-Small-Cell Lung Cancer Patients: Voxel-Based Analysis and NTCP Modeling.

Authors: Serena Monti; Ting Xu; Radhe Mohan; Zhongxing Liao; Giuseppe Palma; Laura Cella
Journal: Cancers (Basel) Date: 2022-04-05 Impact factor: 6.639

2. Evaluation of Epigallocatechin-3-Gallate as a Radioprotective Agent During Radiotherapy of Lung Cancer Patients: A 5-Year Survival Analysis of a Phase 2 Study.

Authors: Wanqi Zhu; Yalan Zhao; Shuyu Zhang; Xiaolin Li; Ligang Xing; Hanxi Zhao; Jinming Yu
Journal: Front Oncol Date: 2021-06-10 Impact factor: 6.244

2 in total