Literature DB >> 29399642

Incorporating spatial dose metrics in machine learning-based normal tissue complication probability (NTCP) models of severe acute dysphagia resulting from head and neck radiotherapy.

Jamie Dean¹, Kee Wong², Hiram Gay³, Liam Welsh², Ann-Britt Jones², Ulricke Schick², Jung Hun Oh⁴, Aditya Apte⁴, Kate Newbold^2,5, Shreerang Bhide^2,5, Kevin Harrington^2,5, Joseph Deasy⁴, Christopher Nutting^2,5, Sarah Gulliford¹.

Abstract

Severe acute dysphagia commonly results from head and neck radiotherapy (RT). A model enabling prediction of severity of acute dysphagia for individual patients could guide clinical decision-making. Statistical associations between RT dose distributions and dysphagia could inform RT planning protocols aiming to reduce the incidence of severe dysphagia. We aimed to establish such a model and associations incorporating spatial dose metrics. Models of severe acute dysphagia were developed using pharyngeal mucosa (PM) RT dose (dose-volume and spatial dose metrics) and clinical data. Penalized logistic regression (PLR), support vector classification and random forest classification (RFC) models were generated and internally (173 patients) and externally (90 patients) validated. These were compared using area under the receiver operating characteristic curve (AUC) to assess performance. Associations between treatment features and dysphagia were explored using RFC models. The PLR model using dose-volume metrics (PLRstandard) performed as well as the more complex models and had very good discrimination (AUC = 0.82) on external validation. The features with the highest RFC importance values were the volume, length and circumference of PM receiving 1 Gy/fraction and higher. The volumes of PM receiving 1 Gy/fraction or higher should be minimized to reduce the incidence of severe acute dysphagia.

Entities: Chemical

Year: 2017 PMID： 29399642 PMCID： PMC5796681 DOI： 10.1016/j.ctro.2017.11.009

Source DB: PubMed Journal: Clin Transl Radiat Oncol ISSN： 2405-6308

Introduction

Acute dysphagia is a common toxicity resulting from head and neck (chemo)radiotherapy (RT), having a substantial impact on patients’ quality of life [1] and personal relationships [2]. Around half of patients experience significant acute swallowing dysfunction [3]. Moreover, severe acute reactions have been implicated in the development of “late” radiation toxicities [4], [5], including late dysphagia [6]. Clinicians are unable to accurately predict which patients will experience severe acute dysphagia [7]. A normal tissue complication probability (NTCP) model with good predictive ability would, therefore, represent a highly useful tool for clinical decision-support, treatment plan comparison, treatment modality selection [8] and isotoxic dose escalation (as is being evaluated in lung RT [9]). Recently, NTCP models of dysphagia six months following RT [10], [11] were successfully validated [12], [13], [14]. However, as many patients suffer severe acute dysphagia that resolves by six months following RT, these models do not capture the substantial early toxicity burden. The currently existing NTCP models for severe acute dysphagia, whilst promising and providing useful insights, [15], [16], [17], [18], [19], [20], [21] possess suboptimal discriminative ability and, hence, are not routinely used to guide clinical decision-making. In addition to the prediction of individual patient toxicity outcomes, there is substantial interest in determining statistical associations between RT dose metrics and toxicity to inform the optimal design of RT treatment planning techniques attempting to reduce the incidence of toxicity. A large number of studies, summarized in [22], [23], with conflicting findings, have sought to establish substructures within the head and neck region that are radiosensitive for late dysphagia. However, the apparent differential radiosensitivity of substructures within the pharyngeal musculature is likely to be an artefact of the positions of the primary disease sites relative to those substructures in these study cohorts [24]. To overcome this bias, we combined multiple spatial dose metrics, which are sensitive to both the extent of the dose distribution and regional variations in radiosensitivity, to “tease apart” these effects. Additionally, we hypothesized that the addition of spatial dose metrics would increase the discriminative performance of NTCP models, compared with dose-volume metrics, as has previously been demonstrated for xerostomia [25] and rectal toxicities [26]. The first aim of this study was to determine whether the addition of novel spatial dose metrics would improve the predictive performance of NTCP models for severe acute dysphagia. The second aim was to establish statistical associations between the RT dose distribution and severe acute dysphagia that could be used to inform RT planning techniques aiming to reduce the incidence of severe dysphagia. This study built upon previous acute dysphagia models [27], [28] by introducing novel spatial dose metrics and using machine learning approaches.

Material and methods

Patient data

Severe acute dysphagia models were generated and internally validated using a training dataset of 335 patients with DICOM RT data available, enrolled in one of six different clinical trials [29], [30], [31], [32], [33], with institutional review board approval and signed patient consent (Table 1). Patients for whom clinical data (age, sex, primary disease site, use of chemotherapy) were unavailable (13 patients) were excluded from the analyses. The cohort includes a diverse range of primary disease sites and RT delivery techniques, ensuring a large variation in the dose distributions across the cohort. This increases the generalizability of the models and reduces the chance of introducing biases, for example, due to the primary tumour location. An independent external validation dataset was provided by Washington University School of Medicine in Saint Louis (Table 1). This consisted of 90 patients with a range of head and neck primary tumour sites.

Table 1

Patient cohorts making up the dataset.

Trial	Patients available	Primary disease site	Radiotherapy technique	Radiotherapy dose-fractionation^*	Concurrent chemotherapy
COSTAR (Phase III, multicentre; NCT01216800)	72	Parotid gland	Unilateral; 3D conformal RT, IMRT	65 Gy/30 # (definitive RT),60 Gy/30 # (post-operative RT)	No
PARSPORT (Phase III, multicentre) [25]	67	Oropharynx, hypopharynx	Bilateral; 3D conformal RT, IMRT	65 Gy/30 # (definitive RT),60 Gy/30 # (post-operative RT)	No
Dose Escalation (Phase II, single centre) [26]	26	Larynx, hypopharynx	Bilateral; IMRT	67.2 Gy/28 #,63 Gy/28 #	Yes
Midline (Phase II, single centre) [27]	116	Oropharynx	Bilateral; IMRT	65 Gy/30 # (definitive RT),60 Gy/30 # (post-operative RT)	Yes
Nasopharynx (Phase II, single centre) [28]	36	Nasopharynx	Bilateral; IMRT	65 Gy/30 # (definitive RT),60 Gy/30 # (post-operative RT)	Yes
Unknown Primary (Phase II, single centre) [29]	18	Unknown primary	Bilateral; IMRT	65 Gy/30 # (definitive RT),60 Gy/30 # (post-operative RT)	Yes
Washington University School of Medicine in Saint Louis (Independent external validation)	90	Oral cavity, nasal cavity, nasopharynx, oropharynx, hypopharynx, larynx, parotid gland, unknown primary	Bilateral, unilateral; IMRT	70 Gy/35 #,66 Gy/33 #,60 Gy/30 #	Both concurrent and no concurrent chemotherapy

The first six trials were used for model training and internal validation. The last trial was used for independent external validation. IMRT - intensity-modulated radiotherapy; # – fractions; RT – radiotherapy; Unilateral – treatment delivered to ipsilateral parotid bed only; Bilateral – treatment delivered to ipsilateral and contralateral mucosa of relevant subsite (e.g. nasopharynx, oropharynx or larynx). * All fractionation regimens used 5 fractions per week with 1 fraction per day from Monday to Friday. Where multiple fractionation schedules are listed for a single trial this means that multiple fractionation schedules were employed in those trials.

Patient cohorts making up the dataset. The first six trials were used for model training and internal validation. The last trial was used for independent external validation. IMRT - intensity-modulated radiotherapy; # – fractions; RT – radiotherapy; Unilateral – treatment delivered to ipsilateral parotid bed only; Bilateral – treatment delivered to ipsilateral and contralateral mucosa of relevant subsite (e.g. nasopharynx, oropharynx or larynx). * All fractionation regimens used 5 fractions per week with 1 fraction per day from Monday to Friday. Where multiple fractionation schedules are listed for a single trial this means that multiple fractionation schedules were employed in those trials. Toxicity data for the patients included in the training dataset were recorded prospectively, by experienced head and neck cancer specialists working according to standard trial protocols, prior to the start of RT, weekly during RT, weekly from 1–4 weeks following RT and at 8 weeks following RT using the Common Terminology Criteria for Adverse Events (CTCAE) version 3 [34] dysphagia instrument. The toxicity endpoint of interest chosen for analysis was the peak grade of dysphagia, dichotomized into severe (grade 3 or worse) and non-severe (less than grade 3) dysphagia. Patients with grade 1 or higher baseline toxicity (14 patients) or missing baseline toxicity (9 patients) were excluded from the analysis. Patients with missing toxicity measurements and peak grade less than 3 were excluded from the analysis as these patients may have experienced unreported grade 3 or worse dysphagia (126 patients). The rationale for this strategy for handling missing toxicity data is described in Appendix A. For the external validation cohort, severe acute dysphagia was defined as the patient requiring percutaneous endoscopic gastrostomy tube (PEG) insertion. It should be noted that there was a slight difference in the scoring systems due to the data available. All institutions treating patients used in this study, including the training and external validation cohorts, employed a reactive and conservative approach to PEG insertion. After removing patients with missing data, 173 patients were available for training and 90 patients available for external validation. The incidences of severe acute dysphagia were 66% in the training dataset and 48% in the external validation dataset. The training dataset incidence is artificially inflated by the strategy for handling missing toxicity data. Induction chemotherapy, concurrent chemotherapy regimen (cisplatin, carboplatin, one cycle of cisplatin then one cycle of carboplatin or none), definitive versus post-operative RT, primary disease site (nasopharynx/nasal cavity, oropharynx/oral cavity, hypopharynx/larynx, parotid gland and unknown primary), sex and age were also included as covariates in the models. These clinical covariate data are given in Appendix B.

Calculations

Radiotherapy dose metrics

The pharyngeal mucosa (PM) was considered as the organ-at-risk for acute dysphagia. The PM was delineated, by clinical oncologists, from the roof of the nasopharynx to the level of the suprasternal notch (Appendix C). The physical dose distribution was converted to the fractional dose distribution (physical dose delivered in each fraction), which was described by the dose-volume histogram (DVH) in 20 cGy intervals from 20 (V20) to 260 (V260) cGy per fraction. The use of the fractional DVH is appropriate as nearly all patients who developed severe acute dysphagia developed it before the full course of RT had been delivered (data not shown) and follows recommendations for acute toxicity modelling by Tucker et al. [35]. Using the biologically effective dose in place of the fractional dose made very little difference to the results due to the fractionation regimens employed (data not shown). The dose distribution was also described spatially, using novel dose-length (DLH; L20 – L260) and dose-circumference histograms (DCH; C20 – C260) and 3D moment invariants describing the centre of mass (η001, η010, η100, η011, η101, η110, η111), spread (η002, η020, η200) and skewness (η003, η030, η003) of the dose distribution in the left-right, anterior-posterior and superior-inferior directions [25], [36], detailed in Appendix D.

Statistical modelling

Statistical analysis was performed using a machine learning pipeline specifically designed for NTCP modelling [36]. Three types of model were compared, penalised logistic regression (PLR), support vector classification (SVC) and random forest classification (RFC). For each, a version with dose-volume mretrics (“standard”) and with the spatial dose metrics (“spatial”) was trained and validated. This is described in Appendix E.

Results

The DVH, DLH and DCH data are summarized in Fig. 1.

Fig. 1

Summary of the pharyngeal mucosa (a) DVH, (b) DLH and (c) DCH data grouped by severe or non-severe peak dysphagia. The lines represent the group medians and the error bars represent the 95 percentile confidence intervals. A correlation matrix of the data is shown in Appendix F. Regarding the first aim, the predictive performances of the models are shown in Table 2.

Table 2

Predictive performance of models.

Model	Hyper-parameters	Internal validation mean (standard deviation)/External validation (standard deviation)
Model	Hyper-parameters	AUC	Log loss	Brier score	Calibration slope	Calibration intercept
PLR_standard	penalty = l2,C = 0.001	0.76 (0.08)/0.82 (0.04)	0.62 (0.04)/0.61 (0.02)	0.21 (0.02)/0.21 (0.01)	14.9 (13.5)/17.6 (3.9)	−6.8 (6.8)/−8.3 (1.9)
SVC_standard	kernel = radial basis function,C = 0.0001,gamma = 0.001	0.75 (0.08)/0.82 (0.04)	–	–	–	–
RFC_standard	max depth = 5,max features = square root	0.71 (0.08)/0.78 (0.05)	0.61 (0.09)/0.57 (0.04)	0.20 (0.03)/0.19 (0.02)	3.5 (1.6)/5.7 (1.3)	−1.5 (1.0)/−3.0 (0.8)
PLR_spatial	penalty = l2,C = 10.0	0.75 (0.08)/0.73 (0.05)	0.64 (0.04)/0.62 (0.02)	0.22 (0.02)/0.22 (0.01)	13.7 (11.1)/11.2 (3.6)	−6.2 (5.6)/−4.9 (1.6)
SVC_spatial	kernel = radial basis function,C = 0.0001,gamma = 0.001	0.74 (0.08)/0.73 (0.05)	–	–	–	–
RFC_spatial	max depth = 5,max features = square root	0.74 (0.07)/0.75 (0.05)	0.58 (0.07)/0.61 (0.02)	0.19 (0.03)/0.21 (0.01)	4.5 (2.4)/8.6 (2.3)	−2.2 (1.6)/−4.1 (1.1)

PLR – penalized logistic regression; SVC – support vector classification; RFC – random forest classification; l2 – ridge regularisation; C – inverse of regularisation strength; gamma – kernel coefficient for radial basis function.

Predictive performance of models. PLR – penalized logistic regression; SVC – support vector classification; RFC – random forest classification; l2 – ridge regularisation; C – inverse of regularisation strength; gamma – kernel coefficient for radial basis function. The discrimination of the PLRstandard model was not outperformed by any of the more complex models, on internal (AUC = 0.76, s.d. = 0.08) or external validation (AUC = 0.82, s.d. = 0.04). The log loss and Brier score were similar between all PLR and RFC models on internal and external validation. SVC models do not provide probability estimates; hence, only discrimination could be assessed. Platt scaling was employed to convert the SVC model outputs to probability estimates [37]. However, this led to substantial reductions in AUC related to the algorithm used (data not shown) so the non-scaled SVC models were preferred. The RFC models had better calibration (calibration slope closer to 1 and intercept closer to 0) than the PLR models on internal and external validation. The discriminative ability of PLRstandard model was good on internal validation and very good on external validation. The calibration curve, of the predicted probabilities of severe dysphagia against the actual toxicity outcomes, for this model applied to the external validation data is displayed in Fig. 2a.

Fig. 2

(a) Calibration of the probabilities of severe dysphagia, as predicted by of the PLRstandard model (x-axis), against the observed fraction of severe dysphagia in the external validation dataset (y-axis). The curve shows a logistic regression model of the predicted probabilities (independent variable) against the observed fraction of patients with severe dysphagia (dependent variable). The inset figure shows the histogram of the predicted probabilities and the observed toxicity outcomes (1 = severe dysphagia; 0 = no severe acute dysphagia). (b) Median dose-volume histograms (error bars show 95% confidence intervals) for external validation patients grouped by probability estimate quintiles using the recalibrated PLRstandard model. The model calibration assessed on the external validation dataset was modest. However, the limitations of model calibration assessment, particularly on a small dataset, should be considered [38]. Fig. 2b indicates how the predicted probability of severe dysphagia in the external validation is related to the DVH. The regression coefficients, and covariate means and standard deviations required to standardize the covariates, necessary to use the model are provided in Table 3.

Table 3

Regression coefficients and covariate transformation values for the PLRstandard model required to use the model for clinical decision-support.

Covariate	Regression coefficient	Mean	Standard deviation
intercept	0.002	–	–
definitiveRT	−0.003	0.86	0.35
male	0.015	0.66	0.47
age	−0.007	57.9	12.0
indChemo	0.023	0.54	0.50
noConChemo	−0.029	0.47	0.50
cisplatin	0.024	0.38	0.49
carboplatin	0.009	0.08	0.27
cisCarbo	0.002	0.006	0.24
hypopharynx/larynx	0.014	0.14	0.35
oropharynx/oral cavity	0.015	0.50	0.50
nasopharynx/nasal cavity	−0.003	0.10	0.31
unknown primary	0.001	0.06	0.23
parotid	−0.029	0.20	0.40
V020	0.019	95.5	9.4
V040	0.020	93.5	10.8
V060	0.021	92.2	11.9
V080	0.024	90.3	13.7
V100	0.026	87.7	16.3
V120	0.028	83.8	19.3
V140	0.027	77.5	20.2
V160	0.024	66.4	18.7
V180	0.024	57.0	17.2
V200	0.023	47.0	20.8
V220	0.025	20.0	16.2
V240	0.013	2.3	8.4
V260	0.011	0.0	0.0

definitiveRT – definitive radiotherapy (versus post-operative radiotherapy); indChemo – induction chemotherapy; noConChemo – no concurrent chemotherapy; cisCarbo – one cycle of cisplatin followed by one cycle of carboplatin; Vx – volume of organ receiving x cGy of radiation per fraction.

Regression coefficients and covariate transformation values for the PLRstandard model required to use the model for clinical decision-support. definitiveRT – definitive radiotherapy (versus post-operative radiotherapy); indChemo – induction chemotherapy; noConChemo – no concurrent chemotherapy; cisCarbo – one cycle of cisplatin followed by one cycle of carboplatin; Vx – volume of organ receiving x cGy of radiation per fraction. The model is given by: where where is the intercept, is the regression coefficient for covariate i and is the, centred and scaled, value of covariate i. To use the recalibrated version of the model f is instead given by where and are the external validation intercept and slope (Table 2). Concerning the second aim, the feature importance values for the RFC models are displayed in Fig. 3.

Fig. 3

Bootstrapped feature importance values for the covariates included in the (a) RFCstandard and (b) RFCspatial models. The whiskers indicate the 95 percentile confidence intervals (data non-normally distributed). Note that the y-axis scales are different in (a) and (b). These indicate increasing importance of the DVH, DLH and DCH metrics, in terms of predicting severe dysphagia in the models, with increasing dose level up to a fractional dose of 180 cGy, for RFCstandard, or 220 cGy, for RFCspatial. There is a decrease in importance at higher doses in this, data-driven, analysis. In the RFCstandard and RFCspatial models, the V140 and C220 were the covariates most strongly associated with severe dysphagia, respectively. The 3D moment invariant with the highest feature importance was η002, describing the spread of the dose in the superior-inferior direction. For completeness, the RFC feature importance values were calculated for a model including both dose-volume and spatial dose metrics (Appendix G). In both RFC models, the clinical covariates with the highest feature importance were parotid gland primary disease site, no concurrent chemotherapy and age. Parotid gland primary disease site correlated strongly with the dose metrics (Appendix F) as patients with parotid gland primaries received unilateral irradiation and, hence, a smaller volume of PM irradiated. No concurrent chemotherapy was correlated with parotid gland primary disease site and the dose metrics (Appendix F) as the parotid gland cancer patients, treated in the COSTAR trial, did not receive concurrent chemotherapy. These correlations should be considered when interpreting the results. When interpreting the apparent importance of age it is important to consider that it may have been artificially inflated due to the larger number of possible values than the other clinical covariates [39]. The RFC model feature importance results agreed with the PLRstandard model regression coefficients (Table 3).

Discussion

We met our first aim of determining whether the addition of novel spatial dose metrics could improve the predictive performance of NTCP models of severe acute dysphagia. We suggest that the PLRstandard model should be preferred over the other models, for prediction, on the grounds of at least as good discrimination as the other models, similar log loss and Brier score and greater simplicity. The good discriminative ability of this model, on internal and external validation, makes it a suitable aid for supporting clinical decision-making. The “spatial” models trained in this study did not have better discriminative ability than the “standard” models so we do not recommend their use. This may have been due to the DLH and DCH metrics being highly correlated with the DVH metrics (Appendix F). Hence, the spatial variations in the dose distributions across the cohort were captured by the DVHs. It is important to note that we cannot rule out the possibility that using different spatial dose metrics, combinations of features, models or datasets would improve model performance compared with dose-volume based acute dysphagia models. Potential uses of the model are discussed in Appendix H. We also achieved our second aim of establishing associations between the RT dose distribution and acute dysphagia. The decrease in feature importance for the highest dose levels was due to a lack of variation in these metrics between patients, as they are either 0 or close to 0 for all patients, rather than indicating reduced biological effects at these dose levels. Our results do not support the existence of regional variations in radiosensitivity of the PM for severe acute dysphagia. The fact that η002 was the 3D moment invariant with the highest feature importance suggests that the length, which is correlated with the volume, of the PM irradiated is more important for toxicity than the irradiation of any sub-region of the structure. Other studies suggested that different pharyngeal muscles were more radiosensitive [19], [21], [22], [23]. However, this is likely related to the primary disease sites of the patients used in those studies [24]. The inclusion of multiple spatial dose metrics, sensitive to different spatial aspects of the dose distribution, and a cohort with a wide variety of dose distributions allowed us to explore regional variations in radiosensitivity more thoroughly than has previously been performed. However, we cannot exclude the possibility that different spatial dose metrics [19], combinations of features, models or datasets could support the existence of spatially dependent radiosensitivity for severe acute dysphagia. The feature importance measures (Fig. 3) indicate that the volume of PM receiving intermediate and high doses are most strongly associated with severe acute dysphagia. This is in agreement with another study using the same data, but a different approach to statistical modelling [28]. RFC feature importance does not provide information on whether the correlations between features and outcome are positive or negative. However, the regression coefficients for the PLRstandard model (Table 3) indicate that the higher the value of the dose metrics the greater the probability of severe dysphagia. There is a relatively large increase in feature importance between V80 and V100 (Fig. 3A). A pragmatic recommendation for RT planning techniques aimed at reducing the incidence of severe acute dysphagia, based on these findings, would be to reduce the volume of the entire PM receiving greater than 1 Gy/fraction as much as possible without compromising other aspects of the treatment plan. A previous model of severe acute dysphagia, without the novel spatial dose metrics, but with a different statistical modelling approach, functional data analysis, had similar discriminative ability to the models trained in this study, but superior performance in terms of the probability calibration [28]. Hence, we recommend that the model recommended in [28] should be preferred over the models presented here for clinical decision-support. The Groningen group have produced and validated models of dysphagia measured six months following RT [10], [11], [12], [13], [40], [41]. Models of severe dysphagia at earlier time points focused on establishing associations between covariates and outcome and, hence, either did not optimize or measure discrimination [15], [18], [20], included much smaller numbers of patients [19], [21] or had lower discriminative ability than the PLRstandard model [16], [17]. In addition, with the exception of one study [42], no external validation has been performed. We did not have access to data pertaining to all the covariates, for example genetic polymorphisms, in those published models and, so, were unable to validate them. Moreover, our study featured a more thorough exploration of RT dose-response associations for severe acute dysphagia, including multiple dose levels and different types of spatial dose metric, than previous studies. This resulted in novel insights that could inform RT planning. Our study possesses several limitations. Firstly, the scoring systems used to assess dysphagia severity differed between the training data and external validation data. The threshold for “severe” dysphagia in the external validation data is higher than in the training data. However, the models generated using the training data generalized well to the external validation data. Whilst the limitations of the CTCAE dysphagia scoring system, which was almost exclusively used when the trials incorporated in this study were conducted, have been demonstrated [43], it has been shown to correlate well with multiple patient-reported quality of life measures [44]. As CTCAE grade 3 and PEG-dependence indicate clinical interventions these are relevant endpoints. The slight difference in the dysphagia scoring systems between the training and external validation cohorts may have reduced the performances of the models on external validation. However, the models performed at least as well on external validation as internal validation. Moreover, it is believed that severe acute dysphagia is a highly complex, multifactorial toxicity with a range of different factors having been implicated. These include tobacco and alcohol use, a patient’s pain tolerance and genetic predispositions to severe (chemo)radiation-induced toxicity. Tobacco and alcohol use were not collected in the PARSPORT or COSTAR trials. Therefore, these factors could not be included in the analysis. It is also likely that chemotherapy is insufficiently characterized, using binary variables, in our analysis. Finally, like most radiotherapy outcomes modelling studies, the size of the training and validation cohorts are smaller than recommended for clinical decision-support tools [45], [46]. We suggest that investigators should strive to collect larger datasets for future development and validation of radiotherapy clinical decision-support tools.

Conclusions

In conclusion, we have trained and externally validated a NTCP model of severe acute dysphagia with very good discriminative ability (external validation AUC = 0.82). We suggest that this model may be suitable for clinical decision-support. Additionally, we established that the volumes of the PM receiving intermediate and high doses, greater than 1 Gy/fraction, are most strongly associated with severe acute dysphagia. These should be minimized in RT planning, where possible, to reduce the incidence of severe acute dysphagia. Our data did not support a regional variation in radiosensitivity for the PM.

Conflicts of interest

None.

Table B1

Clinical covariate data in the training and external validation data sets.

Covariate	n_training (%)	n_validation (%)
Definitive RT	148 (86)	44 (49)
Male	114 (66)	68 (76)
Induction chemotherapy	94 (54)	21 (23)
No concurrent chemotherapy	82 (47)	46 (51)
Cisplatin	66 (38)	28 (31)
Carboplatin	14 (8)	0 (0)
Cisplatin/Carboplatin	11 (6)	0 (0)
Hypopharynx/Larynx	24 (14)	25 (28)
Oropharynx/Oral cavity	87 (50)	41 (46)
Nasopharynx/Nasal cavity	18 (10)	15 (17)
Unknown primary	10 (6)	3 (3)
Parotid gland	34 (20)	6 (7)
Covariate	median_training (range)	median_validation (range)
Age	59 (23–88)	58 (21–87)

Concurrent chemotherapy was administered in two cycles, on days 1 and 29 of RT, in the training data cohort and in three cycles on days 1, 22 and 43 of RT for platinum chemotherapy or weekly during RT with the first dose 1 week before day 1 of RT for cetuximab in the external validation cohort.

56 in total

1. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage.

Authors: Ewout W Steyerberg; Gerard J J M Borsboom; Hans C van Houwelingen; Marinus J C Eijkemans; J Dik F Habbema
Journal: Stat Med Date: 2004-08-30 Impact factor: 2.373

2. A calibration hierarchy for risk models was defined: from utopia to empirical data.

Authors: Ben Van Calster; Daan Nieboer; Yvonne Vergouwe; Bavo De Cock; Michael J Pencina; Ewout W Steyerberg
Journal: J Clin Epidemiol Date: 2016-01-06 Impact factor: 6.437

Review 3. Radiation dose-volume effects in the esophagus.

Authors: Maria Werner-Wasik; Ellen Yorke; Joseph Deasy; Jiho Nam; Lawrence B Marks
Journal: Int J Radiat Oncol Biol Phys Date: 2010-03-01 Impact factor: 7.038

Review 4. Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework.

Authors: Andrew J Vickers; Angel M Cronin
Journal: Semin Oncol Date: 2010-02 Impact factor: 4.929

Review 5. Preventing or reducing late side effects of radiation therapy: radiobiology meets molecular pathology.

Authors: Søren M Bentzen
Journal: Nat Rev Cancer Date: 2006-09 Impact factor: 60.716

6. Dose-volume response in acute dysphagia toxicity: Validating QUANTEC recommendations into clinical practice for head and neck radiotherapy.

Authors: Nigel J Anderson; Morikatsu Wada; Michal Schneider-Kolsky; Maureen Rolfo; Daryl Lim Joon; Vincent Khoo
Journal: Acta Oncol Date: 2014-07-01 Impact factor: 4.089

7. A predictive model for dysphagia following IMRT for head and neck cancer: introduction of the EMLasso technique.

Authors: Kim De Ruyck; Fréderic Duprez; Joke Werbrouck; Nick Sabbe; De Langhe Sofie; Tom Boterberg; Indira Madani; Olivier Thas; De Neve Wilfried; Hubert Thierens
Journal: Radiother Oncol Date: 2013-04-22 Impact factor: 6.280

8. Combined chemotherapy and radiation therapy for head and neck malignancies: quality of life issues.

Authors: Nam P Nguyen; Sabah Sallah; Ulf Karlsson; John E Antoine
Journal: Cancer Date: 2002-02-15 Impact factor: 6.860

9. Factors associated with acute and late dysphagia in the DAHANCA 6 & 7 randomized trial with accelerated radiotherapy for head and neck cancer.

Authors: Hanna R Mortensen; Jens Overgaard; Kenneth Jensen; Lena Specht; Marie Overgaard; Jørgen Johansen; Jan F Evensen; Elo Andersen; Lisbeth J Andersen; Hanne S Hansen; Cai Grau
Journal: Acta Oncol Date: 2013-10 Impact factor: 4.089

10. How to develop a more accurate risk prediction model when there are few events.

Authors: Menelaos Pavlou; Gareth Ambler; Shaun R Seaman; Oliver Guttmann; Perry Elliott; Michael King; Rumana Z Omar
Journal: BMJ Date: 2015-08-11

8 in total

Review 1. Artificial intelligence in radiation oncology.

Authors: Elizabeth Huynh; Ahmed Hosny; Christian Guthier; Danielle S Bitterman; Steven F Petit; Daphne A Haas-Kogan; Benjamin Kann; Hugo J W L Aerts; Raymond H Mak
Journal: Nat Rev Clin Oncol Date: 2020-08-25 Impact factor: 66.675

2. Development and Optimization of a Machine-Learning Prediction Model for Acute Desquamation After Breast Radiation Therapy in the Multicenter REQUITE Cohort.

Authors: Mahmoud Aldraimli; Sarah Osman; Diana Grishchuck; Samuel Ingram; Robert Lyon; Anil Mistry; Jorge Oliveira; Robert Samuel; Leila E A Shelley; Daniele Soria; Miriam V Dwek; Miguel E Aguado-Barrera; David Azria; Jenny Chang-Claude; Alison Dunning; Alexandra Giraldo; Sheryl Green; Sara Gutiérrez-Enríquez; Carsten Herskind; Hans van Hulle; Maarten Lambrecht; Laura Lozza; Tiziana Rancati; Victoria Reyes; Barry S Rosenstein; Dirk de Ruysscher; Maria C de Santis; Petra Seibold; Elena Sperk; R Paul Symonds; Hilary Stobart; Begoña Taboada-Valadares; Christopher J Talbot; Vincent J L Vakaet; Ana Vega; Liv Veldeman; Marlon R Veldwijk; Adam Webb; Caroline Weltens; Catharine M West; Thierry J Chaussalet; Tim Rattay
Journal: Adv Radiat Oncol Date: 2022-01-03

Review 3. Big Data in Head and Neck Cancer.

Authors: Carlo Resteghini; Annalisa Trama; Elio Borgonovi; Hykel Hosni; Giovanni Corrao; Ester Orlandi; Giuseppina Calareso; Loris De Cecco; Cesare Piazza; Luca Mainardi; Lisa Licitra
Journal: Curr Treat Options Oncol Date: 2018-10-25

Review 4. Artificial intelligence and machine learning for medical imaging: A technology review.

Authors: Ana Barragán-Montero; Umair Javaid; Gilmer Valdés; Dan Nguyen; Paul Desbordes; Benoit Macq; Siri Willems; Liesbeth Vandewinckele; Mats Holmström; Fredrik Löfman; Steven Michiels; Kevin Souris; Edmond Sterpin; John A Lee
Journal: Phys Med Date: 2021-05-09 Impact factor: 2.685

5. Incorporation of Dosimetric Gradients and Parotid Gland Migration Into Xerostomia Prediction.

Authors: Rosario Astaburuaga; Hubert S Gabryś; Beatriz Sánchez-Nieto; Ralf O Floca; Sebastian Klüter; Kai Schubert; Henrik Hauswald; Mark Bangert
Journal: Front Oncol Date: 2019-07-31 Impact factor: 6.244

6. Incorporating dose-volume histogram parameters of swallowing organs at risk in a videofluoroscopy-based predictive model of radiation-induced dysphagia after head and neck cancer intensity-modulated radiation therapy.

Authors: Stefano Ursino; Alessia Giuliano; Fabio Di Martino; Paola Cocuzza; Alessandro Molinari; Antonio Stefanelli; Patrizia Giusti; Giacomo Aringhieri; Riccardo Morganti; Emanuele Neri; Claudio Traino; Fabiola Paiar
Journal: Strahlenther Onkol Date: 2020-10-09 Impact factor: 3.621

7. Comparison of machine learning methods for prediction of osteoradionecrosis incidence in patients with head and neck cancer.

Authors: Laia Humbert-Vidan; Vinod Patel; Ilkay Oksuz; Andrew Peter King; Teresa Guerrero Urbano
Journal: Br J Radiol Date: 2021-03-18 Impact factor: 3.039

8. Machine Learning for Head and Neck Cancer: A Safe Bet?-A Clinically Oriented Systematic Review for the Radiation Oncologist.

Authors: Stefania Volpe; Matteo Pepa; Mattia Zaffaroni; Federica Bellerba; Riccardo Santamaria; Giulia Marvaso; Lars Johannes Isaksson; Sara Gandini; Anna Starzyńska; Maria Cristina Leonardi; Roberto Orecchia; Daniela Alterio; Barbara Alicja Jereczek-Fossa
Journal: Front Oncol Date: 2021-11-18 Impact factor: 6.244

8 in total