Literature DB >> 33211108

Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer.

Frederick Matthew Howard¹, Sara Kochanny¹, Matthew Koshy^2,3, Michael Spiotto^2,3,4, Alexander T Pearson¹.

Abstract

Importance: Postoperative chemoradiation is the standard of care for cancers with positive margins or extracapsular extension, but the benefit of chemotherapy is unclear for patients with other intermediate risk features. Objective: To evaluate whether machine learning models could identify patients with intermediate-risk head and neck squamous cell carcinoma who would benefit from chemoradiation. Design, Setting, and Participants: This cohort study included patients diagnosed with squamous cell carcinoma of the oral cavity, oropharynx, hypopharynx, or larynx from January 1, 2004, through December 31, 2016. Patients had resected disease and underwent adjuvant radiotherapy. Analysis was performed from October 1, 2019, through September 1, 2020. Patients were selected from the National Cancer Database, a hospital-based registry that captures data from more than 70% of newly diagnosed cancers in the United States. Three machine learning survival models were trained using 80% of the cohort, with the remaining 20% used to assess model performance. Exposures: Receipt of adjuvant chemoradiation or radiation alone. Main Outcomes and Measures: Patients who received treatment recommended by machine learning models were compared with those who did not. Overall survival for treatment according to model recommendations was the primary outcome. Secondary outcomes included frequency of recommendation for chemotherapy and chemotherapy benefit in patients recommended for chemoradiation vs radiation alone.
Results: A total of 33 527 patients (24 189 [72%] men; 28 036 [84%] aged ≤70 years) met the inclusion criteria. Median follow-up in the validation data set was 43.2 (interquartile range, 19.8-65.5) months. DeepSurv, neural multitask logistic regression, and survival forest models recommended chemoradiation for 17 589 (52%), 15 917 (47%), and 14 912 patients (44%), respectively. Treatment according to model recommendations was associated with a survival benefit, with a hazard ratio of 0.79 (95% CI, 0.72-0.85; P < .001) for DeepSurv, 0.83 (95% CI, 0.77-0.90; P < .001) for neural multitask logistic regression, and 0.90 (95% CI, 0.83-0.98; P = .01) for random survival forest models. No survival benefit for chemotherapy was seen for patients recommended to receive radiotherapy alone. Conclusions and Relevance: These findings suggest that machine learning models may identify patients with intermediate risk who could benefit from chemoradiation. These models predicted that approximately half of such patients have no added benefit from chemotherapy.

Entities: Chemical

Mesh：

Year: 2020 PMID： 33211108 PMCID： PMC7677764 DOI： 10.1001/jamanetworkopen.2020.25881

Source DB: PubMed Journal: JAMA Netw Open ISSN： 2574-3805

Introduction

The adjuvant treatment of resected head and neck squamous cell carcinoma (HNSCC) is guided by the European Organisation for Research and Treatment of Cancer (EORTC) 22931 and Radiation Therapy Oncology Group (RTOG) 95-01 randomized clinical trials.[1,2] Both trials randomized patients with at least 1 adverse prognostic factor after definitive surgery to adjuvant radiotherapy (RT) or cisplatin-based chemoradiotherapy (CRT), and both demonstrated a progression-free survival (PFS) benefit with CRT. The specific adverse risk features qualifying patients for each trial were variable, but an exploratory combined analysis[3] suggested that extracapsular extension (ECE) and positive surgical margins were the most significant prognostic factors, and patients with 1 of these 2 features derived benefit from CRT. Consequently, current clinical practice guidelines recommend CRT for patients with positive margins or ECE and list both RT and CRT as treatment options for patients with the other intermediate-risk features studied in the EORTC and RTOG trials.[4] Real-world practice reflects the equipoise in treatment of these patients with intermediate risk: a retrospective analysis of the National Cancer Database (NCDB) found that approximately half of patients with resected stage III to stage IV disease without positive margins or ECE received CRT, and CRT was associated with better overall survival (OS).[5] In a subgroup analysis, an increasing magnitude of benefit of chemotherapy was seen with increasing number of involved lymph nodes. Several other studies using NCDB data have further clarified the benefit of adjuvant therapy in patients with intermediate risk. In oral tongue cancers without positive margins or ECE, patients with involvement of 2 or more lymph nodes and/or a pathologic tumor stage 3 to 4 demonstrated a survival benefit with adjuvant CRT.[6] An examination of patients aged 70 years or younger found a survival benefit for adjuvant CRT for patients with stage III or IV disease without positive margins or ECE, although the benefit was not statistically significant after propensity matching.[7] Analysis of real-world practice suggests that younger patients with multiple risk factors are more likely to receive CRT,[5] but exact patient and disease characteristics that determine the benefit from treatment intensification is still uncertain. In traditional survival analysis, the group of patients who benefits most from a therapy can be identified through the use of interaction terms or by splitting data into multiple subgroups. For example, the Meta-analysis of Chemotherapy in Head and Neck Cancer[8] examined the benefit of adding chemotherapy to locoregional therapy to various age groups, finding no significant benefit in the group aged 71 years or older. However, there can be significant heterogeneity within subgroups; a patient aged 71 years with T4N3 disease will certainly have more benefit from chemotherapy than a patient aged 91 years with T2N0 disease. Thus, novel approaches for survival analysis are needed. Machine learning is a rapidly evolving field of data analysis that has the ability to account for the interaction between numerous features without explicitly specifying interaction terms or analyzing multiple subgroups.[9] As a form of artificial intelligence, machine learning refers to a broad range of algorithms that can iteratively improve their performance, making predictions that can mimic human decisions. Deep learning is a subset of machine learning wherein increasingly complex features are identified within each level of a multilayered model. Both traditional machine learning and deep learning have been increasingly used to solve complex problems in medicine using large clinical data sets.[10] Several machine learning models have been specifically developed to analyze right censored survival data. An early approach is the random survival forest (RSF), which uses the ensemble prediction of multiple decision trees to estimate a hazard function.[11] A recently developed Cox proportional hazards deep learning model, DeepSurv, was shown to improve on personalized treatment recommendations for the RSF model.[12] The multitask logistic regression is a proportional hazards model that allows for hazards to vary with time, and a deep learning extension of the model was developed that outperforms standard linear survival models.[13] As compared with traditional survival analysis, these 3 models allow for the prediction of a unique hazard ratio (HR) for a specific treatment for each patient based on individual disease characteristics. We planned a study to assess the ability of these machine learning models to predict survival in the adjuvant treatment of patients with intermediate risk factors and to identify which patients benefit from CRT.

Methods

Study Design and Data Source

This was a retrospective cohort study examining the ability of deep learning models to predict outcomes in patients with resected HNSCC without positive margins or ECE, undergoing adjuvant RT or CRT. We used abstracted patient data from the NCDB, the largest clinical cancer registry in the world, which includes approximately 70% of all new invasive cancer diagnoses in the United States.[14] This study was determined to be exempt from review by the University of Illinois at Chicago institutional review board. No informed consent was obtained from patients because our team did not obtain any patient-level data; we used the anonymized NCDB database curated by the American College of Surgeons and American Cancer Society. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Study Population and Covariates

We included patients diagnosed from January 1, 2004, to December 31, 2016, with squamous cell carcinoma of the oral cavity, oropharynx, hypopharynx, or larynx treated with definitive surgery and adjuvant RT or CRT. We excluded patients with metastatic disease at baseline or those who received immunotherapy. Cohort selection is illustrated in Figure 1.

Figure 1.

Cohort Selection Criteria

CRT indicates chemoradiotherapy; ECE, extracapsular extension; NCDB, National Cancer Database; and RT, radiotherapy.

Cohort Selection Criteria

CRT indicates chemoradiotherapy; ECE, extracapsular extension; NCDB, National Cancer Database; and RT, radiotherapy. Feature selection was informed by identifying parameters predictive for survival in a multivariable Cox proportional hazards model. Demographic variables included life expectancy (calculated using the Social Security actuarial life table from 2016[15]), race, sex, treatment at an academic center, year of treatment, and Charlson/Dayo comorbidity index. Disease-specific factors included T stage; N stage; number of lymph nodes involved; involvement of cervical, retropharyngeal, or parapharyngeal lymph nodes; depth of invasion for oral cavity cancers; tumor size; lymphovascular invasion (LVI); grade; HPV status; and primary site/subsite of tumor. Treatment-specific factors included receipt of chemotherapy, use of multiagent chemotherapy, number of lymph nodes dissected, radiation dose, and time from start of adjuvant therapy to completion. Missing categorical values were imputed with the most common category; missing numerical values were imputed with the mean. HPV status was imputed using an extra trees regressor, which predicted the HPV status of known cases with an area under the curve of 0.842.[16]

Model Development

DeepSurv,[12] RSF,[11] and neural network multitask logistic regression (N-MLTR)[13] models were constructed in Python version 3.7 (Python Software Foundation) using the PySurvival package.[17] We used an 80:20 split of data for training and validation: 20% of patients were chosen at random to be held out for model validation, whereas all training was done on the remaining 80% of data.[18] Intrinsic to each model are multiple hyperparameters that may affect predictive accuracy, such as the rate at which the model learns from the data set. To determine the optimal hyperparameters for each model, a random hyperparameter search was performed with 5-fold cross-validation within the training data set, resulting in model parameters (eTable 1 and eTable 2 in the Supplement).

Statistical Analysis

Statistical analysis was performed in Python version 3.7.5 (Python Software Foundation), and code used for model development and evaluation as well as the trained models generated for this analysis are available online.[19] The primary outcome was OS benefit associated with treatment according to model recommendations. We considered CRT recommended by a model if predicted survival was longer with CRT than RT. We calculated the HR, median OS, and significance via log-rank test for receipt of treatment in line with model recommendations. The HR was also calculated with inverse probability of treatment weighting (IPTW).[20] We expected our data set to be imbalanced between those receiving CRT and those receiving RT, with those with higher risk of recurrence likely receiving more intense therapy. IPTW adjusts for this imbalance by weighting cases according to the probability that they received a treatment. We considered multiple secondary outcomes to assess the performance characteristics of these models. Model accuracy was assessed using the concordance index (C index),[21] with confidence intervals assessed using bootstrapping with 1000 iterations[22] and differences between model C indices assessed with analysis of variance. As a comparator for our deep learning models, we used the intermediate risk factors from the EORTC 22931 (T3-4 except T3N0 larynx, N2-3, LVI, deep nodes with oral/oropharynx cancer) or RTOG 95-01 (2 involved nodes) trials as a decision rule. In other words, patients who met the inclusion criteria for these trials would be recommended for chemotherapy, and we calculated the association with OS for such a treatment pattern. Given ongoing attempts to deintensify treatment of HPV-positive cancers as well as the questionable utility of CRT in older patients, the benefit of treatment was assessed in an exploratory analysis of subgroups of patients with HPV-positive cancer and older patients. We also assessed model performance in the subgroups of patients recommended to receive CRT vs RT alone. All comparisons are done at the α = .05 significance level, and all statistical tests were 2-sided when applicable. Because secondary end points were considered exploratory, further adjustment of significance levels for multiple comparisons was not performed. To assess the association of individual features with model accuracy, we calculated the importance of each feature by permutating the data for the feature within the test set. Model accuracy, as indicated by concordance index, is then recalculated with permutated data to determine feature importance.[23] All data analysis was performed between October 1, 2019, through September 1, 2020.

Results

A total of 33 527 patients (24 189 [72%] men; 28 036 [84%] aged ≤70 years) met our inclusion criteria, of whom 14 775 (44%) received CRT. Nearly one-fifth of patients (2945 [9%]) received multiagent chemotherapy. Most patients received chemotherapy and radiation concurrently; chemotherapy began more than 30 days prior to radiation in 894 patients (3%). Demographic characteristics of the patients, stratified by receipt of RT or CRT, are summarized in Table 1. Most cancers were of the oral cavity (15 814 [47%]) or oropharynx (11 162 [33%]); nearly one-half had T3 to T4 disease (14 734 [44%]), and approximately two-thirds of patients had lymph node involvement (24 284 [73%]). Distribution of imputed HPV status closely approximated the distribution for cases with known HPV status. Median OS was 97.4 months (95% CI, 94.1-100.7 months) in the RT-only group and 111.2 months (95% CI, 106.8-114.5 months) in the CRT group. Receipt of chemotherapy was significantly associated with male gender, earlier year of diagnosis, lower Charlson/Deyo score, higher T and N stages, greater degree of nodal involvement, and LVI (Table 1). Features with missing data in more than 15% of cases included tumor thickness, measured tumor size, LVI, and HPV status (eTable 3 in the Supplement).

Table 1.

Patient Demographic, Disease, and Treatment Characteristics

No. (%)	Total population	Received CRT	Received RT	P value^a
Sex
Men	24 189 (72.1)	11 165 (75.6)	13 024 (69.5)	<.001
Women	9338 (27.9)	3610 (24.4)	5728 (30.5)	<.001
Age, y
>70	5491 (16.4)	1517 (10.3)	3974 (21.2)	<.001
≤70	28 036 (83.6)	13 258 (89.7)	14 778 (78.8)	<.001
Year of diagnosis
2004-2008	9697 (28.9)	4461 (30.2)	5236 (27.9)	<.001
2009-2012	10 750 (32.1)	5126 (34.7)	5624 (30.0)
2013-2016	13 080 (39.0)	5188 (35.1)	7892 (42.1)
Race
White	29 261 (87.3)	12 940 (87.6)	16 321 (87)	.10
Black	2801 (8.4)	1229 (8.3)	1572 (8.4)
Other or unknown	1465 (4.4)	606 (4.1)	859 (4.6)
Academic center	18 842 (56.2)	7857 (53.2)	10 985 (58.6)	<.001
Charlson/Deyo Score
0-1	31 752 (94.7)	14 105 (95.5)	17 647 (94.1)	<.001
2-3	1775 (5.3)	670 (4.5)	1105 (5.9)	<.001
Status at last contact
Alive	21 615 (64.5)	9423 (63.8)	12 192 (65.0)	.06
Dead	11 912 (35.5)	5352 (36.2)	6560 (35.0)	.06
Primary site
Oral cavity	15 814 (47.2)	5967 (40.4)	9847 (52.5)	<.001
Buccal mucosa	1082 (3.2)	369 (2.5)	713 (3.8)
Alveolar ridge	2296 (6.8)	714 (4.8)	1582 (8.4)
Retromolar trigone	1118 (3.3)	439 (3.0)	679 (3.6)
Tongue	7207 (21.5)	2822 (19.1)	4385 (23.4)
Oropharynx	11 162 (33.3)	6086 (41.2)	5076 (27.1)
Tonsils	7347 (21.9)	4209 (28.5)	3138 (16.7)
Base of tongue	2997 (8.9)	1488 (10.1)	1509 (8.0)
Hypopharynx	920 (2.7)	500 (3.4)	420 (2.2)
Larynx	5631 (16.8)	2222 (15.0)	3409 (18.2)
Tumor stage
T1	8544 (25.5)	3859 (26.1)	4685 (25.0)	.02
T2	10 249 (30.6)	4408 (29.8)	5841 (31.1)
T3	4711 (14.1)	2104 (14.2)	2607 (13.9)
T4	10 023 (29.9)	4404 (29.8)	5619 (30.0)
Nodal stage
N0	11 038 (32.9)	2596 (17.6)	8442 (45.0)	<.001
N1	6674 (19.9)	2578 (17.4)	4096 (21.8)
N2	15 290 (45.6)	9201 (62.3)	6089 (32.5)
N3	525 (1.6)	400 (2.7)	125 (0.7)
Measured tumor size, median (IQR), mm	3.3 (2.0-3.7)	3.3 (2.0-4.0)	3.2 (2.0-3.5)	<.001
Tumor thickness, median (IQR), mm^b	62.7 (10.0-90.0)	6.46 (11.0-90.0)	6.18 (10.0-90.0)	.01
Differentiation
Well differentiated	3309 (9.9)	1086 (7.4)	2223 (11.9)	<.001
Moderately differentiated	17 645 (52.6)	7461 (50.5)	10 184 (54.3)
Poorly differentiated or anaplastic	10 055 (30.0)	4929 (33.4)	5126 (27.3)
LVI	4654 (13.9)	2449 (16.6)	2205 (11.8)	<.001
Lymph nodes positive
0	9143 (27.3)	2074 (14.0)	7069 (37.7)	<.001
1	7916 (23.6)	3351 (22.7)	4565 (24.3)
2-4	13 184 (39.3)	6870 (46.5)	6314 (33.7)
5-9	2499 (7.5)	1849 (12.5)	650 (3.5)
≥10	785 (2.3)	631 (4.3)	154 (0.8)
Lymph node levels involved
I	6404 (19.1)	3666 (24.8)	2738 (14.6)	<.001
II	14 212 (42.4)	8240 (55.8)	5972 (31.8)	<.001
III	7936 (23.7)	4902 (33.2)	3034 (16.2)	<.001
IV	3047 (9.1)	2086 (14.1)	961 (5.1)	<.001
V	1153 (3.4)	853 (5.8)	300 (1.6)	<.001
Retropharyngeal	265 (0.8)	182 (1.2)	83 (0.4)	<.001
Parapharyngeal	275 (0.8)	165 (1.1)	110 (0.6)	<.001
HPV positivity^c
Oropharynx	4038 (77.8)	1994 (77.3)	2044 (78.3)	.77
Nonoropharynx	669 (16.0)	297 (16.9)	372 (15.3)	.13
HPV positivity, imputed^d
Oropharynx, %	73.3	74.2	72.4	NA
Nonoropharynx, %	16.7	17.8	16.1	NA
Multiagent chemotherapy	2945 (8.8)	2945 (19.9)	0 (0.0)
Time from surgery to completion of RT
>100 d	15 744 (47.0)	7017 (47.5)	8727 (46.5)	.004
≤100 d	17 783 (53.0)	7758 (52.5)	10 025 (53.5)	.004
RT dose
50-59 Gy	4734 (14.1)	2535 (17.2)	2199 (11.7)	<.001
60-69 Gy	5848 (17.4)	2230 (15.1)	3618 (19.3)
≥70 Gy	22 948 (68.4)	9501 (64.3)	13 447 (71.7)
Adequate lymph node dissection, ie, ≥18+ nodes examined	4731 (14.1)	3044 (20.6)	1687 (9.0)	<.001

Abbreviation: CRT, chemoradiotherapy; HPV, human papillomavirus; IQR, interquartile range; LVI, lymphovascular invasion; NA, not applicable; RT, radiotherapy.

For categorical values, the P value for a χ2 test comparing the CRT and RT groups is provided; for numerical values, the P value for an unpaired 2-sided t test is provided.

Values listed for the oral cavity subgroup prior to imputation, given that tumor thickness is not available and/or not applicable for other primary tumor sites.

Percentage listed indicates percentage positive out of all patients from whom HPV status is reported.

The regressor used to impute HPV status assigns a likelihood from 0 to 1 of HPV positivity for each case. Mean imputed HPV status is listed as a percentage to allow comparison with true HPV rates.

Abbreviation: CRT, chemoradiotherapy; HPV, human papillomavirus; IQR, interquartile range; LVI, lymphovascular invasion; NA, not applicable; RT, radiotherapy. For categorical values, the P value for a χ2 test comparing the CRT and RT groups is provided; for numerical values, the P value for an unpaired 2-sided t test is provided. Values listed for the oral cavity subgroup prior to imputation, given that tumor thickness is not available and/or not applicable for other primary tumor sites. Percentage listed indicates percentage positive out of all patients from whom HPV status is reported. The regressor used to impute HPV status assigns a likelihood from 0 to 1 of HPV positivity for each case. Mean imputed HPV status is listed as a percentage to allow comparison with true HPV rates. Our models were trained on 26 821 cases, with the remaining 6706 reserved for validation. Results of the hyperparameter search for each model are described in eTable 1 and eTable 2 in the Supplement. With a median follow-up in the validation set of 43.2 (19.8-65.5) months, treatment according to machine learning model recommendations was associated with significantly improved survival for all models (Table 2), with an HR of 0.79 (95% CI, 0.72-0.85; P < .001) for DeepSurv, 0.83 (95% CI, 0.77-0.90; P < .001) for N-MTLR, and 0.90 (95% CI, 0.83-0.98; P = .01) for RSF. No survival benefit was seen with recommending chemotherapy only for patients with intermediate risk who met inclusion criteria for the EORTC 22931 trial (HR, 0.93; 95% CI, 0.86-1.01; P = .07) or the RTOG 95-01 trial (HR, 0.96; 95% CI, 0.89-1.05; P = .38).

Table 2.

Model Accuracy and Survival Predictions for Treatment According to Model Recommendations

Model	C index	OS, median (IQR), months		HR (95% CI)^a	P value	HR, IPTW (95% CI)^a	P value
Model	C index	Patients receiving recommended treatment	Patients not receiving recommended treatment	HR (95% CI)^a	P value	HR, IPTW (95% CI)^a	P value
RTOG 95-01, ≥2 lymph nodes	NA	111.8 (102.1-118.9)	98.1 (91.8-108.9)	0.96 (0.89-1.05)	.38	0.89 (0.81-0.98)	.02
EORTC 22931, T3-4 disease, except T3N0 larynx N2-3, LVI, deep nodes with oral/oropharynx cancer	NA	111.3 (105.4-117.6)	95.3 (86.9-108.2)	0.93 (0.86-1.01)	.07	0.90 (0.82-0.99)	.03
DeepSurv	0.693 (0.675-0.711)	118.1 (111.5-126.5)	90.6 (79.8-98.1)	0.79 (0.72-0.85)	<.001	0.76 (0.69-0.84)	<.001
N-MTLR	0.691 (0.673-0.709)	116.4 (109.7-123.3)	93.5 (85.9-101.1)	0.83 (0.77-0.90)	<.001	0.80 (0.72-0.88)	<.001
RSF	0.695 (0.676-0.713)	111.4 (101.1-120.3)	99.5 (91.7-110.1)	0.90 (0.83-0.98)	.01	0.96 (0.87-1.06)	.41

HRs are given for the patients who received a recommended treatment, compared with those who did not. Results are compared with decision rules derived from the RTOG and EORTC trials, wherein patients were recommended to receive chemoradiation if they met any of the intermediate risk criteria specified by the trials and radiation alone if not.

Abbreviations: EORTC, European Organisation for Research and Treatment of Cancer; HR, hazard ratio; IPTW, inverse probability of treatment weighting; IQR, interquartile range; LVI, lymphovascular invasion; N-MTLR, neural multitask logistic regression; OS, overall survival; RSF, random survival forest; RTOG, Radiation Therapy Oncology Group. HRs are given for the patients who received a recommended treatment, compared with those who did not. Results are compared with decision rules derived from the RTOG and EORTC trials, wherein patients were recommended to receive chemoradiation if they met any of the intermediate risk criteria specified by the trials and radiation alone if not. Given the expected differences in patient demographic characteristics among those treated with or without chemotherapy, we repeated these comparisons with IPTW, weighting patients more heavily if they were underrepresented in a given treatment group. Survival benefit remained significant with IPTW for treatment according to the DeepSurv (HR, 0.76; 95% CI, 0.69-0.84; P < .001) and N-MTLR (HR, 0.80; 95% CI, 0.72-0.88; P < .001) models. Notably, no survival benefit for CRT was seen for patients recommended to receive RT alone in any of the models, suggesting that the models identified a subgroup of patients for whom RT alone is sufficient (Figure 2; eFigure in the Supplement). Within the entire data set, 24 862 patients (74%) met the EORTC 22931 inclusion criteria, and 16 468 (49%) met RTOG 95-01 inclusion criteria. Chemotherapy would be recommended for 17 589 (52%), 15 917 (47%), and 14 912 (44%) of patients in the DeepSurv, N-MLTR, and RSF models, respectively. No significant difference between model accuracy for prognosis was seen among the machine learning models (DeepSurv: C index, 0.693; 95% CI, 0.675-0.711; N-MTLR: C index, 0.691; 95% CI, 0.673-0.709; RSF: C index, 0.695; 95% CI, 0.676-0.713; P = .95) (Table 2).

Figure 2.

Survival Outcomes for the Subgroup Recommended for Radiotherapy (RT) and chemoradiotherapy (CRT) in the Test Cohort

Survival Outcomes for the Subgroup Recommended for Radiotherapy (RT) and chemoradiotherapy (CRT) in the Test Cohort

Results illustrated for DeepSurv (A, B), neural network multitask logistic regression (N-MLTR) (C, D), and random survival forest (RSF) (E, F) models. Panels on the left examine the subgroup of patients who were recommended to receive radiation alone by each of the 3 models, with no survival difference (per log-rank test) seen between patients who did or did not receive chemotherapy. A benefit to chemotherapy is seen for patients recommended for CRT by the DeepSurv and N-MTLR models (B, D). Assessing variable permutation importance (Figure 3) identified features important to model accuracy for prognosis, with a more than 1% mean reduction in concordance index with permutation of life expectancy, year of diagnosis, T4 tumor stage, HPV positivity, and tonsillar subsite (eTable 4 in the Supplement). In a post hoc analysis, patients in the test cohort were stratified according to HPV status (excluding those for whom HPV status was imputed) and age, and the HR associated with treatment according to model recommendations was measured (eTable 5 in the Supplement). These results suggest that treatment according to DeepSurv recommendations demonstrated a persistent survival benefit in both patients older than 70 years (HR, 0.74; 95% CI, 0.62-0.89; P = .002) and those aged 70 years or younger (HR, 0.87; 95% CI, 0.79-0.95, P = .003).

Figure 3.

Heatmap for Permutation Feature Importance for DeepSurv, Neural Network Multitask Logistic Regression (N-MLTR), and Random Survival Forest (RSF) Models

Values are given as percentage decrease in C index, permuted. Higher values indicate greater importance to the predictive accuracy of the respective deep learning model. A log scale is used to highlight the magnitude of each feature’s contribution to overall model accuracy, identifying the most important (>1%), modestly important (>0.1%), less important (>0), and uninformative (<0) features. HPV indicates human papillomavirus.

Heatmap for Permutation Feature Importance for DeepSurv, Neural Network Multitask Logistic Regression (N-MLTR), and Random Survival Forest (RSF) Models

Discussion

Machine learning algorithms have increasingly prominent applications within health care, with applications in head and neck cancer including detection of HPV status from pathology slides,[24] prospective identification of patients at risk for delay in adjuvant radiotherapy,[25] and prediction of ECE from imaging.[26] Deep learning has traditionally excelled at classification tasks, such as identifying objects present in a picture, but the recent development of novel survival models enables new approaches to outcomes data.[12] Here we demonstrate that adjuvant treatment according to 3 machine learning models can improve patient survival, and all models identified patients who can be safety managed with RT alone. However, only the survival benefit associated with the N-MLTR and DeepSurv models remained significant with IPTW, and only DeepSurv provided robust predictions in both older and younger patients. Since the results of the EORTC 22931 and RTOG 95-01 trials, there have been numerous attempts to identify subgroups of patients with intermediate risk factors who would benefit from adjuvant CRT. For oral cavity cancer, an observational study[27] found that patients with 3 intermediate risk factors had a survival benefit from adjuvant CRT. No difference was seen between the RT and CRT groups for patients with only 2 risk factors, but the study was limited because of its observational nature and lack of propensity matching. Indeed, real-world practice would suggest that a significant number of patients with a single intermediate risk factor are not receiving chemotherapy, given that nearly three-quarters of patients in this data set would meet inclusion criteria for EORTC 22931, but only 44% received CRT. All the machine learning models presented here selected a smaller subset of the population than the EORTC 22931 trial for the addition of chemotherapy, and yet they performed at least as well in identifying who would benefit from chemotherapy. There is particular interest in deescalating patients who have HPV-positive cancers. Several small studies have suggested a lack of association of chemotherapy with outcome in even high-risk patients with HPV-positive cancer.[28,29] Our models found HPV to be 1 of the most important factors with regard to predictive accuracy. Several ongoing and recently completed randomized clinical trials will help further clarify the management of patients with HPV-positive cancer. Preliminary reports from the phase II ECOG 3311 trial report favorable outcomes for patients with minimal (ie, <1 mm) ECE, fewer than 5 involved lymph nodes, and clear surgical margins treated with postoperative radiation without chemotherapy.[30,31] The phase III ADEPT trial will examine whether RT alone is adequate for the treatment of patients with HPV with ECE and negative resection margins.[32] The landscape of HPV-positive cancer is moving toward deintensification, but the lack of randomized data in the intermediate-risk group is a perfect opportunity for machine learning models to identify patients who can forgo chemotherapy. The benefit of CRT has also been called into question for older patients. Two separate retrospective studies of the NCDB revealed conflicting results regarding even high-risk patients, with 1 analysis finding a survival benefit with CRT and the other failing to find a benefit.[33,34] However, a clinically relevant trend toward benefit was seen in both studies, suggesting chemotherapy may still have a role in the adjuvant treatment of select older patients. The utility of CRT in older patients with intermediate risk is also unclear, but again, the benefit seen in our DeepSurv model persisted in both young and older patients and could serve as a decision aid in this patient group.

Limitations

There are several limitations to our current study. The NCDB lacks several important clinical variables, such as smoking history[35] and perineural invasion,[36] that could add to our predictive accuracy. Additionally, chemotherapy regimen and dose is not specified in the NCDB beyond the administration of multiagent vs single-agent treatments. The database also has incomplete data for some variables, most prominently HPV status and lymphovascular invasion; undoubtedly a more accurate model could be created if these features were known instead of imputed. Although we controlled for confounders in our data set with IPTW, it is recommended that machine learning models are validated in an independent data set.[37] Identifying an appropriate external data set is challenging, given that most cases at our center are already represented in the NCDB. Nonetheless, with further study, machine learning models could provide the basis for the prudent use of chemotherapy in patients with intermediate risk.

Conclusions

In this cohort study, 3 machine learning models predicted which patients with resected HNSCC and intermediate risk would benefit from receiving CRT. While such models are naturally opaque, they excel at identifying novel interactions between data. Future studies will need to confirm the validity of these models, and further analysis with more comprehensive clinical data not captured in the NCDB may result in predictions that are even more accurate. Machine learning has the potential to distill the complex heterogeneity of real-world practice into meaningful recommendations for true precision medicine.

23 in total

1. Postoperative concurrent radiotherapy and chemotherapy for high-risk squamous-cell carcinoma of the head and neck.

Authors: Jay S Cooper; Thomas F Pajak; Arlene A Forastiere; John Jacobs; Bruce H Campbell; Scott B Saxman; Julie A Kish; Harold E Kim; Anthony J Cmelak; Marvin Rotman; Mitchell Machtay; John F Ensley; K S Clifford Chao; Christopher J Schultz; Nancy Lee; Karen K Fu
Journal: N Engl J Med Date: 2004-05-06 Impact factor: 91.245

2. Machine Learning to Predict Delays in Adjuvant Radiation following Surgery for Head and Neck Cancer.

Authors: Matthew Shew; Jacob New; Andrés M Bur
Journal: Otolaryngol Head Neck Surg Date: 2019-01-29 Impact factor: 3.497

3. Extracapsular spread and adjuvant therapy in human papillomavirus-related, p16-positive oropharyngeal carcinoma.

Authors: Parul Sinha; James S Lewis; Jay F Piccirillo; Dorina Kallogjeri; Bruce H Haughey
Journal: Cancer Date: 2011-11-15 Impact factor: 6.860

4. Defining risk levels in locally advanced head and neck cancers: a comparative analysis of concurrent postoperative radiation plus chemotherapy trials of the EORTC (#22931) and RTOG (# 9501).

Authors: Jacques Bernier; Jay S Cooper; T F Pajak; M van Glabbeke; J Bourhis; Arlene Forastiere; Esat Mahmut Ozsahin; John R Jacobs; J Jassem; Kie-Kian Ang; J L Lefèbvre
Journal: Head Neck Date: 2005-10 Impact factor: 3.147

5. Adjuvant Chemoradiation After Surgical Resection in Elderly Patients With High-Risk Squamous Cell Carcinoma of the Head and Neck: A National Cancer Database Analysis.

Authors: Neil M Woody; Matthew C Ward; Shlomo A Koyfman; Chandana A Reddy; Jessica Geiger; Nikhil Joshi; Brian Burkey; Joseph Scharpf; Eric Lamarre; Brandon Prendes; David J Adelstein
Journal: Int J Radiat Oncol Biol Phys Date: 2017-03-18 Impact factor: 7.038

6. Statistics versus machine learning.

Authors: Danilo Bzdok; Naomi Altman; Martin Krzywinski
Journal: Nat Methods Date: 2018-04-03 Impact factor: 28.547

7. Association Between Head and Neck Squamous Cell Carcinoma Survival, Smoking at Diagnosis, and Marital Status.

Authors: Nosayaba Osazuwa-Peters; Eric Adjei Boakye; Betty Y Chen; Betelihem B Tobo; Mark A Varvares
Journal: JAMA Otolaryngol Head Neck Surg Date: 2018-01-01 Impact factor: 6.223

Review 8. Perineural invasion in head and neck cancers - a review.

Authors: A Frunza; D Slavescu; I Lascar
Journal: J Med Life Date: 2014-06-25

9. The National Cancer Data Base: a powerful initiative to improve cancer care in the United States.

Authors: Karl Y Bilimoria; Andrew K Stewart; David P Winchester; Clifford Y Ko
Journal: Ann Surg Oncol Date: 2008-01-09 Impact factor: 5.344

Review 10. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies.

Authors: Peter C Austin; Elizabeth A Stuart
Journal: Stat Med Date: 2015-08-03 Impact factor: 2.373

10 in total

1. Improving Adjuvant Liver-Directed Treatment Recommendations for Unresectable Hepatocellular Carcinoma: An Artificial Intelligence-Based Decision-Making Tool.

Authors: Allen Mo; Christian Velten; Julie M Jiang; Justin Tang; Nitin Ohri; Shalom Kalnicki; Parsa Mirhaji; Kei Nemoto; Boudewijn Aasman; Madhur Garg; Chandan Guha; N Patrik Brodin; Rafi Kabarriti
Journal: JCO Clin Cancer Inform Date: 2022-06

2. Development of a Deep Learning Model for Malignant Small Bowel Tumors Survival: A SEER-Based Study.

Authors: Minyue Yin; Jiaxi Lin; Lu Liu; Jingwen Gao; Wei Xu; Chenyan Yu; Shuting Qu; Xiaolin Liu; Lijuan Qian; Chunfang Xu; Jinzhou Zhu
Journal: Diagnostics (Basel) Date: 2022-05-17

3. Benchmarking Eliminative Radiomic Feature Selection for Head and Neck Lymph Node Classification.

Authors: Zoltan R Bardosi; Daniel Dejaco; Matthias Santer; Marcel Kloppenburg; Stephanie Mangesius; Gerlig Widmann; Ute Ganswindt; Gerhard Rumpold; Herbert Riechelmann; Wolfgang Freysinger
Journal: Cancers (Basel) Date: 2022-01-18 Impact factor: 6.639

4. Machine Learning for Head and Neck Cancer: A Safe Bet?-A Clinically Oriented Systematic Review for the Radiation Oncologist.

Authors: Stefania Volpe; Matteo Pepa; Mattia Zaffaroni; Federica Bellerba; Riccardo Santamaria; Giulia Marvaso; Lars Johannes Isaksson; Sara Gandini; Anna Starzyńska; Maria Cristina Leonardi; Roberto Orecchia; Daniela Alterio; Barbara Alicja Jereczek-Fossa
Journal: Front Oncol Date: 2021-11-18 Impact factor: 6.244

Review 5. Transoral Robotic Surgery for Oropharyngeal Squamous Cell Carcinoma of the Tonsil versus Base of Tongue: A Systematic Review and Meta-Analysis.

Authors: Nicolas S Poupore; Tiffany Chen; Shaun A Nguyen; Cherie-Ann O Nathan; Jason G Newman
Journal: Cancers (Basel) Date: 2022-08-08 Impact factor: 6.575

6. Deep Learning Artificial Intelligence to Predict the Need for Tracheostomy in Patients of Deep Neck Infection Based on Clinical and Computed Tomography Findings-Preliminary Data and a Pilot Study.

Authors: Shih-Lung Chen; Shy-Chyi Chin; Chia-Ying Ho
Journal: Diagnostics (Basel) Date: 2022-08-12

7. Deep Learning-Based Time-to-Death Prediction Model for COVID-19 Patients Using Clinical Data and Chest Radiographs.

Authors: Toshimasa Matsumoto; Shannon Leigh Walston; Michael Walston; Daijiro Kabata; Yukio Miki; Masatsugu Shiba; Daiju Ueda
Journal: J Digit Imaging Date: 2022-08-08 Impact factor: 4.903

8. Deep learning models for predicting the survival of patients with chondrosarcoma based on a surveillance, epidemiology, and end results analysis.

Authors: Lizhao Yan; Nan Gao; Fangxing Ai; Yingsong Zhao; Yu Kang; Jianghai Chen; Yuxiong Weng
Journal: Front Oncol Date: 2022-08-22 Impact factor: 5.738

9. Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis.

Authors: Kaiyue Wang; Lin Zhang; Lixia Li; Yi Wang; Xinqin Zhong; Chunyu Hou; Yuqi Zhang; Congying Sun; Qian Zhou; Xiaoying Wang
Journal: Int J Mol Sci Date: 2022-10-08 Impact factor: 6.208

10. Microscopic Extranodal Extension in HPV-Negative Head and Neck Cancer and the Role of Adjuvant Chemoradiation.

Authors: Flora Yan; Hong Li; John R de Almeida; John M Kaczmar; Patrik Pipkorn; Joseph Zenga; Mary S Richardson; David M Neskey; Anand K Sharma; Terry A Day; Evan M Graboyes
Journal: Otolaryngol Head Neck Surg Date: 2021-02-23 Impact factor: 5.591

10 in total