Literature DB >> 34159264

Prospective study of artificial intelligence-based decision support to improve head and neck radiotherapy plan quality.

David J Sher¹, Andrew Godley¹, Yang Park¹, Colin Carpenter², Marc Nash², Hasti Hesami¹, Xinran Zhong¹, Mu-Han Lin¹.

Abstract

BACKGROUND AND
PURPOSE: Volumetric modulated arc therapy (VMAT) planning for head and neck cancer is a complex process. While the lowest achievable dose for each individual organ-at-risk (OAR) is unknown a priori, artificial intelligence (AI) holds promise as a tool to accurately estimate the expected dose distribution for OARs. We prospectively investigated the benefits of incorporating an AI-based decision support tool (DST) into the clinical workflow to improve OAR sparing.
MATERIALS AND METHODS: The DST dose prediction model was based on 276 institutional VMAT plans. Under an IRB-approved prospective trial, the physician first generated a custom OAR directive for 50 consecutive patients (physician directive, PD). The DST then estimated OAR doses (AI directive, AD). For each OAR, the treating physician used the lower directive to form a hybrid directive (HD). The final plan metrics were compared to each directive. A dose difference of 3 Gray (Gy) was considered clinically significant.
RESULTS: Compared to the AD and PD, the HD reduced OAR dose objectives by more than 3 Gy in 22% to 75% of cases, depending on OAR. The resulting clinical plan typically met these lower constraints and achieved mean dose reductions between 4.3 and 16 Gy over the PD, and 5.6 to 9.1 Gy over the AD alone. Dose metrics achieved using the HD were significantly better than institutional historical plans for most OARs and NRG constraints for all OARs.
CONCLUSIONS: The DST facilitated a significantly improved treatment directive across all OARs for this generalized H&N patient cohort, with neither the AD nor PD alone sufficient to optimally direct planning.

Entities: Chemical

Keywords: Artificial intelligence; Decision-support tools; Head and neck cancer; IMRT

Year: 2021 PMID： 34159264 PMCID： PMC8196054 DOI： 10.1016/j.ctro.2021.05.006

Source DB: PubMed Journal: Clin Transl Radiat Oncol ISSN： 2405-6308

Introduction

Head and neck radiation treatment planning has undergone a significant revolution over the past 20 years. Treatment techniques transitioned from opposed laterals and an anterior field to 3-dimensional radiotherapy (3D-CRT) to intensity modulated radiation therapy (IMRT) over a relatively brief period of time [1]. Given the routine implementation of volumetric modulated arc therapy (VMAT), increased conformality and reduced organ-at-risk (OAR) doses are achievable with a relatively brief treatment delivery [2]. Yet with increasing complexity comes a lengthier treatment planning process and the need for physicians to make more tradeoff decisions. While physicians may use standard, nationally-accepted organ-at-risk constraints to direct the planning process, the configuration of each patient’s targets and normal tissue anatomy is different. By using a rigid set of constraints for any given patient, some OARs may receive an unnecessarily high dose that would still be technically within tolerance, or the planner may spend excessive optimization time trying to meet a constraint that is unachievable. Thus, the optimal dose distribution may not be achieved without a more personalized treatment directive that incorporates their anatomy. Research harnessing artificial intelligence (AI) has led to algorithms that can estimate the expected dose distribution either based on the intrinsic relationship between targets and normal structures (i.e. knowledge-based methods) [3], [4], [5] or on mapping the given patient to a library of similar patients (i.e. atlas-based methods) [6]. Preliminary work has suggested that such algorithms may lead to clinically reasonable and accurate estimates of an achievable treatment plan [7], [8], [9]. These predictions can be utilized in several ways, including informing physician decision-making on acceptable OAR target doses or automatically driving a treatment planning optimizer towards a clinically optimal plan [4], [7], [10], [11]. Clinical decision support tools (DST) have been frequently implemented in oncology, such as the use of clinical pathways for chemotherapy regimen or radiotherapy prescription dose in medical and radiation oncology, respectively [12], [13]. The incorporation of AI into a treatment planning DST is promising not only from an accuracy standpoint but also as a means to avoid potential pitfalls of relying exclusively on AI, including lack of clinical context or unintended bias from the underlying data. Further, the improved confidence in delivering the right treatment to the patient, provided by a DST, should result in improved efficiency for the clinical team. It is within this framework that we have evaluated the application of a commercially-available DST (QuickMatch, Siris Medical) [14] to augment clinical expertise by incorporating AI-based dose prediction into the head and neck treatment directive process.

Methods

Decision-support tool

Two hundred seventy-six patients from 2015 to 2018 with standard dose-fractionation regimens were used for AI model training. The DST engine uses a machine learning approach to model dose, although the output to the user is a list of de-identified patients selected as the closest matches to the expected doses to the OARs and planning target volumes (PTV). Once a new patient’s contours were finalized, they were exported to the DST, and following processing, several matched patients in the database are shown to the physician. This study focused on dose metrics to the key OARs that typically drive planning trade-offs. Based on our historical practice, the key OARs were: contralateral parotid gland, ipsilateral superficial parotid gland, contralateral submandibular gland, oral cavity, middle pharyngeal constrictor, inferior pharyngeal constrictor, cervical esophagus, and larynx.

Clinical implementation

One physician (DJS) used the DST on 50 consecutive patients with oropharynx, oral cavity, larynx, or hypopharynx cancer prior to submitting the treatment directive to the dosimetrist. Prior to sending target and OAR structures to the DST for evaluation, the physician prepared the treatment directive (physician directive, PD) for the dosimetrist. The physician’s routine practice is to evaluate the patient’s unique PTV and OAR anatomic relationships and provide a custom estimate for the OARs of that case based on experience. Once the initial constraints were recorded, the DST was engaged, and the physician then identified the preferred matched patient produced by the DST. For each OAR, if the PD constraint was lower than the corresponding OAR in the DST matched patient (AI directive, AD), the PD constraint was kept. If the AD constraint was lower than the PD constraint, the AD constraint was submitted instead. Thus, the final directive used for treatment planning was a hybrid of the initial physician and AI constraints, called a Hybrid Directive (HD). This study was reviewed by the IRB and considered exempt.

Treatment planning

The treatment plans were created using Eclipse treatment planning system V15.5 (Varian, Palo Alto, CA) and volumetric modulated arc therapy (VMAT) technique. The general target coverage requirements are: (1) more than 95% of the target receiving the prescribed dose, (2) less than 105% hot spot and (3) optimal dose conformality (i.e. minimize the dose leak between individual dose level PTVs). The planner generated the VMAT plan with three to five 6MV arcs to achieve the optimal dose distribution fulfilling both the HD and the target coverage requirements. When the HD constraint was unachievable, variation was, of course, allowed to meet coverage criteria.

Analysis

The primary aim of the analysis was to assess the benefit of using the hybrid directive over the physician or decision-support directives alone. First, we compared the AD and PD as well as the frequency that each of them used in the HD for planning. As a general statement, we assumed that planning directives and achieved doses within 3 Gy of each other are clinically comparable and report the frequency at which the AD and PD were different by at least 3 Gy. We also showed the probability that the achieved plans were more than 3 Gy lower than either directives. The achieved doses based on the HD were compared with the initial AD or PD using Wilcoxon signed rank test. We also compared the OAR metrics from the achieved plans using the HD with both NRG national standards and our prior institutional experience using a two-tailed t-test. In order to determine if the planning results based on the HD were achieved because the planner/optimizer was simply able to improve on the given directive, we compared the original directive for the historical plans matched by the DST; 39 original directives were accessible from the record. The differences between the original directive and original achieved plan were compared against zero using a one-sample t-test, as were the differences between the HD and its achieved plan.

Results

Fifty cases were accrued from January 2019 to June 2019. Baseline patient characteristics are shown in Table 1. Approximately 50% of patients had oropharynx cancer, and the remainder larynx/hypopharynx or oral cavity. The mean OAR constraints from the AD and PD estimates are shown in Table 2, as well as the frequency that each was used in the HD. For some structures, the AD was used more frequently, while in others, the PD was more often used for planning. Overall, at least one OAR prediction from the AI DST was incorporated into the final directive for 90% (45/50) of patients; further, at least two OAR predictions from the DST were incorporated for 76% of the patients (38/50).

Table 1

Patient characteristics. T and N stage are based on the AJCC (American Joint Commission on Cancer) 7 staging manual.

Characteristic	Number (%)
Disease site
Oropharynx	26 (52%)
Larynx	14 (28%)
Oral cavity	7 (14%)
Other	3 (6%)

T stage
T1	10 (20%)
T2	17 (34%)
T3	10 (20%)
T4	13 (26%)

N stage
N0	12 (24%)
N1	7 (14%)
N2	30 (60%)
N3	1 (2%)

Table 2

Mean AI directives (AD) and physician directives (PD) and the proportion each was used for the hybrid directive (HD). Abbreviations: Sup = superior; Mid = middle; Inf = inferior; CL = contralateral; IL = ipsilateral; SMG = submandibular gland; Gy = Gray.

OAR	AD		PD
OAR	Mean (Gy), SD	% Used in HD	Mean (Gy), SD	% Used in HD
Esophagus	20.8 (1.3)	33%	19.9 (1.2)	67%
Larynx	29.4 (0.6)	60%	30.1 (0.7)	40%
Sup Constrictor	35.4 (1.1)	19%	36.2 (1.2)	81%
Mid Constrictor	33.9 (1.1)	21%	34.1 (1.0)	79%
Inf Constrictor	21.8 (1.3)	48%	22.4 (1.3)	52%
CL Parotid	19.2 (0.9)	56%	18.5 (0.9)	44%
IL Sup_Parotid	24.7 (0.6)	39%	24.8 (0.2)	61%
CL SMG	37.6 (2.3)	38%	28.0 (1.8)	62%
Oral Cavity	17.9 (0.5)	49%	18.2 (0.7)	51%

Patient characteristics. T and N stage are based on the AJCC (American Joint Commission on Cancer) 7 staging manual. Mean AI directives (AD) and physician directives (PD) and the proportion each was used for the hybrid directive (HD). Abbreviations: Sup = superior; Mid = middle; Inf = inferior; CL = contralateral; IL = ipsilateral; SMG = submandibular gland; Gy = Gray. With respect to the initial prediction from the physician and DST, there was no consistent pattern of one estimate consistently higher than the other (Fig. 1a); while there was reasonably high concordance in some OARs such as the inferior constrictor and contralateral parotid, there was also significant disagreement in some structures, such as the esophagus and middle constrictor. While the directives were frequently comparable, when there was a discrepancy greater than 3 Gy, the mean difference was quite large (greater than 5.7 Gy in Supplementary Table 1). Moreover, the percent of directives derived from the AD or PD that made up the HD for a given patient were variable, depending on the patient (Fig. 1b).

Fig. 1

(A) Comparison of the original physician directive (PD) and artificial intelligence directive (AD) estimates. (B) Frequency of PD vs AD use in forming the hybrid directive (HD). Abbreviations: Sup = superior; Mid = middle; Inf = inferior; CL = contralateral; IL = ipsilateral; Gy = Gray. The final achieved plan was often significantly lower than the original estimates from the AD or PD alone (Fig. 2, Table 3). In fact, depending on the OAR, the final plan was more than 3 Gy lower than the AD and PD in 17%-53% and 13%-53% of the cases, respectively. When comparing either the AD or PD to the final achieved doses, the achieved plans were more likely to be within 3 Gy of the PD than the AD prediction, but the achieved doses were often more than 3 Gy less than the PD estimate as well. Whereas the final plan metrics were highly unlikely to be more than 3 Gy higher than the PD, the AD overestimated the potential sparing of several OARs for 4%-14% of cases. In these cases, if a planner strictly relied on the AD, unnecessary effort would have been spent trying to (unsuccessfully) meet the proposed constraints.

Fig. 2

Comparison of hybrid directive (HD) achieved plan doses to the artificial intelligence directive (AD, Panel A) and original physician directive (PD, Panel B).

Table 3

Comparison of achieved organ-at-risk doses in the treatment plan relative to the AI directive (AD) and the physician directive (PD), respectively. Abbreviations: Sup = superior; Mid = middle; Inf = inferior; CL = contralateral; IL = ipsilateral; Gy = Gray; N/A = not applicable; * = comparison versus no dose difference, p<0.05.

	AD Prediction
	Plan & AD within 3 Gy	Plan > 3 GyLOWER than AD prediction			Plan > 3 GyHIGHER than AD prediction
	%	%	Mean Dose Difference, SD (Gy)	SD (Gy)	%	Mean Dose Difference (Gy)	SD (Gy)
Esophagus	34%	53%	10.7*	7.1	13%	−6.1	1.3
Larynx	71%	29%	5.0*	2.0	0%	N/A	N/A
Sup Constrictor	50%	44%	7.9	8.7	6%	−7.5	N/A
Mid Constrictor	44%	52%	9.8*	6.0	4%	−3.9	N/A
Inf Constrictor	83%	17%	4.3*	1.4	0%	N/A	N/A
CL Parotid	62%	24%	7.2*	3.9	14%	−5.4	1.3
IL Sup Parotid	70%	23%	7.4*	4.7	7%	−3.6	0.4
CL SMG	54%	35%	16.0*	14.9	11%	−8.7	4.1
Oral Cavity	57%	30%	5.9*	3.5	13%	−4.7	1.8

	PD Prediction

	Plan & PD within 3 Gy	Plan > 3 GyLOWER than PD prediction			Plan > 3 GyHIGHER than PD prediction

%	%	Mean Dose Difference (Gy)	SD (Gy)	%	Mean Dose Difference (Gy)	SD (Gy)

Esophagus	71%	25%	9.1*	7.1	4%	−4.5	0.1
Larynx	60%	37%	5.6*	2.4	3%	−6.2	N/A
Sup Constrictor	77%	20%	8.0*	5.3	3%	−7.6	N/A
Mid Constrictor	47%	53%	8.5*	4.9	0%	N/A	N/A
Inf Constrictor	85%	13%	6.4*	3.7	4%	−8.0	N/A
CL Parotid	81%	17%	6.2*	2.6	2%	−3.4	N/A
IL Sup Parotid	81%	19%	6.0*	2.3	0%	N/A	N/A
CL SMG	82%	14%	5.8*	3.3	4%	−8.2	N/A
Oral Cavity	71%	23%	7.7*	3.2	6%	−5.4	2.1

Comparison of hybrid directive (HD) achieved plan doses to the artificial intelligence directive (AD, Panel A) and original physician directive (PD, Panel B). Comparison of achieved organ-at-risk doses in the treatment plan relative to the AI directive (AD) and the physician directive (PD), respectively. Abbreviations: Sup = superior; Mid = middle; Inf = inferior; CL = contralateral; IL = ipsilateral; Gy = Gray; N/A = not applicable; * = comparison versus no dose difference, p<0.05. We compared the achieved metrics in this study with both historical institutional treatments (effectively using the PD alone) and RTOG/NRG standards (Fig. 3). The OAR metrics from the prior institutional plans were typically significantly lower than the cooperative group standards. Despite this favorable baseline plan quality, though, the hybrid directive approach further improved the mean OAR dose for an average of 2 to 9 Gy in comparison to historical plans, many of which were statistically significant.

Fig. 3

Comparison of mean doses from national protocols (RTOG/NRG), UT Southwestern historical plans (traditional directive), and the achieved doses in this study using the hybrid directive (HD). * represents p < 0.05. Finally, we found that the improvement gained using the HD was not due to overachievement by the dosimetrists, as there was no significant difference between the historical PD and its respective achieved plan (Supplementary Fig. 1, blue), nor were there significant differences between the HD estimation and the achieved plan (Supplementary Fig. 1, purple). In fact, out of all OAR directives in the 39 original plans, only 13% were more than 3 Gy lower than the directive. The larynx, superior and inferior constrictor planned doses were more than 3 Gy less than the directive in 11%, 22% and 16% of cases, with all other OARs within 3 Gy of the directive at least 93% of time. In other words, the improvement achieved with the HD was due to their attempt to meet the more aggressive directive; had the higher initial directive been used instead, one would expect higher final plan metrics.

Discussion

In this study, we have shown that the use of an AI-based decision-support tool facilitated significantly improved treatment plans in comparison to institutional historical plans, and RTOG/NRG standards. Depending on the OAR, the DST provided a significantly lower (i.e. at least 3 Gy less) estimate in up to 39% of cases, and in most of these patients, this lower directive led to improved OAR dosimetry. Yet solely relying on the DST estimates would be suboptimal, since the PD was often more ambitious than the DST estimate, and in a marked number of cases, the plans were better than the AD estimate. Thus while the core technology of the DST was artificial intelligence, we posit that its use in radiation planning is best characterized by the term augmented intelligence, as the hybrid directive was a combination of physician experience and the AI. Despite its technical complexities, head and neck treatment planning is still an art. Dosimetrists need to identify which dose constraints are reasonable and should be pushed further, and which are unachievable. Without a feasible treatment directive, though, the planner may either arrive at a plan that is inferior to the optimal solution or proceed through a large number of iterations and optimizations without converging on an achievable set of tradeoffs. This augmented intelligence approach attempts to guide both the physician and planner to the optimal tradeoff. Prior studies have highlighted the potential benefits as well as pitfalls in dose prediction and automatic planning for head and neck radiotherapy. Wang et al. developed their own knowledge-based dose prediction model for oropharynx IMRT but noted prediction errors greater than 4 Gy in 17% of cases. The authors attributed this higher-than-anticipated rate to the challenge of inter-organ dependency and inconsistent tradeoffs in their training dataset [15]. Non-AI driven commercial-based automatic planning solutions have also been interrogated for their ability to automatically guide head and neck IMRT. Tol et al developed RapidPlan (Varian Medical System, Palo Alto, TM) models based on 60 head and neck patients and then applied them to re-plan 15 older patients and 15 recent patients, finding that the clinical plans were generally, but not consistently, comparable [11]. In a recent provocative study, radiation oncologists at an academic center were presented in a blinded fashion with both dosimetry-driven plans and unadjusted knowledge-based plans generated with RapidPlan [16]; among the 36 head and neck plans, the knowledge-based plan was considered superior or equivalent in 67% of cases, a statistically significant result. When comparing the quality of the delivered plans before and after the integration of knowledge-based planning into their clinic, the authors found clear dosimetric improvements in the more recent plans. With the possibility of even more refined dose predictions using AI [8], [17], further work is needed on optimizing human- and computer-based interactions that lead to the optimal treatment plan. This study has several limitations. First, only one physician was involved in the directive process, so the impact of the DST on other radiation oncologists was untested. Other physicians may be more likely to provide achievable dose constraints for the directive, minimizing the benefit of the DST. This physician was the only head and neck radiation oncologist in the department at the time of the study, and thus this limitation was unavoidable. In the future, we hope to generalize the use of this tool with other practitioners in the department. Moreover, the DST was compared against our routine directive process, in which personalized constraints are submitted for each patient, rather than referring to a more standard list of OAR metrics. Since the PD constraints were generally stricter than published guidelines such as in NRG trials, we may have underestimated the potential benefit of the DST in other practices. Also, this analysis is obviously not a randomized trial, and we cannot know that the improved performance was not due to extra effort on the part of planners; a more robust conclusion could be made by generating three independent plans based on the AD, PD and HD and then comparing the results. However, we did not have the bandwidth to perform approximately 100 additional high-quality plans. Instead, when we compared the original treatment directive with its delivered plan, we found that the final dose metrics were very similar to their directive. This result suggests that it is the improved estimate, rather than planner initiative or overachievement, that led to an improved dose distribution. Of course, the ultimate endpoint of improved planning is superior clinical outcomes, and we did not assess patient outcomes as part of this study. Finally, an updated patient model customized to active physicians in the practice would presumably be more likely to accurately estimate an achievable and acceptable dose distribution. In conclusion, this study has shown that the addition of an AI-based decision-support tool for head and neck radiotherapy planning significantly reduced OAR doses in the clinically-approved plan. Instead of using either the DST or physician intuition alone to drive the planning process, using this augmented-intelligence solution reaps the benefits of both physician experience and the knowledge gained from our departmental planning history. Of course, every new case that benefits from the hybrid directive approach can be incorporated into an updated AI model, and future research will assess whether progressively improving performance of the DST further improves plan quality.

Declaration of Competing Interest

Colin Carpenter: Payment and financial relationship with Siris Medical, including ownership of a patent related to work. Marc Nash: Payment and financial relationship with Siris Medical.

17 in total

1. Knowledge-based dose prediction models for head and neck cancer are strongly affected by interorgan dependency and dataset inconsistency.

Authors: Yibing Wang; Ben J M Heijmen; Steven F Petit
Journal: Med Phys Date: 2018-12-24 Impact factor: 4.071

2. Cost and Survival Analysis Before and After Implementation of Dana-Farber Clinical Pathways for Patients With Stage IV Non-Small-Cell Lung Cancer.

Authors: David M Jackman; Yichen Zhang; Carole Dalby; Tom Nguyen; Julia Nagle; Christine A Lydon; Michael S Rabin; Kristen K McNiff; Belen Fraile; Joseph O Jacobson
Journal: J Oncol Pract Date: 2017-03-04 Impact factor: 3.840

3. Predicting dose-volume histograms for organs-at-risk in IMRT planning.

Authors: Lindsey M Appenzoller; Jeff M Michalski; Wade L Thorstad; Sasa Mutic; Kevin L Moore
Journal: Med Phys Date: 2012-12 Impact factor: 4.071

4. An overlap-volume-histogram based method for rectal dose prediction and automated treatment planning in the external beam prostate radiotherapy following hydrogel injection.

Authors: Yidong Yang; Eric C Ford; Binbin Wu; Michael Pinkawa; Baukelien van Triest; Patrick Campbell; Danny Y Song; Todd R McNutt
Journal: Med Phys Date: 2013-01 Impact factor: 4.071

5. A knowledge-based approach to improving and homogenizing intensity modulated radiation therapy planning quality among treatment centers: an example application to prostate cancer planning.

Authors: David Good; Joseph Lo; W Robert Lee; Q Jackie Wu; Fang-Fang Yin; Shiva K Das
Journal: Int J Radiat Oncol Biol Phys Date: 2013-04-25 Impact factor: 7.038

6. Highly Efficient Training, Refinement, and Validation of a Knowledge-based Planning Quality-Control System for Radiation Therapy Clinical Trials.

Authors: Nan Li; Ruben Carmona; Igor Sirak; Linda Kasaova; David Followill; Jeff Michalski; Walter Bosch; William Straube; Loren K Mell; Kevin L Moore
Journal: Int J Radiat Oncol Biol Phys Date: 2016-10-13 Impact factor: 7.038

7. A peer review process as part of the implementation of clinical pathways in radiation oncology: Does it improve compliance?

Authors: Brian J Gebhardt; Dwight E Heron; Sushil Beriwal
Journal: Pract Radiat Oncol Date: 2017-01-19

8. Fully automated treatment planning for head and neck radiotherapy using a voxel-based dose prediction and dose mimicking method.

Authors: Chris McIntosh; Mattea Welch; Andrea McNiven; David A Jaffray; Thomas G Purdie
Journal: Phys Med Biol Date: 2017-07-06 Impact factor: 3.609

9. Dose prediction with deep learning for prostate cancer radiation therapy: Model adaptation to different treatment planning practices.

Authors: Roya Norouzi Kandalan; Dan Nguyen; Nima Hassan Rezaeian; Ana M Barragán-Montero; Sebastiaan Breedveld; Kamesh Namuduri; Steve Jiang; Mu-Han Lin
Journal: Radiother Oncol Date: 2020-10-22 Impact factor: 6.280

10. Noninferiority Study of Automated Knowledge-Based Planning Versus Human-Driven Optimization Across Multiple Disease Sites.

Authors: Mariel Cornell; Robert Kaderka; Sebastian J Hild; Xenia J Ray; James D Murphy; Todd F Atwood; Kevin L Moore
Journal: Int J Radiat Oncol Biol Phys Date: 2019-10-31 Impact factor: 7.038