Literature DB >> 35493850

A recurrent neural network for rapid detection of delivery errors during real-time portal dosimetry.

Abstract

Background and purpose: Real-time portal dosimetry compares measured images with predicted images to detect delivery errors as the radiotherapy treatment proceeds. This work aimed to investigate the performance of a recurrent neural network for processing image metrics so as to detect delivery errors as early as possible in the treatment. Materials and methods: Volumetric modulated arc therapy (VMAT) plans of six prostate patients were used to generate sequences of predicted portal images. Errors were introduced into the treatment plans and the modified plans were delivered to a water-equivalent phantom. Four different metrics were used to detect errors. These metrics were applied to a threshold-based method to detect the errors as soon as possible during the delivery, and also to a recurrent neural network consisting of four layers. A leave-two-out approach was used to set thresholds and train the neural network then test the resulting systems.
Results: When using a combination of metrics in conjunction with optimal thresholds, the median segment index at which the errors were detected was 107 out of 180. When using the neural network, the median segment index for error detection was 66 out of 180, with no false positives. The neural network reduced the rate of false negative results from 0.36 to 0.24. Conclusions: The recurrent neural network allowed the detection of errors around 30% earlier than when using conventional threshold techniques. By appropriate training of the network, false positive alerts could be prevented, thereby avoiding unnecessary disruption to the patient workflow.

Entities: Chemical

Keywords: Artificial neural network; Electronic portal imaging device; In vivo dosimetry; Volumetric modulated arc therapy

Year: 2022 PMID： 35493850 PMCID： PMC9048084 DOI： 10.1016/j.phro.2022.03.004

Source DB: PubMed Journal: Phys Imaging Radiat Oncol ISSN： 2405-6316

Introduction

Portal dosimetry is widely used to ensure the dosimetric accuracy of radiotherapy delivery [1], [2], [3], [4]. In the case of forward-projection, portal images are predicted at the time of treatment planning, and then measured images are compared with these [5], [6], [7], and in the case of back-projection, measured images are projected onto the CT scan of the patient and converted into a dose distribution, which is then compared with the planned dose distribution [8], [9], [10], [11], [12]. Groups of images are selected to represent the segments of volumetric modulated arc therapy (VMAT) [13], [14]. Usually, images for completed fractions of treatment are analysed. However, there is growing interest in analysing the measured images as the treatment fraction proceeds. In this way, it is possible to identify errors before significant dosimetric impact occurs for the patient [15], [16], [17], [18], [19], particularly for hypofractionated treatments [20], which are becoming increasingly commonplace [21], [22], [23]. The real-time method is time-resolved, which also has its own advantages in giving a more thorough analysis than when using integrated images or dose [24], [25]. Typically, errors are detected by setting a series of thresholds for a number of image features or measures, and then watching for the measures to exceed the thresholds [26], preferably avoiding false positives, which are disruptive in the real-time context [27]. Use of an accurate prediction model is an important means of providing sensitivity to errors while avoiding false positives. However, another possible means of increasing reliability is to use an artificial neural network. Simple neural networks have been used in the radiotherapy context before, such as for prediction of biological outcomes [28] and for pre-treatment quality assurance [29], and more complex neural networks are increasingly used in radiotherapy for deep learning in structure delineation and treatment planning [30], [31], [32], [33]. However, they have so far not been used in the context of error detection in portal dosimetry. This study therefore investigated the training of a simple artificial neural network to detect errors based on the supplied image measures at each time point. The study was a proof of principle of a recurrent neural network (RNN) approach, using VMAT treatment of the prostate as an illustration.

Materials and methods

There were several types of neural network that could be used for this application, but the RNN was used in this study because it could not only learn from training data, but also had the ability to learn from, and adapt to, a temporal series of inputs, such as the image measures at each segment of a VMAT arc. The study used the forward-projection method of portal dosimetry and a variety of deliberate errors. The differences between the measured and predicted images were investigated firstly using multiple separate metrics (MSM) and related thresholds and then with the use of an RNN, so as to quantify the timeliness with which each method was able to detect the errors.

Patients and treatment plans

.eatment plans for radiotherapy of the prostate were created using AutoBeam v5.8 [34] for 60 Gy in 20 fractions with the 6 MV beam of a VersaHD linear accelerator (Elekta AB, Stockholm, Sweden) [35], [36]. For six patients who gave their consent for their images to be used for research, predicted portal images were retrospectively produced for each segment of the VMAT arcs and input to AutoDose v1.1 software for comparison with real-time images [19] (Fig. 1). AutoBeam was also used to recalculate the plans and predicted images on a water-equivalent phantom of dimensions 300 mm long (G-T direction) × 300 mm wide (A-B direction) × 200 mm high, with the isocentre located at the centre of the phantom.

Fig. 1

An analysis of a volumetric modulated arc therapy treatment plan for a patient delivery, seen in AutoDose v1.1. The main panel shows the mean image difference as a percentage of local image intensity for sections of arc consisting of 10 segments. The inset (lower right) shows the expected and actual images for a single section of arc, together with horizontal and vertical profiles through the central axis (Data 1 – expected image, Data 2 – actual image).

Measured images

Errors were deliberately introduced into all 180 segments of the treatment plans and both the normal and erroneous plans were then delivered to a Solid Water phantom (Radiation Measurements, Inc., Middleton, WI). The errors consisted of a 2–10% increase in monitor units in 2% steps, a retraction of 2–10 mm in 2 mm steps of all multileaf collimator (MLC) leaves, a shift of 2–10 mm in 2 mm steps of all MLC leaves, and introduction of an air space of 10–50 mm width in 10 mm steps into the phantom to simulate rectal gas [37]. In three patients, all error cases were simulated, and in a further three patients, only the error-free case and 4% increase in monitor units, 4 mm MLC retraction, 4 mm MLC shift and 20 mm air space were simulated. Portal images were recorded using an iViewGT imaging panel (Elekta) and analysed using AutoDose, which allocated the images to control points of the treatment plan [19].

Image metrics and selection of thresholds

At each segment of the VMAT plan, four measures of agreement between predicted and measured images were calculated: central axis signal, mean image value, root-mean-square difference as a percentage of global maximum and root-mean-square difference as a percentage of local prediction. These simple difference measures were used in favour of more complex difference measures as the intention was to identify differences, however small spatially or temporally, and then to use error detection to work with these. The first 10% of segments were neglected as the images were not stable in this period. The startup of the linear accelerator, estimated to affect the first 1% of segments, may have been contributory to this instability. After the first 10% of segments, a running sum of 10 segments was used. For comparison purposes MSM was applied, in which the value of median + 2 × range of the maximum value of each statistic over the cases under consideration was taken as the threshold, and image metrics exceeding these thresholds signified errors.

Recurrent neural network

The four measures were applied to an RNN [38] consisting of four layers of gated recurrent units (GRUs), with four nodes in the first layer, eight in the second layer, four in the third layer and one in the final layer. The function of the GRU was exactly as defined by Cho et al. [39]. For training and testing, a leave-two-out cross-correlation strategy was used [40], [41]. Four of the patients were used to train the network, and the remaining two patients were used to test the result. Of the four patients used for training, two were from patients 1–3, for which a full set of error cases were available, and the other two were from patients 4–6, for which only representative errors were available (see section 2.2). There were therefore nine ways of selecting unique combinations of patient for testing, so the RNN was trained and tested nine times. For example, firstly patients 1 and 4 were retained for testing, so patients 2, 3, 5, and 6 were used for training. Then patients 1 and 5 were retained for testing, so patients 2, 3, 4 and 6 were used for training, etc. Using p to index the P training patients, e to index the E + 1 error types, (e = 0 representing no error), s to index segments after exclusion of the first 19 segments and the vector w to represent the W weights of the RNN, the objective function for training was defined as:The factor was an importance factor to avoid false positives:and was an error-specific factor to ensure that the larger errors were detected:where M was the physical ranking of the error, i.e. 1 to 5 according to a monitor unit increase of 2% to 10% etc. The factor was a segment-specific factor:thereby emphasising the importance of early segments in normal cases and late segments in error cases. Finally, provided a quadratic penalty from the “off” state for normal cases and from the “on” state for error cases:where was the output of the network, with y > 0 signifying an error and y < 0 signifying normal delivery. The final term in equation (1) was an L2 norm to prevent overfitting to the training data. This was applied to the W primary weights of the network, excluding the hidden state, update and reset weights, using an empirically-determined value of 40 for the regularisation parameter, λ. To further avoid false positives, indices of e for which M = 1, i.e. 2% increase in monitor units, 2 mm aperture opening etc, were also defined as normal (no-error) cases. Due to the non-convexity of the objective function, a random search algorithm was used for training. The software was run on a SPARC T4-2 server with 128 hyper-threads (Oracle Corporation) using a separate execution thread for each of the nine combinations of training and testing. To visualise real-time performance, the network trained on patients 2, 3, 5, and 6 was applied to errors for patient 1. The final validation was to apply the RNN to actual patient images for four patients (A-D) different to those used for the phantom study. All of these treatments were considered to be normal deliveries, but the images for patient D were re-acquired on further occasions (in a non-real-time workflow) and were taken as an example of images that the medical physicist was not satisfied with.

Results

Training the recurrent neural network

Training and testing of the network required around 50 h. Over this time, the training progressed steadily, with the objective function converging to a similar value for the nine data sets (Fig. 2). Benefits were observed in timeliness of error detection with the RNN for monitor unit, aperture shift and air gap errors. Importantly, there were no false positives in any of the error-free cases. For the training cases as a whole, the median segment index at which errors were detected was 105 (range 97 – 120) for MSM and 68 (range 52 – 75) for the RNN, with a median relative reduction of 0.57 (range 0.49 – 0.72). The delivery time was approximately 180 s for the 180 segments of these treatment plans, so in terms of time, each segment equated to approximately 1 s of delivery time. Thus, finding the error at segment 68 meant that approximately 68 s of delivery was completed when the error was detected. There were 186 false negatives, in which the error was not detected at all during the 180 segments, out of 432 errors for MSM, representing a ratio of 0.43. There were 100 false negatives out of 432 errors for the RNN, a ratio of 0.23.

Fig. 2

Training the recurrent neural network. (a) Network topology, (b) abstraction of one layer of the network, (c) training progress for the nine data sets, (d)-(g) Median index of the first segment at which each error is detected, as a function of error type and magnitude. White cross-hatching indicates that the error is not detected. C: central image signal, M: mean image value, G: root-mean-square error as a percentage of global maximum, L: root-mean-square error as a percentage of local signal, E: error, MSM: multiple separate metrics, RNN: recurrent neural network.

Testing the recurrent neural network

Testing showed that the RNN was most beneficial for errors in monitor units, aperture position and path length (Fig. 3). MSM were already effective in detecting errors in aperture opening, so in this case the RNN was less beneficial. The thresholds for central image signal and mean image value were exceeded in several instances for an aperture shift of 2 mm (Fig. 3c) but not for 4 mm, unrelated to the errors being introduced. The slightly worse performance of the RNN for larger aperture opening and aperture shift errors (Fig. 3b and 3c) was due to the L2 norm. This prevented overfitting, but meant that some of the obvious errors were not found until several segments after the MSM method.

Fig. 3

Median index of the first segment at which each error is detected, as a function of error type and magnitude, during testing. White cross-hatching indicates that the error is not detected. MSM: multiple separate metrics; RNN: recurrent neural network. Testing results for a specific level of error were found to be broadly similar between patients (Fig. 4), although overall, there was some variation in the nine test samples (Table 1). Again, there were no false positives in any of the test results for error-free cases. There were 77 false negatives out of 216 errors for MSM, representing a ratio of 0.36. There were 52 false negatives out of 216 errors for the RNN, a ratio of 0.24.

Fig. 4

Table 1

Mean segment index at which errors are detected for multiple separate metrics with threshold and for a recurrent neural network, during testing.

PatientA	PatientB	Error Size*	MSM	RNN	Relative benefit†
1	4	Small	159	181	1.14
		Medium	129	38	0.29
		Large	78	23	0.29
		Overall	117	57	0.49
1	5	Small	159	105	0.66
		Medium	120	51	0.43
		Large	78	23	0.29
		Overall	113	51	0.45
1	6	Small	159	142	0.89
		Medium	130	60	0.46
		Large	78	23	0.29
		Overall	117	62	0.53
2	4	Small	114	181	1.59
		Medium	84	84	1.00
		Large	40	33	0.83
		Overall	74	83	1.12
2	5	Small	114	151	1.32
		Medium	92	61	0.66
		Large	38	32	0.84
		Overall	78	66	0.85
2	6	Small	115	103	0.90
		Medium	78	77	0.99
		Large	42	24	0.57
		Overall	72	63	0.88
3	4	Small	129	181	1.40
		Medium	131	72	0.55
		Large	59	74	1.25
		Overall	107	74	0.69
3	5	Small	129	181	1.40
		Medium	122	66	0.54
		Large	58	24	0.41
		Overall	102	71	0.70
3	6	Small	129	181	1.40
		Medium	131	80	0.61
		Large	59	24	0.41
		Overall	107	78	0.73
MEDIAN		Overall	107	66	0.70

MSM: multiple separate metrics; RNN: recurrent neural network.

Small: 2% monitor unit increase, 2 mm aperture opening, 2 mm aperture shift, 10 mm air gap; medium: 4–6% monitor unit increase, 4–6 mm aperture opening, 4–6 mm aperture shift, 20–30 mm air gap; large: 8–10% monitor unit increase, 8–10 mm aperture opening, 8–10 mm aperture shift, 40–50 mm air gap.

Relative benefit defined as quotient of RNN and MSM.

Index of the first segment at which each error is detected, in the six patients separately, for a fixed level of error, during testing. White cross-hatching indicates that the error is not detected. MSM: multiple separate metrics; RNN: recurrent neural network. Mean segment index at which errors are detected for multiple separate metrics with threshold and for a recurrent neural network, during testing. MSM: multiple separate metrics; RNN: recurrent neural network. Small: 2% monitor unit increase, 2 mm aperture opening, 2 mm aperture shift, 10 mm air gap; medium: 4–6% monitor unit increase, 4–6 mm aperture opening, 4–6 mm aperture shift, 20–30 mm air gap; large: 8–10% monitor unit increase, 8–10 mm aperture opening, 8–10 mm aperture shift, 40–50 mm air gap. Relative benefit defined as quotient of RNN and MSM. In the real-time context, the RNN was found to be most active initially in the treatment delivery for the case of moderate errors (Fig. 5). The network failed to detect a 4% increase in monitor units (Fig. 4a), but successfully detected the other errors rapidly (Fig. 4b-d). After error detection, the signal did not change appreciably.

Fig. 5

Network output for patient 1 for several error cases. Results less than or equal to zero indicate absence of an error and results greater than zero indicate an error. The output in the grey region at the left is disregarded due to instability of the raw signals. For the real patient images, deliveries for patients A-C were classified as normal, with a network output of close to −1. Those for patient D were identified very rapidly as abnormal, with the network output quickly moving to approach +1.

Discussion

The results show that in the context of forward-projection real-time portal dosimetry for prostate treatment delivery, the RNN is able to improve the timeliness of error detection by around 30%, compared to MSM. There is some variability in effectiveness of the RNN between error types and between patients. Implicitly, the thresholds of MSM are built in to the RNN in the form of the biases, but the more complex connectivity of the RNN is shown to provide a more effective result, similar to dose-volume histogram prediction [42]. The RNN is trained to detect particular types of errors for a particular treatment site, and there is no guarantee that it operates correctly for other errors or treatment sites. In other words, although the L2 norm prevents overfitting within the patients used, the model as a whole may be over-fitted to certain types of error and treatment site. However, by using general image difference measures, the present study gives an indication of what is likely to be achieved in a larger study using treatment plans of similar complexity. There are relatively few studies focusing on real-time EPID dosimetry for VMAT, but it is possible to make some comparisons with other studies. The method behaves similarly to that of Woodruff et al. [17], except for the use of section images rather than integrated images. Compared to real-time MSM using site-specific control limits [15], which is able to detect monitor unit errors of 5% in static gantry intensity-modulated radiotherapy after about 23% of the delivery, the detection speed in the present study is slower, but the thresholds must be higher with VMAT due to the gantry rotation, which explains this effect. Monitor unit changes and aperture shifts of a similar magnitude to those in the present study can also be detected by back-projection in a non-real-time context [43], [44]. In the real-time situation, Spreeuw et al. [18] show that a 20 cGy dosimetric difference in the patient can be detected after around 10% of the delivery time for deliberately introduced serious errors in prostate radiotherapy. This is faster than either MSM or RNN in this study, but is expected to be so because of the magnitude of the errors. The study presented here is in agreement with Schyns et al. [25] that the time-resolved element is valuable in the forward-projection approach but that interpretation of any errors detected in terms of dose to the patient is not straightforward. As with all studies using deliberate errors, the results must either be based on phantom studies or simulated measurements. For the former, used in this study, the anatomy is somewhat simplified, but the measurements include real variations in quality of panel output and calibration. Other uncertainties are the start-up of the accelerator, the initial instability of the images and the allocation of images to segments of the treatment plan. The method of using a running sum of images for a limited number of treatment plan segments is able to detect errors for parts of the VMAT arc, but this has not been fully demonstrated in this study as the introduced errors are present for the whole arc. However, the method of detecting errors in the whole plan does have the advantage that the timeliness of the detection can be quantified in an analogue manner, such as using segment number at which the error is detected, whereas the introduction of short errors means that the detection is binary, for example detected or not, which is then difficult to analyse in small data sets. It is also more important to detect and act upon persistent errors. Simulated measurements are easier to obtain, by taking predictions and applying noise, e.g. [45], but it is very difficult to ensure that the noise accurately represents the random and systematic errors that typically occur during operation of a portal dosimetry service [46], [47], [48]. In addition, the effectiveness of the portal dosimetry method depends on how accurate the prediction method is [43], [44]. The study does not address patient positioning errors, for which a method such as conebeam CT is more suitable, either separately from the portal dosimetry, or included within it [7], [44], [49]. However, it is likely that anatomical changes can be detected with improved accuracy using the RNN, particularly as this type of change may only impact on the portal images at particular gantry angles [24], [25]. Avoidance of false positive results is an important part of this approach, as a false positive error in the real-time context means that the patient’s treatment is paused while the error is investigated. False positives also add to the operator workload and encourage a lax attitude towards real errors when they occur. There are some false negative results in the study, mostly for the small error cases where the clinical impact is relatively small, but these are reduced in number by appropriate training of the RNN [50]. A logical progression of this work is use a deep learning approach [30], [31], [51], [52] to analyse the predicted and measured images as a whole. Either the pixels of a difference map between the predicted and measured images, or the pixels of both of the images separately could be applied to the inputs. A convolutional stage could detect specific image features which might be indicative of errors. The RNN presented in this study, taking as input several measures of difference between predicted and measured images, can be used to provide timely indication of errors during real-time portal dosimetry. In this simulation study of forward-projection portal dosimetry for prostate VMAT, a variety of errors are detected around 30% earlier than when using the image difference measures alone in a threshold-based approach. The leave-two-out strategy used in this feasibility study gives an indication of the benefit likely to be observed in a larger cohort of similarly complex VMAT treatments.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

46 in total

1. Conventional versus hypofractionated high-dose intensity-modulated radiotherapy for prostate cancer: preliminary safety results from the CHHiP randomised controlled trial.

Authors: David Dearnaley; Isabel Syndikus; Georges Sumo; Margaret Bidmead; David Bloomfield; Catharine Clark; Annie Gao; Shama Hassan; Alan Horwich; Robert Huddart; Vincent Khoo; Peter Kirkbride; Helen Mayles; Philip Mayles; Olivia Naismith; Chris Parker; Helen Patterson; Martin Russell; Christopher Scrase; Chris South; John Staffurth; Emma Hall
Journal: Lancet Oncol Date: 2011-12-12 Impact factor: 41.316

2. Validation of a method for in vivo 3D dose reconstruction for IMRT and VMAT treatments using on-treatment EPID images and a model-based forward-calculation algorithm.

Authors: Eric Van Uytven; Timothy Van Beek; Peter M McCowan; Krista Chytyk-Praznik; Peter B Greer; Boyd M C McCurdy
Journal: Med Phys Date: 2015-12 Impact factor: 4.071

3. Quantifying the performance of in vivo portal dosimetry in detecting four types of treatment parameter variations.

Authors: C Bojechko; E C Ford
Journal: Med Phys Date: 2015-12 Impact factor: 4.071

4. DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks.

Authors: Vasant Kearney; Jason W Chan; Samuel Haaf; Martina Descovich; Timothy D Solberg
Journal: Phys Med Biol Date: 2018-12-04 Impact factor: 3.609

Review 5. Deep learning in medical imaging and radiation therapy.

Authors: Berkman Sahiner; Aria Pezeshk; Lubomir M Hadjiiski; Xiaosong Wang; Karen Drukker; Kenny H Cha; Ronald M Summers; Maryellen L Giger
Journal: Med Phys Date: 2018-11-20 Impact factor: 4.071

6. Safety and Efficacy of a Five-Fraction Stereotactic Body Radiotherapy Schedule for Centrally Located Non-Small-Cell Lung Cancer: NRG Oncology/RTOG 0813 Trial.

Authors: Andrea Bezjak; Rebecca Paulus; Laurie E Gaspar; Robert D Timmerman; William L Straube; William F Ryan; Yolanda I Garces; Anthony T Pu; Anurag K Singh; Gregory M Videtic; Ronald C McGarry; Puneeth Iyengar; Jason R Pantarotto; James J Urbanic; Alexander Y Sun; Megan E Daly; Inga S Grills; Paul Sperduto; Daniel P Normolle; Jeffrey D Bradley; Hak Choy
Journal: J Clin Oncol Date: 2019-04-03 Impact factor: 44.544

7. Clinical implementation and rapid commissioning of an EPID based in-vivo dosimetry system.

Authors: Ian M Hanson; Vibeke N Hansen; Igor Olaciregui-Ruiz; Marcel van Herk
Journal: Phys Med Biol Date: 2014-09-11 Impact factor: 3.609

8. Site-specific alert criteria to detect patient-related errors with 3D EPID transit dosimetry.

Authors: Igor Olaciregui-Ruiz; Roel Rozendaal; Ben Mijnheer; Anton Mans
Journal: Med Phys Date: 2018-11-16 Impact factor: 4.071

9. Optimisation of a composite difference metric for prompt error detection in real-time portal dosimetry of simulated volumetric modulated arc therapy.

Authors: James L Bedford; Ian M Hanson
Journal: Br J Radiol Date: 2021-03-18 Impact factor: 3.039

10. Investigation of a real-time EPID-based patient dose monitoring safety system using site-specific control limits.

Authors: Todsaporn Fuangrod; Peter B Greer; Henry C Woodruff; John Simpson; Shashank Bhatia; Benjamin Zwan; Timothy A vanBeek; Boyd M C McCurdy; Richard H Middleton
Journal: Radiat Oncol Date: 2016-08-12 Impact factor: 3.481

1 in total

1. What is the optimal input information for deep learning-based pre-treatment error identification in radiotherapy?

Authors: Cecile J A Wolfs; Frank Verhaegen
Journal: Phys Imaging Radiat Oncol Date: 2022-08-27

1 in total