Literature DB >> 36060631

Validation and Comparison of Radiograph-Based Organ Dose Reconstruction Approaches for Wilms Tumor Radiation Treatment Plans.

Ziyuan Wang¹, Marco Virgolin², Brian V Balgobind¹, Irma W E M van Dijk¹, Susan A Smith³, Rebecca M Howell³, Matthew M Mille⁴, Choonsik Lee⁴, Choonik Lee⁵, Cécile M Ronckers^6,7, Peter A N Bosman², Arjan Bel¹, Tanja Alderliesten^1,8.

Abstract

Purpose: Our purpose was to validate and compare the performance of 4 organ dose reconstruction approaches for historical radiation treatment planning based on 2-dimensional radiographs. Methods and Materials: We considered 10 patients with Wilms tumor with planning computed tomography images for whom we developed typical historic Wilms tumor radiation treatment plans, using anteroposterior and posteroanterior parallel-opposed 6 MV flank fields, normalized to 14.4 Gy. Two plans were created for each patient, with and without corner blocking. Regions of interest (lungs, heart, nipples, liver, spleen, contralateral kidney, and spinal cord) were delineated, and dose-volume metrics including organ mean and minimum dose (Dmean and Dmin) were computed as the reference baseline for comparison. Dosimetry for the 20 plans was then independently reconstructed using 4 different approaches. Three approaches involved surrogate anatomy, among which 2 used demographic-matching criteria for phantom selection/building, and 1 used machine learning. The fourth approach was also machine learning-based, but used no surrogate anatomies. Absolute differences in organ dose-volume metrics between the reconstructed and the reference values were calculated.
Results: For Dmean and Dmin (average and minimum point dose) all 4 dose reconstruction approaches performed within 10% of the prescribed dose (≤1.4 Gy). The machine learning-based approaches showed a slight advantage for several of the considered regions of interest. For Dmax (maximum point dose), the absolute differences were much higher, that is, exceeding 14% (2 Gy), with the poorest agreement observed for near-beam and out-of-beam organs for all approaches. Conclusions: The studied approaches give comparable dose reconstruction results, and the choice of approach for cohort dosimetry for late effects studies should still be largely driven by the available resources (data, time, expertise, and funding).

Entities: Chemical

Year: 2022 PMID： 36060631 PMCID： PMC9429523 DOI： 10.1016/j.adro.2022.101015

Source DB: PubMed Journal: Adv Radiat Oncol ISSN： 2452-1094

Introduction

Childhood cancer survivors often experience treatment-related late adverse effects (LAEs), which have been linked to an increased risk of chronic morbidity and mortality.1, 2, 3, 4 In particular, radiation treatment (RT) is a component of approximately half of all cancer treatments and is associated with a number of LAEs, such as second malignant neoplasms and cardiovascular disease.6, 7, 8 Dose-response relationships can be translated into contemporary RT planning by providing dose constraints to organs-at-risk to help mitigate RT-related LAEs in future childhood cancer survivors. To date, most large retrospective cohorts of childhood cancer survivors include individuals treated in the precomputed tomography (CT) era of RT (before approximately 1999), for whom treatment was primarily based on conventional 2-dimensional (2D) radiographs., For these cohorts, doses to organs of interest are not directly available so that methods must be applied to reconstruct the radiation dose; these methods often involve the use of physical or computational phantoms (3D models of human anatomy) or recent planning CTs of other patients as surrogate anatomies to overcome the lack of 3D anatomic imaging.,, The mean organ dose and prescribed dose to individual body regions are the most commonly reported dose metrics.13, 14, 15 As LAEs are found to be related to dose in specific organs (eg, second malignant neoplasms) or their subvolumes (eg, coronary artery), there is a growing interest in using more refined dose and dose-volume metrics for dose-response modeling of LAEs., To provide more refined dose information, dose reconstruction is performed from the limited information available. Before dose reconstruction, patients’ historic RT records must be abstracted for patient demographics (sex, age at RT, height, and weight) and specific treatment details, including beam energy, dose fractionation, field geometry, blocking details, field weighting, field location, and anatomic field borders. In some cases, field geometry superimposed on 2D radiographs is available. Figure 1a is an example of a historical 2D radiograph used for field placement. The surrogate anatomy is chosen based on the available patient demographics, and the historical RT plan is then simulated on the surrogate anatomy using the available treatment details.,, The resulting dose distribution can then be used to derive 3D organ dose-volume metrics for dose-response analysis.

Fig. 1

A, An example of a historical 2-dimensional radiograph with field geometry (anteroposterior; AP) is shown. The white corners, made by 4 thin lead wires placed on the patient's body, depict the field boundaries. The white cross indicates the field isocenter, and the ruler indicates the field size. B, An example of a digitally reconstructed radiograph (derived from computed tomography) with field geometry (AP) is shown. The solid yellow lines depict the effective field boundary. The red point, which is the cross of the dashed yellow lines, indicates the field isocenter. There exist several options for the surrogate anatomy, each with its pros and cons. Stylized computational phantoms have a simplified geometric representation of human anatomy but can be easily scaled to different sizes and adapted to include additional organs or organ substructures. Stylized computational phantoms are commonly used for dosimetry in large retrospective childhood cancer survivor cohorts.,, Voxel computational phantoms, on the other hand, are created from 3D medical images of real patients of specific ages and thus are more realistic but are rigid and cannot be flexibly scaled or repositioned. Advanced boundary representation modeling methods offer a hybrid approach, which enables organ reshaping and repositioning while still allowing for realistic anatomic representation. Such an approach has been used to develop a number of computational phantoms to represent average anatomies of the adult and pediatric populations20, 21, 22 based on CT images and to represent different weight percentiles, but these phantoms are typically only available for discrete ages.,,,, Patient-specific CT images with organ delineations provide an alternative type of surrogate anatomy. Due to the routine use of CT in RT planning, it is relatively easy to prepare and import a surrogate CT scan into a treatment planning system (TPS) for dose calculation compared with the computational phantoms. However, the planning CT scans typically only include anatomy near the site of treatment, whereas LAEs such as second solid tumors may occur throughout the entire body. There are different sources of uncertainty in RT dose reconstruction. One source of uncertainty comes from the difference between the surrogate anatomy and the historical patient's unknown anatomy during treatment., The most commonly used patient-to-surrogate matching criteria are age and sex., Different patient-to-surrogate matching criteria have also been investigated based on available patient demographics, such as height and weight percentiles, and water equivalent diameter (defined as average of the scanned range of the body). When a 2D radiograph of the historical patient is available, more features can be used for dose reconstruction (eg, use 2D radiographs to guide 3D organ deformation). Recently, machine learning (ML) has been leveraged in dose reconstruction approaches based on data sets of CT scans to predict 3D information (anatomy or dose) from the available 2D features . Another source of uncertainty comes from the characterization of radiation beams used in the dose reconstruction calculations. Different dose calculation algorithms, such as model-based algorithms are encountered in TPSs, Monte Carlo radiation transport simulation, and measurement-based algorithms are then used to estimate dose distributions in the surrogate anatomies,,, However, studies have shown that the dose calculation algorithms in commercial TPSs underestimate out-of-field doses, while Monte Carlo simulations are more accurate., Because reconstructed doses are used for dose-response modeling,,, the uncertainty in the reconstructed dose should be reported and incorporated into the models., This is crucial for developing robust dose-response models that can be directly translated into contemporary RT planning, that is, used to define objectives for organs-at-risk. However, only a few study groups have reported a validation of their dose reconstruction approaches.,,, In this study, we carried out a collaborative effort among several institutions to validate and compare 4 different dose reconstruction approaches. To simulate ground truth organ doses for the validation, we used CT scans of recently treated patients with Wilms tumor and their clinical plans (with small adaptations) as reference data. For each of the contemporary patients, we extracted data analogous to what would be available in typical historic RT records and asked the institutions to independently reconstruct the RT dose according to their previously published approaches.

Methods and Materials

Patient cohort, plan design, and organ delineations

We investigated Wilms tumor dose reconstructions because this type of kidney cancer is one of the most common types of childhood cancer in the abdominal region and the RT flank fields have not changed significantly over several decades.32, 33, 34 We considered all pediatric patients (18 patients in total) treated for Wilms tumor between 2004 and 2016 in our hospital with a planning CT and complete treatment record. By creating an overview of a set of characteristics associated with these patients (ie, age, sex, height, weight, tumor laterality, treatment field) and considering the representativeness of these characteristics for the Wilms tumor patient population, we selected 10 patients with Wilms tumor. For these 10 patients, the age at the time of RT planning CT ranged from 2.5 to 5.5 years. A common longitudinal field-of-view shared by these CTs was between the 10th thoracic vertebra (T10) and the 1st sacral vertebra (S1). Detailed patient information can be found in Table 1.

Table 1

Characteristics of the 10 patients and the associated RT plans (2 plans per patient)

Patient	Age (y)	Sex	Height (cm)	Weight (kg)	Tumor laterality	Field cranial/ caudal borders	Plan ID	Shielding (yes/no)	Shielded region border
P1*	3.0	M	92	13.0	Right	T9/S1	P1_R	No
							P1_RB	Yes	Right body contour
P2	3.1	F	99	16.0	Right	T8/L4	P2_R	No
							P2_RB	Yes	T8
P3	3.9	F	108	17.5	Right	T8/L4	P3_R	No
							P3_RB	Yes	T8, T9, part of liver
P4	4.7	M	123	27.0	Right	T11/L4	P4_R	No
							P3_RB	Yes	Right rib 9-10
P5	4.8	F	110	15.0	Right	T10/S1	P5_R	No
							P5_RB	Yes	Right rib 9-10
P6*	5.1	F	122	22.0	Right	T10/S2	P6_R	No
							P6_RB	Yes	Right rib 10
P7	2.5	F	93	14.0	Left	T9/S2	P7_L	No
							P7_LB	Yes	S2
P8	4.2	M	106	15.5	Left	T11/L4	P8_L	No
							P8_LB	Yes	Left rib 10
P9	4.2	F	115	20.0	Left	T10/L4	P9_L	No
							P9_LB	Yes	Left rib 9-10
P10	5.5	M	116	18.0	Left	T10/S1	P10_L	No
							P10_LB	Yes	Left rib 10

Abbreviations: F = female; L = left-sided plan; LB = left-sided plan with a block; M = male; R = right-sided plan; RB = right-sided plan with a block; RT = radiation treatment.

*The prescribed dose of P1 and P6 was rescaled to 14.4 Gy in the dose reconstruction analysis.

T1 through T12 represent the 12 thoracic vertebrae; L1 through L5 represent the 5 lumbar vertebrae; S1 through S5 represent the 5 sacral vertebrae.

Characteristics of the 10 patients and the associated RT plans (2 plans per patient) Abbreviations: F = female; L = left-sided plan; LB = left-sided plan with a block; M = male; R = right-sided plan; RB = right-sided plan with a block; RT = radiation treatment. *The prescribed dose of P1 and P6 was rescaled to 14.4 Gy in the dose reconstruction analysis. T1 through T12 represent the 12 thoracic vertebrae; L1 through L5 represent the 5 lumbar vertebrae; S1 through S5 represent the 5 sacral vertebrae. For each patient (P1-10) we created 2 typical Wilms tumor RT plans with 6 MV anteroposterior and posteroanterior parallel opposed flank fields., The first plan involved open fields (P1-10R or L, where R or L refers to a right-sided or left-sided field) and the second (P1-10RB or LB) included small corner blocks (see block information in Table 1). The plans were developed by a pediatric radiation oncologist using the Oncentra TPS (version 4.3; Elekta AB, Stockholm, Sweden). We delineated regions of interest on the CT including lungs, heart, nipples, liver, spleen, contralateral kidney (ipsilateral kidneys surgically removed before RT), and a subvolume of the spinal cord from T10 to S1 using Velocity (version 3.2.0; Varian Medical Systems, Inc, Palo Alto, Calif, USA). For the lungs and heart, we delineated the portions of these organs that were imaged in the CT scans (for 8 and 5 out of 10 patients, the CT scans did not include complete lungs or heart, respectively).

Reference dose calculation

For validation purposes, reference dose values were extracted from the 3D dose distributions of the designed RT plans calculated on the CTs by the Oncentra TPS. All plans were designed assuming an Elekta LINAC treatment machine using 6 MV photons. A collapsed cone algorithm was used to calculate the dose, which was reported to achieve good in-field and near-field dose calculation accuracy. The organ dose-volume metrics that were considered included mean dose (Dmean), minimum dose (Dmin), maximum dose (Dmax), and the percentage of organ volume receiving at least 5 Gy and 10 Gy dose (V5 and V10, respectively). Here, minimum dose and maximum dose refer to the minimum and maximum point dose to a region of interest in the reconstructed dose matrix. For the nipples, only Dmean was considered (as the nipple volume was small). For those patients where the volume of the heart or lungs was truncated (8/10 and 5/10 respectively), only Dmax values representing the highest dose were used in the analysis.

Preparation of input data for dose reconstruction

The following sections summarize the instructions shared among the participating institutes performing independent dose reconstructions according to their particular approach.

Digitally reconstructed radiographs with plotted field

Digitally reconstructed radiographs were generated from the CTs using the built-in module in the TPS for each beam's eye view (ie, anteroposterior and posteroanterior) with the field geometry plotted on top of it to simulate historical radiographs (Fig 1b). We selected an enhancement setting (min/max CT data threshold -300/3095, center 1500, width 3000, bone threshold 100, and bone enhancement factor 2.5) in Oncentra TPS that gave similar contrast (based on a visual check) as historical radiographs.

Data coding forms to describe patient and plan information

The RT record abstraction was prepared according to the data coding forms proposed by Stovall et al. Patient information such as name and date of birth were anonymized. The RT details were abstracted and then checked by an experienced dosimetrist. Abstraction followed the methods described in Howell et al and is briefly summarized here. In total, the data coding forms consisted of 3 pages. The first page was used to collect basic information such as the maximum target dose to each body region, which is defined as the sum of the prescribed dose from all overlapping fields, that is, the anteroposterior and posteroanterior fields. The second page included details on prescription(s) and treatment field parameters, including dose, orientation, energy, field size, weighting, shielding, and anatomic borders. The third page was used to collect information about the proximity of organs of interest to the treatment fields, solely based on visually checking the prepared 2D radiographs. Proximity to the treatment field was specified as in-beam, at beam edge, near-beam, out-of-beam, or shielded.

Dose reconstruction approaches

The 4 dose reconstruction approaches included in this study are listed, and the processes are summarized in Figure 2 and described in detail in the following paragraphs. The examples of the surrogate anatomies are illustrated for methods 1 to 3 in Figure 3.

Fig. 2

An illustration of steps taken by the 4 different dose reconstruction approaches given the same input data, that is, data coding forms and historical-like radiographs of the 2 beams.

Fig. 3

A, An illustration of the stylized computational phantom showing the organs (represented by 3-dimensional grids of points) used by approach 1. Note that since the time of this study, the heart model in this phantom has been updated. B, A coronal view of an example of a computational phantom used by approach 2. The colored regions are representations of the segmented organs. C, A front sectional view of a patient-specific surrogate anatomy constructed by approach 3. The colored regions are representations of the “implanted” organs.

Approach 1: An age-scaled stylized computational phantom-based approach9, 18. Approach 2: A multiple-feature matched computational phantom-based approach10, 31. Approach 3: A surrogate anatomy ML-based approach. Approach 4: A surrogate-free ML-based approach42, 44. An illustration of steps taken by the 4 different dose reconstruction approaches given the same input data, that is, data coding forms and historical-like radiographs of the 2 beams. A, An illustration of the stylized computational phantom showing the organs (represented by 3-dimensional grids of points) used by approach 1. Note that since the time of this study, the heart model in this phantom has been updated. B, A coronal view of an example of a computational phantom used by approach 2. The colored regions are representations of the segmented organs. C, A front sectional view of a patient-specific surrogate anatomy constructed by approach 3. The colored regions are representations of the “implanted” organs.

Approach 1

The details of the age-scalable stylized computational phantom-based approach and its use in dose reconstructions for late effects studies are described in the literature9, 18. The computational phantom consists of rectangular cuboids for the head, neck, trunk, arms, and legs; organs are specified by 3D grids of evenly spaced points. For each of the 10 patients, the phantom was scaled to their age at RT by applying 3D scaling functions that account for nonuniform growth of different body regions. Organs for each phantom were also scaled to age at RT according to the scaling functions that were applied to each of the respective body regions. RT plans were then reconstructed on the age-scaled phantoms based on the field parameters in the coding forms and a visual check of field placements compared with the radiographs. Dose to all points in each organ were calculated using analytical dose models, from which Dmax, Dmean, and Dmin, were reported. For the right and left nipple, doses were reported for a single point on each side. For the spinal cord, doses were the average of the central point in each vertebra (T10, T11, L1 to L5, and S1). For the spleen, no dose was reported as a dose grid that represents the spleen is not available in the computational phantom. For this study, only the RT plans with rectangular open-beam fields (n = 10) were reconstructed using approach 1; however, in principle, blocking is also possible.

Approach 2

The multiple-feature matched computational phantom-based approach, based on a library (n = 351) of whole-body computational phantoms covering a large portion of the population in the United States in terms of age, sex, height, and weight, was previously developed20, 22. The approach considers multiple features (eg, age, sex, height, and weight) when selecting the surrogate phantom as available for a particular study. In this study, patient height, weight, and sex were provided in the data coding forms and were used to select the closest matched phantom from the phantom library as the surrogate anatomy. Next, the phantom (in the format of Digital Imaging and Communications in Medicine files was imported into a commercial TPS, Eclipse (Varian Medical Systems, Palo Alto, Calif, USA). RT plans were reconstructed based on the data coding forms and radiographs. Once the RT plans were created, the plan data were exported from the TPS for organ dose calculation using an RT-dedicated Monte Carlo transport code. Additional details on this approach are available in the literature10, 31. Right and left nipple dose were not reported as these structures were not explicitly defined in the phantom. Point doses in T10-S1 vertebrae were reported as a surrogate for the dose to the spinal cord.

Approach 3

Approach 3 is the latest extension of the work. The approach incorporates ML to automatically construct patient-specific phantoms. Among the several ML models we tested, we selected the model resulting from the gene-pool optimal mixing evolutionary algorithm for genetic programming (GP-GOMEA) for this study. GP-GOMEA is a state-of-the-art algorithm for learning interpretable ML models in the form of mathematical expressions, and, in particular, GP-GOMEA was recently shown to have better prediction performance among several other models (eg, Least absolute shrinkage and selection operator and random forest) in the task of constructing individualized phantoms. The training data set was similar to that used in, with some enhancements. Specifically, the training data set included a larger database of 136 CTs of pediatric patients with cancer, in the age range of 1 to 8 years with more organs delineated. For each of the 136 CT scans, various features analogous to those available in historical radiographs were extracted from digitally reconstructed radiographs. Multiple ML models (1 per ROI) were trained to separately predict the most similar organs and body contours, and the most likely location of each organ's center of mass, based on the extracted features12, 38. Next, the predicted best-matching organs (which may belong to different surrogate CTs) were automatically “virtually implanted” at the predicted locations within the predicted body contour, forming a composite patient-specific phantom. Based on the input data, a list of the features of the 10 patients extracted from the coding forms and 2D radiographs was used as input for this approach. The result was 10 patient-specific phantoms, which were then imported into the Oncentra TPS for manual reconstruction of the RT plans on the phantom as described in. Doses were calculated by the TPS based on the surrogate anatomy using the same collapsed cone algorithm as was used for the reference dose calculation.

Approach 4

Approach 4 is a dose prediction approach based on ML, which does not require any surrogate anatomy. For this approach, we also used the ML algorithm GP-GOMEA to build the models. In the implementation of the approach for this study, ML models were generated to directly predict organ dose-volume metrics given a list of available 2D patient and plan features. The training data included 136 abdominal CTs of patients between 1 to 8 years old and 300 artificial Wilms tumor RT plans. The artificial RT plans were automatically generated by sampling within plan border ranges defined by an experienced clinical oncologist. Each plan was simulated on each CT, resulting in a total of 40,800 dose distributions. The calculated organ dose-volume metrics were then used to train the ML models as response variables, whereas patients’ features available in historical records and detectable on digitally reconstructed radiographs, as well as features of the RT plan, were used as explanatory variables. Separate ML models were then generated for each organ dose-volume metric. Based on the input data, the features of the 10 patients and 20 plans were put into the trained ML models from which organ dose-volume metrics were obtained.

Dose evaluation

To assess the level of agreement between the reference doses and the reconstructed doses obtained by the 4 approaches, we computed the absolute difference (subtracting the reconstructed value from the reference value and taking the absolute value) for organ mean, minimum, maximum dose, V5, and V10 (denoted by DEmean, DEmin, DEmax, DEV5, and DEV10, respectively). To make results comparable, all the plans were normalized to a prescribed dose of 14.4 Gy. In addition to providing the average and range of the differences for each of the organ dose-volume metrics, Wilcoxon rank-sum testing was performed to check whether differences between deviation distributions obtained by the various approaches were statistically significant (P < .05).

Results

The average and range of the magnitude of the organ dose-volume metric differences obtained by the 4 approaches compared with the reference doses are summarized in Table 2 (for DEmean, DEV5, and DEV10) and Table 3 (for DEmin and DEmax). The values of Dmean calculated by the 4 approaches along with the reference values for the 20 cases are presented in the Supplementary Materials. Most of the organs considered in this study are in-field or near-field organs (the contralateral kidney, spleen, liver, and spinal cord) for which the reference dose metrics calculated from the TPS can be considered to be ground truth for comparison purposes.

Table 2

Average (range) of DEmean (in Gy), DEV5, and DEV10 (in %) of organ dose reconstructions (for a subset of the organs) obtained for the 20 reconstruction cases

	DE_mean (Gy)				DE_V5 (%)			DE_V10 (%)
Organ	Approach 1*	Approach 2	Approach 3	Approach 4	Approach 2	Approach 3	Approach 4	Approach 2	Approach 3	Approach 4
R Nipple	1.0 (0.2-5.1)		0.6 (0.0-2.8)	0.6 (0.0-3.6)
L Nipple	0.6 (0.0-1.5)		0.3 (0.0-2.6)	0.4 (0.0-1.5)
Liver	1.4 (0.2-3.0)	1.4 (0.2-3.3)	1.1 (0.1-2.3)	1.4 (0.1-3.6)
Spleen		0.5 (0.0-1.3)	0.9 (0.1-4.7)	0.9 (0.0-2.9)	15 (0-39)	7 (1-17)	10 (3-25)	13 (0-27)	8 (2-17)	10 (2-26)
R Kidney	0.6 (0.3- 0.9)	1.3 (0.6-2.2)	1.3 (0.1-2.5)	1.2 (0.5-2.3)	3 (0-10)	7 (0-34)	6 (0-27)	3 (0-11)	6 (0-35)	7 (0-23)
L Kidney	0.8 (0.2-1.4)	0.7 (0.0-2.0)	0.7 (0.3-1.0)	0.4 (0.0-0.8)	11 (4-20)	11 (1-20)	11 (5-22)	6 (1-10)	9 (1-18)	7 (2-14)
Spinal cord	0.8 (0.2-1.4)	0.7 (0.0-2.0)	0.7 (0.3-1.0)	0.4 (0.0-0.8)	6 (0-15)	6 (2-9)	3 (1-6)	4 (2-11)	4 (1-6)	3 (0-4)

Abbreviations: DEmean = organ mean dose; DEV5 = organ volume receiving at least 5 Gy dose; DEV10 = organ volume receiving at least 10 Gy dose; L = left; R = right.

For approach 1, the reconstruction for plans with a corner block applied was not performed. The statistics of the results are based on reconstruction outcomes of plans with open fields only.

The doses for all plans were scaled to a prescribed dose of 14.4 Gy before comparison. For each dose-volume metric and for each organ, the smallest average of the deviation values among the approaches is indicated in bold. Similarly, the smallest range of the deviation values is formatted in bold in parentheses. The empty fields in the table indicate that the organ dose-volume metrics of the respective approach are not available for the respective approach.

Table 3

Average (range) of DEmin and DEmax values (in Gy) of dose reconstructions obtained for the 20 reconstruction cases.

	DE_min (Gy)				DE_max (Gy)
Organ	Approach 1*	Approach 2	Approach 3	Approach 4	Approach 1*	Approach 2	Approach 3	Approach 4
R Lung					3.0 (0.2-13.2)	3.5 (0.1-12.3)	2.9 (0.0-13.0)	3.5 (0.5-8.6)
L Lung					3.4 (0.2-10.9)	3.3 (0.0-11.0)	4.4 (0.1-11.9)	3.7 (0.9-14.3)
Heart					2.0 (0.2-7.4)	3.6 (0.0-10.3)	2.4 (0.0-12.8)	3.2 (0.0-5.8)
Liver	3.4 (0.2-14.2)	0.3 (0.0-1.0)	0.3 (0.0-0.5)	0.1 (0.0-0.3)	0.6 (0.0-1.8)	0.3 (0.0-0.9)	0.5 (0.0-1.6)	0.4 (0.0-1.2)
Spleen		0.8 (0.0-10.4)	0.8 (0.0-12.2)	0.7 (0.0-6.5)		6.1 (0.0-14.0)	3.1 (0.0-12.3)	3.0 (0.1-10.1)
R Kidney	0.9 (0.7-1.1)	0.2 (0.0-0.4)	0.1 (0.0-0.2)	0.1 (0.0-0.1)	9.9 (9.3-10.6)	1.6 (0.2-6.5)	0.5 (0.1-1.4)	5.0 (2.7-6.1)
L Kidney	1.0 (0.4-1.5)	0.2 (0.0-0.4)	0.1 (0.0-0.2)	0.1 (0.0-0.2)	10.2 (8.7-12.2)	1.1 (0.0-2.9)	3.0 (0.2-7.5)	1.5 (0.2-3.6)
Spinal cord	0.7 (0.0-2.5)	0.4 (0.0-1.7)	0.1 (0.0-1.3)	0.9 (0.0-7.2)	0.5 (0.0-2.7)	0.6 (0.0-2.9)	0.4 (0.0-1.8)	0.5 (0.0-2.6)

The bolded numbers indicate the results with the smallest average deviation or smallest deviation range in brackets.

Abbreviations: DEmax = maximum dose; DEmin = minimum dose; L = left; R = right.

DEmin is available for a subset of organs.

For approach 1, the reconstruction for plans with a corner block applied was not performed. The statistics of the results are based on reconstruction outcomes of plans with open fields only.

Average (range) of DEmean (in Gy), DEV5, and DEV10 (in %) of organ dose reconstructions (for a subset of the organs) obtained for the 20 reconstruction cases Abbreviations: DEmean = organ mean dose; DEV5 = organ volume receiving at least 5 Gy dose; DEV10 = organ volume receiving at least 10 Gy dose; L = left; R = right. For approach 1, the reconstruction for plans with a corner block applied was not performed. The statistics of the results are based on reconstruction outcomes of plans with open fields only. The doses for all plans were scaled to a prescribed dose of 14.4 Gy before comparison. For each dose-volume metric and for each organ, the smallest average of the deviation values among the approaches is indicated in bold. Similarly, the smallest range of the deviation values is formatted in bold in parentheses. The empty fields in the table indicate that the organ dose-volume metrics of the respective approach are not available for the respective approach. Average (range) of DEmin and DEmax values (in Gy) of dose reconstructions obtained for the 20 reconstruction cases. The bolded numbers indicate the results with the smallest average deviation or smallest deviation range in brackets. Abbreviations: DEmax = maximum dose; DEmin = minimum dose; L = left; R = right. DEmin is available for a subset of organs. For approach 1, the reconstruction for plans with a corner block applied was not performed. The statistics of the results are based on reconstruction outcomes of plans with open fields only. For DEmean, an average deviation 1.4 Gy (10% of the prescribed dose) was found for most of the organs. Among the 7 organs in Table 2, approach 3 was found to have the smallest average DEmean for 4 organs. However, for none of the organs considered were these differences found to be significantly smaller than the other 3 approaches (P > .05). The largest DEmean values among organs for the 4 approaches were 5.1 Gy (right nipple) for approach 1, 3.3 Gy (liver) for approach 2, 4.7 Gy (spleen) for approach 3, and 3.6 Gy (right nipple and liver) for approach 4. For DEV5 and DEV10, on average a deviation 15% of the volume was found for all the organs. Except for the obtained values reported by approach 1 for the liver, an average of DEmin 1.0 Gy was found for all organs by all approaches. Approach 4 was found to achieve the smallest DEmin in both average and largest values for 4 out of 5 organs. For 4 out of 8 organs (both lungs, heart, and spleen), DEmax was found to be on average ≥2.0 Gy for all approaches. Among the 8 organs, approach 2 and approach 3 each had the smallest average DEmax for 3 organs, while approach 4 obtained the smallest average DEmax for 2 organs. The largest deviation (ie, worst case) reported for each organ's DEmin and DEmax over all approaches was 7.2 Gy and 14.3, respectively. Approach 3 was found to have significantly smaller DEmin than the other 3 approaches for the spinal cord (P = .01). No other distributions of DEmin or DEmax were found to be statistically different (P > .05). We observed that the reconstructed Dmean of the liver by all approaches had similar variations among the plans with the same laterality (see Supplementary Materials). The differences between Dmean for left-sided plans were smaller than for the right-sided plans. On average, the reconstructed Dmean values of the kidneys by all approaches had in general small differences (≤1.3 Gy). The reconstructed Dmean of the spinal cord had similar values across all plans for any of the approaches. When the Dmean of an organ for plans with a corner block had a different value than for plans with a rectangular open field, approaches 2, 3, and 4 were found to be able to capture the trend (ie, decrease) of the differences but not always accurately (ie, the magnitude of decrease; eg, see P9L vs P9LB in the Supplementary Materials). Furthermore, no significant differences in DEmean between plans with corner blocks applied and plans with open fields were found for these 3 approaches.

Discussion

In this study, the performance of 4 organ dose reconstruction approaches were validated and compared for the same RT plans. Multiple institutes participated and independently performed the dose reconstruction using their own approach based on the same input data. A comprehensive analysis was performed to assess and compare the dosimetry results. The results indicate that on average, the approaches achieved agreement within 10% of the prescribed dose for Dmean and Dmin and within 15% of the organ volume for V5 and V10. Lower agreement was observed for reconstructed Dmax doses for all approaches for near-field and out-of-field organs (eg, kidneys and spleen for right-sided plans). For near-field organs, this is mainly attributed to the high dose gradient at the edge of the field. For out-of-field organs, additionally, the reference dose values were calculated by the collapsed cone algorithm in the Oncentra TPS. This algorithm is known to be subject to underestimation compared with Monte Carlo simulations, as used in approach 2, and when compared with analytical models based on physical measurements, as applied in approach 1. Across all surrogate-based dose reconstruction approaches considered here, for in-beam/near-beam organs, the mismatch between anatomy of the surrogate phantom and the patient represents the main cause of reconstructed dose inaccuracies., For approach 1, the largest DEmin values for all organs were obtained, except for the spinal cord (largest for approach 4). This can potentially be due to the rough geometric modeling of the human anatomy in the approach 1 phantom (Fig 3). Limitations of our study include the small number of patients and plans included, as well as the sole focus on Wilms tumor plans in the abdominal region, which provided a limited spread of anatomic patient and geometric plan variations. Second, in this study we did not consider the uncertainty introduced by using a different dose calculation algorithm in the reference case compared with the dose calculation algorithms used by approach 1 and approach 2., Approach 3 and approach 4 (training stage) used the same dose calculation (collapsed cone algorithm) as was used for the reference dose calculation. Thus, a bias toward smaller dose differences for out-of-the-field organs may exist for the 2 ML-based approaches versus approach 1 and approach 2. Furthermore, some organ dose metrics were not reported for all approaches, such as the spleen dose of approach 1 and the right and left nipple dose of approach 2. Doses for these organs could be added in the future. Furthermore, for approach 1, dose reconstruction was not performed for plans involving corner blocks. In general, compared with approach 1 and 2, approach 3 achieved slightly better average values for Dmean (up to 0.3 Gy) and Dmax (up to 1.1 Gy) for in-field and near-field organs, indicating promising applications of leveraging ML for individualized phantom construction. However, this slight advantage does not apply for all organs and was not found to be statistically significant. Taking into account the relatively small amount of patient anatomies (136 CTs) used in the training stage, it is likely that the ML approach can perform better as more data become available to train the ML models; however, this remains to be seen., A disadvantage of approach 3 is the possible unrealistic 3D anatomy (eg, overlapping organs), as organ shape and locations were predicted independently. In this study, all 10 automatically assembled patient-specific phantoms had overlapping organ contours to some extent. In contrast, approach 2 uses more anatomically consistent phantoms containing a complete set of organs (we refer to for available organs). Overall better results were obtained by approach 2 compared with approach 1 for Dmin and Dmax. This may indicate that more realistic phantoms (approach 2) are a better surrogate anatomy than stylized phantoms or that height and weight (used to select the representative phantom in approach 2) are important features to consider for patient-phantom matching (approach 1 only considers age). However, for a subset of the organs, approach 2 and approach 3 still had larger dose differences than approach 1, for example, DEmean for the right kidney was found to be 0.7 Gy smaller for approach 1 compared with approaches 2 and 3. So, even if approach 1 is arguably simpler than the others in that it uses stylized phantoms and relies on age alone, it remains competitive on some organs. This suggests that the position and shape of some organs (like the right kidney) remain very hard to predict well even with more complex matching criteria (approach 2) or ML (approach 3). The predicted dose metric values of approach 4 were found to be comparable to the investigated surrogate-based approaches or even better (eg, for DEmin). Furthermore, in general a smaller range (or largest value) of differences between the reconstructed organ dose-volume metrics and the reference metrics was observed for approach 4 compared with the other 3 approaches. This indicates that approach 4 has fewer outliers (ie, large deviations in dose reconstruction) and could be considered more robust. The downside of approach 4 is the absence of the entire 3D dose distribution as it can only predict dose metrics for which the ML models are trained. Nevertheless, the promising results of the 2 ML-based approaches indicate that leveraging ML on a data set where variability of patient anatomy is captured can benefit the accuracy, efficiency, and robustness of an ML-based dose reconstruction approach. In epidemiologic studies of late effects, the level of detail of dosimetry that is required remains largely unknown., The required dose accuracy for a study greatly depends on the type of tumor, organs of interest, available outcome data, and study design, such as cohort or case-control. Some existing modeling studies use dose bins of 2 Gy, which indicates that the granularity of dose-effect relationships is limited to effects that can be distinguished at the level of 1 Gy dose difference. One Gy dose bins are almost never used because there would be too few people in each category. However, in future studies, if enough data are available and better dose reconstruction accuracy is desirable, smaller dose bins should be used to obtain finer dose-effect relationship models, or the dose might be modeled as a continuous variable instead of a categorical variable. The question of what level of accuracy is needed can be more solidly answered after the models using finer dose bins or continuous variables are evaluated. We chose several dose metrics for this study. The mean organ dose represents the average of the dose distributed in an organ and is commonly used to model organ-specific effects in published dose-effect studies., The minimum organ dose is considered as an indication of whether the organ is located near the irradiated region. The maximum organ dose is considered as some organs (eg, the spinal cord) have a serial functionality, and thus an understanding of the maximum dose to the organ and the LAEs is required. We further considered DEV5 and DEV10, as dose-volume histogram metrics are more commonly used as dose toxicity predictors and are used in current clinical practice of treatment planning for dose optimization and evaluation., Due to scarcity of data and relevant studies, it is currently difficult to provide clear clinical relevance of different dose metrics and to determine what level of inaccuracy in reconstruction is acceptable to still obtain a good dose-response modeling. In return, this is exactly why more research on the validation of dose reconstruction approaches is needed, as well as studies on their use in dose-response modeling and the associated robustness of the dose-response modeling to deviations in the input (ie, the dose reconstruction values). From a practical point of view, considering our results, we conclude that the selection of a dose reconstruction approach should be primarily based on the available historical data and the amount of time/effort and funding available for dose reconstruction. Approaches that require height and weight or 2D radiographs may not be appropriate when these data are not available, which is often the case. For the 4 approaches we compared, 2D radiographs are necessary for approach 3 and approach 4, but not for approach 1 and approach 2 (although 2D radiographs were used in this study by approach 1 and approach 2 to accurately position a plan on the phantom, which almost certainly benefitted the performance of these approaches). When height and weight information of the patient is not available, approach 2 will only use the age information and impute height and weight from standard growth tables (eg, using growth charts of the USA: https://www.cdc.gov/growthcharts/index.htm). When limited patient information is available, approaches 1 and 2 are more applicable compared with the ML-based, because age is the only patient feature needed to scale the phantoms. In terms of efficiency, approach 4 is a fully automatic pipeline to generate organ dose-volume metrics. Once the models are trained, the pipeline will automatically generate the required organ dose-volume metrics given the historical features of a patient and 2D radiographs in seconds. Approach 3 takes longer to run (minutes) as it handles 3D imaging to assemble a phantom according to ML predictions, but it is handy in that this is done automatically. The subsequent plan emulation and dose calculation step can also be carried out by an automatic plan emulation pipeline. In terms of time, if a cohort includes a large number of patients treated with similar types of RT plans and 2D radiographs are available, approach 4 is recommended as an efficient solution with competitive performance. However, if a study cohort includes a smaller amount of patient data associated with RT plans of large variability, the 3 surrogate-based approaches that provide individualized manual plan emulations are better choices.

Conclusions

We compared the performance of 4 different dose reconstruction approaches for 2D radiograph-based organ dose reconstruction by using the same patient data set. On average all dose reconstruction approaches obtained Dmean with similar accuracy (deviation ≤1.4 Gy) for the investigated organs, which can provide reliable input for dose modeling using 2 Gy dose bins. Conversely, predictions for Dmax were found to have much larger deviations, irrespective of the approach, suggesting that their use should be discouraged. A voxel phantom approach using multifeature matching (approach 2) provided the most realistic anatomy. An age-scaled phantom approach (approach 1) used the least amount of patient information while providing comparable dose reconstruction outcome for most of the investigated organs. Finally, ML-based individualized approaches (approach 3/4) achieved competitive results with the other 2 approaches, which are likely to further improve with more data, while employing automatic dose reconstruction procedures, which increases efficiency especially for larger cohorts.

41 in total

1. Assessment of different patient-to-phantom matching criteria applied in Monte Carlo-based computed tomography dosimetry.

Authors: Elliott J Stepusin; Daniel J Long; Emily L Marshall; Wesley E Bolch
Journal: Med Phys Date: 2017-09-01 Impact factor: 4.071

2. Radiation therapy of Wilms' tumor: results according to dose, field, post-operative timing and histology.

Authors: G J D'Angio; M Tefft; N Breslow; J A Meyer
Journal: Int J Radiat Oncol Biol Phys Date: 1978 Sep-Oct Impact factor: 7.038

Review 3. Radiation dose-volume effects in the heart.

Authors: Giovanna Gagliardi; Louis S Constine; Vitali Moiseenko; Candace Correa; Lori J Pierce; Aaron M Allen; Lawrence B Marks
Journal: Int J Radiat Oncol Biol Phys Date: 2010-03-01 Impact factor: 7.038

Review 4. Radiation dose-volume effects in the spinal cord.

Authors: John P Kirkpatrick; Albert J van der Kogel; Timothy E Schultheiss
Journal: Int J Radiat Oncol Biol Phys Date: 2010-03-01 Impact factor: 7.038

5. Conversion of computational human phantoms into DICOM-RT for normal tissue dose assessment in radiotherapy patients.

Authors: Keith T Griffin; Matthew M Mille; Christopher Pelletier; Mahesh Gopalakrishnan; Jae Won Jung; Choonik Lee; John Kalapurakal; Anil Pyakuryal; Choonsik Lee
Journal: Phys Med Biol Date: 2019-07-05 Impact factor: 3.609

6. The UF/NCI family of hybrid computational phantoms representing the current US population of male and female children, adolescents, and adults--application to CT dosimetry.

Authors: Amy M Geyer; Shannon O'Reilly; Choonsik Lee; Daniel J Long; Wesley E Bolch
Journal: Phys Med Biol Date: 2014-08-21 Impact factor: 3.609

Review 7. An exponential growth of computational phantom research in radiation protection, imaging, and radiotherapy: a review of the fifty-year history.

Authors: X George Xu
Journal: Phys Med Biol Date: 2014-08-21 Impact factor: 3.609

8. Standing adult human phantoms based on 10th, 50th and 90th mass and height percentiles of male and female Caucasian populations.

Authors: V F Cassola; F M Milian; R Kramer; C A B de Oliveira Lira; H J Khoury
Journal: Phys Med Biol Date: 2011-05-31 Impact factor: 3.609

9. Frequency distribution of second solid cancer locations in relation to the irradiated volume among 115 patients treated for childhood cancer.

Authors: Ibrahima Diallo; Nadia Haddy; Elisabeth Adjadj; Akhtar Samand; Eric Quiniou; Jean Chavaudra; Iannis Alziar; Nathalie Perret; Sylvie Guérin; Dimitri Lefkopoulos; Florent de Vathaire
Journal: Int J Radiat Oncol Biol Phys Date: 2009-04-20 Impact factor: 7.038

10. Development of an age-scalable 3D computational phantom in DICOM standard for late effects studies of childhood cancer survivors.

Authors: Aashish C Gupta; Suman Shrestha; Constance A Owens; Susan A Smith; Ying Qiao; Rita E Weathers; Peter A Balter; Stephen F Kry; Rebecca M Howell
Journal: Biomed Phys Eng Express Date: 2020-09-29