| Literature DB >> 35580067 |
Dong Joo Rhee1,2, Chidinma P Anakwenze Akinfenwa3, Bastien Rigaud4, Anuja Jhingran3, Carlos E Cardenas2, Lifei Zhang2, Surendra Prajapati2, Stephen F Kry2, Kristy K Brock4, Beth M Beadle5, William Shaw6, Frederika O'Reilly6, Jeannette Parkes7, Hester Burger7, Nazia Fakie7, Chris Trauernicht8, Hannah Simonds9, Laurence E Court2.
Abstract
PURPOSE: To determine the most accurate similarity metric when using an independent system to verify automatically generated contours.Entities:
Keywords: auto-contour; deep learning; similarity metrics
Mesh:
Year: 2022 PMID: 35580067 PMCID: PMC9359039 DOI: 10.1002/acm2.13647
Source DB: PubMed Journal: J Appl Clin Med Phys ISSN: 1526-9914 Impact factor: 2.243
FIGURE 1Examples of manually generated, clinically acceptable (green) and unacceptable (red) contours for the (a) UteroCervix, (b) bladder, (c) right kidney, and (d) rectum. (e) The reference autocontour (yellow) was clinically unacceptable when the verification autocontour (blue) was clinically acceptable. (f) Both the reference and the verification autocontours were clinically unacceptable
FIGURE 2(a) Diagram demonstrating the data acquisition process for automatic contour QA model development and (b) demonstrating that each set was split equally into three for threefold cross‐validation. QA, quality assurance
List of the combined metrics used in the multi‐metric analysis
| Name | Metrics used | Description |
|---|---|---|
| DSC_HD | DSC, HD_100 | Most used quantitative metrics |
| Three_SDSC | SDSC 1, 2, 3 mm | Top three SDSC from single‐metric analysis |
| Five_SDSC | SDSC 1, 2, 3,4, 5 mm | Top five SDSC from single‐metric analysis |
| Four_metrics | DSC, HD_100, HD_95, MSD | Four conventional quantitative metrics |
| Five_metrics | DSC, MSD, SDSC 1, 2, 3 mm | Two most effective conventional metrics + three most effective SDSCs |
| Seven_metrics | DSC, MSD, SDSC 1, 2, 3, 4, 5 mm | Two most effective conventional metrics + five most effective SDSCs |
| Nine_metrics | DSC, MSD, SDSC 1, 2, 3, 4, 5, 7, 10 mm | Two most effective conventional metrics + all SDSCs |
| All_metrics | DSC, HD_100, HD_95, MSD, SDSC 1, 2, 3, 4, 5, 7, 10 mm | All available metrics |
Abbreviations: DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; SDSC, surface Dice similarity coefficient.
FIGURE 3Average accuracies of the contour QA model with an individual metric for each structure with various penalty parameters, C. The error bar represents ±1 standard deviation from threefold cross‐validation. QA, quality assurance
Changes in accuracy when applying the average threshold of various structures instead of optimal thresholds for each structure
| Change in accuracy (∆Threshold) (%) | UteroCervix | CTVn | PAN | Bladder | Rectum | Kidneys |
|---|---|---|---|---|---|---|
| DSC | 0.82 → 0.79 (4.5%) | 0.87 → 0.88 (1.5%) | 0.87 → 0.76 (17.0%) | 0.88 → 0.86 (3.7%) | 0.88 → 0.88 (9.8%) | 0.94 → 0.87 (11.4%) |
| SDSC_1 | 0.91 → 0.91 (9.5%) | 0.90 → 0.91 (2.0%) | 0.83 → 0.81 (18.3%) | 0.89 → 0.90 (7.5%) | 0.94 → 0.93 (8.8%) | 0.93 → 0.92 (2.9%) |
| SDSC_2 | 0.89 → 0.89 (0.2%) | 0.90 → 0.89 (0.2%) | 0.86 → 0.79 (20.2%) | 0.91 → 0.90 (0.4%) | 0.94 → 0.93 (10.8%) | 0.97 → 0.93 (20.5%) |
| SDSC_3 | 0.88 → 0.88 (0.0%) | 0.74 → 0.74 (1.2%) | 0.88 → 0.80 (21.7%) | 0.92 → 0.90 (8.1%) | 0.92 → 0.92 (8.9%) | 0.96 → 0.93 (15.4%) |
Abbreviations: CTVn, nodal CTV; DSC, Dice similarity coefficient; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient.
Overall accuracies, sensitivities, and specificities with maximized accuracy through the SVM, fixed sensitivity of 0.90, and fixed sensitivity of 0.95 when surface DSC with a thickness of 2 mm was used
| SDSC_2 | Maximize accuracy | Sensitivity ≥0.90 | Sensitivity ≥0.95 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| UteroCervix | 0.89 | 0.79 | 0.94 | 0.90 | 0.90 | 0.90 | 0.86 | 0.95 | 0.81 |
| CTVn | 0.90 | 0.78 | 0.97 | 0.80 | 0.91 | 0.74 | 0.72 | 0.96 | 0.59 |
| PAN | 0.86 | 0.68 | 0.95 | 0.67 | 0.90 | 0.56 | 0.62 | 0.95 | 0.46 |
| Bladder | 0.91 | 0.79 | 0.97 | 0.85 | 0.90 | 0.83 | 0.79 | 0.95 | 0.72 |
| Rectum | 0.94 | 0.86 | 0.97 | 0.89 | 0.90 | 0.88 | 0.79 | 0.96 | 0.72 |
| Kidney | 0.97 | 0.90 | 0.99 | 0.97 | 0.90 | 0.99 | 0.97 | 0.95 | 0.97 |
Abbreviations: CTVn, nodal CTV; DSC, Dice similarity coefficient; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient; SVM, support vector machine.
AUCs of each structure and each metric
| AUC (95% CI) | UteroCervix | CTVn | PAN | Bladder | Rectum | Kidneys |
|---|---|---|---|---|---|---|
| DSC | 0.92 (0.89–0.94) | 0.92 (0.89–0.95) | 0.86 (0.82–0.89) | 0.92 (0.90–0.94) | 0.92 (0.89–0.94) | 0.97 (0.95–0.99) |
| HD_100 | 0.85 (0.81–0.88) | 0.75 (0.71–0.79) | 0.75 (0.70–0.80) | 0.93 (0.90–0.95) | 0.81 (0.76–0.84) | 0.91 (0.88–0.93) |
| HD_95 | 0.87 (0.83–0.89) | 0.83 (0.79–0.86) | 0.70 (0.65–0.74) | 0.96 (0.94–0.97) | 0.83 (0.80–0.86) | 0.95 (0.92–0.97) |
| MSD | 0.93 (0.91–0.95) | 0.92 (0.89–0.94) | 0.84 (0.80–0.88) | 0.97 (0.96–0.98) | 0.92 (0.89–0.94) | 0.96 (0.93–0.98) |
| SDSC 1 mm | 0.96 (0.94–0.97) | 0.93 (0.90–0.95) | 0.90 (0.87–0.93) | 0.95 (0.93–0.97) | 0.96 (0.94–0.98) | 0.95 (0.92–0.97) |
| SDSC 2 mm | 0.96 (0.94–0.97) | 0.93 (0.91–0.95) | 0.89 (0.86–0.92) | 0.96 (0.94–0.97) | 0.96 (0.95–0.98) | 0.97 (0.95–0.99) |
| SDSC 3 mm | 0.95 (0.93 – 0.96) | 0.93 (0.90–0.95) | 0.87 (0.83–0.91) | 0.97 (0.96–0.98) | 0.95 (0.92–0.97) | 0.97 (0.95–0.99) |
| SDSC 4 mm | 0.93 (0.91–0.95) | 0.92 (0.89–0.94) | 0.85 (0.80–0.89) | 0.97 (0.95–0.98) | 0.93 (0.90–0.96) | 0.96 (0.94–0.98) |
| SDSC 5 mm | 0.92 (0.89–0.94) | 0.91 (0.88–0.94) | 0.83 (0.79–0.88) | 0.97 (0.95–0.98) | 0.92 (0.88–0.94) | 0.95 (0.93–0.97) |
| SDSC 7 mm | 0.90 (0.87–0.93) | 0.89 (0.86–0.92) | 0.81 (0.76–0.85) | 0.96 (0.94–0.97) | 0.89 (0.85–0.92) | 0.94 (0.92–0.96) |
| SDSC 10 mm | 0.88 (0.85–0.92) | 0.85 (0.81–0.88) | 0.80 (0.75–0.84) | 0.91 (0.88–0.94) | 0.85 (0.81–0.89) | 0.91 (0.88–0.94) |
Note: 95% CI for AUCs were derived with the bootstrapping method with n = 2000.
Abbreviations: AUC, area under the ROC curve; CI, confidence interval; CTVn, nodal CTV; DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient.
FIGURE 4The ROC curves with a surface DSC with a tolerance of 2 mm, the best metric to predict the clinical acceptability of the automatically generated contours. DSC, Dice similarity coefficient
FIGURE 5Average accuracies of the SVM model with multiple metrics for each structure. The error bar represents ±1 standard deviation. Four different kernels (linear, polynomial, rbf, and sigmoid) were tested. rbf, radial basis function; SVM, support vector machine
Overall accuracies from the single‐metric and multi‐metric analyses, when SVM was used with the linear kernel
| Single‐metric | UteroCervix | CTVn | PAN | Bladder | Rectum | Kidneys |
|---|---|---|---|---|---|---|
| DSC | 0.82 | 0.87 | 0.87 | 0.88 | 0.88 | 0.94 |
| HD_100 | 0.78 | 0.72 | 0.71 | 0.85 | 0.75 | 0.86 |
| HD_95 | 0.77 | 0.76 | 0.66 | 0.87 | 0.76 | 0.91 |
| MSD | 0.87 | 0.88 | 0.85 | 0.91 | 0.89 | 0.95 |
| SDSC 1 mm | 0.91 | 0.90 | 0.83 | 0.89 | 0.94 | 0.93 |
| SDSC 2 mm | 0.89 | 0.90 | 0.86 | 0.91 | 0.94 | 0.97 |
| SDSC 3 mm | 0.88 | 0.88 | 0.89 | 0.92 | 0.92 | 0.96 |
| SDSC 4 mm | 0.89 | 0.88 | 0.87 | 0.91 | 0.91 | 0.95 |
| SDSC 5 mm | 0.88 | 0.88 | 0.87 | 0.89 | 0.89 | 0.94 |
| SDSC 7 mm | 0.85 | 0.87 | 0.87 | 0.89 | 0.86 | 0.92 |
| SDSC 10 mm | 0.80 | 0.79 | 0.83 | 0.86 | 0.78 | 0.86 |
Abbreviations: CTVn, nodal CTV; DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient; SVM, support vector machine.
Overall sensitivities from the single‐ and multi‐metric analyses, when SVM was used with the linear kernel
| Single‐metric | UteroCervix | CTVn | PAN | Bladder | Rectum | Kidneys |
|---|---|---|---|---|---|---|
| DSC | 0.59 | 0.67 | 0.68 | 0.69 | 0.70 | 0.71 |
| HD_100 | 0.46 | 0.33 | 0.32 | 0.66 | 0.22 | 0.47 |
| HD_95 | 0.46 | 0.53 | 0.00 | 0.72 | 0.33 | 0.65 |
| MSD | 0.74 | 0.73 | 0.65 | 0.78 | 0.78 | 0.82 |
| SDSC 1 mm | 0.82 | 0.76 | 0.72 | 0.76 | 0.87 | 0.73 |
| SDSC 2 mm | 0.79 | 0.78 | 0.68 | 0.79 | 0.86 | 0.90 |
| SDSC 3 mm | 0.74 | 0.74 | 0.71 | 0.78 | 0.81 | 0.82 |
| SDSC 4 mm | 0.77 | 0.73 | 0.69 | 0.76 | 0.77 | 0.74 |
| SDSC 5 mm | 0.74 | 0.70 | 0.67 | 0.69 | 0.69 | 0.71 |
| SDSC 7 mm | 0.67 | 0.64 | 0.68 | 0.68 | 0.59 | 0.61 |
| SDSC 10 mm | 0.53 | 0.41 | 0.58 | 0.57 | 0.31 | 0.29 |
Abbreviations: CTVn, nodal CTV; DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient; SVM, support vector machine.
Overall specificities from the single‐ and multi‐metric analyses, when SVM was used with the linear kernel
| Single‐metric | UteroCervix | CTVn | PAN | Bladder | Rectum | Kidneys |
|---|---|---|---|---|---|---|
| DSC | 0.94 | 0.98 | 0.97 | 0.96 | 0.94 | 1.00 |
| HD_100 | 0.94 | 0.93 | 0.90 | 0.93 | 0.96 | 0.95 |
| HD_95 | 0.92 | 0.89 | 0.93 | 0.94 | 0.93 | 0.97 |
| MSD | 0.93 | 0.96 | 0.95 | 0.98 | 0.93 | 0.98 |
| SDSC 1 mm | 0.95 | 0.97 | 0.89 | 0.95 | 0.97 | 0.97 |
| SDSC 2 mm | 0.94 | 0.97 | 0.95 | 0.97 | 0.97 | 0.99 |
| SDSC 3 mm | 0.94 | 0.96 | 0.98 | 0.99 | 0.97 | 0.99 |
| SDSC 4 mm | 0.94 | 0.96 | 0.97 | 0.99 | 0.97 | 1.00 |
| SDSC 5 mm | 0.94 | 0.98 | 0.98 | 0.99 | 0.97 | 1.00 |
| SDSC 7 mm | 0.94 | 0.99 | 0.98 | 0.99 | 0.96 | 1.00 |
| SDSC 10 mm | 0.94 | 1.00 | 0.96 | 1.00 | 0.96 | 1.00 |
Abbreviations: CTVn, nodal CTV; DSC, Dice similarity coefficient; HD, Hausdorff distance; MSD, mean surface distance; PAN, para‐aortic lymph nodes; SDSC, surface Dice similarity coefficient; SVM, support vector machine.
FIGURE 6False positives can make the thresholds more generous (blue dashed lines) than the desired thresholds (brown dashed lines) and result in having more false negatives in clinical situations
FIGURE 7The surface DSC distributions of the clinically acceptable and unacceptable kidney contours with (left) and without (right) the manually generated contours. The thresholds can be confidently determined with the manual contours, whereas the threshold can be anywhere between the blue and red dashed lines without the manual contours due to insufficient amount of data. DSC, Dice similarity coefficient