| Literature DB >> 33261620 |
Ward van Rooij1, Max Dahele2, Hanne Nijhuis2, Berend J Slotman2, Wilko F Verbakel2.
Abstract
BACKGROUND: Deep learning-based delineation of organs-at-risk for radiotherapy purposes has been investigated to reduce the time-intensiveness and inter-/intra-observer variability associated with manual delineation. We systematically evaluated ways to improve the performance and reliability of deep learning for organ-at-risk segmentation, with the salivary glands as the paradigm. Improving deep learning performance is clinically relevant with applications ranging from the initial contouring process, to on-line adaptive radiotherapy.Entities:
Keywords: Artificial intelligence; Deep learning; Salivary glands; Segmentation
Year: 2020 PMID: 33261620 PMCID: PMC7709305 DOI: 10.1186/s13014-020-01721-1
Source DB: PubMed Journal: Radiat Oncol ISSN: 1748-717X Impact factor: 3.481
Fig. 1Overview of the preprocessing steps for the base set-up for a random PG. The clinical contour is depicted in red. The corresponding value ranges are given below the images. The steps are identical for the SMG
Fig. 2Example of an original image (a/c) and its domain-specific augmented version (b/d) for a SMG (a/b) and a PG (c/d) and examples of clinical (red) vs. curated (yellow) contours for SMG/PG (e/f)
Fig. 3Model performance/reliability measured by SDC/HD per set size (a/b), per number of augmented images with traditional data augmentation (c/d) and per number of augmented images with domain-specific augmentation (e/f)
Model performance/reliability measured by SDC/HD with/without curation, per cost function and for set window versus patient-specific window
| Set sizes (SMG/PG) | SDC | HD | ||||||
|---|---|---|---|---|---|---|---|---|
| Train | Validation | Test | SMG | PG | SMG | PG | ||
| Data quality | ||||||||
| Train data | Test data | |||||||
| Clinical | Clinical | 18/18 | 2/2 | 4/4 | .68 ± .06 | .68 ± .05 | 17.6 ± 1.5 | 24.7 ± 5.1 |
| Curated | Curated | 18/18 | 2/2 | 4/4 | .66 ± .07 | .68 ± .04 | 23.4 ± 1.3 | 28.1 ± 4.3 |
| Train data | Test data | |||||||
| Clinical | Clinical | 90/90 | 2/2 | 4/4 | .67 ± .06 | .69 ± .06 | 13.6 ± 1.3 | 24.8 ± 5.3 |
| Curated | Curated | 90/90 | 2/2 | 4/4 | .67 ± .07 | .69 ± .04 | 12.0 ± 1.5 | 21.8 ± 4.6 |
| Cost functions | ||||||||
| SDC | 90/90 | 10/10 | 20/20 | .71 ± .06 | .71 ± .06 | 6.9 ± 1.5 | 17.3 ± 6.4 | |
| SDC(0.5) | 90/90 | 10/10 | 20/20 | .71 ± .06 | .71 ± .06 | 9.0 ± 3.0 | 17.4 ± 6.8 | |
| SDC(0.05) | 90/90 | 10/10 | 20/20 | .70 ± .06 | .71 ± .06 | 6.6 ± 1.6 | 16.6 ± 7.0 | |
| SDC + HD | 90/90 | 10/10 | 20/20 | .70 ± .05 | 7.6 ± 2.4 | |||
| Patient-specific windowing | ||||||||
| Set window | 940/1024 | 94/114 | 188/227 | .86 ± .07 | .85 ± .05 | 4.5 ± 1.9 | 8.1 ± 3.8 | |
| Patient-specific window | 940/1024 | 94/114 | 188/227 | .87 ± .05 | .87 ± .04 | 4.1 ± 1.6 | 7.7 ± 3.8 | |
Fig. 4Model performance/reliability measured by SDC/HD (a/b) with ensemble methods, including the effect of different cut-offs. Grey lines show average (-)/standard deviation(–) of SDC/HD for all stand-alone models in this experiment
Fig. 5Model trained with the maximum set size (MSS; blue) versus an ensemble of models trained with the maximum set size, doubled in size by domain-specific data augmentation and with patient-specific windowing applied (CE; red) for SMG/PG
Fig. 6a Illustrative example of the effect of using an ensemble for an oddly shaped PG; different cut-off levels depicted by shades of blue (low = light, high = dark), clinical delineation is in red. b An inaccuracy (indicated by the yellow arrow) in the clinical contour (red) causing the SDC to be lower than when the DL contour (blue) would have been compared to the actual ground-truth