| Literature DB >> 35514528 |
Edward G A Henderson1, Eliana M Vasquez Osorio1,2, Marcel van Herk1,2, Andrew F Green1,2.
Abstract
Background and purpose: Convolutional neural networks (CNNs) are increasingly used to automate segmentation for radiotherapy planning, where accurate segmentation of organs-at-risk (OARs) is crucial. Training CNNs often requires large amounts of data. However, large, high quality datasets are scarce. The aim of this study was to develop a CNN capable of accurate head and neck (HN) 3D auto-segmentation of planning CT scans using a small training dataset (34 CTs). Materials and Method: Elements of our custom CNN architecture were varied to optimise segmentation performance. We tested and evaluated the impact of: using multiple contrast channels for the CT scan input at specific soft tissue and bony anatomy windows, resize vs. transpose convolutions, and loss functions based on overlap metrics and cross-entropy in different combinations. Model segmentation performance was compared with the inter-observer deviation of two doctors' gold standard segmentations using the 95th percentile Hausdorff distance and mean distance-to-agreement (mDTA). The best performing configuration was further validated on a popular public dataset to compare with state-of-the-art (SOTA) auto-segmentation methods.Entities:
Keywords: 3D convolutional neural network; CT scan auto-segmentation; Limited data
Year: 2022 PMID: 35514528 PMCID: PMC9065428 DOI: 10.1016/j.phro.2022.04.003
Source DB: PubMed Journal: Phys Imaging Radiat Oncol ISSN: 2405-6316
Fig. 1The CNN architecture used in this study. The base model used was a 3D Res-UNet with deep supervision. In this figure we highlight the three modifications that form the presented experiments. We compared using multiple contrast settings for the model input (1), resize or transpose convolutions in the decoder portion (2) and three different loss functions (3). When using transpose convolutions (orange), we did not perform tri-linear up-sampling.
Fig. 2a) The windowing ramp function to map CT image intensities. b) Contrast settings for the full-width window baseline approach. c-e) Window width and level contrast settings selected for our multiple input channels approach.
Median values of the HD95 and meanDTA metrics for every model configuration. Lower values show closer agreement between the CNN predicted segmentations and the gold standard. In this table T and R indicate models using either Transpose or Resize convolutions in the decoder portion respectively. These results are summarised by the median and standard deviation of the metrics for each OAR across all patients in the five test set folds. The best performing configurations for each OAR are highlighted in bold font and are determined with more significant figures than shown. For the HD95, the majority of the spinal cord results reflect the CT image slice thickness (2.50 mm). This suggests most models made errors in the spinal cord length by a single slice. Model configurations trained with the ExpLogLoss function consistently produce better segmentations.
| Loss | Conv. | In-ch. | Brainstem | Mandible | L Parotid | R Parotid | Spinal cord |
|---|---|---|---|---|---|---|---|
| wSD | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| wSD + XE | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| Exp Log Loss | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| Doctor comparison | |||||||
| wSD | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| wSD + XE | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| Exp Log Loss | T | 1 | |||||
| T | 3 | ||||||
| R | 1 | ||||||
| R | 3 | ||||||
| Doctor comparison | |||||||
Fig. 3Boxplots comparing the mDTA for the four model configurations trained using the best performing loss function, ExpLogLoss, and the deviation between doctors for reference (blue boxes). For this figure, lower values indicate better segmentations. Configurations using 3-channel input (3 R&T) outperform the single-channel counterparts (1 R&T) in all soft tissue OARs. Models with traditional transpose convolutions (T) produce marginally better segmentations, with the best-performing model highlighted in green.
HD95, ASD/mDTA and DSC comparison results on the MICCAI’15 set. Bold font indicates the best performing model. Dashes indicate that results for the OAR are not reported. *Kawahara et al. reported a single DSC for the parotids.
| Brainstem | Mandible | Left Parotid | Right Parotid | |
|---|---|---|---|---|
| Gao et al. | 1.08 ± 0.45 | |||
| Gou et al. | 2.98 ± 0.61 | 1.40 ± 0.02 | 3.48 ± 1.28 | 3.15 ± 0.67 |
| Ours | 2.83 ± 1.05 | 2.87 ± 0.89 | 3.55 ± 1.35 | |
| Huang et al. | 1.28 ± 0.45 | 0.56 ± 0.27 | 0.86 ± 0.24 | 1.02 ± 0.38 |
| Gou et al. | 1.19 ± 0.16 | 0.47 ± 0.11 | 1.21 ± 0.34 | 1.14 ± 0.22 |
| Ours | ||||
| Huang et al. | 87.9 ± 2.4 | 91.6 ± 2.1 | 88.4 ± 1.5 | 87.8 ± 2.0 |
| Zhang et al. | 87 ± 3 | 87 ± 7 | ||
| Gao et al. | 88.2 ± 2.5 | 94.7 ± 1.1 | ||
| Gou et al. | 88 ± 2 | 94 ± 1 | 87 ± 3 | 86 ± 5 |
| Kawahara et al. | 88 | - | 81* | 81* |
| Ours | 88.3 ± 3.6 | 93.4 ± 1.9 | 88.6 ± 1.6 | 87.2 ± 3.1 |
Fig. 4Example segmentations produced by our CNN model (green). In the top row, a) and b), we show 2D axial and sagittal views of a patient from the dataset we used for model development. This dataset contained segmentations produced by two doctors which are shown and red and blue. On the bottom row, c) and d), we show axial and sagittal 2D slices of a patient from the MICCAI’15 set. The gold-standard segmentations for this set are shown in purple.