| Literature DB >> 35387126 |
Geng Yang1,2,3, Zhenhui Dai3, Yiwen Zhang1,2, Lin Zhu3, Junwen Tan4, Zefeiyun Chen1,2, Bailin Zhang3, Chunya Cai3, Qiang He3, Fei Li3, Xuetao Wang3, Wei Yang1,2.
Abstract
Purpose: Accurate segmentation of gross target volume (GTV) from computed tomography (CT) images is a prerequisite in radiotherapy for nasopharyngeal carcinoma (NPC). However, this task is very challenging due to the low contrast at the boundary of the tumor and the great variety of sizes and morphologies of tumors between different stages. Meanwhile, the data source also seriously affect the results of segmentation. In this paper, we propose a novel three-dimensional (3D) automatic segmentation algorithm that adopts cascaded multiscale local enhancement of convolutional neural networks (CNNs) and conduct experiments on multi-institutional datasets to address the above problems. Materials andEntities:
Keywords: CT images; deep learning; nasopharyngeal carcinoma; radiotherapy; segmentation
Year: 2022 PMID: 35387126 PMCID: PMC8979212 DOI: 10.3389/fonc.2022.827991
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Details of the multi-institutional datasets.
| Dataset | A | B | C |
|---|---|---|---|
|
| Our department | Institution B | MICCAI 2019 |
|
| pCT + CE-CT | pCT | pCT |
|
| 257 | 40 | 50 |
|
| T1-T4 (30:93:87:47) | NA | NA |
|
| 512 × 512 | 512 × 512 | 512 × 512 |
|
| 3 mm | 3 mm | 3 mm |
Figure 1Overview of the workflow of the proposed method.
Figure 2Network architecture of the proposed CNN model.
Quantitative comparison of different backbone models for GTV segmentation performance, including mean DSC, PPR, SEN, ASSD and HD95 with standard deviation.
| Method/P -value | DSC (%) | PPR (%) | SEN (%) | ASSD (mm) | HD95 (mm) | |
|---|---|---|---|---|---|---|
| ① 3D CNN ( | 73.67 ± 7.88 | 76.74 ± 14.71 | 75.27 ± 14.31 | 1.84 ± 3.91 | 6.32 ± 13.77 | |
| ② 3D Attention-UNet ( | 73.54 ± 7.16 | 75.95 ± 14.59 |
| 1.80 ± 1.64 | 6.74 ± 11.91 | |
| ③ 3D UNet++ ( | 73.87 ± 7.07 | 77.73 ± 14.40 | 74.82 ± 14.26 | 1.53 ± 0.60 | 5.17 ± 3.04 | |
| ④ Proposed 3D Res-UNet |
|
| 73.90 ± 14.58 |
|
| |
| P-value | ④ | 0.026* | <0.001* | 0.041* | 0.179 | 0.190 |
| ④ | 0.007* | <0.001* | 0.001* | 0.005* | 0.043* | |
| ④ | 0.027* | <0.001* | 0.107 | 0.213 | 0.520 | |
Asterisks (∗) indicate that the difference between the proposed 3D Res-UNet method and the competing method is statistically significant (p < 0.05) using a paired t-test. The best result is highlighted in bold.
Figure 3Visual comparison of different networks for GTV segmentation. The red arrows denote false positives and poorly segmented areas. Note that these results derived from the models trained on pCT and CE-CT data are shown in pCT.
Comparison of the effects of adding the ASPP blocks and the proposed cascade architecture.
| Method/P -value | DSC (%) | PPR (%) | SEN (%) | ASSD (mm) | HD95 (mm) | |
|---|---|---|---|---|---|---|
| ① Res-UNet | 74.49 ± 7.81 |
| 73.90 ± 14.58 | 1.49 ± 0.65 | 5.06 ± 3.30 | |
| ② MDR-UNet | 75.16 ± 6.76 | 78.73 ± 13.50 | 75.81 ± 13.65 | 1.68 ± 3.44 | 5.82 ± 13.57 | |
| ③ CMDR-UNet | 7 | 77.34 ± 14.05 |
|
|
| |
| P-value | ② | 0.065 | 0.002* | <0.001* | 0.409 | 0.418 |
| ③ | <0.001* | <0.001* | <0.001* | 0.233 | 0.244 | |
| ③ | <0.001* | <0.001* | <0.001* | <0.001* | 0.002* | |
Two-tailed p-values <0.05 were considered statistically significant between the proposed different models using paired t-tests. The best result is highlighted in bold.
∗p < 0.05 was considered significant. The values were represented as mean ± standard deviation. MDR-UNet: Adding the multiscale dilate CNN of the ASPP module. CMDR-UNet: Adding our proposed cascade architecture.
Figure 4Boxplot showing the DSCs of different models. The symbols represents the means, represents the 1% and 99%, represents the min and max.
Figure 5Visual comparison of the boundary feature maps obtained by including the Res-UNet backbone and adding the ASPP block. Map columns: includes output feature maps and boundary feature maps, where warmer colors represent higher attention. Outline columns: red and blue denote the ground truth and automatic segmentation results, respectively. The red arrows denote boundary areas noticed by the ASPP block, which has better boundary segmentation performance. Note that the two rows are from different patients and shown in pCT.
Figure 6Example pCT images show the level of consistency for GTVnx between the automatic delineation with our method and the ground truth. Red lines denote the human expert-delineated ground truth, and the other lines denote the contours of the automatic delineation.
The segmentation results of the models pretrained, fine-tuned and validated on the dataset A, B, C and B+C.
| Training datasets | Test datasets | DSC (%) | ASSD (mm) |
|---|---|---|---|
| A (pCT) | A |
|
|
| B | 68.21 ± 5.51 | 2.09 ± 0.74 | |
| C | 61.64 ± 13.55 | 2.13 ± 1.02 | |
| B+C | 64.26 ± 11.54 | 2.12 ± 0.93 | |
| A Pretrained + B (70%) fine-tuning | ① B (30%) |
|
|
| A Pretrained + C (70%) fine-tuning | ① C (30%) |
|
|
| A Pretrained + (B+C) (70%) fine-tuning | ② B (30%) | 73.95 ± 8.66 | 1.89 ± 0.78 |
| ② C (30%) | 67.43 ± 12.35 | 1.60 ± 0.49 | |
| P-value | ② B | 0.810 | 0.893 |
| ② C | 0.180 | 0.041* |
Two-tailed p-values <0.05 were considered statistically significant. The best result is highlighted in bold.
The values were represented as mean ± standard deviation. ∗p < 0.05 was considered significant. A: Our institution; B: institution B; and C: MICCAI2019.