| Literature DB >> 36120083 |
Xing-Rui Wang1, Xi Ma1, Liu-Xu Jin1, Yan-Jun Gao1, Yong-Jie Xue1, Jing-Long Li1, Wei-Xian Bai1, Miao-Fei Han2, Qing Zhou2, Feng Shi2, Jing Wang3.
Abstract
Objective: To explore the feasibility of a deep learning three-dimensional (3D) V-Net convolutional neural network to construct high-resolution computed tomography (HRCT)-based auditory ossicle structure recognition and segmentation models.Entities:
Keywords: auditory ossicles; automatic segmentation; computed tomography; convolutional neural network; deep learning
Year: 2022 PMID: 36120083 PMCID: PMC9470864 DOI: 10.3389/fninf.2022.937891
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 3.739
FIGURE 1An anatomical illustration of auditory ossicles located in the middle ear, including the malleus, incus, and stapes.
FIGURE 2The VB-Net network structure.
FIGURE 3Algorithm flow chart.
FIGURE 4(A–C) The manual segmentation of the auditory ossicles; (D–F) the automatic segmentation of the auditory ossicles using the V-Net method.
The detailed configuration for coarse-to-fine segmentation network.
| Configuration | Coarse network | Fine network |
| Resample | [1, 1, 1] mm | [0.2, 0.2, 0.335] mm |
| Patch size | [96, 96, 64] | [96, 96, 96] |
| Normalize | ||
| Learning rate | Step learning rate schedule (initial, 1e-2) | Step learning rate schedule (initial, 1e-2) |
| Optimizer | Adam [momentum = 0.9, decay = 1e-4, betas = (0.9, 0.999)] | Adam [momentum = 0.9, decay = 1e-4, betas = (0.9, 0.999)] |
| Software | PyTorch | PyTorch |
| Hardware | Nvidia Tesla V100 GPU | Nvidia Tesla V100 GPU |
Comparison of segmentation performance between different experiment settings (patch size, patch spacing) under 3D V-Net.
| DSC | |||
|
| |||
| Malleus | Incus | Stapes | |
| Es1 | 0.800 ± 0.018 | 0.787 ± 0.051 | 0.528 ± 0.103 |
| Es2 |
|
|
|
| Es3 | 0.836 ± 0.242 | 0.846 ± 0.237 | 0.799 ± 0.236 |
| Es4 | 0.809 ± 0.048 | 0.797 ± 0.057 | 0.609 ± 0.102 |
| Es5 | 0.817 ± 0.045 | 0.797 ± 0.059 | 0.618 ± 0.103 |
|
| |||
|
| |||
|
| |||
|
|
|
| |
|
| |||
| Es1 | 0.654 ± 0.047 | 0.642 ± 0.040 | 0.580 ± 0.108 |
| Es2 |
|
|
|
| Es3 | 28.990 ± 110.4 | 31.8 ± 120.1 | 26.2 ± 102.4 |
| Es4 | 0.636 ± 0.170 | 0.700 ± 0.210 | 0.649 ± 0.168 |
| Es5 | 0.638 ± 0.170 | 0.706 ± 0.217 | 0.652 ± 0.168 |
|
| |||
|
| |||
|
| |||
|
|
|
| |
|
| |||
| Es1 | 1.439 ± 0.170 | 1.594 ± 0.664 | 4.393 ± 2.368 |
| Es2 |
|
|
|
| Es3 | 31.3 ± 113.7 | 33.9 ± 123.6 | 28.2 ± 104.7 |
| Es4 | 1.740 ± 1.965 | 1.332 ± 0.188 | 1.475 ± 0.0383 |
| Es5 | 1.332 ± 0.236 | 1.351 ± 0.208 | 1.334 ± 0.237 |
Es1: In the coarse model training configuration, patch size and spacing were set as [96, 96, 64] and [2.5, 2.5, 2.5] mm, respectively. In the fine model training configuration, patch size and spacing were set as [96, 96, 96] and [0.5, 0.5, 0.75] mm, respectively.
Es2: In the coarse model training configuration, patch size and spacing were set as [96, 96, 64] and [1, 1, 1] mm, respectively. In the fine model training configuration, patch size and spacing were set as [96, 96, 96] and [0.2, 0.2, 0.335] mm, respectively.
Es3: In the coarse model training configuration, patch size and spacing were set as [96, 96, 64] and [0.5, 0.5, 0.5] mm, respectively. In the fine model training configuration, patch size and spacing were set as [96, 96, 96] and [0.1, 0.1, 0.2] mm, respectively.
Es4: In the coarse model training configuration, patch size and spacing were set as [128, 128, 96] and [1, 1, 1] mm, respectively. In the fine model training configuration, patch size and spacing were set as [128, 128, 128] and [0.2, 0.2, 0.335] mm, respectively.
Es5: In the coarse model training configuration, patch size and spacing were set as [64, 64, 48] and [1, 1, 1] mm, respectively. In the fine model training configuration, patch size and spacing were set as [64, 64, 64] and [0.2, 0.2, 0.335] mm, respectively.
Bold fonts represent the best performance among all training configurations.
Comparison of DSC between automatic segmentation and manual segmentation under two neural networks.
| Network structure | DSC | ||
|
| |||
| Malleus | Incus | Stapes | |
| 3D V-Net | 0.920 ± 0.014 | 0.925 ± 0.014 | 0.835 ± 0.035 |
| U-Net | 0.876 ± 0.025 | 0.889 ± 0.023 | 0.758 ± 0.044 |
|
| –13.602 | –11.762 | –11.727 |
|
| < 0.001 | < 0.001 | < 0.001 |
Comparison of HD95 between automatic segmentation and manual segmentation under two neural networks.
| Network structure | HD95 | ||
|
| |||
| Malleus | Incus | Stapes | |
| 3D V-Net | 1.016 ± 0.080 | 1.000 ± 0.000 | 1.027 ± 0.102 |
| U-Net | 1.361 ± 0.872 | 1.174 ± 0.350 | 1.455 ± 0.618 |
|
| 3.4559 | 4.3487 | 5.4810 |
|
| < 0.001 | < 0.001 | < 0.001 |
FIGURE 5(A) A three-dimensional display of the segmentation results. On the left side is the segmentation result of the ground truth and the 3D V-Net method. The right side shows the difference in the surface distance between the two results. The auditory ossicles comprise the malleus, incus, and stapes. In this paper, positive values represent over-segmentation, and negative values represent under-segmentation. The left side of the figure corresponds to the left auditory ossicles, and the right corresponds to the right auditory ossicles. (B) Comparison of the segmentation results. The segmentation results of the two methods are compared with the surface coincidence degree of the ground truth. Green is the ground truth reconstruction, and red is the segmentation result reconstruction.
Average time for manual segmentation and model segmentation of auditory ossicles.
| Average segmentation time/second (s) | |
| Senior Physician 1 | 220.31 |
| Senior Physician 2 | 225.64 |
| Intermediate Physician 1 | 253.53 |
| Intermediate Physician 2 | 258.12 |
| Junior Physician 1 | 300.78 |
| Junior Physician 2 | 310.56 |
| Resident Physician 1 | 375.31 |
| Resident Physician 2 | 387.42 |
| 3D U-Net | 2.00 |
| 3D V-Net | 1.66 |
Comparison of DSC values with other related studies.
| DSC | ||||
|
| ||||
| Malleus | Incus | Stapes | ||
| Atlas-based segmentation | 0.80 | 0.83 | 0.58(L),0.48(R) |
|
| U-Net | 0.75 |
| ||
| 3D-DSD | 0.82 | 0.81 |
| |
| W-Net | 0.85 |
| ||
| 3D U-Net | 0.84 |
| ||
| V Net | 0.83 |
| ||
| Atlas-based segmentation | 0.83 | 0.84 | 0.36 |
|
| Multi-view fusion algorithm | 0.94 | 0.95 | 0.76 |
|
| PWD-3DNet | 0.89 | 0.89 | 0.82 |
|
| U-Net | 0.88 | 0.89 | 0.76 | |
|
|
|
|
| |
DSC, Dice Similarity Coefficient; L Left; R Right. *DSC of whole ossicle chain.
Bold fonts represent the results of this study.
Comparison of ASD between automatic segmentation and manual segmentation under two neural networks.
| Network structure | ASD | ||
|
| |||
| Malleus | Incus | Stapes | |
| 3D V-Net | 0.257 ± 0.054 | 0.236 ± 0.047 | 0.258 ± 0.077 |
| U-Net | 0.439 ± 0.208 | 0.361 ± 0.076 | 0.433 ± 0.108 |
|
| 7.4500 | 12.1940 | 11.965 |
|
| < 0.001 | < 0.001 | < 0.001 |