| Literature DB >> 33171658 |
Ai Dozen1,2, Masaaki Komatsu1,3, Akira Sakai4,5,6, Reina Komatsu5,7, Kanto Shozu1, Hidenori Machino1,3, Suguru Yasutomi4,5, Tatsuya Arakaki7, Ken Asada1,3, Syuzo Kaneko1,3, Ryu Matsuoka5,7, Daisuke Aoki2, Akihiko Sekizawa7, Ryuji Hamamoto1,3,6.
Abstract
Image segmentation is the pixel-by-pixel detection of objects, which is the most challenging but informative in the fundamental tasks of machine learning including image classification and object detection. Pixel-by-pixel segmentation is required to apply machine learning to support fetal cardiac ultrasound screening; we have to detect cardiac substructures precisely which are small and change shapes dynamically with fetal heartbeats, such as the ventricular septum. This task is difficult for general segmentation methods such as DeepLab v3+, and U-net. Hence, here we proposed a novel segmentation method named Cropping-Segmentation-Calibration (CSC) that is specific to the ventricular septum in ultrasound videos in this study. CSC employs the time-series information of videos and specific section information to calibrate the output of U-net. The actual sections of the ventricular septum were annotated in 615 frames from 421 normal fetal cardiac ultrasound videos of 211 pregnant women who were screened. The dataset was assigned a ratio of 2:1, which corresponded to a ratio of the training to test data, and three-fold cross-validation was conducted. The segmentation results of DeepLab v3+, U-net, and CSC were evaluated using the values of the mean intersection over union (mIoU), which were 0.0224, 0.1519, and 0.5543, respectively. The results reveal the superior performance of CSC.Entities:
Keywords: congenital heart disease; deep learning; fetal cardiac ultrasound video; segmentation; ventricular septum
Year: 2020 PMID: 33171658 PMCID: PMC7695246 DOI: 10.3390/biom10111526
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Cropped image and time-series image. (a) The “original image” is cropped and transforms into the “cropped image”; the cropped image is segmented and transforms into the “segmented cropped image”. The segmented cropped image is then restored to its original size and transforms into the “segmented original image”. (b) A segmentation target image is labelled “t”, and pre-/post- time-series images are labelled “t ± 1, 2, 3”. All of them were cropped.
Figure 2Overview of Cropping-Segmentation-Calibration (CSC). A “cropping module” and “calibration module” were added to improve the U-net-based segmentation results from the “segmentation module”. The cropping module crops out the area around the ventricular septum. Moreover, the calibration module, which consists of an encoder–decoder (ED) and a Visual Geometry Group-backbone module (VGG), calibrated the output of the segmentation module. The ED utilizes time-series information, and VGG utilizes original image information.
Figure A1Pretraining for YOLO. (a) Datasets for YOLO’s pretraining. The ultrasound images were assigned to 6122 training data, 1009 validation data, and 1051 test data. (b) YOLO’s training flow. YOLO was trained by processing the image to calculate the bounding box and comparing it to the ground truth label.
Distribution of test data classified by cardiac axis orientation and ventricular systolic state.
| Ventricular | Apical | Non-Apical | Total |
|---|---|---|---|
| Systole | 183 | 118 | 301 |
| Diastole | 114 | 200 | 314 |
| Total | 297 | 318 | 615 |
Figure 3Representative examples of the ventricular septum segmentation images in test data for the existing methods (DeepLab v3+ and U-net) and CSC (Cropping-Segmentation-Calibration). One horizontal row presents the segmentation results with respect to each method for the same case. The white pixels are estimated as the ventricular septum, and the degree of whiteness indicates the confidence level. Among the three methods, the segmentation results of CSC were most in accordance with the ground truth.
Evaluation of segmentation results obtained using existing methods (DeepLab v3+ and U-net) and CSC (Cropping-Segmentation-Calibration) with respect to the mIoU and mean Dice coefficient (mDice).
| Method | mIoU | mDice | ||
|---|---|---|---|---|
| Original Image | Cropped Image | Original Image | Cropped Image | |
| DeepLab v3+ | 0.0224 ± 0.0085 | 0.0382 ± 0.0140 | ||
| U-net | 0.1519 ± 0.0596 | 0.2238 ± 0.0777 | ||
| CSC (Ours) | 0.5543 ± 0.0081 | 0.5598 ± 0.0067 | 0.6891 ± 0.0104 | 0.6950 ± 0.0074 |
The values are the mean ± standard deviation of the three datasets for cross-validation. CSC yielded the highest values. Moreover, the cropped images yielded slightly higher values than the original images in CSC.
Figure 4Representative examples of the ventricular septum segmentation in test data for each module combination. One horizontal row presents the segmentation results obtained using each method for the same case. The white pixels are estimated as the ventricular septum, and the degree of whiteness indicates the confidence level. Among the various module combinations, the segmentation results of U-net + YOLO (You Look Only Once) + ED + VGG (CSC) were most in accordance with the ground truth. The YOLO significantly contributed to the improvement of the segmentation accuracy, and ED improved the segmentation. Moreover, the addition of VGG slightly narrowed the prediction section.
Evaluation of segmentation results for each combination of modules with respect to the mIoU and mDice.
| U-Net | YOLO | ED | VGG | mIoU | mDice | ||
|---|---|---|---|---|---|---|---|
| Original Image | Cropped Image | Original Image | Cropped Image | ||||
| ✓ | 0.1519 ± 0.0596 | 0.2238 ± 0.0777 | |||||
| ✓ | ✓ | 0.0633 ± 0.0372 | 0.0996 ± 0.0538 | ||||
| ✓ | ✓ | ✓ | 0.0902 ± 0.0304 | 0.1400 ± 0.0442 | |||
| ✓ | ✓ | 0.5373 ± 0.0134 | 0.5424 ± 0.0107 | 0.6724 ± 0.0188 | 0.6782 ± 0.0153 | ||
| ✓ | ✓ | ✓ | 0.5533 ± 0.0139 | 0.5587 ± 0.0138 | 0.6885 ± 0.0141 | 0.6944 ± 0.0123 | |
| ✓ | ✓ | ✓ | ✓ | 0.5543 ± 0.0081 | 0.5598 ± 0.0067 | 0.6891 ± 0.0104 | 0.6950 ± 0.0074 |
The values are the mean ± standard deviation of the three datasets for cross-validation. CSC with the combination of all modules yielded the highest values. Moreover, YOLO significantly contributed to the improvement of segmentation accuracy. The cropped images yielded slightly higher values than the original images for all module combinations, including the YOLO. With YOLO, the addition of the ED improved the segmentation accuracy. In contrast, without YOLO, the addition of ED decreased the segmentation accuracy. Upon the addition of VGG, a slight upward trend was observed.
Figure 5Representative examples of the ventricular septum segmentation images classified by the cardiac axis orientation and ventricular systolic state, from the test data, for each module combination. One horizontal row presents the segmentation results obtained using each method for the same case. The white pixels are estimated as the ventricular septum, and the whiteness indicates the confidence level. The segmentation results were more accurate for the apical group than for the non-apical group, and more accurate for the diastolic group than for the systolic group, irrespective of the module combination. The addition of the YOLO significantly improved the segmentation accuracy, and the addition of the ED further improved it, irrespective of cardiac axis orientation and ventricular systolic state. Moreover, the addition of VGG slightly improved the segmentation accuracy for the systolic and non-apical groups.
Segmentation evaluation by mIoU and mDice for each module combination when divided by the orientation of the heart axis.
| U-Net | YOLO | ED | VGG | mIoU | mDice | ||
|---|---|---|---|---|---|---|---|
| Apical | Non-Apical | Apical | Non-Apical | ||||
| ✓ | 0.1878 ±0.1097 | 0.1213 ± 0.0186 | 0.2697 ± 0.1410 | 0.1845 ± 0.0261 | |||
| ✓ | ✓ | 0.5793 ± 0.0315 | 0.4990 ± 0.0058 | 0.7064 ± 0.0405 | 0.6417 ± 0.0086 | ||
| ✓ | ✓ | ✓ | 0.5889 ± 0.0265 | 0.5210 ± 0.0160 | 0.7146 ± 0.0351 | 0.6653 ± 0.0140 | |
| ✓ | ✓ | ✓ | ✓ | 0.5855 ± 0.0167 | 0.5255 ± 0.0016 | 0.7114 ± 0.0264 | 0.6688 ± 0.0026 |
The values are the mean ± standard deviation of the three datasets for cross-validation. The apical group yielded higher values than the non-apical group. The addition of the YOLO significantly improved the segmentation accuracy, and the addition of the ED further improved it, irrespective of the cardiac axis orientation. The addition of the VGG contributed to the higher values in the apical group.
Segmentation evaluation by mIoU and mDice for each combination of modules when divided by the ventricular systolic state.
| U-Net | YOLO | ED | VGG | mIoU | mDice | ||
|---|---|---|---|---|---|---|---|
| Systole | Diastole | Systole | Diastole | ||||
| ✓ | 0.1397 ± 0.0686 | 0.1631 ± 0.0528 | 0.2072 ± 0.0914 | 0.2388 ± 0.0677 | |||
| ✓ | ✓ | 0.5255 ± 0.0158 | 0.5491 ± 0.0114 | 0.6567 ± 0.0235 | 0.6882 ± 0.0146 | ||
| ✓ | ✓ | ✓ | 0.5413 ± 0.0196 | 0.5655 ± 0.0065 | 0.6733 ± 0.0186 | 0.7037 ± 0.0067 | |
| ✓ | ✓ | ✓ | ✓ | 0.5435 ± 0.0102 | 0.5648 ± 0.0073 | 0.6755 ± 0.0127 | 0.7026 ± 0.0073 |
The values are the mean ± standard deviation of the three datasets for cross-validation. The diastolic group yielded higher values than the systolic group. The addition of the YOLO significantly improved the segmentation accuracy, and the addition of the ED further improved it, irrespective of the ventricular systolic state. The addition of the VGG contributed to the higher values in the systolic group.