| Literature DB >> 35334573 |
Jeong-Woo Ju1,2, Heechul Jung3, Yeoun Joo Lee1,4, Sang-Wook Mun1,4, Jong-Hyuck Lee5.
Abstract
Background andEntities:
Keywords: capsule endoscopy; deep learning; semantic segmentation; small bowel cleanliness; visualization scale
Mesh:
Year: 2022 PMID: 35334573 PMCID: PMC8954405 DOI: 10.3390/medicina58030397
Source DB: PubMed Journal: Medicina (Kaunas) ISSN: 1010-660X Impact factor: 2.430
Figure 1Overall flowchart of our semantic segmentation dataset construction. (A) We collected a set of raw images from CE. (B) Two gastroenterologists split the images into 10 stages according to the size of the clean area. (C) Each image was annotated by five well-trained annotators. (D) The quality of the constructed dataset was checked by training the CNN designed for semantic segmentation task.
Figure 2Schemes for each capsule endoscopy (CE) image class. Each color represents an individual region in a CE image. (a) 3-class: separate clean, dark, and floats/bubbles regions. (b) 2-class: Dark and floats/bubbles regions merged into a single class.
Description of detailed configuration for each CNN architecture. denotes a sequence of layers. denotes a specific convolution layer; that is, kernel size: , channels: 256, dilatation rate: 6. Also, consist of an up-sampling layer and convolution.
| Architecture | Encoder | Decoder | |
|---|---|---|---|
| Backbone | ASPP | ||
| DeepLab v3 | ResNet 50 |
|
|
| FCN | VGG16 |
| |
| U-Net |
| ||
Figure 3Class balance for each stage in our dataset. We calculate the percentage (vertical line) by averaging the ratio of the size of each class to image size.
Training samples and GT images for each visualization scale stage and its corresponding statistics. Images from capsule endoscopy (top row), 3-class ground truth region images (second row). Black, green, and blue denote clean, dark, floats/bubbles regions, respectively. 2-class ground truth region images (third row). Black and red denote clean and non-clean regions, respectively. The statistics of training/testing samples for each stage (bottom three rows). Measuring unit: images, GT: ground truth.
| Stage 1 | Stage 2 | Stage 3 | Stage 4 | Stage 5 | Stage 6 | Stage 7 | Stage 8 | Stage 9 | Stage 10 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Input Image |
|
|
|
|
|
|
|
|
|
| |
| 3-class | GT |
|
|
|
|
|
|
|
|
|
|
| 2-class |
|
|
|
|
|
|
|
|
|
| |
| Training Examples Count | 744 | 775 | 845 | 832 | 695 | 660 | 820 | 855 | 846 | 916 | |
| Testing Examples Count | 256 | 234 | 173 | 179 | 286 | 327 | 186 | 147 | 154 | 103 | |
| Total Examples Count | 1000 | 1009 | 1018 | 1011 | 981 | 987 | 1006 | 1002 | 1000 | 1019 | |
The 3-class semantic segmentation performance measured by mIoU and Dice index based on the results from DeepLab v3, FCN, and U-Net.
| mIoU | Dice | |
|---|---|---|
| DeepLab v3 | 0.7716 | 0.8627 |
| FCN | 0.7667 | 0.8602 |
| U-Net | 0.7594 | 0.8567 |
Our semantic segmentation performance and its corresponding visualization scale (VS) on each test patient based on the results from DeepLab v3. Overall VS is not available since it is solely valid for an individual patient. (mIoU: mean intersection over union); GT VS (ground truth of visualization scale).
| Patient | 3-Class | 2-Class | Example | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mIoU | Dice | GT VS | Predicted VS | mIoU | Dice | GT VS | Predicted VS | ||
| 1 | 0.7743 | 0.8615 | 0.6355 | 0.6442 | 0.9070 | 0.9511 | 0.6355 | 0.6547 |
|
| 2 | 0.7964 | 0.8807 | 0.3598 | 0.3482 | 0.8988 | 0.9465 | 0.3598 | 0.3549 |
|
| 3 | 0.7097 | 0.8251 | 0.4471 | 0.4141 | 0.8562 | 0.9225 | 0.4471 | 0.4386 |
|
| 4 | 0.7031 | 0.7846 | 0.7498 | 0.7395 | 0.9260 | 0.9613 | 0.7498 | 0.7450 |
|
| 5 | 0.7303 | 0.8305 | 0.6069 | 0.5759 | 0.8718 | 0.9313 | 0.6069 | 0.5860 |
|
| 6 | 0.7655 | 0.8584 | 0.6743 | 0.6824 | 0.8783 | 0.9347 | 0.6743 | 0.6950 |
|
| 7 | 0.7608 | 0.8559 | 0.7374 | 0.7273 | 0.8826 | 0.9368 | 0.7374 | 0.7423 |
|
| 8 | 0.8054 | 0.8828 | 0.5007 | 0.4869 | 0.9330 | 0.9654 | 0.5007 | 0.4940 |
|
| 9 | 0.8071 | 0.8891 | 0.5190 | 0.5181 | 0.8928 | 0.9434 | 0.5190 | 0.5237 |
|
| 10 | 0.7990 | 0.8838 | 0.7179 | 0.7089 | 0.8924 | 0.9425 | 0.7179 | 0.7258 |
|
| Overall | 0.7716 | 0.8627 | N/A | N/A | 0.8972 | 0.9457 | N/A | N/A | N/A |
Failure cases for Patient 3 and Patient 4 for the 3-class case. Confusion type (first row). Center-cropped images (second row). GT region annotated images (third row). Our prediction results (bottom row).
| Patient 3 | Patient 4 | |
|---|---|---|
| Confusion Type | dark → floats/bubbles | floats/bubbles → dark |
| Example Image |
|
|
| GT Image |
|
|
| Prediction Image |
|
|
Black, green, and blue colors represent clean, dark, and floats/bubbles regions, respectively. GT: ground truth.
Examples of test Patient 5. Input images (first row). Center-cropped images (second row). GT region annotated images and our prediction results for the 3-class case (third and fourth rows) and 2-class case (fifth and bottom rows).
| . | Stage 1 | Stage 2 | Stage 3 | Stage 4 | Stage 5 | Stage 6 | Stage 7 | Stage 8 | Stage 9 | Stage 10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Input Image |
|
|
|
|
|
|
|
|
|
| |
| 3-class | GT |
|
|
|
|
|
|
|
|
|
|
| Prediction |
|
|
|
|
|
|
|
|
|
| |
| 2-class | GT |
|
|
|
|
|
|
|
|
|
|
| Prediction |
|
|
|
|
|
|
|
|
|
| |
Black, green, blue, and red colors represent clean, dark, floats/bubbles, and non-clean regions, respectively. GT: ground truth.
Figure 4Pixel-level confusion matrix of our semantic segmentation algorithm based on DeepLab v3. (a) 3-class. Clean region exhibited better accuracy compared to other regions. (b) 2-class. Both clean and non-clean regions were fairly well classified correctly.