| Literature DB >> 35735968 |
Ranit Karmakar1, Saeid Nooshabadi1.
Abstract
Colon polyps, small clump of cells on the lining of the colon, can lead to colorectal cancer (CRC), one of the leading types of cancer globally. Hence, early detection of these polyps automatically is crucial in the prevention of CRC. The deep learning models proposed for the detection and segmentation of colorectal polyps are resource-consuming. This paper proposes a lightweight deep learning model for colorectal polyp segmentation that achieved state-of-the-art accuracy while significantly reducing the model size and complexity. The proposed deep learning autoencoder model employs a set of state-of-the-art architectural blocks and optimization objective functions to achieve the desired efficiency. The model is trained and tested on five publicly available colorectal polyp segmentation datasets (CVC-ClinicDB, CVC-ColonDB, EndoScene, Kvasir, and ETIS). We also performed ablation testing on the model to test various aspects of the autoencoder architecture. We performed the model evaluation by using most of the common image-segmentation metrics. The backbone model achieved a DICE score of 0.935 on the Kvasir dataset and 0.945 on the CVC-ClinicDB dataset, improving the accuracy by 4.12% and 5.12%, respectively, over the current state-of-the-art network, while using 88 times fewer parameters, 40 times less storage space, and being computationally 17 times more efficient. Our ablation study showed that the addition of ConvSkip in the autoencoder slightly improves the model's performance but it was not significant (p-value = 0.815).Entities:
Keywords: colorectal cancer; deep learning; polyp segmentation
Year: 2022 PMID: 35735968 PMCID: PMC9225047 DOI: 10.3390/jimaging8060169
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Mobile-PolypNet model backbone architecture with the bottleneck residual blocks and skip connection where x, e, and c in each residual block represent the number of bottleneck residual blocks in each resolution level, number of filters for expansion phase, and number of filters for the contraction phase, respectively.
Figure 2Model’s performance on test images from different datasets (from left) Kvasir, CVC-ClinicDB, CVC-300, Colon-DB, and ETIS where first two are the seen datasets and last three are the unseen datasets.
Model’s performance and comparison with other models on the test dataset. Results have been reported from the PraNet [4] paper and have not been verified. The bold is to identify the best of the column.
| Dataset | Kvasir | CVC-ClinicDB | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| U-Net | 0.818 | 0.746 | 0.794 | 0.055 | 0.823 | 0.755 | 0.811 | 0.019 |
| U-Net++ | 0.821 | 0.743 | 0.808 | 0.048 | 0.794 | 0.729 | 0.785 | 0.022 |
| ResUNet-mod | 0.791 | n/a | n/a | n/a | 0.779 | n/a | n/a | n/a |
| ResUNet++ | 0.813 | 0.793 | n/a | n/a | 0.796 | 0.796 | n/a | n/a |
| SFA | 0.723 | 0.611 | 0.670 | 0.075 | 0.700 | 0.607 | 0.647 | 0.042 |
| PraNet | 0.898 | 0.840 | 0.885 |
| 0.899 | 0.849 | 0.896 | 0.009 |
|
|
|
|
| 0.031 |
|
|
|
|
Model’s accuracy comparison on the unseen test dataset CVC-300, Colon-DB, and ETIS. The bold is to identify the best of the column.
| Dataset | CVC-300 | Colon-DB | ETIS | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| U-Net | 0.710 | 0.627 | 0.022 | 0.512 | 0.044 | 0.061 | 0.398 | 0.335 | 0.036 |
| U-Net++ | 0.707 | 0.624 | 0.018 | 0.483 | 0.410 | 0.064 | 0.401 | 0.344 | 0.035 |
| SFA | 0.467 | 0.329 | 0.065 | 0.469 | 0.347 | 0.094 | 0.297 | 0.217 | 0.109 |
| PraNet | 0.871 | 0.797 |
| 0.709 | 0.640 | 0.045 | 0.628 | 0.567 | 0.031 |
|
|
|
| 0.016 |
|
|
|
|
|
|
Model efficiency is measured in terms of the number of parameters required by the model and the number of FLOPs performed by the model to process a single image of dimension (this image size was only used for the FLOPs count). The FLOPs count has been tested on TensorFlow, and accuracy metrics comparison were made on the Kvasir dataset. The bold is to identify the best of the column.
| Models | Number of | Disk Space | FLOPs | DICE | mIoU | MAE |
|---|---|---|---|---|---|---|
| U-Net (MICCAI’15) | 7.85 M | 30 MB | 52.6 G | 0.818 | 0.746 | 0.055 |
| U-Net++ (TMI’19) | 9.04 M | 34.6 MB | 112.6 G | 0.821 | 0.743 | 0.048 |
| ResUNet-mod | 7.85 M | 30 MB | 52.6 G | 0.791 | n/a | n/a |
| ResUNet++ | 9.04 M | 34.6 MB | 112.6 G | 0.813 | 0.793 | n/a |
| SFA (MICCAI’19) | 25.59 M | 97.7 MB | 222.4 G | 0.723 | 0.611 | 0.075 |
| PraNet (MICCAI’20) | 20.52 M | 78.4 MB | 81.9 G | 0.898 | 0.840 |
|
|
|
|
|
|
|
| 0.031 |
Computation and accuracy performance comparison of different modified models based on the same Mobile-PolypNet backbone architecture on the Kvasir dataset. FLOPs have been calculated for an image dimension of 224 × 224. The bold is to identify the best of the column.
| Mobile-PolypNet Model | Number of Trainable Parameters | Number of Non-Trainable Parameters | FLOPs Count | Number of Epochs to Converge | DICE | MAE |
|---|---|---|---|---|---|---|
| Mobile-PolypNet | 233,001 | 13,616 | 2.0 G | 145 | 0.935 | 0.031 |
| Mobile-PolypNet + MaxPool | 223,913 | 13,616 | 1.8 G | 217 | 0.900 | 0.047 |
| Mobile-PolypNet + ConvSkip | 250,601 | 13,616 | 2.2 G | 186 |
|
|
| Mobile-PolypNet + PT | 234,618 | 2,495,257 |
|
| 0.912 | 0.037 |
| Mobile-PolypNet + Dropout | 233,001 | 13,616 | 2.0 G | 110 | 0.928 | 0.035 |