| Literature DB >> 35621885 |
Siwei Chen1,2, Gregor Urban1,2, Pierre Baldi1,2,3.
Abstract
Colorectal cancer (CRC) is a leading cause of mortality worldwide, and preventive screening modalities such as colonoscopy have been shown to noticeably decrease CRC incidence and mortality. Improving colonoscopy quality remains a challenging task due to limiting factors including the training levels of colonoscopists and the variability in polyp sizes, morphologies, and locations. Deep learning methods have led to state-of-the-art systems for the identification of polyps in colonoscopy videos. In this study, we show that deep learning can also be applied to the segmentation of polyps in real time, and the underlying models can be trained using mostly weakly labeled data, in the form of bounding box annotations that do not contain precise contour information. A novel dataset, Polyp-Box-Seg of 4070 colonoscopy images with polyps from over 2000 patients, is collected, and a subset of 1300 images is manually annotated with segmentation masks. A series of models is trained to evaluate various strategies that utilize bounding box annotations for segmentation tasks. A model trained on the 1300 polyp images with segmentation masks achieves a dice coefficient of 81.52%, which improves significantly to 85.53% when using a weakly supervised strategy leveraging bounding box images. The Polyp-Box-Seg dataset, together with a real-time video demonstration of the segmentation system, are publicly available.Entities:
Keywords: colonoscopy quality improvement; colorectal cancer; convolutional neural networks; deep learning; machine learning
Year: 2022 PMID: 35621885 PMCID: PMC9144698 DOI: 10.3390/jimaging8050121
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1U-Net architecture: each block in the down-sampling path consists of convolution and max pooling operations, and each block in the up-sampling path consists of up-convolution and convolution operations. Each blue box corresponds to a multi-channel feature map. The number of features at the end of each block is denoted on top of the box. Arrows of different colors denote the different operations.
Figure 2The model trains in an iterative scheme where it first generates an initial mask (Step 1) from input polyp image (note that the real bounding box is not shown to the network; only the box coordinates are used); then, the network trains and updates its parameters based on the initial mask (Step 2). The updated network in turn predicts a refined mask which replaces the previous mask (Step 3). The network then trains on the updated mask (Step 4) and updates the model parameters. The masks and the network are updated iteratively until training terminates.
Histology information for 4070 polyp images from the Polyp-Box-Seg dataset.
| Histology | Count |
|---|---|
| Tubular adenoma | 2102 |
| Hyperplastic | 909 |
| Sessile serrated adenoma | 349 |
| Non-serrated sessile | 446 |
| Tubulovillous adenoma | 64 |
| Inflammatory | 42 |
| Traditional serrated adenoma | 33 |
| Lymphoid nodule | 19 |
| CA adenocarcinoma | 14 |
| Hamartomatous | 11 |
| Juvenile polyp | 6 |
| CA lymphoma | 5 |
| Sessile serrated adenoma w dysplasia | 3 |
| Mucosal prolapse | 3 |
| CA squamous/epidermoid | 1 |
| Other | 63 |
Figure 3A breakdown of the 4070 polyp-containing images in training and testing: all 4070 images contain bounding box annotations of polyps, and a subset of 1300 images additionally contains human labeled ground truth (GT) segmentation masks. Among the 1300 images, 100 are used for hyperparameter (HP) selection and 1200 are used for model evaluation using cross-validation; 90 out of the 1200 images are sessile serrated adenomas (SSA).
Average test-set scores and standard deviation using 10-fold cross-validation on the Polyp-Box-Seg images. The neural networks in this table are pre-trained on the public CVC-ClinicDB dataset and then further trained as described in the text.
| Full-Sup-2 | Weak-Sup-Box-CI | Weak-Sup-Box-PI | Weak-Sup-Box-HI | Weak-Sup-Mix | |
|---|---|---|---|---|---|
| Dice Coefficient | 81.52 ± 0.41% | 77.58 ± 0.66% | 77.78 ± 0.87% | 81.36 ± 0.43% |
|
| Accuracy | 98.76 ± 0.09% | 98.42 ± 0.10% | 98.50 ± 0.11% | 98.67 ± 0.09% |
|
Figure 4Evolution of the segmentations starting from their initialization state over the course of training. Masks gradually approach the ground truth label as training progresses. The first column shows images with the bounding box annotations that are used to guide the learning process. The second column shows the initial set of masks generated with different approaches: images (b,f) use model prediction as initial masks, while images (j,n) use circles as initial masks.
Confusion matrices from weakly supervised models’ predictions on one split of the validation set.
| Weak-Sup-Box-CI | Weak-Sup-Box-PI | Weak-Sup-Box-HI | Weak-Sup-Mix | |
|---|---|---|---|---|
| TP | 5039.01 | 5947.08 | 6180.07 |
|
| TN | 139,604.96 | 139,414.03 | 139,341.57 |
|
| FP | 512.68 | 703.62 | 776.07 |
|
| FN | 2299.32 | 1391.26 | 1158.26 |
|
Average test-set scores and standard deviation using 10-fold cross-validation on the Polyp-Box-Seg images. Models are initialized with VGG16 weights and then trained as described in the text.
| Full-Sup-2-VGG | Weak-Sup-Box-CI-VGG | Weak-Sup-Mix-VGG | |
|---|---|---|---|
| Dice Coefficient | 79.11 ± 0.93% | 76.14 ± 0.67% |
|
| Accuracy | 98.56 ± 0.14% | 98.31 ± 0.08% |
|
Average test-set scores and standard deviation using 10-fold cross-validation on the 90 sessile serrated adenoma images with human-labeled segmentation masks.
| Weak-Sup-Mix on SSA | Weak-Sup-Mix-VGG on SSA | |
|---|---|---|
| Dice Coefficient |
| 83.66 ± 1.95% |
| Accuracy |
| 98.05 ± 0.39% |
Average test-set scores and standard deviation using 10-fold cross-validation on CVC-ClinicDB from the Weak-Sup-Mix-VGG model and from the Full-Sup-3 model as described in text.
| Weak-Sup-Mix-VGG on CVC | Full-Sup-3 on CVC | |
|---|---|---|
| Dice Coefficient | 90.43 ± 0.43% |
|
| Accuracy | 98.87 ± 0.08% |
|
Test-set scores on Kvasir-SEG from the Weak-Sup-Box-HI model.
| Weak-Sup-Box-HI on Kvasir-SEG | |
|---|---|
| Dice Coefficient | 82.81% |
| Accuracy | 95.41% |