| Literature DB >> 35453946 |
Alba Nogueira-Rodríguez1,2, Miguel Reboiro-Jato1,2, Daniel Glez-Peña1,2, Hugo López-Fernández1,2.
Abstract
Colorectal cancer is one of the most frequent malignancies. Colonoscopy is the de facto standard for precancerous lesion detection in the colon, i.e., polyps, during screening studies or after facultative recommendation. In recent years, artificial intelligence, and especially deep learning techniques such as convolutional neural networks, have been applied to polyp detection and localization in order to develop real-time CADe systems. However, the performance of machine learning models is very sensitive to changes in the nature of the testing instances, especially when trying to reproduce results for totally different datasets to those used for model development, i.e., inter-dataset testing. Here, we report the results of testing of our previously published polyp detection model using ten public colonoscopy image datasets and analyze them in the context of the results of other 20 state-of-the-art publications using the same datasets. The F1-score of our recently published model was 0.88 when evaluated on a private test partition, i.e., intra-dataset testing, but it decayed, on average, by 13.65% when tested on ten public datasets. In the published research, the average intra-dataset F1-score is 0.91, and we observed that it also decays in the inter-dataset setting to an average F1-score of 0.83.Entities:
Keywords: colorectal cancer; convolutional neural network (CNN); deep learning; polyp detection; polyp localization
Year: 2022 PMID: 35453946 PMCID: PMC9027927 DOI: 10.3390/diagnostics12040898
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Criteria for selecting polyp localization studies and public colonoscopy image datasets.
Descriptions of the ten public colonoscopy image datasets for polyp localization.
| Dataset | Paper Publication Year | Description | Resolution | Ground Truth | Presence of Multiple Polyp Images | Presence of Non-Polyp Images |
|---|---|---|---|---|---|---|
| CVC-ClinicDB [ | 2015 | 612 sequential WL images with polyps extracted from 31 sequences (23 patients) with 31 different polyps | 384 × 288 | Binary mask to locate the polyp | yes | no |
| CVC-ColonDB [ | 2012 | 300 sequential WL images with polyps extracted from 13 sequences (13 patients) | 574 × 500 | Binary mask to locate the polyp | no | no |
| CVC-PolypHD [ | 2018 | 56 WL images | 1920 × 1080 | Binary mask to locate the polyp | yes | no |
| ETIS-Larib [ | 2014 | 196 WL images with polyps extracted from 34 sequences with 44 different polyps | 1225 × 966 | Binary mask to locate the polyp | yes | no |
| Kvasir-SEG [ | 2020 | 1000 polyp images | 332 × 487 | Binary mask and bounding box to locate the polyp | yes | no |
| CVC-ClinicVideoDB [ | 2017 | 11,954 images in total with 10,025 images of polyps | 384 × 288 | Binary mask to locate the polyp | no | yes |
| PICCOLO [ | 2020 | 3433 images (2131 WL and 1302 NBI) from 76 lesions from 40 patients | 854 × 480 | Binary mask to locate the polyp | yes | yes |
| KUMC dataset [ | 2021 | 37,899 images in total, including the CVC-ColonDB, ASU-Mayo Clinic Colonoscopy Video, and Colonoscopic Dataset datasets | Various resolutions | Bounding box to locate the polyp | no | yes |
| SUN [ | 2021 | 49,136 images with polyps. The polyp samples of 100 cases | 1240 × 1080 | Bounding box to locate the polyp | no | no * |
| LDPolypVideo [ | 2021 | 160 videos (40,187 frames: 33,876 polyp images and 6311 non-polyp images) with 200 labeled polyps. | 560 × 480 | Bounding box to locate the polyp | yes | yes |
* The SUN dataset contains 109,554 non-polyp frames that were not downloaded for our experiments.
Figure 2Examples of polyp images from the included datasets. Upper row (left to right): CVC-ClinicDB, CVC-ColonDB, CVC-PolypHD, ETIS-Larib, and Kvasir-SEG. Bottom row (left to right): CVC-ClinicVideoDB, PICCOLO, KUMC, SUN, and LDPolypVideo.
Figure 3Conversion from binary mask annotations to bounding boxes. First column: original polyp images. Second column: binary mask annotations. Third column: obtained bounding box annotations over the original polyp images.
Performance results of studies evaluating DL models for polyp localization in at least one of the selected public colonoscopy image datasets.
| Paper | Train | Test | Results | |||
|---|---|---|---|---|---|---|
| Recall | Precision | F1-Score | F2-Score | |||
| Brandao et al., 2018 [ | CVC-ClinicDB + ASU-Mayo | ETIS-Larib | 0.90 | 0.73 | 0.81 | 0.86 |
| CVC-ColonDB | 0.90 | 0.80 | 0.85 | 0.88 | ||
| Zheng Y. et al., 2018 [ | CVC-ClinicDB + CVC-ColonDB | ETIS-Larib | 0.74 | 0.77 | 0.76 | 0.75 |
| Shin Y. et al., 2018 [ | CVC-ClinicDB | ETIS-Larib | 0.80 | 0.87 | 0.83 | 0.82 |
| CVC-Clinic VideoDB | 0.84 | 0.90 | 0.87 | 0.85 | ||
| Wang et al., 2018 [ | Private | CVC-ClinicDB | 0.88 | 0.93 | 0.91 | 0.89 |
| Private * | 0.94 | 0.96 | 0.95 | 0.95 | ||
| Qadir et al., 2019 [ | CVC-ClinicDB | CVC-ClinicVideoDB | 0.84 | 0.90 | 0.87 | 0.85 |
| Tian Y. et al., 2019 [ | Private | ETIS-Larib | 0.64 | 0.74 | 0.69 | 0.66 |
| Ahmad et al., 2019 [ | Private | ETIS-Larib | 0.92 | 0.75 | 0.83 | 0.88 |
| Sornapudi et al., 2019 [ | CVC-ClinicDB | ETIS-Larib | 0.80 | 0.73 | 0.76 | 0.79 |
| CVC-ColonDB | 0.92 | 0.90 | 0.91 | 0.91 | ||
| CVC-PolypHD | 0.78 | 0.83 | 0.81 | 0.79 | ||
| Wittenberg et al., 2019 [ | Private | ETIS-Larib | 0.83 | 0.74 | 0.79 | 0.81 |
| CVC-ClinicDB | 0.86 | 0.80 | 0.82 | 0.85 | ||
| Private | 0.93 | 0.86 | 0.89 | 0.92 | ||
| Jia X. et al., 2020 [ | CVC-ColonDB | CVC-ClinicDB | 0.92 | 0.85 | 0.88 | 0.91 |
| CVC-ClinicDB | ETIS-Larib | 0.82 | 0.64 | 0.72 | 0.77 | |
| Ma Y. et al., 2020 [ | CVC-ClinicDB | CVC-ClinicVideoDB | 0.92 | 0.88 | 0.90 | 0.91 |
| Young Lee J. et al., 2020 [ | Private | CVC-ClinicDB | 0.90 | 0.98 | 0.94 | 0.96 |
| Private | 0.97 | 0.97 | 0.97 | 0.97 | ||
| Podlasek J. et al., 2020 [ | Private | ETIS-Larib | 0.67 | 0.79 | 0.73 | 0.69 |
| CVC-ClinicDB | 0.91 | 0.97 | 0.94 | 0.92 | ||
| CVC-ColonDB | 0.74 | 0.92 | 0.82 | 0.77 | ||
| Hyper-Kvasir | 0.88 | 0.98 | 0.93 | 0.90 | ||
| Qadir et al., 2021 [ | CVC-ClinicDB | ETIS-Larib | 0.87 | 0.86 | 0.86 | 0.86 |
| CVC-ColonDB | 0.91 | 0.88 | 0.90 | 0.90 | ||
| Xu J. et al., 2021 [ | CVC-ClinicDB | ETIS-Larib | 0.72 | 0.83 | 0.77 | 0.74 |
| CVC-ClinicVideoDB | 0.66 | 0.89 | 0.76 | 0.70 | ||
| Pacal et al., 2021 [ | CVC-ClinicDB | ETIS-Larib | 0.83 | 0.92 | 0.87 | 0.84 |
| CVC-ColonDB | 0.97 | 0.96 | 0.96 | 0.97 | ||
| Liu et al., 2021 [ | CVC-ClinicDB | ETIS-Larib | 0.88 | 0.78 | 0.82 | 0.85 |
| Li K. et al., 2021 [ | KUMC | KUMC-Test ** | 0.86 | 0.91 | 0.89 | 0.87 |
| Ma Y. et al., 2021 [ | CVC-ClinicDB | CVC-ClinicVideoDB | 0.64 | 0.85 | 0.73 | 0.67 |
| LDPolypVideo | 0.47 | 0.65 | 0.55 | 0.50 | ||
| Pacal et al., 2022 [ | SUN + PICCOLO + CVC-ClinicDB | ETIS-Larib | 0.91 | 0.91 | 0.91 | 0.91 |
| SUN | SUN *** | 0.86 | 0.96 | 0.91 | 0.88 | |
| PICCOLO | PICCOLO | 0.80 | 0.93 | 0.86 | 0.82 | |
* Wang et al., 2018 evaluated the test performance using a different private dataset from the one used for model training. However, we consider this as an intra-dataset experiment since the private dataset for model development was collected in the Endoscopy Center of Sichuan Provincial People’s Hospital between January 2007 and December 2015 and the private test dataset was collected in the same center using the same devices between January and December 2016, and we understand that the distribution should be very similar. ** Li K. et al., 2021 used a partition of the KUMC dataset as testing set in their experiments (KUMC-Test in the table). *** Pacal et al., 2022 used a partition of the SUN dataset that includes “non-polyp” images and, therefore, it is not comparable to our performance with the SUN dataset, which includes all polyp images.
Figure 4Usage of datasets for model evaluation among studies in Table 2. Each study using several datasets contributes one point for each testing dataset used.
Performance results of our model when evaluated on the ten selected public colonoscopy image datasets.
| Dataset | Number of Images for Test | Results | ||||
|---|---|---|---|---|---|---|
| Recall | Precision | F1-Score | F2-Score | AP | ||
| CVC-ClinicDB | 612 | 0.82 | 0.87 | 0.85 | 0.83 | 0.82 |
| CVC-ColonDB | 300 | 0.84 | 0.81 | 0.83 | 0.83 | 0.85 |
| CVC-PolypHD | 56 | 0.75 | 0.86 | 0.80 | 0.77 | 0.79 |
| ETIS-Larib | 196 | 0.72 | 0.71 | 0.72 | 0.72 | 0.69 |
| Kvasir-SEG | 1000 | 0.78 | 0.84 | 0.81 | 0.82 | 0.79 |
| PICCOLO | 3433 | 0.60 | 0.76 | 0.67 | 0.62 | 0.63 |
| CVC-ClinicVideoDB | 11,954 | 0.80 | 0.75 | 0.77 | 0.79 | 0.77 |
| KUMC dataset | 37,899 | 0.81 | 0.83 | 0.82 | 0.81 | 0.83 |
| KUMC dataset–Test | 4872 | 0.76 | 0.81 | 0.78 | 0.77 | 0.79 |
| SUN | 49,136 | 0.78 | 0.83 | 0.81 | 0.79 | 0.81 |
| LDPolypVideo | 40,186 | 0.49 | 0.56 | 0.52 | 0.50 | 0.44 |
Figure 5F1 decay (%) in public colonoscopy image datasets compared to the performance of our model in our private dataset reported in Nogueira-Rodríguez et al., 2021 (0.88) [26].
Figure 6Incorrect prediction examples of our detection model over the different splits of the PICCOLO dataset. From left to right, there are three examples taken from the train, validation, and test splits. Predicted boxes are depicted in green, whereas ground truth boxes are in white.
Figure 7Comparison of F1-score decay on public colonoscopy image datasets of those studies reporting their performance for the private test set partition.
Figure 8(A) Comparison of intra-dataset and inter-dataset performances of the 20 selected studies. (B) Same as A, with the inter-dataset performances disaggregated by dataset.
Figure 9Correlation between the median inter-dataset F1-score of published studies and ours in seven public colonoscopy image datasets.