| Literature DB >> 34943421 |
Vajira Thambawita1,2, Inga Strümke1, Steven A Hicks1,2, Pål Halvorsen1,2, Sravanthi Parasa3, Michael A Riegler1.
Abstract
Recent trials have evaluated the efficacy of deep convolutional neural network (CNN)-based AI systems to improve lesion detection and characterization in endoscopy. Impressive results are achieved, but many medical studies use a very small image resolution to save computing resources at the cost of losing details. Today, no conventions between resolution and performance exist, and monitoring the performance of various CNN architectures as a function of image resolution provides insights into how subtleties of different lesions on endoscopy affect performance. This can help set standards for image or video characteristics for future CNN-based models in gastrointestinal (GI) endoscopy. This study examines the performance of CNNs on the HyperKvasir dataset, consisting of 10,662 images from 23 different findings. We evaluate two CNN models for endoscopic image classification under quality distortions with image resolutions ranging from 32 × 32 to 512 × 512 pixels. The performance is evaluated using two-fold cross-validation and F1-score, maximum Matthews correlation coefficient (MCC), precision, and sensitivity as metrics. Increased performance was observed with higher image resolution for all findings in the dataset. MCC was achieved at image resolutions between 512 × 512 pixels for classification for the entire dataset after including all subclasses. The highest performance was observed with an MCC value of 0.9002 when the models were trained on the highest resolution and tested on the same resolution. Different resolutions and their effect on CNNs are explored. We show that image resolution has a clear influence on the performance which calls for standards in the field in the future.Entities:
Keywords: convolutional neural networks; endoscopic images; image resolution
Year: 2021 PMID: 34943421 PMCID: PMC8700246 DOI: 10.3390/diagnostics11122183
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Examples of an image with the different resolutions used for the experiments in this article. Clear differences in the level of details that are detectable can be observed. Note that for this figure all resolutions are re-scaled to the same size to show quality differences.
Statistic of the dataset used for the experiments. Split 0 and Split 1 represent two folds used in our experiments.
| Class | Split 0 | Split 1 | Total |
|---|---|---|---|
| Barrett’s Esophagus | 20 | 21 | 41 |
| BBPS-0-1 | 323 | 323 | 646 |
| BBPS-2-3 | 574 | 574 | 1148 |
| Dyed-lifted-polyps | 501 | 501 | 1002 |
| Dyed-resection-margins | 494 | 495 | 989 |
| Hemorroids | 3 | 3 | 6 |
| Ileum | 4 | 5 | 9 |
| Impacted-stool | 65 | 66 | 131 |
| Normal-cecum | 504 | 505 | 1009 |
| Normal-pylorus | 499 | 500 | 999 |
| Normal-z-line | 466 | 466 | 932 |
| Esophagitis-LA grade A | 201 | 202 | 403 |
| Esophagitis-LA grade B-D | 130 | 130 | 260 |
| Colon Polyp | 514 | 514 | 1028 |
| Retroflex-rectum | 195 | 196 | 391 |
| Retroflex-stomach | 382 | 382 | 764 |
| Short-segment-Barrett’s | 26 | 27 | 53 |
| Ulcerative colitis-Mayo score 0–1 | 17 | 18 | 35 |
| Ulcerative colitis 1–2 | 5 | 6 | 11 |
| Ulcerative-colitis-Mayo 2–3 | 14 | 14 | 28 |
| Ulcerative-colitis-grade-1 | 100 | 101 | 201 |
| Ulcerative-colitis-grade-2 | 221 | 222 | 443 |
| Ulcerative-colitis-grade-3 | 66 | 67 | 133 |
| Total | 5324 | 5338 | 10662 |
Average DenseNet-161 and ResNet-152 results for both cross-validation splits. Best MCC score in bold.
| Network | Resolution | MCC (Rk) | F1-Score | Precision | Sensitivity |
|---|---|---|---|---|---|
| DenseNet-161 | 32 × 32 | 0.8241 | 0.5366 | 0.5414 | 0.5399 |
| 64 × 64 | 0.8554 | 0.5701 | 0.5721 | 0.5748 | |
| 128 × 128 | 0.8865 | 0.6004 | 0.6033 | 0.6012 | |
| 256 × 256 | 0.8995 | 0.6149 | 0.6230 | 0.6141 | |
|
|
| 0.6351 | 0.6446 | 0.6344 | |
| Resnet-152 | 32 × 32 | 0.8076 | 0.5108 | 0.5247 | 0.5137 |
| 64 × 64 | 0.8556 | 0.5727 | 0.5725 | 0.5756 | |
| 128 × 128 | 0.8866 | 0.6112 | 0.6136 | 0.6137 | |
| 256 × 256 | 0.8965 | 0.6175 | 0.6182 | 0.6193 | |
|
|
| 0.6115 | 0.6329 | 0.6171 |
Figure 2Comparison of MCC, macro F1, macro precision, and macro sensitivity when the models are trained and tested with the same input resolution.
Figure 3Averaged MCC from two-fold cross-validation as confusion matrices. Left is from DenseNet-161 and right is from ResNet-152.
Average time for predicting output using DenseNet-161 and ResNet-152 in the inference stage.
| Time (ms) per Image | ||
|---|---|---|
| Resolution | DenseNet-161 | Resnet-152 |
| 32 × 32 | 19.875 | 17.190 |
| 64 × 64 | 20.248 | 15.148 |
| 128 × 128 | 21.606 | 15.450 |
| 256 × 256 | 20.246 | 14.986 |
| 512 × 512 | 20.422 | 16.690 |
Figure 4A complete overview of all obtained results including the macro and micro average for precision, sensitivity, and F1-score.