| Literature DB >> 32989680 |
Luisa F Sánchez-Peralta1, Artzai Picón2, Francisco M Sánchez-Margallo3, J Blas Pagador3.
Abstract
PURPOSE: Data augmentation is a common technique to overcome the lack of large annotated databases, a usual situation when applying deep learning to medical imaging problems. Nevertheless, there is no consensus on which transformations to apply for a particular field. This work aims at identifying the effect of different transformations on polyp segmentation using deep learning.Entities:
Keywords: Data augmentation; Deep learning; Polyp segmentation; Semantic segmentation; Transformations
Mesh:
Year: 2020 PMID: 32989680 PMCID: PMC7671995 DOI: 10.1007/s11548-020-02262-4
Source DB: PubMed Journal: Int J Comput Assist Radiol Surg ISSN: 1861-6410 Impact factor: 2.924
Transformations used for data augmentation in polyp segmentation
| Work | Year | Rotation | Width shift | Height shift | Shear | Zoom | Flip | Warp | Gaussian noise | Contrast | Brightness | Patch selection |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jha [ | 2020 | – | – | – | ✓ | ✓ | – | – | – | ✓ | – | |
| Guo [ | 2019 | ✓ | – | – | – | ✓ | ✓ | – | – | – | ✓ | – |
| Kang [ | 2019 | (− 45°, 45°) | – | – | (− 16°, 16°) | (0.5, 1.5) | ✓ | – | – | (0.5, 1.5) | (0.8, 1.5) | ✓ |
| Akbari [ | 2018 | 10° interval, between 0°–290° | – | – | – | – | ✓ | – | – | – | – | 15 patches/image |
| Brandao [ | 2018 | – | – | – | – | – | ✓ | – | – | – | – | 224 × 224 patches |
| Wichakam [ | 2018 | up to 180° | (0, 20%) | (0, 20%) | up to 20% | (–0.8, 1.2) | ✓ | – | – | – | – | – |
| Wickstrom [ | 2018 | (–90°, 90°) | – | – | (0, 0.4) | (0.8, 1.2) | – | – | – | – | – | 224 × 224 patches |
| Bardhi [ | 2017 | ✓ | ✓ | ✓ | – | ✓ | – | ✓ | – | – | ✓ | |
| Li [ | 2017 | ✓ | ✓ | ✓ | – | – | – | – | ✓ | ✓ | – | – |
| Vázquez [ | 2017 | (0°, 180°) | – | – | (0, 0.4) | (0.9, 1.1) | – | (0, 10) | – | – | – | – |
Transformations and ranges analysed in this study
| Transformation | Parameter definition | Ranges | Total cases |
|---|---|---|---|
| Image-based transformations | |||
| Width shift | % of the image displaced to the right or to the left | 0–90%, with 10% intervals | 9 cases |
| Height shift | % of the image displaced up or down | 0–90%, with 10% intervals | 9 cases |
| Rotation | ± Degrees that the image is rotated | 0–180°, with up to 45° intervals | 8 cases |
| Shear | ± Shear angle in counter-clockwise direction | 0–180°, with up to 45° intervals | 8 cases |
| Zoom out | Factor by which the image size is multiplied | 1 − | 9 cases |
| Zoom in | Factor by which the image size is multiplied | 1 + | 10 cases |
| Flip | Vertically and horizontally flip the image | True | 2 cases |
| Elastic deformation | Parameters as indicated in [ | 8 cases | |
| Pixel-based transformations | |||
| Brightness | ± value to be added to the actual pixel value for all RGB channels equally | [25, 175], with 25 intervals | 7 cases |
| Brightness | Value to be added to the actual pixel value for each RGB channel independently | [25, 175], with 25 intervals | 7 cases |
| Contrast | Value to multiply the actual pixel value for all RGB channels equally | [1 − | 5 cases |
| Contrast | Value to multiply the actual pixel value for each RGB channel independently | [1 − | 4 cases |
| Application-based transformations | |||
| Specular lights | Overexposed light ellipses simulating the effect of bright points | True | 1 case |
| Blurry images | Window size of a mean filter | [1, 15], only even integers | 7 cases |
Fig. 1Original and transformed images
Details for the datasets used in this study
| CVC-EndoSceneStill | Kvasir-SEG | |
|---|---|---|
| Void area (%) | 23.73 ± 5.57 (27.83–14.62) | 15.23 ± 4.82 (28.44–6.16) |
| Polyp area relative to the valid area (%) | 12.50 ± 11.49 (66.15–0.75) | 17.36 ± 15.65 (83.66–0.61) |
| Mean value of brightness channel in HSV [ | 0.560 ± 0.006 (1.000–0.000) | 0.622 ± 0.003 (1.000–0.000) |
| Histogram flatness measure [ | 0.858 ± 0.121 (0.959–0.000) | 0.419 ± 0.443 (0.962–0.000) |
| Histogram spread [ | 0.252 ± 0.088 (0.520–0.076) | 0.218 ± 0.070 (0.432–0.075) |
Results are reported as mean ± standard deviation. Minimum and maximum values are indicated between brackets. The void area refers to the black area in the images, while the remaining area is considered as valid area
Fig. 2Network architecture. Figure based on [36]
Mean and standard deviation of the mean for transformations and ranges analysed in both datasets
| Transformation | Range | IoU on test set CVC-EndoSceneStill | IoU on test set Kvasir-SEG |
|---|---|---|---|
| None | N/A | 59.10 ± 9.35 | 66.45 ± 8.08 |
| Image-based transformations | |||
| Width Shift | ± 10% | 60.78 ± 8.99 | 67.09 ± 7.96 |
| ± 20% | 59.45 ± 9.80 | ||
| ± 30% | 59.31 ± 9.08 | 66.28 ± 8.22 | |
| ± 40% | 62.70 ± 8.57 | 65.94 ± 8.22 | |
| ± 50% | 62.80 ± 8.84 | 66.23 ± 8.09 | |
| ± 60% | 63.02 ± 8.78 | 66.90 ± 7.86 | |
| ± 70% | 63.03 ± 8.67 | 66.82 ± 7.87 | |
| ± 80% | 61.34 ± 8.62 | 65.41 ± 7.92 | |
| ± 90% | 65.82 ± 7.72 | ||
| Height shift | ± 10% | 58.82 ± 8.97 | 67.00 ± 7.98 |
| ± 20% | 58.94 ± 8.80 | 67.12 ± 8.08 | |
| ± 30% | 61.81 ± 8.74 | ||
| ± 40% | 67.23 ± 7.80 | ||
| ± 50% | 61.78 ± 8.42 | 67.17 ± 7.89 | |
| ± 60% | 60.21 ± 8.64 | 66.97 ± 7.94 | |
| ± 70% | 61.55 ± 8.46 | 66.69 ± 7.98 | |
| ± 80% | 60.42 ± 8.19 | 66.26 ± 7.94 | |
| ± 90% | 61.52 ± 8.27 | 67.06 ± 7.58 | |
| Rotation | ± 3° | 57.74 ± 9.37 | 66.41 ± 8.09 |
| ± 6° | 65.61 ± 8.16 | ||
| ± 10° | 55.40 ± 9.75 | 65.74 ± 8.15 | |
| ± 15° | 55.50 ± 9.65 | 67.03 ± 8.10 | |
| ± 45° | 54.66 ± 9.62 | 68.38 ± 8.00 | |
| ± 90° | 57.62 ± 9.37 | ||
| ± 135° | 58.60 ± 9.49 | 68.22 ± 8.07 | |
| ± 180° | 58.19 ± 9.35 | 68.78 ± 8.10 | |
| Shear | ± 3° | 59.62 ± 9.05 | 66.24 ± 8.11 |
| ± 6° | 67.00 ± 8.02 | ||
| ± 10° | 59.42 ± 9.00 | 67.32 ± 7.90 | |
| ± 15° | 57.91 ± 9.10 | 67.11 ± 7.97 | |
| ± 45° | 59.07 ± 9.80 | ||
| ± 90° | 56.38 ± 9.3 | 67.84 ± 7.85 | |
| ± 135° | 55.22 ± 9.37 | 67.53 ± 7.91 | |
| ± 180° | 57.09 ± 8.89 | 67.67 ± 7.90 | |
| Zoom in | 0.9, 1 | 66.71 ± 8.08 | |
| 0.8, 1 | 59.98 ± 8.53 | 67.45 ± 8.01 | |
| 0.7, 1 | 57.01 ± 9.46 | 67.56 ± 8.24 | |
| 0.6, 1 | 55.57 ± 10.07 | 68.54 ± 8.14 | |
| 0.5, 1 | 57.37 ± 10.30 | ||
| 0.4, 1 | 58.58 ± 10.18 | 67.26 ± 8.29 | |
| 0.3, 1 | 58.41 ± 10.40 | 66.54 ± 8.30 | |
| 0.2, 1 | 57.71 ± 10.34 | 65.54 ± 8.37 | |
| 0.1, 1 | 57.56 ± 10.06 | 64.05 ± 8.51 | |
| Zoom out | 1, 1.1 | 58.70 ± 9.12 | 65.48 ± 8.17 |
| 1, 1.2 | 61.64 ± 8.26 | 66.25 ± 8.09 | |
| 1, 1.3 | 58.99 ± 8.50 | 65.88 ± 8.03 | |
| 1, 1.4 | 62.21 ± 8.04 | 66.13 ± 7.98 | |
| 1, 1.5 | 61.83 ± 8.39 | 66.56 ± 7.86 | |
| 1, 1.6 | 67.38 ± 7.80 | ||
| 1, 1.7 | 60.67 ± 7.90 | 67.38 ± 7.83 | |
| 1, 1.8 | 62.01 ± 8.20 | ||
| 1, 1.9 | 62.73 ± 8.00 | 67.91 ± 7.57 | |
| 1, 2.0 | 64.00 ± 8.13 | 67.97 ± 7.64 | |
| Horizontal flip | True | 55.89 ± 9.22 | 67.57 ± 8.11 |
| Vertical flip | True | 59.54 ± 8.90 | 67.23 ± 8.08 |
| Elastic deformation | 250, 40 | 65.92 ± 8.19 | |
| 500, 40 | 59.17 ± 9.31 | 65.86 ± 8.18 | |
| 1000, 40 | 57.93 ± 9.12 | 66.97 ± 8.00 | |
| 2000, 40 | 57.83 ± 8.86 | 67.88 ± 8.02 | |
| 3000, 40 | 55.89 ± 9.14 | ||
| 4000, 40 | 54.65 ± 9.12 | 66.96 ± 8.20 | |
| 5000, 40 | 56.55 ± 9.13 | 65.17 ± 8.36 | |
| 6000, 40 | 55.90 ± 9.37 | 65.02 ± 8.28 | |
| Pixel-based transformations | |||
| Brightness, all channels equally | ± 25 | 59.89 ± 84 | 66.87 ± 7.66 |
| ± 50 | 63.27 ± 8.41 | 66.22 ± 7.74 | |
| ± 75 | 66.79 ± 8.28** | 65.17 ± 7.76 | |
| ± 100 | 67.99 ± 8.23** | 64.55 ± 7.86 | |
| ± 125 | 68.98 ± 7.90*** | 63.95 ± 7.87 | |
| ± 150 | 67.25 ± 7.86 | ||
| ± 175 | 68.32 ± 7.74** | ||
| Brightness, each channel independently | ± 25 | 67.85 ± 7.84 | |
| ± 50 | 70.90 ± 7.81*** | 68.28 ± 7.78 | |
| ± 75 | 69.26 ± 8.19*** | 68.91 ± 7.60 | |
| ± 100 | 69.07 ± 8.26*** | 69.21 ± 7.51 | |
| ± 125 | 67.86 ± 8.27** | ||
| ± 150 | 67.86 ± 7.77** | 67.07 ± 8.05 | |
| ± 175 | 66.15 ± 8.16* | 68.39 ± 7.65 | |
| Contrast, all channels equally | 0.8, 1.2 | 58.11 ± 9.35 | 66.89 ± 7.98 |
| 0.6, 1.4 | 61.55 ± 8.76 | 67.31 ± 7.85 | |
| 0.4, 1.6 | 66.17 ± 8.37* | 67.92 ± 7.56 | |
| 0.2, 1.8 | |||
| 0.0, 2.0 | 60.54 ± 9.43 | 66.29 ± 8.14 | |
| Contrast, each channel independently | 0.8, 1.2 | 71.80 ± 7.61*** | |
| 0.6, 1.4 | 71.70 ± 7.62*** | 66.58 ± 7.79 | |
| 0.4, 1.6 | 66.45 ± 7.63 | ||
| 0.2, 1.8 | 70.54 ± 7.97*** | 66.83 ± 7.46 | |
| Application-based transformations | |||
| Specular lights | True | 59.64 ± 9.06 | 67.52 ± 7.59 |
| Blurry image | 3 | 66.14 ± 8.01 | |
| 5 | 58.94 ± 9.37 | 65.54 ± 8.01 | |
| 7 | 53.61 ± 9.33 | 64.86 ± 8.05 | |
| 9 | 50.39 ± 9.84** | ||
| 11 | 51.24 ± 10.02* | 64.78 ± 8.12 | |
| 13 | 52.21 ± 9.75* | 65.85 ± 8.13 | |
| 15 | 48.41 ± 10.32** | 64.91 ± 8.13 |
Best value for each transformation is indicated in bold
Statistical differences between baseline and the particular case are identified with permutation test
***p value < 0.001; **p value < 0.01; *p value < 0.05
Fig. 3Results for image-based transformations. Ranges with highest mean are shown for each transformation and dataset. Baselines of each dataset are included. Their median and quartiles are prolongated on the background for reference. For the CVC-EndoSceneStill: ± 90% width shift; ± 40% height shift; ± 6° rotation, ± 45° shear; 0.9 zoom in; 0.4 zoom out; (250,40) elastic deformation. For the Kvasir-SEG: ± 20% width shift; ± 30% height shift; ± 90° rotation, ± 45° shear; 0.5 zoom in; 0.2 zoom out; (3000,40) elastic deformation
Fig. 4Results for image-based transformations. Ranges with highest mean are shown for each transformation and dataset. Baselines of each dataset are included. Their median and quartiles are prolongated on the background for reference. For the CVC-EndoSceneStill: ± 150 for brightness in all channels equally; ± 25 for brightness in each channel independently; (0.2–1.8) for contrast in all channels equally; and (0.4–1.6) for brightness in each channel independently. For the Kvasir-SEG: ± 175 for brightness in all channels equally; ± 125 for brightness in each channel independently; (0.2–1.8) for contrast in all channels equally; and (0.8–1.2) for brightness in each channel independently
Fig. 5Results for problem-based transformations. Ranges with highest mean are shown for each transformation and dataset. Baselines of each dataset are included. Their median and quartiles are prolongated on the background for reference. For the CVC-EndoSceneStill: 3 for blurry images. For the Kvasir-SEG: 9 for blurry images
Mean and standard deviation of combinations analysed
| CVC-EndoSceneStill | Kvasir-SEG | |||
|---|---|---|---|---|
| Transformations | IoU on test set | Transformations | IoU on test set | |
| Baseline | None | 59.10 ± 9.35 | None | 66.45 ± 8.08 |
| Transformation and range with highest mean for each one of the three types of transforms | Width at ± 90% | 72.30 ± 7.26*** | 90° rotation | 65.53 ± 7.98 |
| Change of contrast: each channel independently, with range [0.4, 1.6] | Change of brightness: each channel independently, with range ± 125 | |||
| Inclusion of specular lights | Inclusion of specular lights | |||
| Range with highest mean of the image-based transformations, provided that they improve the baseline result | Width at ± 90% | 65.19 ± 7.81* | Width at ± 20% | 57.97 ± 9.21** |
| Height at ± 40% | Height at ± 30% | |||
| Zoom with range [1, 1.6] | 90° rotation | |||
| Vertical flip | 45° shear | |||
| Zoom with range [0.5, 1] | ||||
| Vertical flip | ||||
| Horizontal flip | ||||
| Elastic deformation, with values (3000,40) | ||||
| The two transformations with higher mean | Change of contrast: each channel independently, with range [0.4, 1.6] | 70.50 ± 7.69*** | 90° rotation | 69.24 ± 7.85 |
| Change of brightness: each channel independently, with range ± 25 | 45° shear | |||
Statistical differences between baseline and combination are identified with permutation test
***p value < 0.001; **p value < 0.01; *p value < 0.05
Fig. 6Results for combination of transformations. Baselines of each dataset are included. Their median and quartiles are prolongated on the background for reference. Combination of the transformation and range with highest mean for each one of the three types of transforms for each dataset. For CVC-EndoSceneStill: width at ± 90%, change of contrast: each channel independently, with range [0.4, 1.6], and inclusion of specular lights. For Kvasir-SEG: 90° rotation, change of brightness: each channel independently, with range ± 125, and inclusion of specular lights. Combination of the range with highest mean of the image-based transformations, provided that they improve the baseline result. For CVC-EndoSceneStill: width at ± 90%, height at ± 40%, zoom with range [1, 1.6], and vertical flip. For Kvasir-SEG: width at ± 20%, height at ± 30%, 90° rotation, 45° shear, zoom with range [0.5, 1], vertical flip, horizontal flip, and elastic deformation, with values (3000,40). Combination of the two transformations with higher mean. For CVC-EndoSceneStill: change of contrast: each channel independently, with range [0.4, 1.6] and change of brightness: each channel independently, with range ± 25. For Kvasir-SEG: 90° rotation and 45° shear