| Literature DB >> 35214498 |
Sandra Treneska1, Eftim Zdravevski1, Ivan Miguel Pires2,3, Petre Lameski1, Sonja Gievska1.
Abstract
Large-scale labeled datasets are generally necessary for successfully training a deep neural network in the computer vision domain. In order to avoid the costly and tedious work of manually annotating image datasets, self-supervised learning methods have been proposed to learn general visual features automatically. In this paper, we first focus on image colorization with generative adversarial networks (GANs) because of their ability to generate the most realistic colorization results. Then, via transfer learning, we use this as a proxy task for visual understanding. Particularly, we propose to use conditional GANs (cGANs) for image colorization and transfer the gained knowledge to two other downstream tasks, namely, multilabel image classification and semantic segmentation. This is the first time that GANs have been used for self-supervised feature learning through image colorization. Through extensive experiments with the COCO and Pascal datasets, we show an increase of 5% for the classification task and 2.5% for the segmentation task. This demonstrates that image colorization with conditional GANs can boost other downstream tasks' performance without the need for manual annotation.Entities:
Keywords: convolutional neural network; generative adversarial network; image colorization; self-supervised learning; transfer learning
Mesh:
Year: 2022 PMID: 35214498 PMCID: PMC8880520 DOI: 10.3390/s22041599
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Supervised and self-supervised methods for learning visual features.
Comparison of previous research papers that investigate image colorization as a proxy task.
| Paper | Model Base | Pretrain Dataset | Fine-Tune Dataset |
|---|---|---|---|
| [ | AlexNet | ImageNet | PASCAL VOC |
| [ | VGG-16 | ImageNet | PASCAL VOC |
| [ | AlexNet, VGG-16, ResNet-152 | ImageNet, Places | PASCAL VOC |
| [ | Cross-channel autoencoder | ImageNet, Places | PASCAL VOC |
Comparison of the best results of previous research on the Pascal VOC dataset for classification, segmentation, and object detection downstream tasks. The bolded result in each column denotes the best result for that task.
| Paper | Classification (mAP%) | Segmentation (mIU%) | Detection (mAP%) |
|---|---|---|---|
| [ | 65.9 | 35.6 |
|
| [ | / | 50.2 | / |
| [ |
|
| / |
| [ | 67.1 | 36.0 | 46.7 |
Figure 2Data flow diagram of training the colorization model and transferring its weights to downstream tasks.
Figure 3Architecture of a conditional generative adversarial network for image colorization.
Figure 4Example images and their masks from the Pascal VOC segmentation dataset.
Figure 5Architecture of the multilabel classification model.
Figure 6Architecture of the semantic segmentation model.
Figure 7Learning progress of the image colorization model. On the first row are the grayscale input images. Next are the learned colorizations after 1, 5, 15, and 20 epochs. On the last row are the original color images.
Evaluation of the cGAN colorization model on the COCO test dataset.
| Metric | Average | Min | Max |
|---|---|---|---|
| PSNR | 20.94 | 8.82 | 42.61 |
| SSIM | 0.85 | 0.31 | 0.99 |
Evaluation of the multilabel classification and semantic segmentation models on the Pascal VOC test dataset.
| Model Initialization | Classification (Acc) | Segmentation (mIU) |
|---|---|---|
| Baseline | 47.18% | 44.66% |
| Colorization pre-training | 52.83% | 47.07% |
Figure 8Training losses of the multi-label classification models (left) and semantic segmentation models (right).