| Literature DB >> 34322592 |
Syed Muhammad Arsalan Bashir1,2, Yi Wang1, Mahrukh Khan3, Yilong Niu4.
Abstract
Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.Entities:
Keywords: Artificial intelligence; Convolutional neural networks (CNN); Deep learning; Generative adversarial networks (GAN); Image super-resolution; Neural networks; Single-image super-resolution (SISR); Super-resolution
Year: 2021 PMID: 34322592 PMCID: PMC8293932 DOI: 10.7717/peerj-cs.621
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Hierarchical classification of this survey.
Four main categories are (a) classical methods of image super-resolution, (b) deep learning-based methods for SR, (c) applications of super-resolution, (d) future research and directions in SR. Green color represent first-level sections, the blue color is for second-level subsections, and orange color represent third level subsections.
Figure 2Downsampling and upsampling in super-resolution.
Noise is added to simulate realistic degradation within an image.
Comparison of image quality metrics for super-resolution.
| Method | Strengths | Weaknesses |
|---|---|---|
| PSNR | Most commonly used quality assessment metric; thus, it is easy to compare results with other methods. Quantitative and is based on MSE | Since this metric is pixel-based, the overall score could be misleading in some cases where two images could be visually different, but the PSNR would still be high. This method does not give consider the structural information within the image |
| SSIM | After PSNR, this metric is the most commonly used IQA metric; thus, comparing results with other methods is easier. Quantitatively scores an image based on its structural similarity with the original image with the possibility to change the weights of luminance, contrast, and structural comparison. | SSIM is unstable in cases where the variance or luminance of the reference image is low; thus, in medical imaging, this metric could give inconsistent results. |
| Opinion Scoring | Opinion scoring is a subjective quality metric, and human testers grade image quality based on predefined parameters such as sharpness, color, and natural look. This method is particularly suitable for human face reconstruction methods. | Limitations include non-linear scoring among the testers, human error, and changes in test parameters. Scoring takes much time, especially for large datasets |
| Perceptual Quality | This method is similar to opinion scoring, but human testers are replaced by models that learn the behavior of testers using deep learning. Very fast compared to opinion scoring. | Requires additional resources for training the network to learn the features for the quality assessment network. It depends on annotated datasets to learn human behavior. |
| Task-based Evaluation | This metric is appropriate if the SR images are used to perform another task, for example, object detection/classification and diagnosis. It helps in measuring the performance of the whole task, which uses SR images. | Highly dependent on the performance of the associated task. Same SR images will give different scores if there is a change in task parameters. |
List of benchmark datasets used in super-resolution.
| Name | Number of images/pairs | Image format | Type | Resolution | Details of images |
|---|---|---|---|---|---|
| BSD100 ( | 100 | PNG | Unpaired | (480, 320) | 100 images of animals, people, buildings, scenic views etc. |
| BSDS300 ( | 300 | JPG | Unpaired | (430, 370) | 300 images of animals, people, buildings, scenic views, plants, etc. |
| BSDS500 ( | 500 | JPG | Unpaired | (430, 370) | Extended version of BSD 300 with additional 200 images |
| CelebA ( | 202,599 | PNG | Unpaired | (2048, 1024) | Over 40 attribute defined categories of celebrities |
| DIV2K ( | 1,000 | PNG | Paired | (2048, 1024) | Objects, People, Animals, scenery, nature |
| Manga109 ( | 109 | PNG | Unpaired | (800, 1150) | 109 manga volumes drawn by professional manga artists in Japan |
| MS-COCO ( | 164,000 | JPG | Unpaired | (640, 480) | Labeled objects with over 80 object categories |
| OutdoorScene ( | 10,624 | PNG | Unpaired | (550, 450) | Outdoor scenes including plants, animals, sceneries, water reservoirs, etc. |
| PIRM ( | 200 | PNG | Unpaired | (600, 500) | Sceneries, people, flowers, etc. |
| Set14 ( | 14 | PNG | Unpaired | (500, 450) | Faces, animals, flowers, animated characters, insects, etc. |
| Set5 ( | 5 | PNG | Unpaired | (300, 340) | Only 5 images including, butterfly, baby, bird, head, and women. |
| T91 ( | 91 | PNG | Unpaired | (250, 200) | 91 images of fruits, cars, faces, etc. |
| Urban100 ( | 100 | PNG | Unpaired | (1000, 800) | Urban buildings, architecture |
| VOC2012 ( | 11,530 | JPG | Unpaired | (500, 400) | Labelled objects with over 20 classes |
Inclusion and exclusion criteria.
| Section | Inclusion | Exclusion |
|---|---|---|
| Introduction | Methods that defined image interpolation and performed some practical form of image interpolation, i.e., super-resolution | Studies that solely defined model Review articles |
| Classical Methods | Methods that performed pixel, neighborhood, or any classical image interpolation | Application research where applications of classical methods were discussed Review papers |
| Deep learning-based methods | Development of image super-resolution using deep learning methods, including review papers | Papers that emphasize video super-resolution as these papers give priority to frame per second (FPS) and inference time were not included |
| Applications | Direct applications of super-resolution methods in the six fields defined in “Domain-Specific Applications of Super-Resolution” were included | Applications that combined other methods with image super-resolution and SR was a limited part were not included Review papers |
Figure 3Methodology for the collection of studies.
Sample details based on inclusion/exclusion criteria defined in Table 5.
SR method details of various SR algorithms.
| Year | Method name | US | Network | Framework | Loss function | Details |
|---|---|---|---|---|---|---|
| 2014, ECCV | SRCNN ( | Bicubic | CNN | Pre | First deep learning-based SR | |
| 2016, CVPR | DRCN ( | Bicubic | Res., Rec. | Pre | Recursive layers | |
| 2016, ECCV | FSRCNN ( | Deconv | Post | Lightweight | ||
| 2017, CVPR | ESPCN ( | Sub-pixel | Pre | Sub-pixel | ||
| 2017, CVPR | LapSRN ( | Bicubic | Res. | Prog | Cascaded CNN | |
| 2017, CVPR | DRRN ( | Bicubic | Res., Rec. | Pre | Recursive layer blocks | |
| 2017, CVPR | SRResNet ( | Sub-pixel | Res. | Post | Content loss | |
| 2017, CVPR | SRGAN ( | Sub-pixel | Res. | Post | GAN-based loss | |
| 2017, CVPR | EDSR ( | Sub-pixel | Res. | Post | Compact design | |
| 2017, ICCV | EnhanceNet ( | Bicubic | Res. | Pre | GAN-based loss | |
| 2017, ICCV | MemNet ( | Bicubic | Res., Rec., Dense | Pre | Memory layers blocks | |
| 2017, ICCV | SRDenseNet ( | Deconv | Res., Dense | Post | Fully connected layers | |
| 2018, CVPR | DBPN ( | Deconv | Res., Dense | Iter | Back-prop. Based | |
| 2018, CVPR | DSRN ( | Deconv | Res., Rec. | Pre | Dual-state network | |
| 2018, CVPRW | ProSR, ProGanSR (Wang et al., 2018) | Progressive Upscale | Res., Dense | Prog | Least square loss | |
| 2018, ECCV | MSRN ( | Sub-pixel | Res. | Post | Multi-path | |
| 2018, ECCV | RCAN ( | Sub-pixel | Res., Attent. | Post | Attention-based loss | |
| 2018, ECCV | ESRGAN ( | Sub-pixel | Res., Dense | Post | GAN-based loss | |
| 2019, CVPR | Meta-RDN ( | Meta Upscale | Res., Dense | Post | Multi-scale model | |
| 2019, CVPR | Meta-SR ( | Meta Upscale | Res., Dense | Post | Arbitrary scale factor as input | |
| 2019, CVPR | RBPN ( | Sub-Pixel | Rec. | Post | Used SISR and MISR together for VSR | |
| 2019, CVPR | SAN ( | Sub-Pixel | Res., Attent. | Post | 2nd order attention | |
| 2019, CVPR | SRFBN ( | Deconv | Res., Rec., Dense | Post | Feedback path | |
| 2020, Neuro-computing | WRAN ( | Bicubic | Res., Attent. | Pre | Wavelet-based |
Note:
“US,” “Rec.,” “Res.,” “Attent.,” “Dense,” “Pre.,” “Post.,” “Iter.,” and “Prog.” represent upsampling methods, recursive learning, residual learning, attention-based learning, dense connections, pre-upsampling framework, post-upsampling framework, iterative up-down upsampling framework, and progressive upsampling framework respectively.
Figure 4Sub-pixel layer. Blue color represents the input convolution, and output feature maps are represented in other colors.
(A) Input. (B) Convolution. (C) Reshaping.
Figure 5Deconvolution layer. The blue color represents the input, and the green color represents the convolution operation.
(A) Input. (B) Expansion. (C) Convolution.
Comparison of upsampling methods.
| Method | Strengths | Weaknesses |
|---|---|---|
| Sub-pixel layer | It uses convolution in an end-to-end manner, so it is frequently used in SR models. This layer has a wide receptive field, which helps learn more contextual information | This layer may generate some false artifacts at the boundaries of complex patterns due to its uneven distribution of the respective field. Upscaling factor is fixed |
| Deconvolution layer | This layer is most commonly used in SR methods, and it generates HR images in an end-to-end manner. Compatible with vanilla convolution | In some cases, due to uneven overlapping within the generated HR image, the patterns are replicated in a check-like format and may result in a non-realistic HR image. Upscaling factor is fixed |
| Meta upscaling | This method uses arbitrary scaling factors to generate the SR image. Extracts more information from the LR feature maps, which helps to construct an HR image using meta upscaling in the last layer of the SR models, which makes this method an end-to-end SR approach | The whole process may become unstable for high-scale factors as it predicts the convolution weights for every single pixel independent of the image information within those pixels. |
Figure 6Deep-learning network structures for super-resolution.
(A) Recursive learning, (B) residual learning, (C) dense connection-based learning, (D) multiscale learning, (E) advanced convolution-based learning, (F) attention-based learning.
Figure 7Pre-upsampling-based super-resolution network pipeline.
Figure 10Progressive sampling-based super-resolution network pipeline.
Figure 8Post-upsampling-based super-resolution network pipeline.
Figure 9Iterative up-and-down sampling-based super-resolution network pipeline.
Figure 11Benchmarking of super-resolution models.
Image quality index is represented by PSNR (in blue color), which is a significant evaluation indicator of any super-resolution method; the total number of parameters learned by every method is shown in green. The computational efficiency is measured in tera multiply-adds, and it is shown in orange color.