| Literature DB >> 32802024 |
Lianchao Jin1, Fuxiao Tan1, Shengming Jiang1.
Abstract
Computer vision is one of the hottest research fields in deep learning. The emergence of generative adversarial networks (GANs) provides a new method and model for computer vision. The idea of GANs using the game training method is superior to traditional machine learning algorithms in terms of feature learning and image generation. GANs are widely used not only in image generation and style transfer but also in the text, voice, video processing, and other fields. However, there are still some problems with GANs, such as model collapse and uncontrollable training. This paper deeply reviews the theoretical basis of GANs and surveys some recently developed GAN models, in comparison with traditional GAN models. The applications of GANs in computer vision include data enhancement, domain transfer, high-quality sample generation, and image restoration. The latest research progress of GANs in artificial intelligence (AI) based security attack and defense is introduced. The future development of GANs in computer vision is also discussed at the end of the paper with possible applications of AI in computer vision.Entities:
Mesh:
Year: 2020 PMID: 32802024 PMCID: PMC7416236 DOI: 10.1155/2020/1459107
Source DB: PubMed Journal: Comput Intell Neurosci
Development of GANs.
| Stages | Stage 1 | Stage 2 | Stage 3 |
|---|---|---|---|
| Time | 2014.06–2015.11 | 2015.11–2017.01 | 2017.1-today |
| GAN models | GAN- > DCGAN | DCGAN- > WGAN | WGAN- > today |
| Improvements | GAN is the beginning of generating the adversarial model | DCGAN uses many new methods to make the model more stable such as batchnorm, ReLU, and leaky ReLU | WGAN uses weight clipping solving the problem of gradient disappearance |
Figure 1GAN network architecture.
Loss function and derivative models for GANs.
| GAN models | Loss function of discriminator | Loss function of generator |
|---|---|---|
| GAN [ |
|
|
|
| ||
| CGAN [ |
|
|
|
| ||
| WGAN [ |
|
|
|
| ||
| BEGAN [ |
|
|
|
| ||
| PAN [ |
|
|
|
| ||
| InfoGAN [ |
|
|
|
| ||
| LSGAN [ |
|
|
|
| ||
| DRAGAN [ |
|
|
|
| ||
| CWGAN [ |
|
|
|
| ||
| AdaBalGAN [ |
|
|
|
| ||
| FittingGAN [ |
|
|
|
| ||
| StackGAN [ |
|
|
Comparisons of GAN models on the loss function.
| GAN models | Improvements | Shortages | Applications |
|---|---|---|---|
| CGAN [ | Through adding a conditional variable | The model is limited by data set and data set needs both tags and markeds | The model through semisupervised learning to generate a specified target |
|
| |||
| PAN [ | The loss function was composed by the perceptual adversarial loss to train models | The model is a supervised model also needs the data set with both tags and markeds | The model can be applied to many image-to-image conversions |
|
| |||
| CWGAN [ | The model is based on Wasserstein's condition which has a lower cost than traditional GANs | Model collapses and has a lack of diversity | The model can be applied to short data set's training |
|
| |||
| FittingGAN [ | The model is based on the CGAN loss function but adds an L1 regularization | The model accuracy is not very high and has a lack of diversity | Be better than the image-to-image task, it can generate images different from the input image guide |
Figure 2PAN network structure [40].
Comparisons of GAN models on the structure.
| GAN models | Improvements | Shortages | Applications |
|---|---|---|---|
| MAD-GAN [ | The model uses many generators and a discriminator to generate samples | Hard to convergence and lack of diversity | The model is applied to multivariate time series anomaly detection |
| InfoGAN [ | The model's input is composed of | Complex and with a large number of params | The model is unsupervised and learns interpretable and disentangled representations on challenging datasets |
| ACGAN [ | The model combines the advantages of CGAN and SGAN to generate samples | Semisupervised and the model is hard to converge in the small amount of data | Can generate high-quality samples and have diversity |
| StackGAN [ | The model through two-stage training generating more realistic samples | Complicated and need more training time | The model can be applied according to text to generate images |
Figure 3MAD-GAN model structure.
Figure 4Comparison of stackGAN with its improved version [50] (StackGAN-v1 is original stackGAN while StackGAN-v2 is the stackGAN++ model; the latter can generate more realistic and details samples).
Figure 5Image of hemangioma generated by CGAN [56].
Comparisons of GAN models in high-quality examples generation.
| GAN models | Improvements | Shortages | Applications |
|---|---|---|---|
| DCGAN [ | The methods fraction-strided convolution, batchnorm, and ReLU make the model more stable and easy to converge | Model collapses and needs to adjust parameters in different conditions | Highest usage models in most scenarios |
| LAPGAN [ | Laplacian and Gaussian pyramids in the up and down samples which make the model easy to approach and learn residuals | Supervision model | High-resolution images generation |
| SAGAN [ | Using self-attention mechanism and two-timescale update rule, the model can generate realistic images | The attention mechanism is limited | Large-scale classification of conditional image generation tasks |
| VSRResFeat GAN [ | Using GAN loss and Charbonnier distance in feature and pixel space | The noise in the estimated frames is redundant | Video super-resolution |
Comparisons of GAN models in domains transfer.
| GAN models | Improvements | Shortages | Applications |
|---|---|---|---|
| Pix2pix [ | Using U-NET network and PatchGAN architecture which make the model easy to converge and images are realistic | The model is a supervised model and also needs the data with both tags and markeds | Style transfer and other applications |
| CycleGAN [ | Cycle loss, self-constraint, and two-step transformation. The model training does not need a large data set | The quality of generated images is lower than pix2pix | Most of the style conversion scenes |
| DiscoGAN [ | Using two GAN models to discover cross-domain relationships reducing model collapse and improve image quality | Data sets must be one-to-one paired images | Most scenes in domain transfer |
| StarGAN [ | Adding control information of a domain to understand the image which domain does it belongs to | Needs a large number of different data sets | Multidomain transfer |
| DTN [ | Using several complex loss functions, generating appealing emoji. From a facial image | The generated images with low quality | Using real photos to generate cartoon images |
Figure 6StarGAN generation image [71].
Figure 7Encrypted communication system.