Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Literature DB >> 33631607

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Dunlu Peng¹, Wuchen Yang², Cong Liu³, Shuairui Lü⁴.

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis.

Keywords: Machine learning; SAM-GAN; Self-attention mechanism; Text-to-image synthesis

Mesh：

Year: 2021 PMID： 33631607 DOI： 10.1016/j.neunet.2021.01.023

Source DB: PubMed Journal: Neural Netw ISSN： 0893-6080

Keyword Cloud
Cited

2 in total

1. Rapid DNA origami nanostructure detection and classification using the YOLOv5 deep convolutional neural network.

Authors: Matthew Chiriboga; Christopher M Green; David A Hastman; Divita Mathur; Qi Wei; Sebastían A Díaz; Igor L Medintz; Remi Veneziano
Journal: Sci Rep Date: 2022-03-09 Impact factor: 4.379

2. GAN-FDSR: GAN-Based Fault Detection and System Reconfiguration Method.

Authors: Zihan Shen; Xiubin Zhao; Chunlei Pang; Liang Zhang
Journal: Sensors (Basel) Date: 2022-07-15 Impact factor: 3.847

2 in total