| Literature DB >> 35270898 |
Abdul Jabbar1, Xi Li1, Muhammad Assam1, Javed Ali Khan2, Marwa Obayya3, Mimouna Abdullah Alkhonaini4, Fahd N Al-Wesabi5, Muhammad Assad6.
Abstract
To address the problem of automatically detecting and removing the mask without user interaction, we present a GAN-based automatic approach for face de-occlusion, called Automatic Mask Generation Network for Face De-occlusion Using Stacked Generative Adversarial Networks (AFD-StackGAN). In this approach, we decompose the problem into two primary stages (i.e., Stage-I Network and Stage-II Network) and employ a separate GAN in both stages. Stage-I Network (Binary Mask Generation Network) automatically creates a binary mask for the masked region in the input images (occluded images). Then, Stage-II Network (Face De-occlusion Network) removes the mask object and synthesizes the damaged region with fine details while retaining the restored face's appearance and structural consistency. Furthermore, we create a paired synthetic face-occluded dataset using the publicly available CelebA face images to train the proposed model. AFD-StackGAN is evaluated using real-world test images gathered from the Internet. Our extensive experimental results confirm the robustness and efficiency of the proposed model in removing complex mask objects from facial images compared to the previous image manipulation approaches. Additionally, we provide ablation studies for performance comparison between the user-defined mask and auto-defined mask and demonstrate the benefits of refiner networks in the generation process.Entities:
Keywords: automatic mask removal; generative adversarial network (GAN); image restoration
Mesh:
Year: 2022 PMID: 35270898 PMCID: PMC8914700 DOI: 10.3390/s22051747
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The proposed AFD-StackGAN results on real-world images.
Figure 2The architecture of the automatic mask removal network for face de-occlusion. It consists of Stage-I Network that generates a binary mask and Stage-II Network that removes the mask object from input facial images.
Figure 3Some images of our synthetic dataset.
A summary of dataset feature description used in experiments.
| Synthetic Generated Dataset | Feature Description |
|---|---|
| Total Number of Samples | 20,000 |
| Number of Training Samples | 18,000 |
| Number of Testing Samples | 2000 |
| No. of Classes | 50 |
| Samples Per Class | 400 |
| Number of Training Samples | 18,000 |
Note: In the above table, number of classes indicates how many mask objects (non-face objects) varied in sizes, shapes, structures, and positions are used in the synthetic generated dataset, and samples per class indicates on how many images (faces) a specific mask object is applied.
Figure 4The results of Stage-I Network on real-world images.
Figure 5The results of AFD-StackGAN (Stage-I Network + Stage-II Network) on real-world images.
Figure 6Visual assessment of the proposed AFD-StackGAN with the baseline models on real-world images.
Figure 7AFD-StackGAN performance for real face images with occlusion masks that have very different structures and locations in the face images than the occlusion masks used in the synthetic dataset. The first row shows occluded input facial images, and the second row shows de-occluded output face images.
Performance comparison of different methods in terms of SSIM, MSE, PSNR, NIQE, and BRISQUE. For PSNR and SSIM, higher values show superior performance, while for BRISQUE and NIQE, the lower, the better.
| Methods | SSIM ↑ | PSNR ↑ | MSE ↓ | NIQE ↓ | BRISQUE |
|---|---|---|---|---|---|
| Iizuka et al. [ | 0.763 | 21.953 | 2329.062 | 4.754 | 34.106 |
| Yu et al. [ | 0.797 | 15.469 | 2316.839 | 4.951 | 32.761 |
| Nazeri et. [ | 0.561 | 15.848 | 2450.889 | 16.991 | 36.426 |
| Din et al. [ | 0.850 | 16.209 | 2223. 938 | 5.721 | 31.016 |
| AFD-StackGAN | 0.978 | 33.201 | 32.435 | 4.902 | 39.872 |
Figure 8Visual comparison of the automatic mask removal network (used auto-generated mask) with FD-StackGAN (used user-defined mask).
Performance comparison between using user-defined mask and auto-defined mask in SSIM, PSNR, MSE, NIQE, and BRISQUE.
| Methods | SSIM | PSNR | MSE | NIQE | BRISQUE |
|---|---|---|---|---|---|
| User-Defined Mask | 0.981 | 32.803 | 34.145 | 4.499 | 42.504 |
| Auto-Defined Mask | 0.978 | 33.201 | 32.435 | 4.902 | 39.872 |
Figure 9Results of image refiner network on real-world images further improve the results by rectifying what is missing or wrong in the mask base network results.