| Literature DB >> 35126081 |
Shuai Yang1, Kai Qiao1, Ruoxi Qin1, Pengfei Xie1, Shuhao Shi1, Ningning Liang1, Linyuan Wang1, Jian Chen1, Guoen Hu1, Bin Yan1.
Abstract
With the continuous development of deep-learning technology, ever more advanced face-swapping methods are being proposed. Recently, face-swapping methods based on generative adversarial networks (GANs) have realized many-to-many face exchanges with few samples, which advances the development of this field. However, the images generated by previous GAN-based methods often show instability. The fundamental reason is that the GAN in these frameworks is difficult to converge to the distribution of face space in training completely. To solve this problem, we propose a novel face-swapping method based on pretrained StyleGAN generator with a stronger ability of high-quality face image generation. The critical issue is how to control StyleGAN to generate swapped images accurately. We design the control strategy of the generator based on the idea of encoding and decoding and propose an encoder called ShapeEditor to complete this task. ShapeEditor is a two-step encoder used to generate a set of coding vectors that integrate the identity and attribute of the input faces. In the first step, we extract the identity vector of the source image and the attribute vector of the target image; in the second step, we map the concatenation of the identity vector and attribute vector onto the potential internal space of StyleGAN. Extensive experiments on the test dataset show that the results of the proposed method are not only superior in clarity and authenticity than other state-of-the-art methods but also sufficiently integrate identity and attribute.Entities:
Keywords: deepfake; disentanglement; face swapping; generative adversarial network; style transfer
Year: 2022 PMID: 35126081 PMCID: PMC8814752 DOI: 10.3389/fnbot.2021.785808
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1Some abnormal results generated by FaceShifter (Li et al., 2019).
Figure 2The overall structure and data flow of the proposed model. (A) is the flow of our method. (B) is the structure of Eattr. (C) is the structure of Multilayer Perceptron (MLP).
Training ShapeEditor using gradient descent.
| 1: |
| 2: |
| 3: Generate the |
| 4: ShapeEditor: |
| 5: Generate the face-swapping image |
| 6: G |
| 7: Calculate the identity loss |
| 8: Update ShapeEditor with loss |
| 9: end |
| 10: end. |
Figure 3Qualitative comparison with FSGAN (Nirkin et al., 2019), FaceShifter (Li et al., 2019; Nitzan et al., 2020) on the CelebAMask-HQ (Lee et al., 2020) test dataset.
Quantitative comparison with Nitzan et al. (2020). Our method performs better in most indicators.
|
|
| ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Nitzan et al. ( |
| 0.30 | 5.99 | 7.16 | 10.13 | 5.38 | 65.35 |
| Ours | 1.30 | 0.33 |
| 6.88 |
| 3.63 |
|
Bold values represent the best. ↑ represents that the larger the value, the better. ↓ represents that the smaller the value, the better.
Quantitative assessment with DeepFakes (Rössler et al., 2019), FSGAN (Nirkin et al., 2019), and FaceShifter (Li et al., 2019).
|
|
|
|
| ||
|---|---|---|---|---|---|
| Identity Error ↓ | Avg. | 1.35 | 1.51 |
| 1.30 |
| Std. | 0.32 | 0.45 | 0.31 | 0.33 | |
| Pose Error ↓ | Avg. | 3.79 |
| 3.04 | 3.82 |
| Std. | 1.99 | 4.41 | 6.70 | 6.88 | |
| Expression Error ↓ | Avg. | 8.82 | 5.03 |
| 5.93 |
| Std. | 3.30 | 2.17 | 2.83 | 3.63 | |
| Mood Consistency ↑ | Acc. (%) | 39.80 | 72.77 |
| 75.38 |
| SSIM ↓ | Avg. | 0.81 | 0.95 | 0.96 |
|
| Std. | 0.09 | 0.03 | 0.03 | 0.08 | |
| PSNR ↓ | Avg. | 20.54 | 23.76 | 28.17 |
|
| Std. | 2.60 | 2.30 | 1.92 | 1.62 | |
| FDR ↓ | Tsd.=0.01 | 91.42 | 76.59 | 37.67 |
|
| Tsd.=0.05 | 83.83 | 48.99 | 11.66 |
| |
| Tsd.=0.1 | 77.45 | 35.86 | 6.05 |
| |
| Tsd.=0.2 | 70.86 | 24.22 | 2.86 |
| |
Tsd. represents the threshold, which is set to judge whether samples are forged or not. Bold values represent the best. ↑ represents that the larger the value, the better. ↓ represents that the smaller the value, the better.
Figure 4Qualitative ablation study on different variants. Our original model performs better than others.
Quantitative ablation study on different variants for face swapping.
|
|
|
| |||
| Identity Error ↓ | Avg. |
| 1.33 | 1.37 |
|
| Std. | 0.32 | 0.33 | 0.34 | 0.33 | |
| Pose Error ↓ | Avg. | 3.94 | 4.53 | 4.05 |
|
| Std. | 5.59 | 5.74 | 5.75 | 5.55 | |
| Expression Error ↓ | Avg. | 6.63 | 7.43 | 6.48 |
|
| Std. | 3.45 | 4.20 | 3.87 | 3.22 | |
| PSNR ↓ | Avg. | 19.38 |
| 19.94 | 20.22 |
| Std. | 1.51 | 1.58 | 1.59 | 1.62 | |
| SSIM ↓ | Avg. | 0.73 |
| 0.74 | 0.75 |
| Std. | 0.08 | 0.09 | 0.07 | 0.08 |
Bold values represent the best. ↓ represents that the smaller the value, the better.
Figure 5Shortcomings of the proposed model. The problem in panel (A) is that the background of the conversion result is blurred. The problem in panel (B) is that the swapped face lacks Asian characteristics.