| Literature DB >> 35401139 |
Xianmin Wang1, Jing Li1,2,3, Qi Liu1, Wenpeng Zhao1, Zuoyong Li2, Wenhao Wang3.
Abstract
Neural networks have played critical roles in many research fields. The recently proposed adversarial training (AT) can improve the generalization ability of neural networks by adding intentional perturbations in the training process, but sometimes still fail to generate worst-case perturbations, thus resulting in limited improvement. Instead of designing a specific smoothness function and seeking an approximate solution used in existing AT methods, we propose a new training methodology, named Generative AT (GAT) in this article, for supervised and semi-supervised learning. The key idea of GAT is to formulate the learning task as a minimax game, in which the perturbation generator aims to yield the worst-case perturbations that maximize the deviation of output distribution, while the target classifier is to minimize the impact of this perturbation and prediction error. To solve this minimax optimization problem, a new adversarial loss function is constructed based on the cross-entropy measure. As a result, the smoothness and confidence of the model are both greatly improved. Moreover, we develop a trajectory-preserving-based alternating update strategy to enable the stable training of GAT. Numerous experiments conducted on benchmark datasets clearly demonstrate that the proposed GAT significantly outperforms the state-of-the-art AT methods in terms of supervised and semi-supervised learning tasks, especially when the number of labeled examples is rather small in semi-supervised learning.Entities:
Keywords: adversarial training; generative AT; neural networks; smoothness function; trajectory-preserving-based alternating update strategy; worst-case perturbations
Year: 2022 PMID: 35401139 PMCID: PMC8988301 DOI: 10.3389/fnbot.2022.859610
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1The overall framework of Generative AT (GAT).
Trajectory preserving training process.
Figure 2The transition curves of accuracy rates by Plain NN and the proposed GAT on M-dataset and C-dataset. (A) Plots the results for M-dataset, (B) plots the results for C-dataset.
Figure 3The visualization of model distributions of GAT and Plain NN on the synthetic datasets. (A,B) Show the distribution surface on M-dataset, (C,D) show the distribution surface on C-dataset, where flat surface regions implicate small output deviations.
Figure 4The contour of output confidences for label 1 on M-dataset with various regularization methods. The red circles and blue triangles represent the data points with labels 1 and 0, respectively. The decision boundaries with different confidences are plotted with different colored contours. Note that the black line represents the contour of probability value 0.5, which is usually served as the decision boundary for the binary classification task. The accuracy rate of each method for the test examples is displayed above the panel.
Figure 5The contour of output confidence for label 1 on C-dataset with various regularization methods. The detailed illustrations for this figure can be referred to the caption of Figure 4.
Test error rates of various regularization methods for supervised learning task on MNIST dataset.
|
|
|
|---|---|
| SVM (gaussian kernel) | 1.40 |
| Dropout | 1.05 |
| Maxout networks | 0.94 |
| DBM | 0.79 |
| Ladder network† | 0.57 |
| Conv-CatGAN† | 0.48 |
| Plain NN (Baseline) | 1.15 |
| RAT | 0.85 |
| SAT ( | 0.78 |
| VAT | 0.66 |
| GAT-woTP | 0.65 |
| GAT (Our method) | 0.45 |
The upper panel refers to the experimental results reported in prior work, the error rates in the bottom panel are derived by our implementations. .
Test error rates of semi-supervised learning methods on MNIST datasets.
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| SVM | 23.44 | 8.85 | 7.77 | 4.21 |
| EmbedNN | 16.9 | 5.97 | 5.73 | 3.59 |
| PEA | 10.79 | 2.44 | 2.23 | 1.91 |
| Conv-CatGAN† | 1.93(±0.01) | 1.86(±0.11) | 1.73(±0.18) | 1.67(±0.12) |
| Ladder networks† | 1.06(±0.37) | 0.93(±0.07) | 0.84(±0.08) | 0.79(±0.09) |
| Auxiliary DGM† | 0.96(±0.02) | 0.90(±0.05) | 0.86(±0.13) | 0.78(±0.05) |
| RAT | 6.62(±1.02) | 3.75(±0.14) | 1.61(±0.09) | 1.51(±0.08) |
| VAT | 2.38(±0.11) | 1.38(±0.08) | 1.35(±0.12) | 1.28(±0.07) |
| GAT-woTP | 1.97(±0.87) | 1.66(±0.85) | 1.58(±0.96) | 1.32(±0.65) |
| GAT (Our method) | 0.90(±0.11) | 0.85(±0.09) | 0.83(±0.17) | 0.75(±0.08) |
N
The results in the upper panel are referred to the reports in prior work, the error rates in the bottom panel are derived by our implementations. .
Test error rates (%) of semi-supervised learning methods on SVHN and CIFAR-10 datasets.
|
|
|
|
|---|---|---|
|
|
| |
| Π-model | 5.43(±0.25) | 16.55(±0.29) |
| Mean teacher | 5.21(±0.21) | 17.74(±0.30) |
| ALI | 7.41(±0.65) | 17.99(±1.62) |
| Ban GAN† | 4.25(±0.03) | 14.41(±0.30) |
| Tripple GAN† | 5.77(±0.17) | 16.99(±0.36) |
| Improved GAN† | 4.39(±1.20) | 16.20(±1.60) |
| TNAR-LGAN (Small)† | 4.25(±0.09) | 12.97(±0.31) |
| TNAR-LGAN (Large)† | 4.03(±0.13) | 12.76(±0.04) |
| RAT (Small) | 8.42(±0.22) | 18.58(±0.26) |
| RAT (Large) | 8.36(±0.22) | 18.23(±0.16) |
| VAT (Small) | 6.83(±0.24) | 14.87(±0.13) |
| VAT (Large) | 5.77(±0.32) | 14.18(±0.38) |
| GAT-woTP (Small) | 6.53(±0.95) | 14.36(±1.03) |
| GAT-woTP (Large) | 5.26(±0.92) | 14.02(±0.88) |
| GAT (Our method, Small) | 4.27(±0.14) | 12.96(±0.15) |
| GAT (Our method, Large) | 4.01(±0.11) | 12.81(±0.13) |
N.