| Literature DB >> 33286710 |
Slava Voloshynovskiy1, Olga Taran1, Mouad Kondah1, Taras Holotyak1, Danilo Rezende2.
Abstract
In this paper, we consider an information bottleneck (IB) framework for semi-supervised classification with several families of priors on latent space representation. We apply a variational decomposition of mutual information terms of IB. Using this decomposition we perform an analysis of several regularizers and practically demonstrate an impact of different components of variational model on the classification accuracy. We propose a new formulation of semi-supervised IB with hand crafted and learnable priors and link it to the previous methods such as semi-supervised versions of VAE (M1 + M2), AAE, CatGAN, etc. We show that the resulting model allows better understand the role of various previously proposed regularizers in semi-supervised classification task in the light of IB framework. The proposed IB semi-supervised model with hand-crafted and learnable priors is experimentally validated on MNIST under different amount of labeled data.Entities:
Keywords: deep networks; hand crafted priors; information bottleneck principle; latent space representation; learnable priors; regularization; semi-supervised classification
Year: 2020 PMID: 33286710 PMCID: PMC7597214 DOI: 10.3390/e22090943
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Classification with the hand-crafted latent space regularization.
Figure 2Classification with the learnable latent space regularization.
Semi-supervised classification performance (percentage error) for the optimal parameters (Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G) defined on the MNIST (D—deterministic; S—stochastic).
| MNIST (100) | MIST (1000) | MNIST (all) | SVHN (1000) | ||
|---|---|---|---|---|---|
| NN Baseline ( | [ | 26.31 (±0.91) | 7.50 (±0.19) | 0.68 (±0.05) | 36.16 (±0.77) |
| [ | 26.78 (±1.66) | 7.54 (±0.25) | 0.70 (±0.05) | 36.28 (±0.93) | |
| InfoMax [ | [ | 33.41 | 21.5 | 15.86 | - |
| VAE [ | [ | 14.26 | 8.71 | 5.02 | - |
| MV-InfoMax [ | [ | 13.22 | 7.39 | 6.07 | - |
| IB multiview [ | [ | 3.03 | 2.34 | 2.22 | - |
| VAE (M1 + M2) [ | [ | 3.33 (±0.14) | 2.40 (±0.02) | 0.96 | 36.02 (±0.10) |
| CatGAN | [ | 1.91 (±0.10) | 1.73 (±0.18) | 0.91 | - |
| AAE | [ | 1.90 (±0.10) | 1.60 (±0.08) | 0.85 (±0.02) | 17.70 (±0.30) |
| No priors on latent space | |||||
|
| [ | 20.72 (±1.58) | 4.99 (±0.28) | 0.69 (±0.04) | 25.78 (±0.90) |
| [ | 19.60 (±1.37) | 4.49 (±0.25) | 0.67 (±0.05) | 26.34 (±0.80) | |
| Hand crafted latent space priors | |||||
|
| [ | 27.44 (±1.40) | 6.77 (±0.34) | 0.91 (±0.05) | 35.94 (±1.08) |
| [ | 27.48 (±1.07) | 6.91 (±0.45) | 0.88 (±0.05) | 35.80 (±1.21) | |
|
| [ | 12.04 (±4.46) | 2.43 (±0.12) | 0.81 (±0.05) | 24.70 (±0.46) |
| [ | 11.80 (±3.82) | 2.40 (±0.10) | 0.82 (±0.04) | 24.62 (±0.54) | |
| Learnable latent space priors | |||||
|
| [ | 1.55 (±0.21) | 1.25 (±0.10) | 0.74 (±0.04) | 20.07 (±0.36) |
| [ | 1.49 (±0.18) | 1.43 (±0.06) | 0.78 (±0.04) | 20.00 (±0.31) | |
|
| [ | 1.38 (±0.09) | 1.21 (±0.10) | 0.77 (±0.06) | 19.75 (±0.52) |
| [ | 1.42 (±0.10) | 1.16 (±0.09) | 0.79 (±0.02) | 19.71 (±0.26) | |
Execution time (hours) per 100 epochs on one NVIDIA GPU. For the SVHN the models with the learnable latent space priors were trained with a learning rate 0.0001 that explains the longer time but without optimization of Lagrangians, i.e., the Lagrangians were re-used from pre-trained MNIST model. All the others models were trained with a learning rate 0.001.
| MNIST | SVHN | |
|---|---|---|
| NN Baseline ( | 0.47–0.65 | 0.85–0.92 |
|
| ||
| | 0.47–0.65 | 0.85–0.92 |
|
| ||
| | 0.47–0.65 | 1–1.05 |
| | 0.97–1.18 | 1.5–1.6 |
|
| ||
| | 1.23–1.6 | 2.25–2.3 |
| | 1.98–2.42 | 3.5–3.55 |
The network parameters of baseline classifier trained on . The encoder is trained with and without batch normalization (BN) after Conv2D layers.
| Encoder | |
|---|---|
| Size | Layer |
| 28 × 28 × 1 | Input |
| 14 × 14 × 32 | Conv2D, LeakyReLU |
| 7 × 7 × 64 | Conv2D, LeakyReLU |
| 4 × 4 × 128 | Conv2D, LeakyReLU |
| 2048 | Flatten |
| 1024 | FC, ReLU |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 10 | FC, Softmax |
The network parameters of semi-supervised classifier trained on and . The encoder is trained with and without batch normalization (BN) after Conv2D layers.
| Encoder | |
|---|---|
| Size | Layer |
| 28 × 28 × 1 | Input |
| 14 × 14 × 32 | Conv2D, LeakyReLU |
| 7 × 7 × 64 | Conv2D, LeakyReLU |
| 4 × 4 × 128 | Conv2D, LeakyReLU |
| 2048 | Flatten |
| 1024 | FC, ReLU |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 10 | FC, Softmax |
|
| |
| Size | Layer |
| 10 | Input |
| 500 | FC, ReLU |
| 500 | FC, ReLU |
| 1 | FC, Sigmoid |
The performance (percentage error) of deterministic classifier based on for the encoder with and without batch normalization as a function of Lagrangian multiplier and the number of labelled examples.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0 | 26.56 | 26.24 | 28.04 | 26.95 | 0.96 |
| 0.005 | 20.44 | 21.93 | 18.98 | 20.45 | 1.48 | |
| 0.0005 | 18.55 | 20.43 | 20.59 |
| 1.14 | |
| 1 | 19.23 | 22.42 | 20.57 | 20.74 | 1.60 | |
| with BN | 0 | 29.37 | 29.27 | 30.62 | 29.75 | 0.75 |
| 0.005 | 27.97 | 28.02 | 26.27 | 27.42 | 1.00 | |
| 0.0005 | 25.99 | 23.70 | 24.47 |
| 1.17 | |
| 1 | 27.78 | 31.98 | 35.88 | 31.88 | 4.05 | |
| without BN | 0 | 7.74 | 6.99 | 6.97 | 7.23 | 0.44 |
| 0.005 | 5.62 | 6.06 | 5.60 |
| 0.26 | |
| 0.0005 | 6.30 | 6.12 | 6.02 | 6.15 | 0.14 | |
| 1 | 5.99 | 6.27 | 6.28 | 6.18 | 0.16 | |
| with BN | 0 | 7.45 | 6.95 | 7.52 | 7.31 | 0.31 |
| 0.005 | 5.57 | 5.08 | 5.22 |
| 0.25 | |
| 0.0005 | 5.60 | 6.05 | 6.22 | 5.96 | 0.32 | |
| 1 | 6.05 | 6.41 | 5.82 | 6.09 | 0.30 | |
| without BN | 0 | 0.83 | 0.83 | 0.74 |
| 0.05 |
| 0.005 | 0.83 | 0.82 | 0.88 | 0.84 | 0.03 | |
| 0.0005 | 0.86 | 0.92 | 0.82 | 0.87 | 0.05 | |
| 1 | 0.72 | 0.85 | 0.87 | 0.81 | 0.08 | |
| with BN | 0 | 0.73 | 0.67 | 0.79 | 0.73 | 0.06 |
| 0.005 | 0.72 | 0.73 | 0.70 | 0.72 | 0.02 | |
| 0.0005 | 0.75 | 0.77 | 0.72 | 0.75 | 0.03 | |
| 1 | 0.67 | 0.68 | 0.73 |
| 0.03 | |
The performance (percentage error) of stochastic classifier with supervised noisy data (noise std = 0.1, # noise realisation = 3) based on for the encoder with and without batch normalization as a function of Lagrangian multiplier and the number of labelled examples.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0 | 25.75 | 26.61 | 26.59 | 26.32 | 0.49 |
| 0.005 | 23.34 | 21.38 | 24.37 | 23.03 | 1.52 | |
| 0.0005 | 19.92 | 15.83 | 16.03 |
| 2.31 | |
| 1 | 22.51 | 20.48 | 21.28 | 21.42 | 1.02 | |
| with BN | 0 | 30.26 | 31.24 | 29.3 | 30.27 | 0.97 |
| 0.005 | 21.17 | 24.41 | 24.75 | 23.44 | 1.98 | |
| 0.0005 | 22.97 | 26.38 | 24.44 | 24.60 | 1.71 | |
| 1 | 26.62 | 30.43 | 28.44 | 28.50 | 1.91 | |
| without BN | 0 | 7.68 | 7.30 | 7.23 | 7.4 | 0.24 |
| 0.005 | 5.59 | 5.16 | 5.80 | 5.52 | 0.33 | |
| 0.0005 | 5.59 | 6 | 5.84 | 5.81 | 0.21 | |
| 1 | 6.66 | 6.8 | 7.62 | 7.03 | 0.52 | |
| with BN | 0 | 6.97 | 7.06 | 7.66 | 7.23 | 0.38 |
| 0.005 | 4.42 | 4.54 | 4.08 |
| 0.24 | |
| 0.0005 | 5.28 | 5.56 | 5.14 | 5.33 | 0.21 | |
| 1 | 5.77 | 5.88 | 5.72 | 5.79 | 0.08 | |
| without BN | 0 | 0.8 | 0.91 | 0.87 | 0.86 | 0.06 |
| 0.005 | 0.77 | 0.82 | 0.88 | 0.82 | 0.06 | |
| 0.0005 | 0.86 | 0.81 | 0.87 | 0.85 | 0.03 | |
| 1 | 0.93 | 0.85 | 0.92 | 0.90 | 0.04 | |
| with BN | 0 | 0.65 | 0.67 | 0.71 | 0.68 | 0.03 |
| 0.005 | 0.69 | 0.77 | 0.68 | 0.71 | 0.05 | |
| 0.0005 | 0.78 | 0.71 | 0.74 | 0.74 | 0.04 | |
| 1 | 0.71 | 0.64 | 0.62 |
| 0.05 | |
The network parameters of supervised classifier trained on and . The encoder is trained with and without batch normalization (BN) after Conv2D layers. is trained in the adversarial way.
| Encoder | |
|---|---|
| Size | Layer |
| 28 × 28 × 1 | Input |
| 14 × 14 × 32 | Conv2D, LeakyReLU |
| 7 × 7 × 64 | Conv2D, LeakyReLU |
| 4 × 4 × 128 | Conv2D, LeakyReLU |
| 2048 | Flatten |
| 1024 | FC |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 10 | FC, Softmax |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 500 | FC, ReLU |
| 1 | FC, Sigmoid |
The performance (percentage error) of deterministic classifier based on for the encoder with and without batch normalization as a function of Lagrangian multiplier.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0 | 26.79 | 27.26 | 27.39 | 27.15 | 0.32 |
| 0.005 | 28.05 | 25.95 | 30.72 | 28.24 | 2.39 | |
| 0.0005 | 26.67 | 27.69 | 28.46 |
| 0.89 | |
| 1 | 33.42 | 33.05 | 34.81 | 33.76 | 0.92 | |
| with BN | 0 | 30.37 | 29.32 | 29.82 | 29.83 | 0.52 |
| 0.005 | 28.02 | 31.49 | 30.80 |
| 1.84 | |
| 0.0005 | 34.54 | 31.92 | 29.82 | 31.09 | 2.36 | |
| 1 | 34.43 | 44.35 | 44.25 | 41.01 | 5.70 | |
| without BN | 0 | 7.16 | 8.12 | 7.55 | 7.61 | 0.48 |
| 0.005 | 7.02 | 6.34 | 6.59 | 6.65 | 0.34 | |
| 0.0005 | 6.73 | 6.34 | 6.82 |
| 0.26 | |
| 1 | 9.49 | 9.93 | 10.56 | 9.99 | 0.54 | |
| with BN | 0 | 7.39 | 7.83 | 7.92 |
| 0.28 |
| 0.005 | 7.94 | 7.15 | 8.53 | 7.88 | 0.69 | |
| 0.0005 | 8.00 | 9.62 | 9.51 | 9.05 | 0.91 | |
| 1 | 15.79 | 14.88 | 13.71 | 14.79 | 1.04 | |
| without BN | 0 | 0.76 | 0.70 | 0.81 |
| 0.06 |
| 0.005 | 1.07 | 1.03 | 1.13 | 1.08 | 0.05 | |
| 0.0005 | 0.84 | 0.78 | 0.89 | 0.84 | 0.06 | |
| 1 | 4.78 | 7.24 | 4.71 | 5.58 | 1.44 | |
| with BN | 0 | 0.68 | 0.68 | 0.69 |
| 0.01 |
| 0.005 | 0.90 | 0.81 | 1.12 | 0.94 | 0.16 | |
| 0.0005 | 0.87 | 0.80 | 0.89 | 0.85 | 0.05 | |
| 1 | 2.37 | 3.61 | 4.35 | 3.44 | 1.00 | |
The performance (percentage error) of stochastic classifier with supervised noisy data (noise std = 0.1, # noise realisation = 3) based on for the encoder with and without batch normalization as a function of Lagrangian multiplier.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0.005 | 28.13 | 25.16 | 29.9 |
| 2.40 |
| 0.0005 | 28.05 | 30.03 | 28.11 | 28.73 | 1.13 | |
| 1 | 32.33 | 34.09 | 33.73 | 33.38 | 0.93 | |
| with BN | 0.005 | 32.25 | 33.47 | 26.01 | 30.58 | 4.00 |
| 0.0005 | 33.37 | 36.15 | 35.65 | 35.06 | 1.48 | |
| 1 | 33.37 | 42.37 | 32.46 | 36.07 | 5.48 | |
| without BN | 0.005 | 7.37 | 7.17 | 6.65 | 7.06 | 0.37 |
| 0.0005 | 7.48 | 6.68 | 6.67 |
| 0.46 | |
| 1 | 9.48 | 9.94 | 11.61 | 10.34 | 1.12 | |
| with BN | 0.005 | 7.82 | 7.97 | 7.81 | 7.87 | 0.09 |
| 0.0005 | 9.5 | 8.68 | 9.37 | 9.18 | 0.44 | |
| 1 | 12.99 | 10.52 | 9.98 | 11.16 | 1.60 | |
| without BN | 0.005 | 1.19 | 1.09 | 1.06 | 1.11 | 0.07 |
| 0.0005 | 0.79 | 0.88 | 0.82 | 0.83 | 0.05 | |
| 1 | 6.22 | 4.81 | 5 | 5.34 | 0.77 | |
| with BN | 0.005 | 0.94 | 1.07 | 1.04 | 1.02 | 0.07 |
| 0.0005 | 0.78 | 0.81 | 0.78 |
| 0.02 | |
| 1 | 4.49 | 3.35 | 2.18 | 3.34 | 1.16 | |
The network parameters of semi-supervised classifier trained on , and . The encoder is trained with and without batch normalization (BN) after Conv2D layers. and are trained in the adversarial way.
| Encoder | |
|---|---|
| Size | Layer |
| 28 × 28 × 1 | Input |
| 14 × 14 × 32 | Conv2D, LeakyReLU |
| 7 × 7 × 64 | Conv2D, LeakyReLU |
| 4 × 4 × 128 | Conv2D, LeakyReLU |
| 2048 | Flatten |
| 1024 | FC |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 10 | FC, Softmax |
|
| |
| Size | Layer |
| 10 | Input |
| 500 | FC, ReLU |
| 500 | FC, ReLU |
| 1 | FC, Sigmoid |
|
| |
| Size | Layer |
| 1024 | Input |
| 500 | FC, ReLU |
| 500 | FC, ReLU |
| 1 | FC, Sigmoid |
The performance (percentage error) of deterministic classifier based on for the encoder with and without batch normalization.
| Encoder Model |
|
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | |||||
| without BN | 0.005 | 0.005 | 21.39 | 18.12 | 18.34 | 19.28 | 1.83 |
| 0.0005 | 0.0005 | 15.33 | 22.36 | 13.80 | 17.16 | 4.56 | |
| 0.005 | 0.0005 | 25.66 | 26.25 | 28.81 | 26.91 | 1.67 | |
| 0.0005 | 0.005 | 9.82 | 13.44 | 13.06 |
| 1.99 | |
| with BN | 0.005 | 0.005 | 23.45 | 21.19 | 28.87 | 24.50 | 3.94 |
| 0.0005 | 0.0005 | 28.57 | 19.06 | 26.37 | 24.67 | 4.98 | |
| 0.005 | 0.0005 | 26.18 | 26.18 | 25.49 | 25.95 | 0.40 | |
| 0.0005 | 0.005 | 8.96 | 13.82 | 14.76 | 12.52 | 3.11 | |
| without BN | 0.005 | 0.005 | 3.91 | 4.21 | 3.70 | 3.94 | 0.26 |
| 0.0005 | 0.0005 | 3.54 | 3.72 | 3.54 | 3.60 | 0.10 | |
| 0.005 | 0.0005 | 6.19 | 5.80 | 7.31 | 6.43 | 0.78 | |
| 0.0005 | 0.005 | 2.80 | 2.82 | 2.83 | 2.82 | 0.02 | |
| with BN | 0.005 | 0.005 | 3.30 | 2.94 | 2.93 | 3.06 | 0.21 |
| 0.0005 | 0.0005 | 2.80 | 2.53 | 2.50 | 2.61 | 0.17 | |
| 0.005 | 0.0005 | 3.51 | 3.75 | 4.12 | 3.79 | 0.31 | |
| 0.0005 | 0.005 | 2.58 | 2.27 | 2.24 |
| 0.19 | |
| without BN | 0.005 | 0.005 | 1.04 | 1.07 | 1.07 | 1.06 | 0.02 |
| 0.0005 | 0.0005 | 0.86 | 0.90 | 0.88 | 0.88 | 0.02 | |
| 0.005 | 0.0005 | 1.08 | 0.92 | 1.09 | 1.03 | 0.10 | |
| 0.0005 | 0.005 | 0.85 | 0.93 | 0.93 | 0.90 | 0.05 | |
| with BN | 0.005 | 0.005 | 1.10 | 1.01 | 0.93 | 1.01 | 0.09 |
| 0.0005 | 0.0005 | 0.84 | 0.88 | 0.83 | 0.85 | 0.03 | |
| 0.005 | 0.0005 | 1.10 | 1.12 | 0.93 | 1.05 | 0.10 | |
| 0.0005 | 0.005 | 0.76 | 0.82 | 0.79 |
| 0.03 | |
The performance (percentage error) of stochastic classifier with supervised noisy data (noise std = 0.1, # noise realisation = 3) based on for the encoder with and without batch normalization.
| Encoder Model |
|
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | |||||
| without BN | 0.005 | 0.005 | 12.4 | 18.05 | 16.73 | 15.73 | 2.96 |
| 0.0005 | 0.0005 | 15.01 | 11.16 | 14.74 | 13.64 | 2.15 | |
| 0.005 | 0.0005 | 23.31 | 26.61 | 25.41 | 25.11 | 1.67 | |
| 0.0005 | 0.005 | 9.21 | 9.02 | 10.12 |
| 0.59 | |
| with BN | 0.005 | 0.005 | 13.55 | 22.48 | 14.72 | 16.92 | 4.85 |
| 0.0005 | 0.0005 | 8.37 | 15.01 | 26.92 | 16.77 | 9.40 | |
| 0.005 | 0.0005 | 32.12 | 30.27 | 31.44 | 31.28 | 0.94 | |
| 0.0005 | 0.005 | 5.46 | 17 | 11.54 | 11.33 | 5.77 | |
| without BN | 0.005 | 0.005 | 3.9 | 4.25 | 4.02 | 4.06 | 0.18 |
| 0.0005 | 0.0005 | 3.64 | 3.82 | 4.11 | 3.86 | 0.24 | |
| 0.005 | 0.0005 | 6.68 | 5.34 | 6.36 | 6.13 | 0.70 | |
| 0.0005 | 0.005 | 3.03 | 2.88 | 2.66 | 2.86 | 0.19 | |
| with BN | 0.005 | 0.005 | 2.96 | 3.37 | 2.98 | 3.10 | 0.23 |
| 0.0005 | 0.0005 | 2.87 | 3.10 | 2.73 | 2.90 | 0.19 | |
| 0.005 | 0.0005 | 3.72 | 3.8 | 4.14 | 3.89 | 0.22 | |
| 0.0005 | 0.005 | 2.57 | 2.39 | 2.28 |
| 0.15 | |
| without BN | 0.005 | 0.005 | 1.05 | 1.09 | 1.1 | 1.08 | 0.33 |
| 0.0005 | 0.0005 | 0.94 | 0.96 | 0.9 | 0.93 | 0.03 | |
| 0.005 | 0.0005 | 1.16 | 1.14 | 1.13 | 1.14 | 0.02 | |
| 0.0005 | 0.005 | 0.88 | 0.92 | 0.91 | 0.90 | 0.02 | |
| with BN | 0.005 | 0.005 | 0.98 | 0.84 | 0.94 | 0.92 | 0.07 |
| 0.0005 | 0.0005 | 0.79 | 0.96 | 0.82 | 0.86 | 0.09 | |
| 0.005 | 0.0005 | 1.04 | 1.05 | 1.03 | 1.04 | 0.01 | |
| 0.0005 | 0.005 | 0.74 | 0.78 | 0.84 |
| 0.05 | |
The encoder and decoder of semi-supervised classifier trained based on , and . The encoder is trained with and without batch normalization (BN) after Conv2D layers. and are trained in the adversarial way.
| Encoder | |||
|---|---|---|---|
| Size | Layer | ||
| 28 × 28 × 1 * | Input | ||
| 14 × 14 × 32 | Conv2D, LeakyReLU | ||
| 7 × 7 × 64 | Conv2D, LeakyReLU | ||
| 4 × 4 × 128 | Conv2D, LeakyReLU | ||
| 2048 | Flatten | ||
| 1024 | FC, ReLU | ||
| 10 | 10 | FC, Softmax | FC |
|
| |||
| Size | Layer | ||
| 10 + 10 | Input | ||
| 7 × 7 × 128 | FC, Reshape, BN, ReLU | ||
| 14 × 14 × 128 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 128 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 64 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 1 | Conv2DTrans, Sigmoid | ||
|
| |||
| Size | Layer | ||
| 10 | Input | ||
| 500 | FC, ReLU | ||
| 500 | FC, ReLU | ||
| 1 | FC, Sigmoid | ||
|
| |||
| Size | Layer | ||
| 10 | Input | ||
| 500 | FC, ReLU | ||
| 500 | FC, ReLU | ||
| 1 | FC, Sigmoid | ||
The performance (percentage error) of deterministic classifier based on for the encoder with and without batch normalization.
| Encoder Model | Runs | Mean | std | ||
|---|---|---|---|---|---|
| 1 | 2 | 3 | |||
| without BN | 2.15 | 2.05 | 1.78 | 1.99 | 0.19 |
| with BN | 1.57 | 1.56 | 1.92 |
| 0.21 |
| without BN | 1.55 | 1.47 | 1.53 | 1.52 | 0.04 |
| with BN | 1.37 | 1.34 | 1.73 |
| 0.22 |
| without BN | 0.78 | 0.7 | 0.82 | 0.77 | 0.06 |
| with BN | 0.79 | 0.77 | 0.76 |
| 0.02 |
The performance (percentage error) of stochastic classifier with supervised noisy data (noise std = 0.1, # noise realisation = 3) based on for the encoder with and without batch normalization.
| Encoder Model | Runs | Mean | std | ||
|---|---|---|---|---|---|
| 1 | 2 | 3 | |||
| without BN | 1.55 | 3.19 | 2.11 | 2.28 | 0.83 |
| with BN | 1.4 | 1.33 | 1.72 |
| 0.21 |
| without BN | 1.73 | 1.53 | 1.6 | 1.62 | 0.10 |
| with BN | 1.28 | 1.43 | 1.2 |
| 0.12 |
| without BN | 0.94 | 0.86 | 0.86 | 0.89 | 0.05 |
| with BN | 0.77 | 0.65 | 0.84 |
| 0.10 |
The network parameters of semi-supervised classifier trained based on , and . The encoder is trained with and without batch normalization (BN) after Conv2D layers. and are trained in the adversarial way.
| Encoder | |||
|---|---|---|---|
| Size | Layer | ||
| 28 × 28 × 1 | Input | ||
| 14 × 14 × 32 | Conv2D, LeakyReLU | ||
| 7 × 7 × 64 | Conv2D, LeakyReLU | ||
| 4 × 4 × 128 | Conv2D, LeakyReLU | ||
| 2048 | Flatten | ||
| 1024 | FC, ReLU | ||
| 10 | 10 | FC, Softmax | FC |
|
| |||
| Size | Layer | ||
| 10 | Input | ||
| 500 | FC, ReLU | ||
| 500 | FC, ReLU | ||
| 1 | FC, Sigmoid | ||
|
| |||
| Size | Layer | ||
| 10 | Input | ||
| 500 | FC, ReLU | ||
| 500 | FC, ReLU | ||
| 1 | FC, Sigmoid | ||
|
| |||
| Size | Layer | ||
| 10 + 10 | Input | ||
| 7 × 7 × 128 | FC, Reshape, BN, ReLU | ||
| 14 × 14 × 128 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 128 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 64 | Conv2DTrans, BN, ReLU | ||
| 28 × 28 × 1 | Conv2DTrans, Sigmoid | ||
|
| |||
| Size | Layer | ||
| 28 × 28 × 1 | Input | ||
| 14 × 14 × 64 | Conv2D, LeakyReLU | ||
| 7 × 7 × 64 | Conv2D, LeakyReLU | ||
| 4 × 4 × 128 | Conv2D, LeakyReLU | ||
| 4 × 4 × 256 | Conv2D, LeakyReLU | ||
| 4096 | Flatten | ||
| 1 | FC, Sigmoid | ||
The performance (percentage error) of deterministic classifier based on for the encoder with and without batch normalization.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0.005 | 2.85 | 3.36 | 2.77 | 2.99 | 0.32 |
| 0.0005 | 2.58 | 2.49 | 3.08 | 2.72 | 0.32 | |
| 1 | 19.62 | 19.96 | 15.97 | 18.52 | 2.21 | |
| with BN | 0.005 | 1.56 | 1.33 | 1.35 |
| 0.13 |
| 0.0005 | 1.68 | 1.66 | 2.02 | 1.79 | 0.20 | |
| 1 | 20.85 | 13.6 | 21.67 | 18.71 | 4.44 | |
| without BN | 0.005 | 2.29 | 2.35 | 2.11 | 2.25 | 0.12 |
| 0.0005 | 1.69 | 1.88 | 2.24 | 1.94 | 0.28 | |
| 1 | 3.47 | 3.30 | 4.12 | 3.63 | 0.43 | |
| with BN | 0.005 | 1.18 | 1.21 | 1.09 |
| 0.06 |
| 0.0005 | 1.44 | 1.28 | 1.29 | 1.34 | 0.09 | |
| 1 | 4.14 | 2.94 | 2.48 | 3.19 | 0.86 | |
| without BN | 0.005 | 0.97 | 1.01 | 1.04 | 1.01 | 0.04 |
| 0.0005 | 0.88 | 0.85 | 0.93 | 0.89 | 0.04 | |
| 1 | 1.31 | 1.28 | 1.47 | 1.35 | 0.10 | |
| with BN | 0.005 | 0.81 | 0.83 | 0.75 | 0.80 | 0.04 |
| 0.0005 | 0.73 | 0.78 | 0.75 |
| 0.03 | |
| 1 | 0.88 | 0.86 | 1.27 | 1.00 | 0.23 | |
The performance (percentage error) of stochastic classifier with supervised noisy data (noise std = 0.1, # noise realisation = 3) based on for the encoder with and without batch normalization.
| Encoder Model |
| Runs | Mean | std | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| without BN | 0.005 | 2.45 | 3.04 | 2.67 | 2.72 | 0.30 |
| 0.0005 | 2.63 | 2.3 | 2.45 | 2.46 | 0.17 | |
| with BN | 0.005 | 1.34 | 1.21 | 6.4 | 2.98 | 2.96 |
| 0.0005 | 1.35 | 1.51 | 1.93 |
| 0.30 | |
| without BN | 0.005 | 2.31 | 2.26 | 2.2 | 2.26 | 0.06 |
| 0.0005 | 1.71 | 2.16 | 1.86 | 1.91 | 0.23 | |
| with BN | 0.005 | 1.23 | 1.31 | 1.10 |
| 0.11 |
| 0.0005 | 1.42 | 1.62 | 1.37 | 1.47 | 0.13 | |
| without BN | 0.005 | 0.93 | 1.01 | 1.05 | 1.00 | 0.06 |
| 0.0005 | 0.92 | 0.83 | 0.88 | 0.88 | 0.05 | |
| with BN | 0.005 | 0.88 | 0.86 | 0.91 | 0.88 | 0.03 |
| 0.0005 | 0.77 | 0.80 | 0.80 |
| 0.02 | |