| Literature DB >> 35735970 |
Fadi Al Machot1, Mohib Ullah2, Habib Ullah1.
Abstract
Zero-Shot Learning (ZSL) is related to training machine learning models capable of classifying or predicting classes (labels) that are not involved in the training set (unseen classes). A well-known problem in Deep Learning (DL) is the requirement for large amount of training data. Zero-Shot learning is a straightforward approach that can be applied to overcome this problem. We propose a Hybrid Feature Model (HFM) based on conditional autoencoders for training a classical machine learning model on pseudo training data generated by two conditional autoencoders (given the semantic space as a condition): (a) the first autoencoder is trained with the visual space concatenated with the semantic space and (b) the second autoencoder is trained with the visual space as an input. Then, the decoders of both autoencoders are fed by the test data of the unseen classes to generate pseudo training data. To classify the unseen classes, the pseudo training data are combined to train a support vector machine. Tests on four different benchmark datasets show that the proposed method shows promising results compared to the current state-of-the-art when it comes to settings for both standard Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL).Entities:
Keywords: Zero-Shot Learning (ZSL); computer vision; conditional autoencoders; generative models; semantic space
Year: 2022 PMID: 35735970 PMCID: PMC9225515 DOI: 10.3390/jimaging8060171
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1The proposed approach consists of two autoencoders. The first autoencoder is provided by the concatenated vectors of the visual and semantic spaces. The second autoencoder is provided by the visual features vectors only. Both autoencoders have a dense layer, followed by a dropout and a second dense layer. This is followed by another layer, which generates the values z. Activation functions are ReLU, and the activation functions for the last layer for both the encoder and the decoder are linear.
Figure 2Examples from Animals with Attributes (AWA) dataset.
Figure 3Examples from Caltech-UCSD-Birds (CUB) dataset.
Figure 4Examples from SUN Attribute (SUN) dataset.
State-of-the-art comparison on four datasets using the per-class average under the ZSL setting.
| Model | CUB | AwA1 | AwA2 | SUN |
|---|---|---|---|---|
| DAP [ | 40.0 | 44.1 | 46.1 | 39.9 |
| IAP [ | 24.0 | 35.9 | 35.9 | 19.4 |
| ConSE [ | 34.3 | 45.6 | 44.5 | 38.8 |
| CMT [ | 34.6 | 39.5 | 37.9 | 39.9 |
| SSE [ | 43.9 | 60.1 | 61.0 | 51.5 |
| DeViSE [ | 52.0 | 54.2 | 59.7 | 56.5 |
| SJE [ | 53.9 | 65.6 | 61.9 | 53.7 |
| LATEM [ | 49.3 | 55.1 | 55.8 | 55.3 |
| ESZSL [ | 53.9 | 58.2 | 58.6 | 54.5 |
| ALE [ | 54.9 | 59.9 | 62.5 | 58.1 |
| SYNC [ | 55.6 | 54.0 | 46.6 | 56.3 |
| SAE [ | 33.3 | 53.0 | 54.1 | 40.3 |
| Relation Net [ | 55.6 | 68.2 | 64.2 | - |
| DEM [ | 51.7 | 68.4 | 67.1 | 61.9 |
| f-VAEGAN-D2 [ | 61.0 | — | 71.1 | 64.7 |
| TF-VAEGAN [ | 64.9 | — | 72.2 | 66.0 |
| CVAE [ | 52.1 | 71.4 | 65.8 | 61.7 |
| HFM (Ours) | 69.5 | 65.0 | 65.5 | 53.8 |
Figure 5This figure visualizes the image feature space of the AwA-1 dataset (each color denotes a label), (a) shows t-SNE real test data visualization and (b) shows the test data generated from the proposed approach.
Results of Generalized Zero-Shot Learning (GZSL) settings.We used the harmonic mean of accuracy on both seen and unseen classes as a measure.
| Model | CUB | AwA1 | AwA2 | SUN |
|---|---|---|---|---|
| DAP [ | 3.3 | 0.0 | 0.0 | 7.2 |
| IAP [ | 0.4 | 4.1 | 1.8 | 1.8 |
| ConSE [ | 3.1 | 0.8 | 1.0 | 11.6 |
| CMT [ | 8.7 | 15.3 | 15.9 | 13.3 |
| SSE [ | 14.4 | 12.9 | 14.8 | 4.0 |
| DeViSE [ | 32.8 | 22.4 | 27.8 | 20.9 |
| SJE [ | 33.6 | 19.6 | 14.4 | 19.8 |
| LATEM [ | 24.0 | 13.3 | 20.0 | 19.5 |
| ESZSL [ | 21.0 | 12.1 | 11.0 | 15.8 |
| ALE [ | 34.4 | 27.5 | 23.9 | 26.3 |
| SYNC [ | 19.8 | 16.2 | 18.0 | 13.4 |
| SAE [ | 13.6 | 3.5 | 2.2 | 11.8 |
| Relation Net [ | 47.0 | 46.7 | 45.3 | — |
| DEM [ | 29.2 | 47.3 | 45.1 | 25.6 |
| f-VAEGAN-D2 [ | 53.6 | — | 63.5 | 41.3 |
| TF-VAEGAN [ | 58.1 | — | 66.6 | 43.0 |
| CVAE [ | 34.5 | 47.2 | 51.2 | 26.7 |
| HFM (Ours) | 43.4 | 61.6 | 63.4 | 29.7 |
Results for each autoencoder on four datasets under the ZSL setting. The performance is evaluated using the per-class average.
| Dataset | Autoencoder1 | Autoencoder2 | Both |
|---|---|---|---|
| AWA1 | 63.6 | 60.0 | 65.0 |
| AWA2 | 58.6 | 58.4 | 65.5 |
| CUB | 68.5 | 58.9 | 69.5 |
| SUN | 50.6 | 51.4 | 53.8 |
Results of Generalized Zero-Shot setting (GZSL) that are calculated based on per-class average using seen classes, unseen classes, and harmonic mean.
| Dataset | Seen | Unseen | Harmonic Mean |
|---|---|---|---|
| AWA1 | 75.7 | 52.0 | 61.6 |
| AWA2 | 80.9 | 49.7 | 63.4 |
| CUB | 57.9 | 34.7 | 43.4 |
| SUN | 75.3 | 18.5 | 29.7 |