| Literature DB >> 34067100 |
Jingyi Liu1, Caijuan Shi1, Dongjing Tu1, Ze Shi1, Yazhi Liu1.
Abstract
The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.Entities:
Keywords: common space embedding; deep learning; deep metric; image classification; relation network; zero-shot learning
Year: 2021 PMID: 34067100 PMCID: PMC8124744 DOI: 10.3390/s21093241
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The framework of ZIC-LDM.
Accuracy of models for zero-shot learning (%).
| Model | AwA1 | AwA2 | CUB | SUN |
|---|---|---|---|---|
| DAP [ | 44.1 | 46.1 | 40.0 | 39.9 |
| ConSE [ | 45.6 | 44.5 | 34.3 | 38.8 |
| ESZSL [ | 58.2 | 58.6 | 53.9 | 54.5 |
| ALE [ | 59.9 | 62.5 | 54.9 | 58.1 |
| SynC [ | 54.0 | 46.6 | 55.6 | 56.3 |
| SAE [ | 53.0 | 54.1 | 33.3 | 40.3 |
| CCSS [ | 56.3 | 63.7 | 44.1 | 56.8 |
| Gaussian [ | 60.5 | 61.2 | 52.1 | 58.7 |
| SELAR [ | - | 66.7 | 56.4 | 57.8 |
| RN [ | 68.2 | 64.2 | 55.6 | - |
| SJE [ | 65.6 | 61.9 | 53.9 | 53.7 |
| ZIC-LDM |
|
|
|
|
Figure 2Confusion matrices of ZIC-LDM on AwA1 and AwA2 datasets respectively. (a) is confusion matrix for AwA1 and (b) is confusion matrix for AwA2.
Figure 3Visualization of the distribution of the 10 unseen class images in the common embedding space on AwA1 and AwA2 using t-SNE. (a) is t-SNE result on AwA1 and (b) is on AwA2.
Accuracy of models for generalized zero-shot learning (%).
| Model | AwA1 | AwA2 | CUB | SUN | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| DAP [ | 0.0 | 88.7 | 0.0 | 0.0 | 84.7 | 0.0 | 1.7 | 67.9 | 3.3 | 4.2 | 25.1 | 7.2 |
| SynC [ | 8.9 | 87.3 | 16.2 | 10.0 | 90.5 | 18.0 | 11.5 |
| 19.8 | 7.9 |
| 13.4 |
| ESZSL [ | 6.6 | 75.6 | 12.1 | 5.9 | 77.8 | 11.0 | 12.6 | 63.8 | 21.0 | 11.0 | 27.9 | 15.8 |
| ALE [ | 16.8 | 76.1 | 27.5 | 14.0 | 81.8 | 23.9 | 23.7 | 62.8 | 34.4 | 21.8 | 33.1 | 26.3 |
| SAE [ | 1.8 | 77.1 | 3.5 | 1.1 | 82.2 | 2.2 | 7.8 | 54.0 | 13.6 | 8.8 | 18.0 | 11.8 |
| ConSE [ | 0.4 | 88.6 | 0.8 | 0.5 | 90.6 | 1.0 | 1.6 | 72.2 | 3.1 | 6.8 | 39.9 | 11.6 |
| Gaussian [ | 6.1 | 81.3 | 11.4 | 7.3 | 79.1 | 13.3 | 17.5 | 59.9 | 27.1 | 18.2 | 33.2 | 23.5 |
| MLSE [ | - | - | - | 23.8 | 83.2 | 37.0 | 22.3 | 71.6 | 34.0 | 20.7 | 36.4 | 26.4 |
| MIIR [ | - | - | - | 17.6 | 87.0 | 28.9 | 30.4 | 65.8 | 41.2 | 22.0 | 34.1 | 26.7 |
| SELAR [ | - | - | - | 31.6 | 80.3 | 45.3 | 32.1 | 63.0 | 42.5 | 22.8 | 31.6 | 26.5 |
| RN [ | 31.4 |
| 46.7 | 30.0 |
| 45.3 | 38.1 | 61.1 | 47.0 | - | - | - |
| SJE [ | 11.3 | 74.6 | 19.6 | 8.0 | 73.9 | 14.4 | 23.5 | 59.2 | 33.6 | 14.7 | 30.5 | 19.8 |
| ZIC-LDM |
| 90.5 |
|
| 92.5 |
|
| 62.9 |
|
| 33.9 |
|
Figure 4Loss convergence curves on AwA1 and AwA2 datasets. (a) is the loss convergence curves on AwA1 dataset and (b) is the loss convergence curves on AwA2.
Distance Metric Study (%).
| Model | AwA1 | AwA2 | CUB | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| ED | 55.2 | 5.4 | 68.3 | 10.0 | 55.8 | 5.7 | 69.5 | 10.5 | 42.7 | 8.2 | 53.1 | 14.2 |
| CS | 55.4 | 5.9 | 68.6 | 10.9 | 55.7 | 5.1 | 70.2 | 9.5 | 42.9 | 8.5 | 53.5 | 14.7 |
| MML | 56.7 | 6.3 | 70.4 | 11.6 | 56.7 | 6.1 | 73.7 | 11.3 | 16.8 | 10.5 | 54.1 | 17.6 |
| ZIC-LDM |
|
|
|
|
|
|
|
|
|
|
|
|