| Literature DB >> 34108871 |
Zhikui Chen1,2, Xu Zhang1, Wei Huang3, Jing Gao1,2, Suhua Zhang1.
Abstract
Deep transfer learning aims at dealing with challenges in new tasks with insufficient samples. However, when it comes to few-shot learning scenarios, due to the low diversity of several known training samples, they are prone to be dominated by specificity, thus leading to one-sidedness local features instead of the reliable global feature of the actual categories they belong to. To alleviate the difficulty, we propose a cross-modal few-shot contextual transfer method that leverages the contextual information as a supplement and learns context awareness transfer in few-shot image classification scenes, which fully utilizes the information in heterogeneous data. The similarity measure in the image classification task is reformulated via fusing textual semantic modal information and visual semantic modal information extracted from images. This performs as a supplement and helps to inhibit the sample specificity. Besides, to better extract local visual features and reorganize the recognition pattern, the deep transfer scheme is also used for reusing a powerful extractor from the pre-trained model. Simulation experiments show that the introduction of cross-modal and intra-modal contextual information can effectively suppress the deviation of defining category features with few samples and improve the accuracy of few-shot image classification tasks.Entities:
Keywords: context awareness; cross modal information; deep transfer learning; few-shot learning; image classification
Year: 2021 PMID: 34108871 PMCID: PMC8180855 DOI: 10.3389/fnbot.2021.654519
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1Image samples of class “butterfly” in Caltech-101. (A) Images with contextual information in the backgrounds. (B) Images with blank backgrounds. (C) Images with negative-feedback information in the backgrounds.
Figure 2Architecture of proposed model.
Figure 7Performance comparisons with different similar metrics.
Figure 3Architecture for jointly using contextual information.
Performance comparisons of few-shot image classification on miniImageNet.
| Matching network (Vinyals et al., | NIPS (2017) | 43.56 ± 0.84 | 55.31 ± 0.73 |
| Meta-Learning | ICLR (2017) | 43.44 ± 0.77 | 60.60 ± 0.71 |
| Model-agnostic | ICML (2017) | 48.7 ± 1.84 | 63.11 ± 0.92 |
| Delta-encoder (Schwartz et al., | NIPS (2018) | 59.9 ± 0 | 69.7 ± 0 |
| Rapid adaptation | ICML (2018) | 56.88 ± 0.62 | 71.94 ± 0.57 |
| DTN (Chen et al., | AAAI (2020) | 57.89 ± 0.84 | 73.28 ± 0.65 |
| STA Net (Yan et al., | AAAI (2019) | 58.35 ± 0.57 | 71.07 ± 0.39 |
| TPN (Liu et al., | ICLR (2019) | 59.46 ± 0 | 75.65 ± 0 |
| LEO (Rusu et al., | ICLR (2019) | 61.76 ± 0.08 | 77.59 ± 0.12 |
| Contextual transfer | / | 62.27 ± 0.76 | 77.81 ± 0.98 |
Performance comparisons of few-shot image classification on CUB, Caltech-101, and textual modal added.
| Matching network (Vinyals et al., | 49.3 | 59.3 | 37.6 | 51.3 |
| Meta-learning LSTM (Ravi and Larochelle, | 40.4 | 49.7 | 43.2 | 57.2 |
| Model-agnostic meta learning (Finn et al., | 38.4 | 59.1 | 35.6 | 52.3 |
| Delta-encoder (Schwartz et al., | 69.8 | 82.6 | 66.0 | 80.7 |
| Deep DTN (Chen et al., | 72.0 | 85.1 | 69.6 | 83.3 |
| Contextual transfer (Proposed method) | 72.2 | 85.7 | 70.1 | 84.2 |
| Contextual transfer (Textual added) | 75.1 | 87.8 | 76.1 | 86.3 |
Figure 4Sample similarity in few-shot scenarios.
Figure 5Performance comparisons with accidental similarity added.
Figure 6Performance comparisons using/without DTL.