| Literature DB >> 33286397 |
Fang Dong1, Li Liu1, Fanzhang Li1.
Abstract
Deep learning has achieved many successes in different fields but can sometimes encounter an overfitting problem when there are insufficient amounts of labeled samples. In solving the problem of learning with limited training data, meta-learning is proposed to remember some common knowledge by leveraging a large number of similar few-shot tasks and learning how to adapt a base-learner to a new task for which only a few labeled samples are available. Current meta-learning approaches typically uses Shallow Neural Networks (SNNs) to avoid overfitting, thus wasting much information in adapting to a new task. Moreover, the Euclidean space-based gradient descent in existing meta-learning approaches always lead to an inaccurate update of meta-learners, which poses a challenge to meta-learning models in extracting features from samples and updating network parameters. In this paper, we propose a novel meta-learning model called Multi-Stage Meta-Learning (MSML) to post the bottleneck during the adapting process. The proposed method constrains a network to Stiefel manifold so that a meta-learner could perform a more stable gradient descent in limited steps so that the adapting process can be accelerated. An experiment on the mini-ImageNet demonstrates that the proposed method reached a better accuracy under 5-way 1-shot and 5-way 5-shot conditions.Entities:
Keywords: convolutional neural network; deep learning; lie group; machine learning; meta-learning
Year: 2020 PMID: 33286397 PMCID: PMC7517158 DOI: 10.3390/e22060625
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Overview of a task used in meta-train phase includes and . As described above, the number of is and the number of is .
Figure 2Diagram of our method, a full inner loop process to optimize network in meta-learner. Meta-learner make a prediction by , then update parameters of the network by compare the prediction with grand truth labels, and then generate outputs on as the level of model’s learning performance.
Figure 3(a) Testing accuracy on mini-ImageNet dataset during pretrain phases for 100 epochs. (b) The accuracy that transfer feature extractor to classify test subset, result shows that the model could reach a high accuracy with only trains the classify layer.
Figure 4(a,b) Represent the accuracy results of 5-way 1-shot learning on mini-ImageNet dataset with different settings, no sti in figures means gradient descent is Euclidean gradient, no stage is the setting that disable the multi-stage optimizing method. (c,d) show results under the setting 5-way 5-shot.
Image classification accuracy on mini-ImageNet with different methods.
| Methods | 1-Shot | 5-Shot | Memory Cost (MB) | |
|---|---|---|---|---|
| 1-Shot | 5-Shot | |||
|
| 4062 | 6443 | ||
|
| 8655 | 9984 | ||
|
| 4165 | 6521 | ||
|
| 8767 | 10,157 | ||
Image classification results on mini-ImageNet under the setting of 5-way 1-shot or 5-way 5-shot. Results in this table reported from their papers.
| Few-Shot Learning Method | 1-Shot | 5-Shot |
|---|---|---|
|
| ||
| Matching Nets [ |
|
|
| Relation Nets [ |
|
|
| Prototypical Nets [ |
|
|
|
| ||
| MAML (4 Conv) [ |
|
|
| Reptile (4 Conv) [ |
|
|
| meta-SGD (4 Conv) [ |
|
|
| LEO (WRN-28-10) [ |
|
|
| MTL (ResNet-12) [ |
|
|
| TAML (4 Conv) [ |
|
|
|
| ||
|
| ||
| TADAM [ |
|
|
| SNAIL [ |
|
|