| Literature DB >> 35877623 |
Yassir Bendou1, Yuqing Hu1,2, Raphael Lafargue1, Giulia Lioi1, Bastien Pasdeloup1, Stéphane Pateux2, Vincent Gripon1.
Abstract
Few-shot classification aims at leveraging knowledge learned in a deep learning model, in order to obtain good classification performance on new problems, where only a few labeled samples per class are available. Recent years have seen a fair number of works in the field, each one introducing their own methodology. A frequent problem, though, is the use of suboptimally trained models as a first building block, leading to doubts about whether proposed approaches bring gains if applied to more sophisticated pretrained models. In this work, we propose a simple way to train such models, with the aim of reaching top performance on multiple standardized benchmarks in the field. This methodology offers a new baseline on which to propose (and fairly compare) new techniques or adapt existing ones.Entities:
Keywords: ambiguity; augmentations; backbones; classification; cropping; deep learning; ensembling; few-shot learning; self-supervision
Year: 2022 PMID: 35877623 PMCID: PMC9324255 DOI: 10.3390/jimaging8070179
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Illustration of our proposed method. Y: We first train multiple backbones using the base and validation datasets. We use two cross-entropy losses in parallel: one for the classification of base classes and the other for the self-supervised targets (rotations). We also use manifold mixup [18]. All the backbones are trained using the exact same routine, except that their initialization is different (random) and the order in which data batches are presented is also potentially different. AS: Then, for each image in the novel dataset and each backbone, we generate multiple crops, then compute their feature vectors, which we average. E: Each image becomes represented as the concatenation of the outputs of AS for each of the trained backbones. Preprocessing: We add a few classical preprocessing steps, including centering by removing the mean of the feature vectors of the base dataset in the inductive case, or the few-shot run feature vectors for the transductive case, and projecting on the hypersphere. Finally, we use a simple nearest class mean classifier (NCM) if in the inductive setting or a soft K-means algorithm in the transductive setting.
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on MiniImageNet in the inductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| SimpleShot [ |
|
|
| Baseline++ [ |
|
| |
| TADAM [ |
|
| |
| ProtoNet [ |
|
| |
| R2-D2 (+ens) [ |
|
| |
| FEAT [ |
|
| |
| CNL [ |
|
| |
| MELR [ |
|
| |
| Deep EMD v2 [ |
|
| |
| PAL [ |
|
| |
| invariance-equivariance [ |
|
| |
| CSEI [ |
|
| |
| COSOC [ |
|
| |
| EASY 2×ResNet12 |
|
| |
| 36 M
| S2M2R [ |
|
|
| LR + DC [ |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on TieredImageNet in the inductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| SimpleShot [ |
|
|
| ProtoNet [ |
|
| |
| FEAT [ |
|
| |
| PAL [ |
|
| |
| DeepEMD v2 [ |
|
| |
| MELR [ |
|
| |
| COSOC [ |
|
| |
| CNL [ |
|
| |
| invariance-equivariance [ |
|
| |
| CSEI [ |
|
| |
| ASY ResNet12 (ours) |
|
| |
| 36 M
| S2M2R [ |
|
|
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on CIFAR-FS in the inductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| S2M2R [ |
|
|
| R2-D2 (+ens) [ |
|
| |
| invariance-equivariance [ |
|
| |
| EASY 2×ResNet12 |
|
| |
| 36 M
| S2M2R [ |
|
|
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on CUB-FS in the inductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| FEAT [ |
|
|
| ProtoNet [ |
|
| |
| DeepEMD v2 [ |
|
| |
| EASY 4×ResNet12 |
|
| |
| 36 M
| S2M2R [ |
|
|
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on FC-100 in the inductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| DeepEMD v2 [ |
|
|
| TADAM [ |
|
| |
| ProtoNet [ |
|
| |
| invariance-equivariance [ |
|
| |
| R2-D2 (+ens) [ |
|
| |
| EASY 2×ResNet12 |
|
| |
| 36 M | EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on MiniImageNet in the transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| TIM-GD [ |
|
|
| ODC [ |
|
| |
| PEM |
|
| |
| SSR [ |
|
| |
| iLPC [ |
|
| |
| EPNet [ |
|
| |
| DPGN [ |
|
| |
| ECKPN [ |
|
| |
| Rot + KD + POODLE [ |
|
| |
| EASY 2×ResNet12 |
|
| |
| 36 M
| SSR [ |
|
|
| fine-tuning(train+val) [ |
|
| |
| SIB + E |
|
| |
| LR + DC [ |
|
| |
| EPNet [ |
|
| |
| TIM-GD [ |
|
| |
| PT+MAP [ |
|
| |
| iLPC [ |
|
| |
| ODC [ |
|
| |
| PEM |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on CUB-FS in the transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| TIM-GD [ |
|
|
| ODC [ |
|
| |
| DPGN [ |
|
| |
| ECKPN [ |
|
| |
| iLPC [ |
|
| |
| Rot + KD + POODLE [ |
|
| |
| EASY 4×ResNet12 |
|
| |
| 36 M
| LR + DC [ |
|
|
| PT+MAP [ |
|
| |
| iLPC [ |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on FC-100 in the transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| TADAM [ |
|
|
| EASY 2×ResNet12 |
|
| |
| 36 M
| SIB + E |
|
|
| fine-tuning (train) [ |
|
| |
| ODC [ |
|
| |
| fine-tuning (train+val) [ |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on CIFAR-FS in the transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| SSR [ |
|
|
| iLPC [ |
|
| |
| DPGN [ |
|
| |
| ECKPN [ |
|
| |
| EASY 2×ResNet12 |
|
| |
| 36 M
| SSR [ |
|
|
| fine-tuning (train+val) [ |
|
| |
| iLPC [ |
|
| |
| PT+MAP [ |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on TieredImageNet in the transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| PT+MAP [ |
|
|
| TIM-GD [ |
|
| |
| ODC [ |
|
| |
| SSR [ |
|
| |
| Rot + KD + POODLE [ |
|
| |
| DPGN [ |
|
| |
| EPNet [ |
|
| |
| ECKPN [ |
|
| |
| iLPC [ |
|
| |
| ASY ResNet12 (ours) |
|
| |
| 36 M
| SIB + E |
|
|
| SSR [ |
|
| |
| fine-tuning (train+val) [ |
|
| |
| TIM-GD [ |
|
| |
| LR + DC [ |
|
| |
| EPNet [ |
|
| |
| ODC [ |
|
| |
| iLPC [ |
|
| |
| PEM |
|
| |
| EASY 3×ResNet12 (ours) |
|
|
Ablation study of the steps of proposed solution in inductive setting, for a fixed number of trainable parameters in the considered backbones. When using ensembles, we use 2×ResNet12 instead of a single ResNet12.
| Dataset | E | AS | 1-Shot | 5-Shot |
|---|---|---|---|---|
| MiniImageNet |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| CUB-FS |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| CIFAR-FS |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| FC-100 |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| TieredImageNet |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
|
Ablation study of the steps of proposed solution in transductive setting for a fixed number of trainable parameters in the considered backbones. When using ensembles, we use 2×ResNet12 instead of a single ResNet12.
| Dataset | E | AS | 1-Shot | 5-Shot |
|---|---|---|---|---|
| MiniImageNet |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| CUB-FS |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| CIFAR-FS |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| FC-100 |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
| |
| TieredImageNet |
|
| ||
| √ |
|
| ||
| √ |
|
| ||
| √ | √ |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on MiniImageNet in the imbalanced transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| MAML [ |
|
|
| LR+ICI [ |
|
| |
| PT+MAP [ |
|
| |
| LaplacianShot [ |
|
| |
| TIM [ |
|
| |
|
|
| ||
| ASY ResNet12 (ours) |
|
| |
| 36 M
| PT+MAP [ |
|
|
| SIB [ |
|
| |
| LaplacianShot [ |
|
| |
| TIM [ |
|
| |
|
|
| ||
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on TieredImageNet in the imbalanced transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| Entropy-min [ |
|
|
| PT+MAP [ |
|
| |
| LaplacianShot [ |
|
| |
| TIM [ |
|
| |
| LR+ICI [ |
|
| |
|
|
| ||
| ASY ResNet12 (ours) |
|
| |
| 36 M
| Entropy-min [ |
|
|
| PT+MAP [ |
|
| |
| LaplacianShot [ |
|
| |
| TIM [ |
|
| |
|
|
| ||
| EASY 3×ResNet12 (ours) |
|
|
The 1-shot and 5-shot accuracy of state-of-the-art methods and the proposed solution on CUB-FS in the imbalanced transductive setting.
| Method | 1-Shot | 5-Shot | |
|---|---|---|---|
| ≤12 M
| PT+MAP [ |
|
|
| Entropy-min [ |
|
| |
| LaplacianShot [ |
|
| |
| TIM [ |
|
| |
|
|
| ||
| ASY ResNet12 (ours) |
|
| |
| 36 M | EASY 3×ResNet12 (ours) |
|
|