| Literature DB >> 34987568 |
Hao Hu1, Mengya Gao2, Mingsheng Wu3.
Abstract
In the real-world scenario, data often have a long-tailed distribution and training deep neural networks on such an imbalanced dataset has become a great challenge. The main problem caused by a long-tailed data distribution is that common classes will dominate the training results and achieve a very low accuracy on the rare classes. Recent work focuses on improving the network representation ability to overcome the long-tailed problem, while it always ignores adapting the network classifier to a long-tailed case, which will cause the "incompatibility" problem of network representation and network classifier. In this paper, we use knowledge distillation to solve the long-tailed data distribution problem and fully optimize the network representation and classifier simultaneously. We propose multiexperts knowledge distillation with class-balanced sampling to jointly learn high-quality network representation and classifier. Also, a channel activation-based knowledge distillation method is also proposed to improve the performance further. State-of-the-art performance on several large-scale long-tailed classification datasets shows the superior generalization of our method.Entities:
Mesh:
Year: 2021 PMID: 34987568 PMCID: PMC8723848 DOI: 10.1155/2021/6702625
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Classifier weight norm for ResNet-10 trained on ImageNet-LT. The class indexes are sorted by descending values of class sample numbers.
Comparison feature quality between class-balanced sampling (CBS) and instance-balanced sampling (IBS). ResNet-10 models are trained on ImageNet-LT (I-LT), and then classifiers are retrained with class-balanced sampling on Places-LT (P-LT).
| Representation | Classifier | ||
|---|---|---|---|
| Strategy | ImageNet-LT | Strategy | Places-LT |
| IBS | 35.7 | CBS | 25.2 |
| CBS | 36.5 | CBS | 22.1 |
Figure 2Framework overview of the proposed method. Here, training datasets are split into three subsets and three experts are used as teachers. Each expert is responsible for transferring knowledge from its corresponding subset into a student model. The knowledge is transferred between feature maps and only channels with high activation intensity, which we consider as containing more knowledge, will be used for distillation. Details about filtering channels are introduced in Section 4.3.
Figure 3Visualization of features where each one is a vector averaged among one category on CIFAR-100. Each banner is taken from three different classes. Brighter color corresponds to a higher activation intensity.
Ablation of using different experts while applying the proposed method.
| Model | Many-shot | Medium-shot | Few-shot | Acc |
|---|---|---|---|---|
| ResNet-10 | >100 | ≤100 and >20 | ≤20 | |
| Plain model | 56.8 | 25.7 | 3.6 | 34.6 |
| Many-shot model | 57.9 | — | — | — |
| Medium-shot model | — | 32.5 | — | — |
| Low-shot model | — | — | 10.6 | — |
| OLTR [ | 43.2 | 35.1 | 18.5 | 35.6 |
| Ours with many-shot/OLTR/OLTR | 54.0 | 34.1 | 17.4 | 39.2 |
| Ours with many-shot/medium-shot/low-shot | 54.4 | 28.7 | 8.9 | 36.9 |
| Average over combination of designed experts | 53.7 | 32.9 | 14.5 | 37.2 |
| Average over randomly splitting strategy | 48.3 | 27.6 | 8.7 | 36.4 |
Ours with A/B/C refers to A, B, and C which are used as expert models to supervise many-shot/medium-shot/few-shot subsets, respectively. Experiments are performed on ImageNet-LT with ResNet-10.
Ablation of our approach using instance-balanced sampling (IBS) and class-balanced sampling (CBS) with ResNet-10 on ImageNet-LT.
| Shot | IBS | CBS |
|---|---|---|
| Many-shot model |
| 54.0 |
| Medium-shot model | 32.9 |
|
| Few-shot model | 16.8 |
|
| Overall | 37.5 |
|
Bold values are the highest results in each line.
Ablation of representation quality with our method. ResNet-10 is first trained on ImageNet-LT (I-LT). Classifiers are retrained on Places-LT (P-LT).
| Representation | Classifier | ||
|---|---|---|---|
| Strategy | ImageNet-LT | Strategy | Places-LT |
| IBS | 35.7 | CBS | 25.2 |
| CBS | 36.5 | CBS | 22.1 |
| Ours with IBS | 37.5 | CBS | 28.2 |
| Ours with CBS | 39.2 | CBS | 27.8 |
Ablation of knowledge distillation settings on ImageNet-LT.
| Distillation | — | √ | √ | √ | √ |
| Multiexperts | — | — | √ | — | √ |
| Channel activation | — | — | — | √ | √ |
| Acc | 35.7 | 37.1 | 38.6 | 37.8 |
|
Long-tailed classification results on ImageNet LT.
| Model | Many-shot | Medium-shot | Few-shot | Acc |
|---|---|---|---|---|
| ResNet-10 | >100 | ≤100 and >20 | ≤20 | |
| Plain model | 56.8 | 25.7 | 3.6 | 34.6 |
| Many-shot model |
| — | — | — |
| Lifted loss† [ | 35.8 | 30.4 | 17.9 | 30.8 |
| Focal loss† [ | 36.4 | 29.9 | 16 | 30.5 |
| Range loss† [ | 35.8 | 30.3 | 17.6 | 30.7 |
| FsLwf† [ | 40.9 | 22.1 | 15 | 28.4 |
| OLTR | 43.2 | 35.1 | 18.5 | 35.6 |
| BALMS† [ | 50.3 | 39.5 | 25.3 | 41.8 |
| Decouple (cRT) | 52.3 | 39.5 | 23.2 | 42.1 |
| Decouple ( | 51.9 | 38.3 | 22.5 | 40.6 |
| Ours with OLTR | 54.0 | 34.1 | 17.4 | 39.2 |
| Ours with decouple | 54.9 |
|
|
|
†Results directly copied from Ref. [10]. Results reproduced with author's code.
Long-tailed classification results on Places-LT, starting from an ImageNet pretrained ResNet-152.
| Model | Many-shot | Medium-shot | Few-shot | Acc |
|---|---|---|---|---|
| ResNet-152 | >100 | ≤100 and >20 | ≤20 | |
| Plain model | 45.5 | 27.8 | 8.5 | 30.2 |
| Many-shot model | 46.4 | — | — | — |
| Lifted loss† [ | 41.1 | 34.8 | 22.4 | 34.6 |
| Focal loss† [ | 41.1 | 35.4 | 24 | 35.2 |
| Range loss† [ | 41.1 | 35.4 | 23.2 | 35.1 |
| FsLwf† [ | 43.9 | 29.9 | 29.5 | 34.9 |
| OLTR | 42.2 | 38.1 | 17.8 | 35.3 |
| BALMS† [ | 41.2 | 39.8 | 31.6 | 38.7 |
| Decouple (cRT) | 41.6 | 39.4 | 29.2 | 38.1 |
| Decouple ( | 37.8 | 40.7 | 31.8 | 37.9 |
| Ours with OLTR | 43.8 | 37.8 | 17.5 | 37.5 |
| Ours with decouple | 41.5 | 40.9 | 32.2 | 38.7 |
†Results directly copied from Ref. [10]. Results reproduced with author's code.
Long-tailed classification results on iNaturalist-2018.
| Model | Many-shot | Medium-shot | Few-shot | Acc |
|---|---|---|---|---|
| ResNet-50 | >100 | ≤100 and >20 | <20 | |
| Plain model | 73.5 | 65.2 | 59.5 | 63.6 |
| Many-shot model | 74.6 | — | — | — |
| OLTR | 65.9 | 66.3 | 63.6 | 65.4 |
| Decouple (cRT) | 66.2 | 67.3 | 66.8 | 67.0 |
| Decouple ( | 65.4 | 66.8 | 66.9 | 67.2 |
| Ours with OLTR | 68.7 | 66.2 | 63.5 | 67.4 |
| Ours with decouple | 69.4 | 69.3 | 67.6 | 68.8 |
Precision and Recall analysis on long-tailed dataset.
| Dataset | Precision-decouple [ | Precision-ours | Recall-decouple [ | Recall-ours |
|---|---|---|---|---|
| ImageNet-LT | 69.9 | 72.8 | 62.0 | 66.9 |
| Places-LT | 58.6 | 62.6 | 53.5 | 58.7 |
| iNaturalist | 73.9 | 75.1 | 71.7 | 74.8 |