| Literature DB >> 36175593 |
Nagwa Elaraby1, Sherif Barakat2, Amira Rezk2.
Abstract
Supervised learning with the restriction of a few existing training samples is called Few-Shot Learning. FSL is a subarea that puts deep learning performance in a gap, as building robust deep networks requires big training data. Using transfer learning in FSL tasks is an acceptable way to avoid the challenge of building new deep models from scratch. Transfer learning methodology considers borrowing the architecture and parameters of a previously trained model on a large-scale dataset and fine-tuning it for low-data target tasks. But practically, fine-tuning pretrained models in target FSL tasks suffers from overfitting. The few existing samples are not enough to correctly adjust the pretrained model's parameters to provide the best fit for the target task. In this study, we consider mitigating the overfitting problem when applying transfer learning in few-shot Handwritten Character Recognition (HCR) tasks. A data augmentation approach based on Conditional Generative Adversarial Networks is introduced. CGAN is a generative model that can create artificial instances that appear more real and indistinguishable from the original samples. CGAN helps generate extra samples that hold the possible variations of human handwriting instead of applying traditional image transformations. These transformations are low-level, data-independent operations, and only produce augmented samples with limited diversity. The introduced approach was evaluated in fine-tuning the three pretrained models: AlexNet, VGG-16, and GoogleNet. The results show that the samples generated by CGAN can enhance transfer learning performance in few-shot HCR tasks. This is by achieving model fine-tuning with fewer epochs and by increasing the model's [Formula: see text] and decreasing the Generalization Error [Formula: see text].Entities:
Mesh:
Year: 2022 PMID: 36175593 PMCID: PMC9520122 DOI: 10.1038/s41598-022-20654-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Transfer learning methodology creates high-performing learners by extracting knowledge learned from previous tasks and applying it to new related low-data tasks.
A comparison of some recent studies concentrating on applying data augmentation to avoid the overfitting problem in low-data regimes.
| Reference | Augmentation Strategy | Classification Model | Task | Advantages | Limitations |
|---|---|---|---|---|---|
| Antoniou et al.[ | DAGAN | Standard SGDNN | Low-data regime tasks | The proposed DAGAN significantly improves the classification accuracy in the human faces and handwriting domains | Further evaluations for the developed GAN architecture need to be made in FSL |
| Frid-Adar et al.[ | Traditional transformations and GAN | CNN model | Low-data medical recognition tasks | Using GAN with the traditional methods increases the performance of CNN to 7% compared with using only traditional methods | Using GAN to generate artificial images for each class is a time-consuming task |
| Mondal et al.[ | FM GAN and Bad-GAN | CNN model | FSL in segmenting 3D multimodal medical images | FM GAN outperforms the performance of classical GAN and Bad GAN | Further experiments are required for the FM-Bad GAN, as Bad-GAN is essential for good semi-supervised learning |
| Zhang et al.[ | Horizontal flipping, cropping ,scaling, rotating, and contrast changing transformations | Different architectures of CNNs | FSL in ear recognition | The applied transformations create flexible CNN that can adapt to new test data and perform fast recognition | The applied transformations are not tested in open-set ear recognition problems which are highly challenging |
| Guan and Loew[ | GAN | VGG-16 model | Breast cancer detection systems when training examples are small | GAN avoids overfitting in the pretrained network and transfer learning increases the speed of training approximately 10 times faster than training CNN from scratch | Using GAN for generating artificial images for each class is a time-consuming task |
| Noon et al.[ | Rotation, width shift, height shift, zoom, horizontal flip, and vertical flip transformations | DenseNet-121 model | Plant leaf disease recognition with small datasets | The network generalization is best for the combination of width shift and height shift transformations | The combination of zoom and rotation transformations makes the network highly prone to overfitting |
| Joseph and George[ | Offline and real-time traditional transformations | CNN model | HCR with data scarcity | The real-time augmentation helps CNNs achieve better accuracy by exceeding low resources compared with the offline augmentation | The applied transformations in each mode are different, this made the comparison not fair |
| Zhang et al.[ | DADA | CNN model | Ill-posed extremely low-data regimes | Applying the 2K loss to GAN’s discriminator boosts the performance | The proposed DADA has yet to be applied to real-world tasks, such as military, satellite, and biomedical image classification. |
| Jha and Cecotti[ | GAN | CNN model | Handwritten digit recognition tasks with low number of labeled samples | The recommended augmentation approach causes a substantial gain in accuracy | The overall performance may decrease when too many artificial images added to the original training examples |
| Ahmad et al.[ | Random rotation, random horizontal reflection, random vertical reflection, random horizontal shear, and random vertical shear transformations | MobileNet, ResNet50, and InceptionV3 models | Classifying novel COVID-19 when sufficient chest X-ray images are absent | The applied transformations help increase the performance significantly | The augmented X-ray images are still not highly accurate as benchmarks for identifyingCOVID-19 infections in patients |
| Fabian
et al.[ | A combination of pixel and general affine preserving transformations | End-to-end VarNet model | Accelerated MRI reconstruction on small datasets | The proposed data augmentation pipeline improves the model robustness against various shifts in the test distribution | This study faces a challenge which is how to find the optimal augmentation strength throughout training |
| De la Rosa et al.[ | Scaling, rotation, translation, and flipping. | ResNet-50 model | Small sample defect classification problems | The F1 score of the model increases every time the dataset volume increased with augmented images | This study does not regard the Generalization Error for performance evaluations |
| Yunusa et al.[ | SG2-ADA | Faster- RCNN , and SSD models | Rice leaf diseases when large quality datasets are absent | The SG2-ADA produces better-quality artificial images and leads to good recognition | Experiments miss comparing the performance of StyleGAN2 and the vanilla GAN architecture |
| Asghar et al.[ | Zoom, horizontal shift, vertical shift, and rotation transformation, and GAN | InceptionV3, Resnet101, DenseNet-121, Xception , and QuNet | Data scarcity problem in detecting COVID-19 cases | The highest detection accuracy is achieved byXception and QuNet models when applying the traditional transformations and by QuNet when using GAN | The experimental observations couldn’t conclude the perfect augmentation approach to detect the novel COVID-19 |
Figure 2Difference between generative and discriminative modeling in dealing with input data.
Figure 3GAN architecture consists of two networks trained simultaneously, G and D. G generates synthetic samples that D tries to make plausible.
Figure 4Both G and D are conditioned with the class labels in CGAN.
Figure 5Different transfer learning models with their depth.
Figure 6The proposed framework for enhancing the performance of transfer learning models in few-shot HCR tasks.
Figure 7The total structure of G and D networks included in the introduced CGAN architecture.
Details of the chosen datasets from Omniglot package.
| Dataset | No. of classes | No. of samples/class |
|---|---|---|
| Latin | 26 | 20 |
| Malay(Jawi-Arabic) | 38 | 20 |
| Korean | 40 | 20 |
| Sanskrit (old Indo-Aryan) | 42 | 20 |
Figure 8Total loss of the constructed CGAN. G loss decreases when the D loss increases , the two networks play a mina mini-max two-player game.
Figure 9For each dataset: (a) represents examples of its real samples, and (b) represents examples of its fake samples generated by the construced CGAN.
The applied traditional transformations in each dataset for Case(B).
| Dataset | Applied Transformations |
|---|---|
| Latin | Random reflection with x axis |
| Malay(Jawi-Arabic) | Random reflection with x-axis, horizontal and vertical translation by a distance in the range |
| Korean | Horizontal and vertical translation by a distance in the range |
| Sanskrit(old Indo-Aryan) | Random reflection with x-axis, and horizontal translation by a distance in the range |
The recognition results of AlexNet, VGG-16, and GoogleNet models under the different training cases for each dataset.
| Dataset | Recognition Model | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Latin | AlexNet | 86.54 | 80.77 | 99.19 | 86.69 | 85 | 85.27 | 14.1 | 16.03 | 16.03 | |||
| Vgg16 | 82.69 | 84.62 | 98.59 | 87.24 | 83.66 | 85.50 | 14.74 | 18.59 | 16.03 | ||||
| GoogleNet | 65.38 | 69.23 | 97.24 | 73.96 | 74.29 | 86.31 | 27.56 | 28.21 | 14.47 | ||||
| Malay(Jawi-Arabic) | AlexNet | 80.26 | 89.47 | 92.36 | 80.18 | 78.55 | 80.12 | 21.93 | 24.12 | 21.49 | |||
| Vgg16 | 72.37 | 76.32 | 92.61 | 76.18 | 79.08 | 84.85 | 27.63 | 23.25 | 16.76 | ||||
| GoogleNet | 43.42 | 47.37 | 86.89 | 39.46 | 38.4 | 72.52 | 60.53 | 64.91 | 30.26 | ||||
| Korean | AlexNet | 85 | 90 | 96.80 | 89.83 | 88.05 | 85.80 | 11.76 | 12.92 | 15 | |||
| Vgg16 | 68.75 | 72.50 | 97.27 | 77.81 | 75.36 | 85.77 | 24.12 | 27.08 | 15.83 | ||||
| GoogleNet | 60 | 47.50 | 96.82 | 51.34 | 55.02 | 88.45 | 50 | 48.75 | 12.50 | ||||
| Sanskrit(old Indo-Aryan) | AlexNet | 70.24 | 78.57 | 93.12 | 72.47 | 65.28 | 72.42 | 31.35 | 38.10 | 28.97 | |||
| Vgg16 | 54.76 | 61.90 | 91.56 | 55.54 | 52.51 | 70.43 | 49.60 | 53.17 | 30.95 | ||||
| GoogleNet | 29.76 | 30.95 | 87.73 | 29.24 | 27.37 | 74.45 | 74.60 | 75 | 27.78 | ||||
Significant values are in [bold].
Figure 10Visualization for the F1-Score recorded by each model under the different training cases for (a) Latin dataset, (b) Malay(Jawi-Arabic) dataset, (c) Korean dataset, and (d) Sanskrit(Indo-Aryan) dataset.
Figure 11Visualization for the recorded by each model under the different training cases for (a) Latin dataset, (b) Malay dataset, (c) Korean dataset, and (d) Sanskrit(Indo-Aryan) dataset.
Figure 12Visualization for Val.Acc. achieved at each epoch in Case(A) and Case(C) for (a) Latin dataset, (b) Malay dataset, (c) Korean dataset, and (d) Sanskrit(Indo-Aryan) dataset.