| Literature DB >> 34940721 |
Loris Nanni1, Michelangelo Paci2, Sheryl Brahnam3, Alessandra Lumini4.
Abstract
Convolutional neural networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the discrete wavelet transform and the other on the constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across four benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, a portrait dataset, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art (or comparable) performance across all four data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification.Entities:
Keywords: convolutional neural networks; data augmentation; deep learning; ensemble
Year: 2021 PMID: 34940721 PMCID: PMC8707550 DOI: 10.3390/jimaging7120254
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Proposed approach. Transfer learning with multiple ResNet50s pretrained on ImageNet using different sets of data augmentation methods, with networks fused by sum rule.
Figure 2Schematic of ResNet50.
Figure 3An example of some traditional augmentation methods on the BARK data set. The left image is the original image.
Figure 4An example image of App5—DCT. The left image is the original image.
Figure 5An example image of App10—DWT. The left image is the original image.
Figure 6An example image of App11—DQT. The left image is the original image.
Number of artificial images created by each data augmentation method.
| Data Augmentation Method | Number of Generated Images |
|---|---|
| App1 | 3 |
| App2 | 6 |
| App3 | 4 |
| App4 | 3 |
| App5 | 3 |
| App6 | 3 |
| App7 | 7 |
| App8 | 2 |
| App9 | 6 |
| App10 | 3 |
| App11 | 3 |
Note: The number of generated images is per image in the training set. As an example, if a training set has 1000 images, then App1 would build an additional 3 × 1000 images. Thus, the final training set would be 1000 (the original number in the training set) plus the 3000 images generated by App1.
Performance (accuracy) of the different configurations for data augmentation.
| DataAUG | VIR | BARK | GRAV | POR |
|---|---|---|---|---|
| NoDA | 85.53 | 87.48 | 97.66 | 86.29 |
| App1 | 87.00 | 89.60 | 97.83 | 87.05 |
| App2 | 86.87 | 90.17 | 98.08 | 85.97 |
| App3 | 87.80 | 89.45 | 97.99 | 87.05 |
| App4 | 86.33 | 87.91 | 97.74 | 84.90 |
| App5 | 86.00 | 87.61 | 97.83 | 86.41 |
| App6 | -- | 88.63 | 98.08 | 87.37 |
| App7 | -- | 89.28 | 97.99 | 88.13 |
| App8 | -- | 87.29 | 97.74 | 86.06 |
| App9 | 85.67 | 88.86 | 98.24 | 86.19 |
| App10 | 84.20 | 86.39 | 98.41 | 85.10 |
| App11 | 85.47 | 89.20 | 97.91 | 86.71 |
| [ | 82.93 | -- | -- | -- |
| [ | 83.07 | -- | -- | -- |
| EnsDA_all | 90.00 | 91.27 | 98.33 | 89.21 |
| EnsDA_5 | 89.60 | 91.01 | 98.08 | 88.56 |
| EnsBase | 89.73 | 90.67 | 98.16 | 87.58 |
| EnsBase_5 | 89.60 | 90.66 | 97.99 | 87.48 |
| State of the art | 89.60 | 90.40 | 98.21 | 80.09/90.08 * |
* As noted above, for fair comparison, 80.09 is the best performance using their deep learning approach, but 90.08 was obtained when combining handcrafted with deep learning features. Note: the virus data set has gray level images; for this reason, the three data augmentation methods based on color (App7–8) perform poorly on VIR, so these methods are not reported for this data set. Additionally, because of the low performance on VIR, [29,33] are not tested on BARK, GRAV, and POR. Bold values highlight the best results.
Performance (accuracy) compared with the best in the literature on the VIR data set.
| EnsDA_all | [ | [ | [ | [ | [ | [ | [ | [ |
|---|---|---|---|---|---|---|---|---|
| 90.00 | 89.60 | 89.47 | 89.00 | 88.00 | 87.27 | 87.00 * | 86.20 | 85.70 |
Note: the method notated with * combines descriptors based on both object scale and fixed scale images (as noted in Section 3.3, the fixed scale data set is not publicly available); yet, even with this advantage, our proposed system outperforms [14].
Comparison with the literature, BARK data set.
| EnsDA_all | [ | [ | [ | [ |
|---|---|---|---|---|
| 91.27 | 48.90 | 85.00 | 90.40 | 85.00 |