| Literature DB >> 31974419 |
Junyoung Park1,2, Dong In Kim3,2, Byoungjo Choi1,2, Woochul Kang4,5, Hyung Wook Kwon6,7.
Abstract
Image-based automatic classification of vector mosquitoes has been investigated for decades for its practical applications such as early detection of potential mosquitoes-borne diseases. However, the classification accuracy of previous approaches has never been close to human experts' and often images of mosquitoes with certain postures and body parts, such as flatbed wings, are required to achieve good classification performance. Deep convolutional neural networks (DCNNs) are state-of-the-art approach to extracting visual features and classifying objects, and, hence, there exists great interest in applying DCNNs for the classification of vector mosquitoes from easy-to-acquire images. In this study, we investigated the capability of state-of-the-art deep learning models in classifying mosquito species having high inter-species similarity and intra-species variations. Since no off-the-shelf dataset was available capturing the variability of typical field-captured mosquitoes, we constructed a dataset with about 3,600 images of 8 mosquito species with various postures and deformation conditions. To further address data scarcity problems, we investigated the feasibility of transferring general features learned from generic dataset to the mosquito classification. Our result demonstrated that more than 97% classification accuracy can be achieved by fine-tuning general features if proper data augmentation techniques are applied together. Further, we analyzed how this high classification accuracy can be achieved by visualizing discriminative regions used by deep learning models. Our results showed that deep learning models exploit morphological features similar to those used by human experts.Entities:
Mesh:
Year: 2020 PMID: 31974419 PMCID: PMC6978392 DOI: 10.1038/s41598-020-57875-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of the mosquito dataset.
| Species | Vector Disease | Captured Location | # of images |
|---|---|---|---|
| Zika, Dengue | Jong-no Seoul | 600 | |
| Zika, Westnile Virus | Tanhyeon Paju | 591 | |
| Malaria | Tanhyeon Paju | 593 | |
| Westnile virus | Jong-no Seoul | 600 | |
| Japanese Encephalitis | Tanhyeon Paju | 594 | |
| — | Tanhyeon Paju | 200 | |
| — | Tanhyeon Paju | 200 | |
| — | Tanhyeon Paju | 200 |
Species marked with † are captured at the lab. facility. We treat Ae. dorsalis, Ae. koreikus and Cx. inatomii as a single less-potential class.
The summary of DCNN architectures investigated in this work.
| DCNN Model | # of Layers (#Conv. + #FC) | # of Parameters (×106) | Top-5 accuracy on ImageNet-2012 | Latency [ms] | Energy/Inference [mWh] |
|---|---|---|---|---|---|
| VGG-16 | 13 + 3 | 138 | 90.4 | 98.63 | 0.32 |
| ResNet-50 | 49 + 1 | 25.6 | 94.7 | 40.18 | 0.14 |
| SqueezeNet | 18 + 0 | 1.23 | 80.3 | 13.26 | 0.03 |
The specification of the DCNN models are from their respective original papers[28,29,32] except the inference latency and energy consumption. For consistency, the inference latency and energy consumption are all measured in the Nvidia Jetson TX2 embedded device.
Figure 1The flow of the classification and visualization in the VGG-16 DCNN model. The class of a given mosquito image is predicted by two steps: (1) extracting hierarchical features and (2) classifying these features. In the feature extractor part, feature maps generated by filters at each convolution layer are shown. These feature maps are used for visualization by weighting them with channel-wise averaged gradients.
The average test classification accuracy.
| DCNN model | Augmentation | Fine-tuning | LR | Accuracy (%) |
|---|---|---|---|---|
| VGG-16 | 5e-6 | 38.96 | ||
| ✓ | 5e-6 | 56.76 | ||
| ✓ | 5e-6 | 91.15 | ||
| ✓ | ✓ | 5e-6 | ||
| ResNet-50 | 5e-3 | 57.47 | ||
| ✓ | 1e-3 | 57.74 | ||
| ✓ | 5e-3 | 93.45 | ||
| ✓ | ✓ | 7.5e-3 | ||
| SqueezeNet | 3e-5 | 47.42 | ||
| ✓ | 3e-5 | 62.81 | ||
| ✓ | 3e-5 | 78.58 | ||
| ✓ | ✓ | 3e-5 |
Each model is trained with four settings of dynamic data-augmentation and fine-tuning. When a model is not fine-tuned, it is trained from scratch with random initialization.
Figure 2Validation accuracy of the models during the training. In all models, optimal validation accuracy was reached early when both the data augmentation and fine-tuning were applied together.
The test accuracies with different partial fine-tuning strategies for VGG-16.
| Model | Fine-tuning targets | Accuracy(%) |
|---|---|---|
| VGG-16 - | All FC layers | 76.05 |
| VGG-16 - | 5th Conv. block + All FC layers | 87.26 |
| VGG-16 - | 4–5th Conv. blocks + All FC layers | 88.51 |
| VGG-16 - | 2–5th Conv. blocks + All FC layers | 93.11 |
| VGG-16 - | All Conv. blocks and FC layers | 97.19 |
In all settings, models are initialized with pre-trained weights from the ImageNet dataset. During the training with the ADAM optimizer, the learning rates are set to 5e-6 and reduced by 0.25 every 15 epochs.
Figure 3The visualization of 64 filters of the first convolution layer in the VGG-16 model. Filters with dimensions are projected into pixel space for the visualization.
Figure 4The visualization of discriminative regions in the 4-, 7-, 10- and 13-th convolution layers of VGG-16 when an Aedes albopictus input image is given.
Morphological keys used by human experts for classifying vector mosquito species.
| Species | Morphological keys used by human experts | Highlighted in heatmaps? |
|---|---|---|
| 1. Tarsi with pale bands | Sometimes | |
| 2. Body relatively darker than the other species | Mostly | |
| 3. Abodominal terga II-VII with large laterobasal patches | Mostly | |
| 1. Last segment of mid and hind tarsi dark apically | Sometimes | |
| 2. Scutum yellowish brown without patches or stripes | Mostly | |
| 3. Middle abdominal bands B-shaped | Rarely | |
| 1. Palpus as long as proboscis | Sometimes | |
| 2. Wing spotted | Mostly | |
| 1. Proboscis without a pale band | Rarely | |
| 2. Abdomen with basal bands | Mostly | |
| 3. Body yellowish brown | Mostly | |
| 1. Proboscis with a pale band in the middle | Rarely | |
| 2. Costa and other veins without pale band | Sometimes | |
| 3. Body relatively small and reddish brown | Mostly |
Most morphological keys related to the body area are highlighted actively in the heatmaps.
Figure 5Visual keys used by human experts are marked with red circles and arrows. Each key is numbered according to the list of keys in Table 5. Some evident or invisible keys are not shown.
Figure 6The visualization of 5 vector mosquito species. Discriminative regions captured by DCNNs are shown in heatmaps for shallow (4-th), middle (8 or 9-th), and deep (12-th) convolution layers.
The confusion matrix (%) achieved by the VGG-16 model on the test dataset.
| 0.0 | 0.0 | 0.0 | 0.0 | 0.2 | ||
| 0.0 | 0.8 | 0.4 | 2.2 | 0.0 | ||
| 0.0 | 0.0 | 0.6 | 0.0 | 0.0 | ||
| 0.0 | 0.0 | 0.0 | 0.85 | 0.0 | ||
| 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | ||
| 0.61 | 6.26 | 0.0 | 1.01 | 0.4 |
Figure 7Three major cases of misclassification. Samples are shown with their heatmaps (feature activation at the 7th and 10th convolution layers) and the classification probabilities.