| Literature DB >> 35987756 |
Song-Quan Ong1, Hamdan Ahmad2.
Abstract
Conventional methods to study insect taxonomy especially forensic and medical dipterous flies are often tedious, time-consuming, labor-intensive, and expensive. An automated recognition system with image processing and computer vision provides an excellent solution to assist the process of insect identification. However, to the best of our knowledge, an image dataset that describes these dipterous flies is not available. Therefore, this paper introduces a new image dataset that is suitable for training and evaluation of a recognition system involved in identifying the forensic and medical importance of dipterous flies. The dataset consists of a total of 2876 images, in the input dimension (224 × 224 pixels) or as an embedded image model (96 × 96 pixels) for microcontrollers. There are three families (Calliphoridae, Sarcophagidae, Rhiniidae) and five genera (Chrysomya, Lucilia, Sarcophaga, Rhiniinae, Stomorhina), and each class of genus contained five different variants (same species) of fly to cover the variation of a species.Entities:
Mesh:
Year: 2022 PMID: 35987756 PMCID: PMC9392721 DOI: 10.1038/s41597-022-01627-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1General workflow to record the dataset and organised into the labelled classes.
Summary of the image annotations.
| Annotation (number of images) | |
|---|---|
| Family | Genera |
| • Calliphoridae (1318) | • Chrysomya (731) |
| • Sarcophagidae (570) | • Lucilia (587) |
| • Rhiniidae (988) | • Sarcophaga (570) |
| • Rhiniinae (488) | |
| • Stomorhina (500) | |
Fig. 2Data collection process.
Description and example of annotated classes of flies.
| Order | Family | Genus | Examples |
|---|---|---|---|
| Diptera | Calliphoridae | Chrysomya | |
| Lucilia | |||
| Sarcophagidae | Sarcophaga | ||
| Rhiniidae | Rhiniinae | ||
| Stomorhina |
Pilot test result: Training and testing accuracy of the deep learning model by using two different dimensions of dataset at three learning rates; blue line is representing training accuracy; orange line is representing testing accuracy.
Pilot test result: Training and testing loss of a deep learning model by using two different dimensions of dataset at three learning rates; blue line is representing training function loss, orange line is representing testing function loss .
Confusion matrix of the deep learning model by using two different dimensions of dataset at three learning rates; the blue intensities indicate the frequency counts, the darker the blue colour the higher the frequency.
Chy- Chrysomya; Luc- Lucilia; Sto- Stomorhina; Sar- Sarcophagidae; Rhi- Rhiniinae
Number of images used for pilot test training and testing [class (train: test)]: Chy (621:110); Lucilia (499:88); Sto (425:75); Sar (484:86); Rhi (414:74).
| Measurement(s) | supervised machine learning |
| Technology Type(s) | Camera Device |
| Sample Characteristic - Organism | Diptera sp. NZAC 03009335 |