| Literature DB >> 34188837 |
Deng-Qi Yang1,2,3,4, Kun Tan2,3, Zhi-Pang Huang2,3, Xiao-Wei Li1,4, Ben-Hui Chen1,4, Guo-Peng Ren2,3, Wen Xiao2,3.
Abstract
Camera traps often produce massive images, and empty images that do not contain animals are usually overwhelming. Deep learning is a machine-learning algorithm and widely used to identify empty camera trap images automatically. Existing methods with high accuracy are based on millions of training samples (images) and require a lot of time and personnel costs to label the training samples manually. Reducing the number of training samples can save the cost of manually labeling images. However, the deep learning models based on a small dataset produce a large omission error of animal images that many animal images tend to be identified as empty images, which may lead to loss of the opportunities of discovering and observing species. Therefore, it is still a challenge to build the DCNN model with small errors on a small dataset. Using deep convolutional neural networks and a small-size dataset, we proposed an ensemble learning approach based on conservative strategies to identify and remove empty images automatically. Furthermore, we proposed three automatic identifying schemes of empty images for users who accept different omission errors of animal images. Our experimental results showed that these three schemes automatically identified and removed 50.78%, 58.48%, and 77.51% of the empty images in the dataset when the omission errors were 0.70%, 1.13%, and 2.54%, respectively. The analysis showed that using our scheme to automatically identify empty images did not omit species information. It only slightly changed the frequency of species occurrence. When only a small dataset was available, our approach provided an alternative to users to automatically identify and remove empty images, which can significantly reduce the time and personnel costs required to manually remove empty images. The cost savings were comparable to the percentage of empty images removed by models.Entities:
Keywords: artificial intelligence; camera trap images; convolutional neural networks; deep learning; ensemble learning
Year: 2021 PMID: 34188837 PMCID: PMC8216933 DOI: 10.1002/ece3.7591
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
FIGURE 1Framework of ensemble learning for automatically removing empty images. A and R were the classifiers that output by the three DCNN models on the training set Train was the first‐level ensemble classifier combined with A, and R (i = 1,2). AIR was the second‐level ensemble classifier combined with AIR and AIR
The training sets and the test set
| Datasets | Number of total images | Number of empty images | Number of nonempty images | Empty image percentage (%) |
|---|---|---|---|---|
|
| 238,673 | 185,688 | 52,985 | 77.80 |
|
| 105,970 | 52,985 | 52,985 | 50.00 |
|
| 29,811 | 23,294 | 6,517 | 78.14 |
Characteristics of different deep learning architectures
| Architecture | Number of layers | Input size | Short description |
|---|---|---|---|
| AlexNet | 8 | 227 × 227 | 2012 ILSVRC Champion. It is a landmark architecture for deep learning. |
| InceptionV3 | 42 | 299 × 299 | It increases the width of the network and uses the idea of Batch Normalization and factorization. |
| ResNet‐18 | 18 | 224 × 224 | It introduces a residual module to solve the problem of network degradation. |
Two different ensemble methods
| Model | Predicted results | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
| 0 | 0 | 0 | 1 | 1 | 1 |
|
|
|
|
| 0 | 1 |
| 0 | 1 |
| 0 | 1 |
|
|
| 0 |
|
|
| 1 |
|
|
|
|
|
| 0 |
| 0 |
| 1 | 1 | 0 | 1 |
|
0, 1, and x represented empty image, nonempty, and uncertain image, respectively. AIR and AIR was the enhanced ensemble model. AIR or AIR was the complementary ensemble model.
FIGURE 2Image‐level experimental results on LSM dataset (with 95% confidence of DCNN models)
Image‐level errors on the LSM dataset
| No. | Models |
|
|
|
|
|
|---|---|---|---|---|---|---|
| I |
| 0.75 | 0.70 | 3.81 | 0.14 | 50.78 |
| II |
| 4.33 | 1.13 | 14.86 | 0.37 | 58.48 |
| III |
| 3.94 | 2.54 | 14.60 | 0.68 | 77.51 |
| 1 |
| 8.10 | 8.8 | 23.58 | 2.59 | 86.89 |
| 2 |
| 3.26 | 9.06 | 9.24 | 1.96 | 86.21 |
| 3 |
| 9.82 | 22.18 | 26.28 | 5.51 | 80.61 |
| 4 |
| 10.95 | 5.43 | 31.56 | 1.75 | 81.29 |
| 5 |
| 6.80 | 3.80 | 22.31 | 1.12 | 85.14 |
| 6 |
| 15.62 | 8.85 | 40.07 | 3.00 | 66.56 |
E was the overall error, and E = (FP + FN)/(TP + FP + FN + TN). E was the omission error of animal images, and E = 1‐recall = FN / (FN + TP). E was commission error of animal image, and E + FP). E was the commission error of empty images, and E = FN / (FN + TN). R was the removal rate of empty images, and R = TN / N, where N was the number of empty images in the test set.
FIGURE 3Event‐level experimental results on LSM dataset (with 95% confidence of DCNN models)
Event‐level errors on the LSM dataset
| No. | Models |
|
|
|
|
|
|---|---|---|---|---|---|---|
| I |
| 0.60 | 0.26 | 1.93 | 0.10 | 40.22 |
| II |
| 4.26 | 0.64 | 11.34 | 0.32 | 48.93 |
| III |
| 3.69 | 1.61 | 10.91 | 0.58 | 70.71 |
| 1 |
| 8.25 | 7.12 | 22.26 | 2.46 | 84.99 |
| 2 |
| 2.91 | 6.11 | 7.71 | 1.62 | 85.55 |
| 3 |
| 10.59 | 11.31 | 26.80 | 3.87 | 72.59 |
| 4 |
| 10.68 | 4.53 | 28.54 | 1.70 | 79.77 |
| 5 |
| 6.37 | 3.47 | 19.17 | 1.18 | 85.56 |
| 6 |
| 19.85 | 4.03 | 41.93 | 1.97 | 56.47 |
E was the overall error, and E = (FP + FN)/(TP + FP + FN + TN). E was the omission error of animal images, and E = 1‐recall= FN / (FN + TP). E was commission error of animal image, and E + FP). E was the commission error of empty images, and E = FN / (FN + TN). R was the removal rate of empty images, and R = TN / N, where N was the number of empty images in the test set.
Image‐level errors on the SS_S1_135 dataset (with 95% of confidence)
| No. | Models |
|
|
|
|
|
|---|---|---|---|---|---|---|
| I |
| 0.71 | 2.75 | 2.69 | 0.42 | 86.29 |
| II |
| 2.07 | 2.95 | 9.45 | 0.56 | 87.24 |
| III |
| 2.43 | 5.68 | 9.45 | 1.06 | 94.02 |
| 1 |
| 3.76 | 11.11 | 11.38 | 2.23 | 96.03 |
| 2 |
| 2.74 | 11.35 | 5.71 | 2.20 | 97.08 |
| 3 |
| 3.94 | 17.31 | 8.95 | 3.12 | 96.05 |
| 4 |
| 5.41 | 7.26 | 20.73 | 1.56 | 93.27 |
| 5 |
| 4.26 | 6.14 | 16.44 | 1.32 | 92.67 |
| 6 |
| 5.07 | 8.14 | 19.79 | 1.65 | 91.06 |
E was the overall error, and E = (FN+ FP)/(TP + FP + FN+ TN). E was the omission error of animal images, and E = 1‐recall= N / (FN + TP). E was commission error of animal image, and E. E was the commission error of empty images, and E = FN / (FN + TN). R was the removal rate of empty images, and R = TN / N, where N was the number of empty images in the test set.
FIGURE 4Omission errors of animal images (a) and coverage (b) of different schemes with different confidence thresholds on the LSM dataset