| Literature DB >> 31534668 |
Ruilong Chen1, Ruth Little2, Lyudmila Mihaylova1, Richard Delahay3, Ruth Cox3.
Abstract
Wildlife conservation and the management of human-wildlife conflicts require cost-effective methods of monitoring wild animal behavior. Still and video camera surveillance can generate enormous quantities of data, which is laborious and expensive to screen for the species of interest. In the present study, we describe a state-of-the-art, deep learning approach for automatically identifying and isolating species-specific activity from still images and video data.We used a dataset consisting of 8,368 images of wild and domestic animals in farm buildings, and we developed an approach firstly to distinguish badgers from other species (binary classification) and secondly to distinguish each of six animal species (multiclassification). We focused on binary classification of badgers first because such a tool would be relevant to efforts to manage Mycobacterium bovis (the cause of bovine tuberculosis) transmission between badgers and cattle.We used two deep learning frameworks for automatic image recognition. They achieved high accuracies, in the order of 98.05% for binary classification and 90.32% for multiclassification. Based on the deep learning framework, a detection process was also developed for identifying animals of interest in video footage, which to our knowledge is the first application for this purpose.The algorithms developed here have wide applications in wildlife monitoring where large quantities of visual data require screening for certain species.Entities:
Keywords: automatic image recognition; bovine tuberculosis; convolutional neural networks; deep learning; wildlife monitoring
Year: 2019 PMID: 31534668 PMCID: PMC6745675 DOI: 10.1002/ece3.5410
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1The training and testing processes of a recognition framework
Figure 2An example of a generic CNN architecture (Chen et al., 2018)
Figure 3The architecture of CNN‐1
Figure 4The architecture of CNN‐2
Figure 5The process of applying trained CNNs to video footage
Figure 6Example images from the testing dataset. From the first row to the last row are badger, bird, cat, fox, rat, and rabbit
Number of images per category in the wildlife dataset
| Category | Total images | Training images | Testing images |
|---|---|---|---|
| Badger | 1,556 | 1,089 | 467 |
| Bird | 1,528 | 1,070 | 458 |
| Cat | 1,083 | 758 | 325 |
| Fox | 2,693 | 1,885 | 808 |
| Rat | 570 | 399 | 171 |
| Rabbit | 938 | 657 | 281 |
| Total | 8,368 | 5,858 | 2,510 |
The performance of CNN‐1 for binary classification without and with the resampling process
| Test data | ||||
|---|---|---|---|---|
| Badger | Nonbadger | Accuracy (%) | F1 score | |
| Without resampling | ||||
| Prediction | ||||
| Badger | 384 (TP) | 28 (FP) | 95.58 | 0.87 |
| Nonbadger | 83 (FN) | 2015 (TN) | ||
| With resampling | ||||
| Prediction | ||||
| Badger | 416 (TP) | 53 (FP) | 95.86 | 0.89 |
| Nonbadger | 51 (FN) | 1990 (TN) | ||
The performance of CNN‐2 for binary classification without and with the resampling process
| Test data | ||||
|---|---|---|---|---|
| Badger | Nonbadger | Accuracy (%) | F1 score | |
| Without resampling | ||||
| Prediction | ||||
| Badger | 429 (TP) | 22 (FP) | 97.61 | 0.93 |
| Nonbadger | 38 (FN) | 2021 (TN) | ||
| With resampling | ||||
| Prediction | ||||
| Badger | 442 (TP) | 24 (FP) | 98.05 | 0.95 |
| Nonbadger | 25 (FN) | 2019 (TN) | ||
The performance of CNN‐1 for multiclassification without the resampling process
| Test data | Accuracy (%) | Mean accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Badger | Bird | Cat | Fox | Rat | Rabbit | |||
| Prediction | ||||||||
| Badger | 395 | 2 | 6 | 29 | 9 | 6 | 83.07 | 79.98 |
| Bird | 1 | 441 | 3 | 7 | 2 | 4 | ||
| Cat | 4 | 3 | 207 | 34 | 10 | 6 | ||
| Fox | 54 | 11 | 90 | 704 | 24 | 46 | ||
| Rat | 7 | 0 | 5 | 3 | 122 | 3 | ||
| Rabbit | 6 | 1 | 14 | 31 | 4 | 214 | ||
| Individual accuracy (%) | 84.58 | 96.29 | 63.69 | 87.13 | 71.35 | 76.87 | ||
The performance of the CNN‐1 for multiclassification with the resampling process
| Test data | Accuracy (%) | Mean accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Badger | Bird | Cat | Fox | Rat | Rabbit | |||
| Prediction | ||||||||
| Badger | 402 | 2 | 2 | 34 | 8 | 5 | 83.51 | 82.71 |
| Bird | 0 | 438 | 4 | 12 | 0 | 3 | ||
| Cat | 4 | 3 | 235 | 41 | 8 | 4 | ||
| Fox | 11 | 0 | 9 | 652 | 12 | 29 | ||
| Rat | 11 | 0 | 9 | 7 | 132 | 3 | ||
| Rabbit | 5 | 5 | 14 | 62 | 11 | 237 | ||
| Individual accuracy (%) | 86.08 | 95.63 | 72.31 | 80.69 | 77.19 | 84.34 | ||
The performance of CNN‐2 for multiclassification without the resampling process
| Test data | Accuracy (%) | Mean accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Badger | Bird | Cat | Fox | Rat | Rabbit | |||
| Prediction | ||||||||
| Badger | 431 | 1 | 6 | 10 | 3 | 11 | 90.32 | 87.57 |
| Bird | 2 | 447 | 3 | 5 | 3 | 5 | ||
| Cat | 3 | 0 | 251 | 8 | 6 | 5 | ||
| Fox | 16 | 5 | 47 | 763 | 10 | 15 | ||
| Rat | 5 | 2 | 3 | 6 | 133 | 3 | ||
| Rabbit | 10 | 3 | 15 | 16 | 16 | 242 | ||
| Individual accuracy (%) | 92.29 | 97.60 | 77.23 | 94.43 | 77.78 | 86.12 | ||
The performance of CNN‐2 for multiclassification with the resampling process
| Test data | Accuracy (%) | Mean accuracy (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| Badger | Bird | Cat | Fox | Rat | Rabbit | |||
| Prediction | ||||||||
| Badger | 434 | 3 | 6 | 24 | 2 | 11 | 86.85 | 87.04 |
| Bird | 2 | 439 | 1 | 2 | 3 | 4 | ||
| Cat | 13 | 2 | 281 | 90 | 7 | 7 | ||
| Fox | 7 | 1 | 26 | 644 | 7 | 8 | ||
| Rat | 6 | 5 | 2 | 13 | 137 | 6 | ||
| Rabbit | 5 | 8 | 9 | 35 | 15 | 245 | ||
| Individual accuracy (%) | 92.93 | 95.85 | 86.46 | 79.70 | 80.12 | 87.19 | ||
Figure 7An example of a detected active frame. (a) An input frame; (b) the average variation of the activated pixels in the current frame, which is calculated in Equation 5; (c) the activated pixels in Equation 4, with the blue arrow indicating the estimated movement in the next frame; (d) the classification result of the frame of interest
| Require |
| Load images from the source, weights are randomly initialized |
| Ensure |
| 1. Image input layer [480 × 640 × 3] with “zero center” normalization |
| 2. Convolution layer 1. 50 [13 × 13 × 13] convolution kernels with stride of [4 4], no padding is applied |
| 3. ReLU nonlinear function 1 |
| 4. Max pooling 1, with size [3,3] with stride 2, no padding is applied |
| 5. Batch normalization 1 |
| 6. Convolution layer 2. 80 [5 × 5 × 50], with padding [1 1 1 1] applied ([top bottom left right]) |
| 7. ReLU nonlinear function 2 |
| 8. Max pooling 2, with size [2 2] with stride [2 2], no padding is applied |
| 9. Batch normalization 2 |
| 10. Convolution layer 3. 100 [3 × 3 × 80] with stride of [1 1], with padding [1 1 1 1] is applied |
| 11. ReLU nonlinear function 3 |
| 12. Max pooling 3, with size [2 2] with stride [2 2], no padding is applied |
| 13. Batch normalization 3 |
| 14. Convolution layer 4. 100 [3 × 3 × 100] with stride of [1 1], with padding [1 1 1 1] is applied |
| 15. ReLU nonlinear function 4 |
| 16. Max pooling 4, with size [2 2] with stride [2 2], with padding [0 0 0 1] is applied |
| 17. Batch normalization 4 |
| 18. Fully connected layer 1, with 1,000 neurons |
| 19. ReLU nonlinear function |
| 20. Dropout layer, with probability is set to 0.5 |
| 21. Fully connected layer 2, with 6 neurons. (If the task is only to distinguish badger and nonbadger, change the number of neurons to 2.) |
| 22. Softmax layer |
| 23. Classification layer |
| Require |
| Resize the input images to the size of [227 × 227 × 3] (image size [227 × 227] and image channel [3]) |
| Ensure |
| 1. Image input layer [227 × 227 × 3] with “zero center” normalization |
| 2. Convolution layer 1. 96 [11 × 11 × 3] convolution kernels with stride of [4 4], no padding is applied |
| 3. ReLU nonlinear function 1 |
| 4. Max pooling 1, with size [3 3] with stride 2, no padding is applied. |
| 5. Batch normalization 1 |
| 6. Convolution layer 2. 256 [5 × 5 × 48], with padding [1 1], padding size of [2 2 2 2] |
| 7. ReLU nonlinear function 2 |
| 8. Batch normalization 2 |
| 9. Max pooling 2 |
| 10. Convolution layer 3. 384 [3 × 3 × 256] with stride of [1 1], with padding [1 1 1 1] |
| 11. ReLU nonlinear function 3 |
| 12. Convolution layer 4. 384 [3 × 3 × 192] with stride of [1 1], with padding [1 1 1 1] |
| 13. ReLU nonlinear function 4 |
| 14. Convolution layer 5. 256 [3 × 3 × 192] with stride of [1 1], with padding [1 1 1 1] |
| 15. ReLU nonlinear function 5 |
| 16. Max pooling 5 |
| 17. Fully connected layer 1, with 4,096 neurons |
| 18. ReLU nonlinear function |
| 19. Dropout layer 1, with probability set to 0.5 |
| 20. Fully connected layer 2, with 4,096 neurons |
| 21. ReLU nonlinear function |
| 22. Dropout layer 2, with probability set to 0.5 |
| 23. Fully connected layer 3, with 6 neurons. (If the task is only to distinguish badger and nonbadger, change the number of neurons to 2.) |
| 22. Softmax layer |
| 23. Classification layer |