| Literature DB >> 35327089 |
Matthias Zuerl1, Philip Stoll1, Ingrid Brehm2, René Raab1, Dario Zanca1, Samira Kabri1, Johanna Happold1, Heiko Nille1, Katharina Prechtel2, Sophie Wuensch2, Marie Krause2, Stefan Seegerer3, Lorenzo von Fersen4, Bjoern Eskofier1.
Abstract
The monitoring of animals under human care is a crucial tool for biologists and zookeepers to keep track of the animals' physical and psychological health. Additionally, it enables the analysis of observed behavioral changes and helps to unravel underlying reasons. Enhancing our understanding of animals ensures and improves ex situ animal welfare as well as in situ conservation. However, traditional observation methods are time- and labor-intensive, as they require experts to observe the animals on-site during long and repeated sessions and manually score their behavior. Therefore, the development of automated observation systems would greatly benefit researchers and practitioners in this domain. We propose an automated framework for basic behavior monitoring of individual animals under human care. Raw video data are processed to continuously determine the position of the individuals within the enclosure. The trajectories describing their travel patterns are presented, along with fundamental analysis, through a graphical user interface (GUI). We evaluate the performance of the framework on captive polar bears (Ursus maritimus). We show that the framework can localize and identify individual polar bears with an F1 score of 86.4%. The localization accuracy of the framework is 19.9±7.6 cm, outperforming current manual observation methods. Furthermore, we provide a bounding-box-labeled dataset of the two polar bears housed in Nuremberg Zoo.Entities:
Keywords: Ursus maritimus; animal behavior; animal monitoring; animal welfare; behavior observation; deep learning; object detection
Year: 2022 PMID: 35327089 PMCID: PMC8944680 DOI: 10.3390/ani12060692
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 2.752
Current video-based frameworks for animal behavior monitoring. They are listed according to the requirements that must be addressed for the present zoo setting: (a) species-unspecific approach; (b) identification of individuals; (c) applicable in the zoo setting (varying camera angles, low camera resolutions, varying light conditions, large enclosures); the last column lists the extracted behavioral features.
| Framework | (a) | (b) | (c) | Output |
|---|---|---|---|---|
| Unspecific | ID | Zoo | ||
| ChickTrack [ | ✓ | ✗ | ✓ | locomotion |
| Nakamura et al. [ | ✗ | ✗ | ✗ | pose estimation |
| Swarup et al. [ | ✗ | ✗ | ✓ | activity recognition |
| DeepLabCut [ | ✓ | ✗ | ✗ | pose estimation |
| Nilsson et al. [ | ✓ | ✗ | ✗ | count |
| Kashiha et al. [ | ✓ | ✗ | ✗ | locomotion |
| Blyzer [ | ✓ | ✗ | ✗ | trajectory |
| idTracker [ | ✓ | ✓ | ✗ | trajectory |
| GroupTracker [ | ✓ | ✓ | ✗ | trajectory |
|
| ✓ | ✓ | ✓ | trajectory |
Figure 1A high-level overview of the proposed framework. It takes raw videos as input and outputs labeled trajectories as well as basic statistics of the observed animal behavior. There are 4 major stages: animal detection (1), classification of individuals (2), coordinate transformation (3) from the image plane to the enclosure map and finally a basic analysis (4) of the trajectories.
Figure 2Example picture taken with one of the three cameras. Both animals are walking in one of the two outdoor enclosures. The polar bear on the left is Vera, the one on the right is Nanuq.
Figure 3Accordance rate after first (top) and second (bottom) labeling round. The peak at in the first labeling round is due to instances where only two of three experts found an animal, resulting in when the pairwise agreement is computed. The same is true for the instances where all three experts found the same animal, but only two assigned the same identity. After the second collaborative round, this peak almost vanishes, implying a very high consistency in annotation for the dataset. Please note that instances without any animal (resulting in ) were excluded from this graph for a clearer presentation.
Figure 4Schematic representation of the implemented coordinate transformation. The enclosure is divided into segments, which represent flat surfaces in good approximation. For each segment a homography matrix is determined, which then transforms the coordinates to the map of the enclosure.
Day-wise splitting of data. All images in the dataset were acquired in the same week from 27 April to 1 May in 2020. The second row states how many instances with polar bears were used (excluding empty images).
| Day | Total | |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||
|
| 900 | 850 | 950 | 700 | 1050 |
|
|
| 477 | 419 | 406 | 383 | 414 |
|
Results of experiment 1. yolo was trained and evaluated in a five-fold cross-validation. The task is the detection of the class polar bear. The F1 score is calculated at different IoU thresholds for the definition of a valid detection. Additionally, the mean IoU is given in the last row.
| Metric | Fold | |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | Overall | |
|
| 0.961 | 0.943 | 0.948 | 0.950 | 0.906 | |
|
| 0.961 | 0.940 | 0.944 | 0.947 | 0.897 | |
|
| 0.942 | 0.900 | 0.908 | 0.906 | 0.839 | |
|
| 0.194 | 0.225 | 0.232 | 0.338 | 0.146 | |
|
| 0.824 | 0.786 | 0.794 | 0.807 | 0.709 | |
Comparison of different state-of-the-art networks for image classification. The F1 score is given as a mean result of all runs of the day-wise five-fold cross-validation including the overall standard deviation. Inference time (IT) was evaluated on a single batch of size 8 on a Nvidia GeForce RTX 2060.
| Architecture | F1 Score | IT [ms] | |||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | Overall | ||
| ResNet18 [ | 0.971 | 0.961 | 0.882 | 0.892 | 0.865 | 3.7 | |
| ResNet50 [ | 0.956 | 0.944 | 0.890 | 0.846 | 0.796 | 8.4 | |
| ResNet101 [ | 0.908 | 0.902 | 0.812 | 0.838 | 0.698 | 15.5 | |
| MobileNetV2 [ | 0.972 | 0.963 | 0.894 | 0.850 | 0.888 | 7.4 | |
| ResNeXt50 [ | 0.968 | 0.921 | 0.857 | 0.841 | 0.831 | 12.3 | |
| DenseNet121 [ | 0.949 | 0.936 | 0.868 | 0.923 | 0.863 | 20.0 | |
Experiment 3 investigated the performance when using yolo for detecting the animals and ResNet18 for classifying them. Experiment 4 assessed the possibility of using yolo for both object detection and classification. For both experiments we evaluated precision, recall and resulting F1 scores at different thresholds for the IoU. The scores are given for both individual animals as well as the weighted average (w. a.).
| Experiment | IoU Threshold | Precision | Recall | F1 Score | ||
|---|---|---|---|---|---|---|
| Vera | Nanuq | w. a. | ||||
|
| 0.50 | 0.920 | 0.832 | 0.900 | 0.842 |
|
| 0.75 | 0.920 | 0.786 | 0.882 | 0.808 |
| |
|
| 0.50 | 0.908 | 0.780 | 0.874 | 0.800 |
|
| 0.75 | 0.910 | 0.728 | 0.856 | 0.748 |
| |
Figure 5Graphical representation of the result of experiment 6. A person followed a trajectory while being tracked by the proposed framework. The output of the framework is shown in orange. At the same time, the person was positioned by two laser-based distance measuring devices. This trajectory, assumed to be ground truth, is depicted in blue.
Figure 6Difficult and unusual instances of the dataset. The first image shows Nanuq in a sandbox far away from the camera. The second image shows Nanuq standing. The third image shows Nanuq partly occluded. The last image shows Vera swimming.