| Literature DB >> 35578372 |
Cory Overton1, Michael Casazza2, Joseph Bretz3, Fiona McDuie2,4, Elliott Matchett2, Desmond Mackell2, Austen Lorenz2, Andrea Mott2, Mark Herzog2, Josh Ackerman2.
Abstract
BACKGROUND: Identifying animal behaviors, life history states, and movement patterns is a prerequisite for many animal behavior analyses and effective management of wildlife and habitats. Most approaches classify short-term movement patterns with high frequency location or accelerometry data. However, patterns reflecting life history across longer time scales can have greater relevance to species biology or management needs, especially when available in near real-time. Given limitations in collecting and using such data to accurately classify complex behaviors in the long-term, we used hourly GPS data from 5 waterfowl species to produce daily activity classifications with machine-learned models using "automated modelling pipelines".Entities:
Keywords: Anatidae; Animal behavior; Automated model pipeline; Biologging; Classification; Daily activity; Daily activity routine; Global positioning system; Life history state; Supervised machine learning; Telemetry; Waterfowl
Year: 2022 PMID: 35578372 PMCID: PMC9109391 DOI: 10.1186/s40462-022-00324-7
Source DB: PubMed Journal: Mov Ecol ISSN: 2051-3933 Impact factor: 5.253
Distribution of 8 life history states or movement patterns used to train and validate machine learned classification models
| Brooding | Dead | Local | Migration | Molt-like | Molting | Nesting | Regional relocation | |
|---|---|---|---|---|---|---|---|---|
| Northern Pintail | 0 | 54 | 1238 | 64 | 422 | 56 | 0 | 103 |
| American Wigeon | 0 | 0 | 0 | 4 | 12 | 0 | 0 | 0 |
| Cinnamon Teal | 8 | 522 | 188 | 3 | 166 | 6 | 102 | 9 |
| Mallard | 41 | 1 | 1914 | 4 | 1466 | 190 | 107 | 39 |
| Gadwall | 56 | 0 | 971 | 24 | 896 | 184 | 80 | 34 |
| Undeployed | 0 | 370 | 0 | 0 | 0 | 0 | 0 | 0 |
Annotation was performed on daily sets of 24 hourly GPS locations obtained between January 2015 and August 2020 from 131 free-living waterfowl representing 5 species in North America, includes GPS locations from two undeployed transmitters representing bird mortality
Fig. 1Extent of 224,016 GPS locations obtained from 131 individual ducks of 5 species and representing 9334 bird-days. Daily sets of hourly location data used to train and validate machine learned classification models for dabbling duck life history states and movement patterns
Input data elements consisted of 24 GPS locations collected hourly within a single day
| Data frequency | Number of bird days | Number of augmented (hourly) data elements |
|---|---|---|
| Hourly | 2260 | 2260 |
| Half-hourly | 1941 | 3882 |
| Quarter-hourly | 798 | 3192 |
| Total | 4999 | 9334 |
Higher frequency locations were subset to consistent hourly sets to augment available training data. Data collected between January 2015 and August 2020 from 131 free-living waterfowl representing 5 species in North America, includes GPS locations from two undeployed transmitters representing bird mortality
Fig. 2Each daily set of hourly GPS locations were classified into 8 life history categories representing the daily activities of waterfowl including: A brooding; B dead; C local movements; D migration; E molt-like movements; F molting; G nesting; H regional relocation movements
Candidate model pipeline framework and data transformation steps produced by SageMaker Autopilot©
| Model # | Framework | Data transformation steps |
|---|---|---|
| 1 | XGBoost | Create Threshold One Hot Encoding (threshold = 30) for categorical/sparse features |
| 2 | LinearLearner | Converts features with extreme values to a uniform distribution Feature dimension reduction using PCA |
| 3 | LinearLearner | Scaling and centering features while accounting for data sparsity only |
| 4 | XGBoost | Create threshold one hot encoding (threshold = 5) for categorical/sparse features |
| 5 | LinearLearner | Create threshold one hot encoding (threshold = 6) for sparse features Feature dimension reduction using PCA |
| 6 | LinearLearner | Create threshold one hot encoding (threshold = 7) for categorical/sparse features |
| 7 | LinearLearner | Create threshold one hot encoding (threshold = 7) for categorical/sparse features Feature dimension reduction using PCA |
| 8 | XGBoost | Create threshold one hot encoding (threshold = 7) for categorical/sparse features |
| 9 | XGBoost | Create threshold one hot encoding (threshold = 9) for categorical/sparse features |
| 10 | MLP | Scaling and centering features while accounting for data sparsity only |
Data processing steps utilize functions from the AWS ScikitLearn extention (https://github.com/aws/sagemaker-scikit-learn-extension, copyright AWS 2019). Models represent 3 frameworks: Extreme Gradient Descent (XGBoost); Stochastic Gradient Descent (aka LinearLearner); and Multi-Layered Perceptron. Data transformation for each candidate pipeline automatically included imputation of missing values which were not present in training data. Each candidate model included a processing step to scale and center features while accounting for data sparsity
Performance metrics for 10 candidate model pipelines (Model Numbers from Table 3) classifying daily activity of waterfowl into 8 classes using GPS-derived feature datasets reflecting movement and timing, habitat, and history of movement
| Evaluation metric | Model number (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| Accuracy | 95.2 | 86.4 | 86.7 | 94.8 | 85.8 | 81.8 | 85.5 | 94.8 | 94.9 | 92.4 |
| Macro-precision | 96.3 | 76.3 | 80.7 | 96.3 | 76.7 | 70.3 | 75.4 | 96.3 | 96.3 | 86.5 |
| Macro-recall | 87.1 | 71.6 | 72.5 | 86.7 | 69.9 | 63.4 | 70.0 | 86.7 | 87.2 | 82.9 |
| Macro-F1 | 89.9 | 73.3 | 74.6 | 89.7 | 72.1 | 65.8 | 71.8 | 89.7 | 89.9 | 84.1 |
| Weighted-precision | 95.3 | 85.7 | 86.4 | 94.8 | 85.3 | 80.9 | 84.7 | 94.8 | 94.9 | 92.3 |
| Weighted-recall | 95.2 | 86.4 | 86.7 | 94.8 | 85.8 | 81.8 | 85.5 | 94.8 | 94.9 | 92.4 |
| Weighted-F1 | 86.0 | 86.3 | 94.6 | 85.4 | 81.1 | 84.9 | 94.6 | 94.7 | 92.2 | |
Due to class imbalance, we determined the best performing model using the weighted-F1 score, in bold
Confusion matrix and class specific performance metrics of the best performing, optimized, model pipeline using all three feature sets (movement and timing, habitat, and history) to classify daily activity of waterfowl into 8 classes
| Actual class | Predicted class | F1-score | Precision | Recall | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Brood | Dead | Local | Migration | Molt-like | Molting | Nesting | Regional relocation | ||||
| Brooding | 8 | 0 | 2 | 0 | 11 | 0 | 0 | 0 | 0.552 | 1.000 | 0.381 |
| Dead | 0 | 189 | 0 | 0 | 0 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 |
| Local | 0 | 0 | 839 | 0 | 20 | 0 | 0 | 3 | 0.969 | 0.964 | 0.973 |
| Migration | 0 | 0 | 0 | 19 | 0 | 0 | 0 | 1 | 0.974 | 1.000 | 0.950 |
| Molt-like | 0 | 0 | 27 | 0 | 561 | 3 | 2 | 0 | 0.932 | 0.918 | 0.946 |
| Molting | 0 | 0 | 0 | 0 | 14 | 73 | 0 | 0 | 0.896 | 0.961 | 0.839 |
| Nesting | 0 | 0 | 2 | 0 | 5 | 0 | 51 | 0 | 0.919 | 0.962 | 0.879 |
| Regional relocation | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 0.949 | 0.902 | 1.000 |
Class specific F1-scores and overall weighted F1-score across all classes (in bold) from best performing model using different combinations of available feature sets
| All feature sets | Movement and timing and habitat features | Movement and timing and history features | Movement and timing features only | |
|---|---|---|---|---|
| Brooding | 0.552 | 0.240 | 0.000 | 0.000 |
| Dead | 1.000 | 0.992 | 0.997 | 0.984 |
| Local | 0.969 | 0.954 | 0.965 | 0.946 |
| Migration | 0.974 | 0.947 | 0.974 | 0.923 |
| Molt-like | 0.932 | 0.899 | 0.909 | 0.856 |
| Molting | 0.896 | 0.824 | 0.764 | 0.577 |
| Nesting | 0.919 | 0.899 | 0.897 | 0.792 |
| Regional relocation | 0.949 | 0.923 | 0.935 | 0.895 |