Literature DB >> 35362103

pamlr: A toolbox for analysing animal behaviour using pressure, acceleration, temperature, magnetic or light data in R.

Kiran L Dhanjal-Adams^1,2,3, Astrid S T Willener¹, Felix Liechti¹.

Abstract

Light-level geolocators have revolutionised the study of animal behaviour. However, lacking spatial precision, their usage has been primary targeted towards the analysis of large-scale movements. Recent technological developments have allowed the integration of magnetometers and accelerometers into geolocator tags in addition to barometers and thermometers, offering new behavioural insights. Here, we introduce an R toolbox for identifying behavioural patterns from multisensor geolocator tags, with functions specifically designed for data visualisation, calibration, classification and error estimation. More specifically, the package allows for the flexible analysis of any combination of sensor data using k-means clustering, expectation maximisation binary clustering, hidden Markov models and changepoint analyses. Furthermore, the package integrates tailored algorithms for identifying periods of prolonged high activity (most commonly used for identifying migratory flapping flight), and pressure changes (most commonly used for identifying dive or flight events). Finally, we highlight some of the limitations, implications and opportunities of using these methods.

Entities: Chemical

Keywords: SOI-GDL3pam; behaviour; classification; clustering; embc; geolocator; hmm; k-means

Mesh：

Year: 2022 PMID： 35362103 PMCID： PMC9542251 DOI： 10.1111/1365-2656.13695

Source DB: PubMed Journal: J Anim Ecol ISSN： 0021-8790 Impact factor: 5.606

INTRODUCTION

Light‐level geolocators (GLS) revolutionised the study of animal behaviour. Thanks to their relatively low cost, geolocators were first deployed in the 90s on marine species and integrated with depth and temperature sensors (hereafter TDR for time‐depth recorder) to study diving behaviour in species as diverse as whales (Hooker & Baird, 1999), seabirds (Naito et al., 1990), seals (Burns & Castellini, 1998), turtles (Witt et al., 2010) and fish (Musyl et al., 2003; West & Stevens, 2001). Further miniaturisation of GLS technology in the last 20 years led to the method rapidly becoming popular for the analysis of small migratory birds (Egevang et al., 2010; Lisovski & Hahn, 2012), with species as small as 10 g now tagged (Bridge et al., 2011). Further miniaturisations of accelerometers and magnetometers in the last decade have led to their integration into GLS tags in addition to barometers and thermometers—creating a long‐lived (~1 year), lightweight (~0.5–1.5 g) and cheap (~$100) multisensor tag (hereafter PAM logger for Pressure, Accelerometer and Magnetometer). These PAM loggers have opened up new opportunities for analysing animal behaviour, particularly during flight (Bäckman, Andersson, Alerstam, et al., 2017; Dhanjal‐Adams et al., 2018; Liechti et al., 2013, 2018; Meier et al., 2018; Sjöberg et al., 2018, 2021) and have become increasingly popular over the last 5 years. Although an important reason behind the development of PAM loggers was to improve the accuracy of the geolocation estimates (Lisovski et al., 2020), these additional sensors have opened the door to a wide range of behavioural analyses beyond migration, and have allowed a wider range of species to be tagged. In particular, additional sensors allow nocturnal species be tagged such as mice, lemurs and bats (pers. obs.). Indeed, barometers allow us to explore behaviour in the third dimension—height and depth—and can inform on diving, climbing, flying, flocking and foraging behaviours (Dhanjal‐Adams et al., 2018; Dreelin et al., 2018; Meier et al., 2018; Sjöberg et al., 2018). Accelerometers can be used to understand resting behaviour, migration timing, and how long animals remain active, airborne or foraging (Liechti et al., 2018; Sjöberg et al., 2018). Thermometers can inform on habitat usage (Edwards et al., 2016; Shaffer et al., 2005), fitness level and infection status, and magnetometers can be used to understand bearing and direction (Bidder et al., 2015). Light can also be used to estimate geographic location (Frisius, 1544; Lisovski & Hahn, 2012; Shaffer et al., 2005), but importantly to understand diving, flying and nesting behaviour (Bulla et al., 2016). Although all of these sensors have previously been integrated into GPS tags with an increasing number of methods available for identifying behavioural states using such sensors, these analyses rely heavily on (a) movement data and precise location estimates to infer behaviour from turning angles (e.g. Garriga et al., 2016; Munden et al., 2019; Potts et al., 2018; Seidel et al., 2018; Williams et al., 2020), (b) multi‐second tri‐axial acceleration and bearing (Bidder et al., 2015; Hernández‐Pliego et al., 2017; Willener et al., 2016; Williams et al., 2017) and/or (c) validation datasets for supervised machine learning (Leos‐Barajas et al., 2017; Resheff et al., 2014). PAM loggers, however, (a) cannot provide spatial information to infer turning angles. Furthermore, due to the weight restrictions of using them on smaller species, they (b) can only collect data over minutes or hours (not milliseconds) and data are often summarised to save space and reduce tag weight. Finally, (c) they are most commonly deployed on flying or diving animals that are physically impossible to follow, making training datasets impossible to collect for supervised machine learning. Behavioural analyses of multisensor data therefore differ conceptually from any previously developed behavioural classification methods for movement data, because behaviour must be identified independent of location. Here, we introduce , a toolbox for identifying behavioural patterns from Pressure, Acceleration, temperature, Magnetic or Light data in R (R Core Team, 2019). The package combines functions (Figure 1) for importing data from multisensor geolocator tags (from any or all of these combinations of sensors), functions for calculating and plotting data, and wrappers for different classification algorithms (changepoint, clustering and hidden Markov models) to infer behaviour. We also introduce functions specifically developed for identifying endurance activities, sustained pressure changes, and for collecting summary statistics. Finally, pamlr includes functions for comparing the agreement between different model outputs. Fully worked and up‐to‐date example data and analyses of the code presented in this paper can accessed at https://kiranlda.github.io/PAMLrManual/.

FIGURE 1

Example of a typical workflow in pamlr with available functions for each step to the analyses

FROM SENSOR READINGS TO BEHAVIOURAL PATTERNS

For all loggers, data intervals and sensors are logger dependent and customisable, requiring trade‐offs between logging frequency, weight and battery life. As technology progresses data are likely to increase in resolution, but we present here some of the most common sensors and their uses, and how they can be used within pamlr.

Pressure

Pressure is available on a wide range of tags (Table 1) and can be informative because it changes throughout the day and across the globe in response the weather changes. Of particular interest to behavioural analyses is that pressure also decreases with height (International Organization for Standardization, 1975). Attaching a barometer to an animal therefore allows us to use peaks in pressure to quantify diving behaviour (such as depth and duration), while dips in pressure can be used to quantify flight or climbing behaviour (such as height and duration; Liechti et al., 2018; Tremblay et al., 2009). Indeed, standard calculations exist to estimate altitude, saltwater depth and freshwater depth from pressure recordings (International Organization for Standardization, 1975). These standard calculations are included as functions calculate_altitude and calculate_depth within pamlr. The precision of the height estimate from a barometer can range from 1.4 to 85.29 m assuming pressure was 1,000 Pa greater than sea level pressure (Dreelin et al., 2018; Shipley et al., 2018). It is therefore advisable to calibrate pressure on the tag before or after deployment in a known location with a known altitude and atmospheric pressure for terrestrial uses, and known depth and water pressure for aquatic uses. Finally, pressure is also useful for refining geolocation estimates by correlating pressure data from the animal to weather data (Lisovski et al., 2020).

TABLE 1

Summary of some of the available multisensor PAM and TDR loggers

Tag	Light‐sensor	Barometer	Thermometer	Conductivity (wet/dry)	Raw accelerometer data stored	On‐board activity calculated	On‐board pitch calculated	Magnetometer	Weight
SOI‐GDL3pam Swiss Ornithological Institute	✓ 5 min	✓ 15 min	✓ 15 min		✓ 4 hr	✓ 5 min	✓ 5 min	✓ 4 hr	1.3 g
Intigeo Migratetech	✓	✓	✓	✓		✓	✓		Dependent on sensor combination
Activity logger Centre for Animal Movement Research (CAnMove)	✓ Intermittent	✓ 1 hr	✓ 1 hr			✓ 1 hr			1.2 g
Cefas G7	✓	✓ Water‐proof	✓	✓	✓			✓	16.7 g
Cefas G5	✓	✓ Water‐proof	✓	✓					2.7 g
Lotek ArcGeo	✓	✓ Water‐proof	✓	✓					3.4 g
Lotek LAT	✓	✓ Water‐proof	✓	✓					6.2–10 g
Lotek MK	✓		✓	✓					0.38–2.9 g
Lotek Flight	✓								0.3 g

Summary of some of the available multisensor PAM and TDR loggers SOI‐GDL3pam Swiss Ornithological Institute ✓ 5 min ✓ 15 min ✓ 15 min ✓ 4 hr ✓ 5 min ✓ 5 min ✓ 4 hr Activity logger Centre for Animal Movement Research (CAnMove) ✓ Intermittent ✓ 1 hr ✓ 1 hr ✓ 1 hr ✓ Water‐proof ✓ Water‐proof ✓ Water‐proof ✓ Water‐proof

Temperature

Many tags also record temperature. One primary advantage of using temperature in an analysis is that it is available on most geolocator tags including some of the smaller ones (<0.5 g) which do not record pressure (lightweight Intigeo and Lotek MK). Indeed, temperature fluctuates daily and regionally, and with height and depth similarly to pressure. However, temperature does not decrease as much as pressure with altitude making it much harder to analyse. In addition, body heat and feathers can bias measurements because the sensor is recording a mix of ambient temperature and the animal's temperature. To minimise this bias, some tags include two sensors, one under and another on top of the device, to capture the temperature of the animal and the temperature of the atmosphere. Another alternative is to attach the logger to an area that is not likely to be covered in fur or feathers. For instance, TDR loggers are often attached to seabird legs to record sea surface temperature. Furthermore, temperature readings can be used to refine location estimates (Halpin et al., 2021).

Activity

Accelerometers are rapidly developing and changing the face of animal behavioural research. Many methods are now available for analysing ultra‐high‐resolution tri‐axial acceleration data and/or supervised machine learning methods (Leos‐Barajas et al., 2017; Resheff et al., 2014; Wang et al., 2015; Williams et al., 2017; Wilson et al., 2018). Although state‐of‐the‐art, such methods perform poorly with low‐resolution accelerometer data because they rely on fine‐scale patterns for classification, and the use of metrics such as vectorial dynamic body acceleration (VeDBA; Qasem et al., 2012). Such data are not collected by lightweight PAM loggers. Indeed, PAM loggers have on‐board algorithms which calculate summary statistics such as activity and pitch, allowing the tag to record for a full year or more at 5 min to 1 hr intervals (for full details see Bäckman, Andersson, Alerstam, et al., 2017; Bäckman, Andersson, Pedersen, et al., 2017; Liechti et al., 2013, 2018). Currently two on‐board algorithms exist for estimating dynamic acceleration or ‘activity’, one where 50 values are sampled every 5 min with 100 Hz frequency and used to determine whether the species was active or not yielding a 0 or 1 score for the 5 min period, all 5 min scores are summed to yield an hourly activity value between 0 (inactive) and 12 (active for 60 min; Bäckman, Andersson, Alerstam, et al., 2017; Bäckman, Andersson, Pedersen, et al., 2017). The other method samples 32 values every 5 min with 10 Hz frequency to estimate representing the relative position of the body axis with respect to the horizontal plane (pitch) and uses the sum of the absolute differences between consecutive points along the z‐axis to estimate a 5 min ‘activity’ value (Liechti et al., 2013, 2018). In some rare cases, raw tri‐axial accelerometer is stored on PAM loggers, in which case pamlr integrates the function calculate_triaxial_accelerometer to calculate roll, pitch and yaw from this tri‐axial data (Bidder et al., 2015). However, the best data resolution currently available is every 4 hr, making such data of limited use, although there is potential for exploiting this capability in coming years as tags improve.

Magnetic field

Magnetic field data are only recorded on devices which also have an accelerometer. Magnetometers can be used for estimating an animal's body posture and heading (Bidder et al., 2015). Furthermore, magnetic field changes across the globe, and can also be used to refine location estimates from light (Lisovski et al., 2020). Currently, tri‐axial magnetic data are only recorded every 4 hr on the best of tags, limiting their usefulness for behavioural analyses until PAM loggers increase in data storage capacity. To this end, pamlr integrates a function calculate_triaxial_magnetic following calibration methods of Bidder et al. (2015). Indeed, magnetic data recording can be distorted by the presence of ferrous materials or magnetism near the sensor.

Light

Beyond geolocation and the estimation of large‐scale foraging or migratory movements (Frisius, 1544; Lisovski & Hahn, 2012; Shaffer et al., 2005), light sensors have also proven useful for monitoring incubation behaviour and nest success (Bulla et al., 2016), and can generally be used to understand the behaviour of any species that enters and exits a burrow, nest box or cave during daylight hours. In the context of aquatic species, light decreases with depth and can be used to better understand diving behaviour and visibility (van Dam & Diez, 1997). One of the primary advantages when using light for analyses, is that these tags are manufactured by many companies, are lightweight and cheap, and are well‐established with a wide range of available analysis methods (Table 1; Lisovski et al., 2020).

ANALYSING MULTISENSOR DATA

Note that we do not describe the analysis of time‐depth recorders (TDRs) in much detail as these loggers have well developed software and packages available (Luque & Fried, 2011; Tremblay et al., 2003) although their data can be analysed in pamlr using the function convert_tdr. Here, we instead focus on PAM loggers that contain many shared sensors with TDR (light, pressure and temperature) but also contain accelerometer and magnetometer sensors used to derive activity and pitch. Throughout the manuscript, we illustrate the use of pamlr using an example of a Hoopoe (Upupa Epops) with a SOI‐GDL3pam logger tagged in Switzerland in August 2016 in the Valais region (exact location not given for conservation purposes) and tagged over the course of a year. We outline with this example (a) how to import the data, (b) how to visualise, plot and explore the data, (c) what approaches can be taken to format the data for analyses, (d) describe available analysis methods and finally and (e) how these methods can be compared.

Step 1: Data import

Currently pamlr is set‐up to read files with the following extensions: ‘.pressure’, ‘.glf’, ‘.gle’, ‘.acceleration’, ‘.temperature’, ‘AirTemperature’, ‘BodyTemperature’ and ‘.magnetic’, where each file contains a dataframe with date and time in one column and in the other column the associated sensor measurements. Additionally, the first six lines of the file describe the Geolocator ID, Starttime RTC, StoptimeRTC, Stoptime reference and Terminal version. For users with data that do not follow this format, it is possible to either format and save data following this format before import into R, or import data and format it within R. Indeed, the create_import function inputs the folder path of all the sensor files and returns a nested list containing all the measurements (Box 1). Users can access example datasets. We encourage anyone using logger data which cannot be read by pamlr to contact us through https://github.com/KiranLDA/PAMLr/issues so that we may accommodate different data inputs.

Step 2: Visualisation the data using plot_… functions

For complete and up‐to‐date code and examples on how to visualise data in pamlr, users can access https://kiranlda.github.io/PAMLrManual/dataviz.html

Time series

Time series are a commonly used method of plotting biologging data (Figure 2a) and can be implemented in pamlr using the function plot_timeseries. This function is useful for a rapid view of the data. However, such plots can become noisy with large datasets, and pamlr therefore also offers interactive time‐series plotting using the function plot_interactive_timeseries. This function exploits the dygpaph R package (Vanderkam et al., 2018) to create interactive plots that allow the user to zoom into the data by right clicking and highlighting certain regions. A double click can be used to zoom out. The plots for different sensors are all synched to the same time period, so that the user can view the same time period over multiple sensors. A timeline at the bottom can be used to increase or decrease the time over which the data are observed.

FIGURE 2

Different visualisations of magnetic field data for alpine swift Tachymarptis melba. To gain an initial impression of the (a) raw data, it can first be plotted as an interactive time series. However, a great deal of insight can also be gleaned from plotting the data as (b) a sensor image. These suggest that resting periods should be easy to distinguish from others using mY as confirmed by (c) histograms and (d) 3D plots. Data can also be visualised without distortions with (e) an m‐sphere

Sensor images

Actograms are often used to plot activity over time at different hours of the day (Bäckman, Andersson, Pedersen, et al., 2017; Barras et al., 2021; Briedis et al., 2020; Evens et al., 2020). However, the same approach can be used to plot any sensor data, not just activity. For simplicity, we name these ‘sensor images’ (Figure 2b). Sensor images are a good place to start when thinking about analysing data, as they can give a rapid overview of the dataset. Plotting all sensors side by side is an important step for visualising data and developing an understanding of data patterns, and to start thinking about the behaviours that may be driving the observed patterns. This can be done using the function plot_sensorimage. In these plots, the data are summarised for each day over a 24 hr period on each row (x‐axis). The next day is on the row below, therefore all the days that the organism was tagged are stacked on top of each other for a year (y‐axis). This allows us to see how sensor measurements change throughout the day and whether these patterns are consistent from day to day, and throughout the year.

Histograms and 3D plots

Histograms and 3D plots can also help with data interpretation by visualising data clusters. Indeed differences in sensor data may be due to different behaviours, and clustered data will be easier to classify using a clustering algorithm. The functions plot_histogram and plot_interactive_3d respectively allow the user to visualise the data (Figure 2c,d). Finally, tri‐axial magnetic bearing and acceleration can be plotted onto an m‐sphere (Williams et al., 2017) or g‐sphere (Wilson et al., 2016), using the function plot_interactive_sphere (Figure 2e).

Step 3: Data formatting using create_… functions

Once data are plotted and the user is informed on the data patterns that are present, pamlr combines a suite of functions for formatting the data in a meaningful way. For instance, data from different sensors are often collected at different temporal resolutions, and create_custom_interpolation formats data to the same time intervals as a specified variable (e.g. pressure). The function also has options to summarise finer resolution data (median, sum or snapshot) and interpolating (if desired) lower resolution data. This can be helpful for plotting. However, interpolation is not advisable, particularly in the context of data analyses when there are a large numbers of missing data points, as it can create artefacts in the data and lead to false analyses and interpretation. A better alternative for formatting data for analysis when all datasets are at different resolutions is to use a rolling window with create_rolling_window, which progresses across all the time series and creates summary statistics for the data contained within that window of a certain timeframe. These include standard deviation, cumulative sum, minimum, maximum, range and sum of absolute differences. Indeed, these variables come in handy during the classification process (Sakamoto et al., 2009). To this end, the function create_summary_statistics extracts specific patterns from the data into events. These behavioural patterns include (a) continuous high activity which can be extracted from the data using the method “flap”, (b) sustained activity (low and high) using “endurance”, (c) a pressure change greater than the background pressure changes due to weather using “pressure”, (d) a period of continuous light using “light”, (e) a period of darkness using “darkness” and finally (f) periods of resting using “rest”. These functions also calculate summary statistics for each event. These include, but are not limited to, how much the animal changed height during the event, how active it was during that event, whether it was night or day during that event, how long the event lasted, how many other similar events occurred during the same day, how often these events lasted overall that day and whether pressure at the start of the event was different from pressure at the end. For a full list of summary statistics, please refer to https://kiranlda.github.io/PAMLrManual/dataprep.html.

Step 4: Classifying behaviour using classify_… functions

One of the complexities of classifying PAM data into behavioural states is that these data are often (but not exclusively) collected by archival tags on small species that are released and recaptured. In such cases, the user is unable to observe the species while it is diving, flying or migrating—making it impossible to validate behavioural classifications. PAM data analyses must therefore be taken with care. There are two main approaches that can be taken. The first is to develop a hierarchical decision‐based algorithm. This approach is more subjective and relies on an understanding of the specie's ecology and behaviour, and on exploiting this knowledge in the classification. The user can develop algorithms that extract patterns of interest using a series of meaningful decision rules (e.g. Chakravarty et al., 2019; Liechti et al., 2018). pamlr already integrates two such algorithms (see Sections 3.4.1 and 3.4.2) for classifying high endurance activities and sustained pressure changes. The second approach relies on the machine performing the classification unsupervised. This approach is considered more objective, yet still requires some understanding of the species ecology and behaviour when deciding what data are classified, and how. Care must be taken in the process, otherwise the machine can output results that are difficult to interpret due to the ‘black box’ effect. To this end create_summary_statistics and create_rolling_window become useful for generating commonly used statistical summaries for data classification (Sakamoto et al., 2009).

Classification of endurance activity

pamlr integrates an algorithm for identifying periods of endurance high activity. This is one of the most common and useful applications of pamlr, as it can be used to identify periods of migratory flapping flight in passerines (Barras et al., 2021; Briedis et al., 2020; Evens et al., 2020). This functionality is included in the function classify_flap (Figure 3). The function differentiates between inactive and active periods, whereby the active periods are grouped into low and high activity using either k‐means clustering or hidden Markov models and finally periods of sustained high activity are identified and formatted into a timetable, with the start, stop and duration of each endurance event. This is similar to outputs from the function changeLight in the GeoLight package (Lisovski et al., 2020; Lisovski & Hahn, 2012). However, changeLight uses variations in daylight hours to calculate migration, while pamlr uses the activity. The estimated migratory timetable is therefore much more precise (within 5 min) than that estimated from light alone (resolution of 1 day; Bäckman, Andersson, Alerstam, et al., 2017; Liechti et al., 2018; Sjöberg et al., 2018).

FIGURE 3

Schematic representation of the classify_flap algorithm for classifying flapping migratory behaviour. Activity (a) is first divided into inactive and active. Active data are then clustered to define a threshold (thld) between low and high activity. For each high activity event, its duration durA is calculated. If this duration is greater than a user‐defined time t (set to 1 hr by default) then the hoopoe is assumed to be performing migration

Classification of pressure changes

Any variations in pressure that are greater than expected from weather can reliably be classified as diving (Luque & Fried, 2011) or flying behaviour (Dhanjal‐Adams et al., 2018). pamlr therefore integrates the function classify_pressurechange aimed at identifying such periods. Indeed, although activity can be a good classification parameter for some species, for species that travel large distances without exerting much energy (e.g. Williams et al., 2020) pressure can be a useful alternative for identifying flight or dive events through changes in height (Dreelin et al., 2018; Shipley et al., 2018). This function simply finds periods where pressure change is greater than pressure fluctuations expected by weather, and also outputs a timetable with the start, stop and duration of the event.

Unsupervised classification methods: classify_… functions

Although PAM loggers have primarily been developed for migratory passerines (Bäckman, Andersson, Alerstam, et al., 2017; Briedis et al., 2020; Dhanjal‐Adams et al., 2018; Evens et al., 2020; Liechti et al., 2018; Sander et al., 2021; Sjöberg et al., 2018), they can be used with any species whose behaviour is likely to be detected by sensors. In such cases, it is possible for the user to develop their own classification using the classify_changepoint and classify_summary_statistics functions described in the following sections. For examples of how this can done, illustrated code are available online for classifying flap‐gliding and for soar‐gliding flight at https://kiranlda.github.io/PAMLrManual/swift.html and https://kiranlda.github.io/PAMLrManual/soar.html respectively.

Changepoint analysis

Changepoint analyses are implemented in pamlr using the function classify_changepoint. They are used to find the point in a time series when there has been a change in the mean and/or variance of the data. By default, the function is parameterised to find changepoints in pressure data because these can be used to identify the start and end of migration periods in birds. However, the classify_changepoint function is simply a wrapper for the R package changepoint (Killick & Eckley, 2014) and can flexibly be used to find changepoints in any time series. The user can therefore customise the function to fit their needs by modifying whether they are looking for a change in mean, (cpt.method = “mean”), variance (cpt.method = “variance”) or both (cpt.method = “meanvar”) for any sensor or combination of sensors. Note that this function returns points in time, and that the number of changepoints to be identified in the data can either be automatic or user‐defined. For full details please refer to the changepoint R package manual (see Killick et al., 2016).

Cluster analysis

In contrast to finding a point in time where the data have changed, clustering algorithms aim to group points together and assign them to different groups or clusters. These clusters can be used to separate different behaviours, for example, classifying rapid and slow changes in altitude into clusters. Indeed, clustering algorithms find points that are more similar to each other based on a specified criteria. However, there are a number of clustering methods for assigning these criteria and sorting data points into clusters. These clustering algorithms can be accessed through the function classify_summary_statistics using the parameter method. One of the most established clustering methods is k‐means clustering, which minimises the within‐cluster sum of squares of the points (Hartigan & Wong, 1979) and which can be implemented in pamlr by using the method “kmeans” in classify_summary_statistics. In this case, pamlr is a wrapper for the function kmeans from the base R package stats. Note that the user must define the number of clusters. More recently, expectation–minimisation binary clustering (EMbC) has been used for high‐resolution behavioural data analysis (Garriga et al., 2016). The method uses the maximum likelihood estimation of a Gaussian mixture model to assign the data into clusters (Garriga et al., 2016). More specifically binary delimiters are used to segregate the data along an axis, forcing centroids to lie within these binary regions. In short, the method clusters data points based on geometry (Garriga et al., 2016). Analysis can be undertaken in pamlr using the method “embc”. Indeed, pamlr is using a wrapper for the function embc from the R package EMbC and the user can refer to the manual for more information (Garriga et al., 2016). Note that the method also estimates the number of clusters and that these cannot be user‐defined.

Hidden Markov models

Hidden Markov models (HMMs) also allocate classes to time‐series data. However, they are stochastic time‐series models (see Visser & Speekenbrink, 2010; Zucchini et al., 2017) that assume that the observed time series (such as the measured acceleration, temperature or pressure) is driven by an unobservable state process (such as diving, flying, walking or resting behaviour). The unobserved states are allocated in a way that captures as much as possible of the marginal distribution of the observations, while also accounting for the correlation structure of the data. Thus, the probability of the system being in a state at time t depends on the state at the previous time step t − 1, but is otherwise independent of any previous state. HMMs are therefore powerful tools for the analysis of behavioural data, and can be implemented in pamlr using method = “hmm”. In this case, pamlr is wrapping the depmix, posterior and fit functions from the depmixS4 package whereby users can refer to the user manual to better understand how the package works and customise it for their application (see Visser & Speekenbrink, 2010). Note that users must define the number of behavioural states to find in the data.

Step 5: Measuring classification accuracy with compare_… functions

As seen in Box 3, classification can agree in some regions and disagree in others. pamlr offers a function compare_classification which takes multiple classification outputs and summarises the agreement between all, as seen in Box 4. For instances where the user is unsure which approach will work best for classifying their data, we recommend they employ an ‘ensemble’ approach and use all classification methods and look for the overlap in all the methods.The function compare_confusion_matrix also populates a confusion matrix using predicted and reference points. If no reference data are available, the agreement between the two different classifications can instead be compared following standard confusion matrix metrics (Congalton & Green, 2008). Indeed, Errors in Commission provide a measure of false negatives, that is, the number of points that were predicted to be part of a class that they were not (probability something was incorrectly predicted FN/[TP + FN]). Errors in Omission provide a measure of false positives that were predicted to be in a different class from their actual class (probability that something was missed FP/(FP + TP). Producer Accuracy or Precision provides a measure of how likely something was missed by the classification (probability that something was not missed TP/[TP + FP]). User Accuracy or Recall represents the probability that a class was correctly predicted TP/(TP + FN). Overall Accuracy represents the probability that all classes were correctly predicted (TP + TN)/(TP + TN + FP + FN). Finally, kappa coefficient measures the agreement between the classification and the truth ((TN + FP) (TN + FN) + (FN + TP) (FP + TP))/(TP + FP + TN + FN)2.

FOOD FOR THOUGHT

Is it really necessary to tag the animal?

Tagging is not only resource and time intense for scientists, it also comes at a cost to the animal that is being tagged. Many tags are archival meaning the animal must be caught both when attaching and when removing the tag, raising ethical concerns around the stress caused to the animal, and the potential for increased likelihood of death. Indeed, there are cases where tags can compromise a species camouflage, reduce its aero‐ or hydro‐dynamism, cause entanglement in nets, cause stress during breeding, or for them to abandon migration or die of exhaustion from carrying the additional weight. It is therefore the user's responsibility to ensure the research is meaningful and that the tag is being fitted safely and ethically (Brlík et al., 2020; McGowan et al., 2016; Mcmahon et al., 2011).

When did the logger stop recording?

Loggers can record data even when they are not attached to an animal. Often the logger is taken off, stored in a backpack, driven home or posted to a laboratory for download. Users should always ensure the analysis starts and stops when the logger was mounted on the study species, and that the behaviour being classified is not, for example, someone hiking to the field site. The function create_crop is specifically set‐up for getting rid of these unwanted periods. Additionally, because animals can modify their behaviour just after tag attachment (see Section 4.1) these data should be treated with care or removed from the analysis.

Clock drift

As the battery runs out throughout the year, the clock on a logger can gradually become slower and slower. There are a number of methods for correcting for this. The bird/animal will always be caught at a known location. It is best therefore to find the sunset and sunrise times for the location where the logger was fitted, and that of where it was removed, and to see by how many minutes the sunrise and sunset estimated from the light sensor differ from the true sunrise and sunset. It is then possible to linearly interpolate the time series by the known number of minutes. This can be implemented in using the function calculate_clockdrift.

How can I be sure I am classifying biologically meaningful behaviours and not weather patterns?

Pressure, temperature, magnetic field and light can all change as a result of weather, geographic region and animal behaviour. To ensure one is not classifying weather patterns instead of behavioural patterns, it is advised to use a calibration period (as one would with classic geolocation) where the tag is at a known location to measure tag accuracy, and to measure the natural variation of weather patterns as detected by the sensor.

Where was the logger attached?

If an animal is wearing the tag on its back (e.g. bat; Voigt et al., 2020), its neck (e.g. meerkat; Chakravarty et al., 2019) or its leg (e.g. seabird; Halpin et al., 2021) there will be different implications for the data interpretations. For instance, vocalisations can cause vibration on accelerometers when worn on the neck (pers. obs.). Pitch, yaw and roll calculations can be impacted by the location where the tag is attached. Furthermore, even when attached at the same location on the same species, each logger will vary a little in how it is positioned on each animal leading to differences in sensor readings due to how tightly it was attached to the animal. Therefore, classifications developed on one individual are not necessarily transferable to another individual, and can be problematic when going from unsupervised to supervised learning techniques.

What do I do if I encounter a bug in the code?

Any problems with the code or the package can be logged at https://github.com/KiranLDA/PAMLr/issues.

OUTLOOK

Here, we present functions adapted to the analysis of multisensor geolocator tags using an example of a migratory passerine, the hoopoe (Upupa Epops). However, many of the functions in pamlr are set‐up to be flexible and applicable to any species tagged with any combination of light, pressure, temperature, activity or magnetic field sensors and provide important information on the natural history, behaviour and physiology of any species. Furthermore, many multisensor geolocator tags are now customisable and purpose‐built by the manufacturer. Thus, the temporal data resolution of, for instance, tri‐axial accelerometer and magnetometer recordings has the potential to be increased with a shorter battery life, allowing more detailed and complex behavioural classifications to be performed over smaller time‐scales. Thus, methods such as dead reckoning can be used to reconstruct tracks (Bidder et al., 2015) and infer turning angles. This would also allow methods previously developed for finer resolution datasets to be applied (e.g. Bidder et al., 2015; Garriga et al., 2016; Potts et al., 2018). The collection of observation data would also allow for the development of supervised machine learning methods (Valletta et al., 2017). Multisensor geolocator tags therefore provide exciting new opportunities for analysing otherwise unseen behaviours in animals that were previously impossible to tag.

CONFLICT OF INTEREST

The authors have no conflict of interest to declare.

AUTHORS' CONTRIBUTIONS

K.L.D.‐A. led the writing of both package and manuscript. All authors contributed towards the conception and design, analysis and interpretation of data or drafting the article or revising it critically for important intellectual content.

27 in total

1. Weak effects of geolocators on small birds: A meta-analysis controlled for phylogeny and publication bias.

Authors: Vojtěch Brlík; Jaroslav Koleček; Malcolm Burgess; Steffen Hahn; Diana Humple; Miloš Krist; Janne Ouwehand; Emily L Weiser; Peter Adamík; José A Alves; Debora Arlt; Sanja Barišić; Detlef Becker; Eduardo J Belda; Václav Beran; Christiaan Both; Susana P Bravo; Martins Briedis; Bohumír Chutný; Davor Ćiković; Nathan W Cooper; Joana S Costa; Víctor R Cueto; Tamara Emmenegger; Kevin Fraser; Olivier Gilg; Marina Guerrero; Michael T Hallworth; Chris Hewson; Frédéric Jiguet; James A Johnson; Tosha Kelly; Dmitry Kishkinev; Michel Leconte; Terje Lislevand; Simeon Lisovski; Cosme López; Kent P McFarland; Peter P Marra; Steven M Matsuoka; Piotr Matyjasiak; Christoph M Meier; Benjamin Metzger; Juan S Monrós; Roland Neumann; Amy Newman; Ryan Norris; Tomas Pärt; Václav Pavel; Noah Perlut; Markus Piha; Jeroen Reneerkens; Christopher C Rimmer; Amélie Roberto-Charron; Chiara Scandolara; Natalia Sokolova; Makiko Takenaka; Dirk Tolkmitt; Herman van Oosten; Arndt H J Wellbrock; Hazel Wheeler; Jan van der Winden; Klaudia Witte; Bradley K Woodworth; Petr Procházka
Journal: J Anim Ecol Date: 2019-03-13 Impact factor: 5.091

2. Ecological metrics and methods for GPS movement data.

Authors: Dana Paige Seidel; Eric Dougherty; Colin Carlson; Wayne M Getz
Journal: Int J Geogr Inf Sci Date: 2018-07-23 Impact factor: 4.186

3. Tracking of Arctic terns Sterna paradisaea reveals longest animal migration.

Authors: Carsten Egevang; Iain J Stenhouse; Richard A Phillips; Aevar Petersen; James W Fox; Janet R D Silk
Journal: Proc Natl Acad Sci U S A Date: 2010-01-11 Impact factor: 11.205

4. Spatiotemporal Group Dynamics in a Long-Distance Migratory Bird.

Authors: Kiran L Dhanjal-Adams; Silke Bauer; Tamara Emmenegger; Steffen Hahn; Simeon Lisovski; Felix Liechti
Journal: Curr Biol Date: 2018-08-23 Impact factor: 10.834

5. Recursive filtering for zero offset correction of diving depth time series with GNU R package diveMove.

Authors: Sebastián P Luque; Roland Fried
Journal: PLoS One Date: 2011-01-28 Impact factor: 3.240

6. AcceleRater: a web application for supervised learning of behavioral modes from acceleration measurements.

Authors: Yehezkel S Resheff; Shay Rotics; Roi Harel; Orr Spiegel; Ran Nathan
Journal: Mov Ecol Date: 2014-12-25 Impact factor: 3.600

7. pamlr: A toolbox for analysing animal behaviour using pressure, acceleration, temperature, magnetic or light data in R.

Authors: Kiran L Dhanjal-Adams; Astrid S T Willener; Felix Liechti
Journal: J Anim Ecol Date: 2022-04-22 Impact factor: 5.606

8. Fat King Penguins Are Less Steady on Their Feet.

Authors: Astrid S T Willener; Yves Handrich; Lewis G Halsey; Siobhán Strike
Journal: PLoS One Date: 2016-02-17 Impact factor: 3.240

9. Miniaturized multi-sensor loggers provide new insight into year-round flight behaviour of small trans-Sahara avian migrants.

Authors: Felix Liechti; Silke Bauer; Kiran L Dhanjal-Adams; Tamara Emmenegger; Pavel Zehtindjiev; Steffen Hahn
Journal: Mov Ecol Date: 2018-10-02 Impact factor: 3.600

1 in total

1. pamlr: A toolbox for analysing animal behaviour using pressure, acceleration, temperature, magnetic or light data in R.

Authors: Kiran L Dhanjal-Adams; Astrid S T Willener; Felix Liechti
Journal: J Anim Ecol Date: 2022-04-22 Impact factor: 5.606

1 in total