Literature DB >> 35852382

Using piecewise regression to identify biological phenomena in biotelemetry datasets.

David W Wolfson¹, David E Andersen², John R Fieberg³.

Abstract

Technological advances in the field of animal tracking have greatly expanded the potential to remotely monitor animals, opening the door to exploring how animals shift their behaviour over time or respond to external stimuli. A wide variety of animal-borne sensors can provide information on an animal's location, movement characteristics, external environmental conditions and internal physiological status. Here, we demonstrate how piecewise regression can be used to identify the presence and timing of potential shifts in a variety of biological responses using multiple biotelemetry data streams. Different biological latent states can be inferred by partitioning a time-series into multiple segments based on changes in modelled responses (e.g. their mean, variance, trend, degree of autocorrelation) and specifying a unique model structure for each interval. We provide six example applications highlighting a variety of taxonomic species, data streams, timescales and biological phenomena. These examples include a short-term behavioural response (flee and return) by a trumpeter swan Cygnus buccinator following a GPS collar deployment; remote identification of parturition based on movements by a pregnant moose Alces alces; a physiological response (spike in heart-rate) in a black bear Ursus americanus to a stressful stimulus (presence of a drone); a mortality event of a trumpeter swan signalled by changes in collar temperature and overall dynamic body acceleration; an unsupervised method for identifying the onset, return, duration and staging use of sandhill crane Antigone canadensis migration; and estimation of the transition between incubation and brood-rearing (i.e. hatching) for a breeding trumpeter swan. We implement analyses using the mcp package in R, which provides functionality for specifying and fitting a wide variety of user-defined model structures in a Bayesian framework and methods for assessing and comparing models using information criteria and cross-validation measures. These simple modelling approaches are accessible to a wide audience and offer a straightforward means of assessing a variety of biologically relevant changes in animal behaviour.

© 2022 The Authors. Journal of Animal Ecology published by John Wiley & Sons Ltd on behalf of British Ecological Society. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35852382 PMCID： PMC9540865 DOI： 10.1111/1365-2656.13779

Source DB: PubMed Journal: J Anim Ecol ISSN： 0021-8790 Impact factor: 5.606

INTRODUCTION

Recent technological advancements in the field of biotelemetry have greatly expanded the potential for remotely monitoring animals (Cagnacci et al., 2010; Hebblewhite & Haydon, 2010; Tomkiewicz et al., 2010). Historically, animals were tracked using very‐high‐frequency (VHF) telemetry, which requires a receiver in close proximity to an animal to triangulate its location (Craighead, 1982). Global positioning system (GPS) transmitters, now the standard for many wildlife studies, are small enough to be placed on bats and songbirds weighing less than 20 grams (Cvikel et al., 2015). In many applications, current GPS technology provides copious fine‐scale data over long study durations and spatial coverages. Other biotelemetry sensors such as accelerometers, temperature‐depth recorders, light‐level geolocators and heart‐rate monitors, also produce valuable, high‐frequency data on physiological and external conditions affecting animals (Cooke et al., 2004; Jonsen et al., 2007; Wilmers et al., 2015). Raw measurements (e.g. location, acceleration, body temperature) or derived metrics (e.g. net‐squared displacement for distance moved over time, overall dynamic body acceleration (ODBA) for energy expenditure: Wilson et al., 2006; Bunnefeld et al., 2011) can elucidate behavioural patterns (foraging, migration, predator/prey dynamics) and provide information about demographic parameters such as fecundity and survival; however, processing high‐volume data can be complex and time‐consuming (Brooks et al., 2019; Hamilton et al., 2017; Kramer et al., 2018). Flexible and widely accessible tools are necessary to facilitate efficient analysis of animal biotelemetry data to examine behaviour over time and response to external stimuli (Patterson et al., 2017). Analyses of biotelemetry data collected at a high frequency often begin by partitioning datasets into homogeneous segments that correspond to different behavioural states (Edelhoff et al., 2016). Some common segmentation methods include (1) clustering algorithms that minimize a cost function associated with statistical properties of a time series (e.g. the pruned exact linear time algorithm or the penalized contrasts method; Lavielle, 2005; Killick et al., 2012), (2) model‐based approaches that use moving windows to identify shifts in behavioural responses (e.g. behavioural change point analysis [BCPA]; Gurarie et al., 2009; or correlated velocity models; Gurarie et al., 2017) or (3) parametric state‐space or hidden Markov models (HMMs) that explicitly model transitions between latent behavioural states (Glennie et al., 2022; Patterson et al., 2008). Often, these methods require restrictive assumptions (e.g. data are Normally distributed or error and correlation structures are fixed), are computationally intensive, or are restricted to a single data type, such as GPS telemetry (Morelle et al., 2017). Here, we demonstrate how piecewise regression can be used to analyse a wide range of biotelemetry datasets, offering a flexible and user‐friendly approach with easily interpretable results.

Overview of piecewise regression and applications with the mcp package

Piecewise regression is a statistical method used to model ecological thresholds (Toms & Lesperance, 2003) and can be used to identify change points that signify potential shifts in the relationship between response and explanatory variables (Muggeo, 2003; Toms & Villard, 2015). Piecewise regression is an extremely flexible modelling framework due to the ability to specify unique model structures for each segment between change points. Transitions between segments can be smooth, if segments are joined (meaning segments share a common endpoint), or abrupt, if segments are disjoint (not sharing a common endpoint). Herein, we focus on a Bayesian formulation of piecewise regression following the formulation of Stephens (1994). Let be the realization of a sequence of random variables of length n, with a change point at . We can write the distribution of as where and are parameter vectors describing the data‐generating process before and after the change point, respectively. Values of can describe various characteristics of the response distributions, including their means, variances or correlation structures. Inference is made via the posterior distribution of , and , denoted by : where , and are prior distributions for the parameters describing the data‐generating process. This approach easily extends to multiple change points, with unique parameter vectors, , describing the likelihood of within each segment. Consider a simple example represented by separate intercepts before and after a single change point. Let r be the location of the change point along the x‐axis, where the expected value of , , changes from , to : and can represent combinations of additive or multiplicative effects of covariates. The R package mcp, developed by Lindeløv (2020), implements piecewise regression as described thus far and has broad applicability beyond ecological data. The mcp package uses indicator functions to map elements of a dataset (i.e. segments) to statistical models specific to each segment. For a dataset with as the response variable and as the predictor variable, a two‐intercept case is written as: Specifying a model structure with a joined segment is also straightforward: The ‘~0’ term specifies that the second segment shares the same intercept, whereas the ‘+ x’ term assigns a slope term associated with the variable such that the slope is 0 before the change point and afterwards. The mcp package allows custom model structures for each segment and provides a wide range of statistical distributions and link functions. Model parameters are given suitable vague prior distributions, although these can easily be altered (Lindeløv, 2020). Hierarchical models with random effects can be specified by allowing change points to covary within a group, thereby allowing the estimation of individual‐specific and population‐level change point parameters. Another advantage of the mcp package is the combination of a user‐friendly interface connected to a robust Bayesian modelling backend. The model structure for each segment uses the familiar syntax of the lme4 and brms packages, then is converted ‘on‐the‐fly’ to JAGS code, (Bates et al., 2014; Bürkner, 2017) and future versions of mcp will have the option of using Stan as a backend. The mcp package returns samples from the marginal posterior distributions for all change points and model parameters, which can be used to visualize model output and produce credible intervals (Gabry et al., 2019). The mcp package provides robust options for model assessment and comparison, hypothesis testing and data simulation. Markov‐chain Monte Carlo (MCMC) sampler performance can be checked via the Gelman–Rubin statistic, effective sample size and MCMC trace plots. Posterior predictive checks ensure that the fitted model is consistent with the data‐generating process (i.e. simulations from the fitted model resemble the original data; Gelman et al., 1996; Gelman & Shalizi, 2013), and the influence of the priors can be evaluated by comparing prior‐posterior overlap (Youngflesh, 2018). The predictive performance of multiple models can be compared with leave‐one‐out cross‐validation (LOO‐CV), either using WAIC (Widely Applicable Information Criterion) or ELPD (Expected Log Predictive Density), as estimated by the loo package (Gelman et al., 2014; Vehtari et al., 2017). If the ideal number of change points is not known a priori, LOO‐CV allows comparison of the predictive performance of multiple models to determine an optimal number or to compare different model structures and prior distributions. Additionally, hypothesis testing can be performed using point Bayes factors (i.e. the prior‐to‐posterior ratios associated with specific parameter values; Verdinelli & Wasserman, 1995; Wagenmakers et al., 2010); Bayes factors can also be used to compare models that represent alternative hypotheses and for model averaging (Hooten & Hobbs, 2015; Kass & Raftery, 1995).

EXAMPLE APPLICATIONS

We demonstrate the use of piecewise regression using the mcp package with six example applications highlighting a variety of taxonomic species, data streams, timescales and biological phenomena. All of our R code is publicly available at the Data Repository for University of Minnesota (https://doi.org/10.13020/qbha‐bs48).

Identification of altered behaviour post‐capture

Effects of capture, handling and transmitter deployment are not well understood for many species, especially in the period immediately post‐capture. Efforts to quantify animal response have predominantly focused on effects of the transmitter itself (i.e. the weight and aerodynamics of the unit, usually in birds: Evans et al., 2020) or on the physiological effects from chemical immobilization (typically in large mammals: Barron et al., 2010; Brivio et al., 2015; Thompson et al., 2020). Most researchers have sought to quantify effects on vital rates, such as survival and fecundity (Casas et al., 2015; DelGiudice et al., 2005; Lameris & Kleyheeg, 2017) or short‐term ethological responses such as changes in the time spent grooming (Kölzsch et al., 2016; Rachlow et al., 2014). Because capture and handling may result in a short‐term period of altered movement behaviour (Picardi et al., 2021), it is common to remove data from the first week or two post‐capture, although often without biological or empirical justification for the threshold used to filter the data. Arbitrarily filtering data without knowledge of the existence and duration of capture effects presents issues for movement‐related analyses. Removing data potentially discards useful information, which may already be scarce in studies using VHF telemetry or with small sample sizes (Girard et al., 2002). Alternatively, quantifying behavioural responses post‐capture can provide useful species‐specific information on altered behaviour and inform future studies (Dechen Quinn et al., 2012; Picardi et al., 2022). For example, Stabach et al. (2020) examined effects of GPS collar deployment on scimitar‐horned oryx Oryx dammah to quantify the short‐term responses in activity, behaviour, stress levels and the length of time before these effects subsided. The effects of capture and handling on movement may also provide insights into how individuals respond to risky situations (e.g. capture, predation). For example, animals may exhibit a ‘flee and return’ response, the strength of which may be indicative of their tolerance of risk as it relates to a fecundity‐survival trade‐off (Ghalambor & Martin, 2001; Montgomerie & Weatherhead, 1988). DelGiudice et al. (2015) documented that capture of moose Alces alces neonates caused some mothers to abandon their calves, and Obermoller et al. (2019) found that adult female moose frequently fled when their calves were depredated and then returned after the risk had subsided. These responses suggest that many adult female moose favour individual survival over protection of their young. Other studies have found similar ‘flee and return’ responses to human activity such as hunting or helicopter‐based capture (Jung et al., 2019; Thurfjell et al., 2013). In an ongoing study of trumpeter swan Cygnus buccinator movement ecology, we observed a swan leaving the capture area and returning a short time later (D. W. Wolfson, unpubl. data). To determine the prevalence of this behaviour, we needed an objective method for identifying ‘flee and return’ responses immediately post‐capture. By fitting a model with a single intercept (representing no flee) and another with two intercepts (representing a flee and return), we were able to compare these two models with cross‐validation. Figure 1 shows the results of a piecewise regression model for an individual trumpeter swan fit to the relationship between Net‐Squared Displacement (NSD) and time since capture:

FIGURE 1

Hourly net‐squared displacement measured from the point of release of a trumpeter swan Cygnus buccinator after collar deployment. Grey lines show 25 draws from the posterior distribution, with 95% credible intervals for the mean response shown as red dotted lines. The posterior distribution for the change point is shown in blue on the x‐axis. The NSD value of the first intercept quantifies how far the swan fled and the change point reveals when a return to the capture site occurred. This model, which has two segments with different intercepts and a single change point, had a better fit using LOO‐CV to estimate ELPD, a goodness‐of‐fit measure that estimates the predictive accuracy of a model while balancing the trade‐offs of bias and variance, than an alternative model with a single intercept, suggesting the presence of a flee response (Table 1).

TABLE 1

Model syntax	ELPD difference	SE of difference
Two intercepts	0.0	0.0
One intercept	−213.3	64.2

Leave‐one‐out cross‐validation used to compute the estimated log predictive density (ELPD) of two different models; one fit with a single intercept and the other with two intercepts separated by a change point. The higher ELPD for the model with two intercepts indicates that it has a higher predictive accuracy Protocols for capturing and marking trumpeter swans were approved by the University of Minnesota Animal Care and Use Committee (protocol no. 1905‐37072A) and authorized under permits from the Minnesota Department of Natural Resources (Special Permit no. 19017), the U.S. Fish and Wildlife Service (Research & Monitoring Special Use Permit no. K‐10‐001) and the U.S. Geological Survey Bird Banding Laboratory (Federal Bird Banding Permit no. 21631). All other example applications use data collected from other studies.

Pre‐parturition movement

Accurate estimation of vital rates and their contributions to population dynamics is an essential tenet for population management (Coulson et al., 2005). Signals in movement data can reveal important biological events such as parturition in large ungulates, which relate to fecundity. Although fecundity is a key vital rate, detection of the timing and location of parturition can be difficult, especially in ‘hider’ species that have low parental care of neonates (Lent, 1974; Ralls et al., 1986). Pregnant ungulates typically make a long‐distance movement immediately preceding parturition and then remain sedentary in areas that may have lower predation risk while still providing sufficient foraging opportunity (Berg, 2019; McGraw et al., 2014). Patterns in movement data thus offer a cost‐effective means for identifying parturition in radio‐collared ungulates (DeMars et al., 2013; Nicholson et al., 2019; Peterson et al., 2018). Although multiple methods have been used to infer parturition events, many are sensitive to parameter choices or involve visual observation of movement metrics, which can be time‐intensive (Dettki & Ericsson, 2008; Mohr et al., 2022). To demonstrate remote identification of parturition, we provide an example of quantifying movements of a pregnant moose (Figure 2). Severud et al. (2015) confirmed the parturition period by locating the twin calves shortly after their birth. In this case, the first segment is modelled with an intercept and slope of zero, representing typical moose movement, and the second segment is modelled with a change in the mean displacement (i.e. a different intercept) and also a change in variance (‘sigma’ in code below) that is representative of a shift to more sedentary movement at time of parturition.

FIGURE 2

Displacement (distance from each location to the location at the start of the observation period) for a pregnant moose Alces alces. The green area is the period identified by researchers as immediately preceding parturition. Grey lines represent 25 draws from the posterior distribution of the mean displacement. Red dotted lines depict 95% credible intervals for the mean displacement and green dotted lines depict 80% prediction intervals. The posterior distribution for the change point is shown in blue on the x‐axis. LOO‐CV of this model versus one without a change in variance indicated that this model is a better fit to the data, thus illustrating the flexibility in adapting model syntax for each segment that corresponds to the biological situation. Prediction intervals illustrate that the addition of a change in variance for the second segment provides increased precision, therefore allowing the model syntax to reflect the more sedentary behaviour (Figure 2). Parturition events can be monitored in real time for studies involving the capture of neonates (Figure 2; Obermoller et al., 2019) or identified post‐hoc for retrospective analyses of ungulate breeding and fecundity (Bonar et al., 2018; Long et al., 2009). Previous studies have used piecewise regression to identify change points indicative of ungulate parturition, but the mcp package can provide additional functionality such as informed priors for the change point, cross‐validation and simulation from the fitted model (Berg et al., 2021). Posterior predictive checks can be used to determine if the model can provide an adequate representation of the data (Gelman et al., 1995); data () are generated from the fitted model by simulating from the posterior predictive distribution and then compared to the observed data () (Hobbs & Hooten, 2015). We demonstrate this approach using the fitted model for moose movement (Figure 3). The two peaks at 3 and 12 km correspond to the two intercepts fit by the model, and overall, the distribution of the simulated and observed data are similar.

FIGURE 3

A kernel density posterior predictive check compares the distribution of observed outcomes (displacement in kilometres by a pregnant moose Alces alces), shown in the black line, against 50 distributions of replicated datasets produced by the fitted model, each shown as a light blue line.

Physiological response to drone fly‐over

Remotely piloted aircraft (hereafter drones) allow increased opportunity for remotely monitoring wildlife populations, and recent reductions in their cost have led to widespread adoption as an alternative to traditional aerial surveys (Watts et al., 2010). Drones equipped with remote sensors are now commonly used to estimate animal abundance in remote locations, collect fine‐grain aerial imagery and monitor poaching activities (Anderson & Gaston, 2013; Linchant et al., 2015). The use of drones can decrease costs, reduce the need for hazardous fieldwork and be coupled with computer vision methods to increase the quality and precision of data collection (Chabot & Francis, 2016; Hodgson et al., 2018; Seymour et al., 2017). Despite the advantages of drones, their presence can influence behaviour or elicit a physiological response in the study species, especially when flown at low altitudes (McEvoy et al., 2016; Mulero‐Pázmány et al., 2017). Drones have been shown to have detrimental effects on wildlife at both immediate (e.g. increased vigilance and decreased foraging behaviour) and long‐term scales (e.g. decreased reproduction and population declines: Blickley & Patricelli, 2010; Senzaki et al., 2020; Shannon et al., 2016). Most studies investigating effects of drones have focused on external responses, such as altered movement and behaviour, whereas internal physiological responses have been limited to quantifying levels of glucocorticoids (e.g. cortisol and corticosterone: Baker et al., 2013; Bennitt et al., 2019; Millspaugh & Washburn, 2004; Vas et al., 2015). Recently, telemetry data have been paired with physiological data, allowing for new insights into the response of animals to anthropogenic stimuli. Ditmer et al. (2015) measured changes in movement and heart rate levels of black bears Ursus americanus affixed with GPS collars and internally implanted cardiac biologgers during controlled drone flyovers. Although drones rarely elicited an external behavioural response in black bears, heart rate levels were strongly correlated with proximity of drones overhead (Ditmer et al., 2015). We used piecewise regression to model the relationship between drone presence and black bear heart rate. We include two segments, the initial segment includes an intercept capturing the baseline heart rate of the bear, and the second segment contains a separate intercept (representing the spike in heart rate in response to the drone) and a disjoined slope term reflecting the heart rate gradually decreasing after the initial spike. Figure 4 shows the relationship between drone presence and black bear heart rate in beats per minute (bpm) before and after a controlled flight.

FIGURE 4

Heart rate of a black bear Ursus americanus in relation to a drone flight. The green box indicates the duration of a drone flight. Grey lines represent 25 draws from the posterior distribution of the mean response. Red dotted lines depict the 95% credible intervals for the mean response. The posterior distribution for the change point is shown in blue on the x‐axis. The flexibility of piecewise regression to accommodate different model structures for each segment allows for identification of the timing, magnitude and acclimation rate of the stress response caused by the drone presence, thus providing biologically meaningful parameters for each segment. As shown in Table 2, the change point is estimated to occur approximately 65 (95% credible interval = 63–66) minutes into the observation period, after which the heart rate is estimated to increase by 116 bpm (Intercept 2–Intercept 1) once the drone appears, and then decrease by 0.78 bpm with every minute that passes. If the recovery rate is static, the black bear will return to baseline heart rate (represented by Intercept 1) in approximately 2.5 hr ((Intercept 2–Intercept 1)/Time in minutes = 149 minutes). Lastly, the Sigma parameter captures the variability in heart rate about the overall trend.

TABLE 2

A summary of output from the model looking at the black bear Ursus americanus heart rate response to a drone flyover

Model parameter	Posterior mean	Lower CL (2.5%)	Upper CL (97.5%)
Change point 1	64.77	62.89	66.00
Intercept 1	43.83	39.86	47.79
Intercept 2	160.14	150.22	169.43
Time (in minutes)	−0.78	−1.08	−0.46
Sigma	10.92	8.95	13.07

A summary of output from the model looking at the black bear Ursus americanus heart rate response to a drone flyover

Mortality signal from acceleration and temperature data

Accurate estimates of vital rates such as survival and fecundity are necessary for modelling population growth. Survival rates can be estimated using mark‐recapture methods, but for wide‐ranging migratory species, individuals in subsequent years may not be resighted due to three possibilities: mortality, emigration or missed detection (Anders & Marshall, 2005). Because it is difficult to separate these components, biologists are often forced to estimate ‘apparent’ survival, as opposed to true survival (Lebreton et al., 1992). Many studies have illustrated the importance of evaluating vital rates over the entire annual cycle (Rushing et al., 2017; Sillett & Holmes, 2002). The miniaturization of GPS devices now allows for a more direct accounting of survival during the breeding, migration and overwintering seasons (Kays et al., 2015). Increased precision in vital rates during different seasons (and of different age and sex classes) can better inform ecological studies considering life‐history trade‐offs between survival and fecundity, especially concerning survival during migration periods (Buechley et al., 2021; Cheng et al., 2019; Flack et al., 2016). Despite the advantages that GPS telemetry can offer to survival rate estimation, mechanical failure of transmitters can obscure whether the true fate was mortality or equipment failure. Recovery of a GPS transmitter after a mortality event can be logistically difficult for species that migrate long distances. Different methods have been used to infer mortality versus transmitter failure from transmitter signals (Buechley et al., 2021; Sergio et al., 2019). A common approach is a visual assessment of whether locations appear to be stationary; however, this approach may lead to inconsistencies due to subjective evaluation (Koczur et al., 2017; Nygård et al., 2016; Rotics et al., 2017). Sensor data such as battery voltage, temperature and accelerometry are increasingly being used to diagnose mortality versus transmitter failure (Burnside et al., 2016; Ely & Meixell, 2016; Hewson et al., 2016). Mortalities often coincide with shifts in the trend or variability in transmitter data; therefore, piecewise regression can be used to identify change points that indicate mortality. This approach can also be extended to monitor nest success with temperature loggers (Hartman & Oring, 2006; Sutti & Strong, 2014; Zangmeister et al., 2009). As part of an ongoing trumpeter swan study (D. W. Wolfson, unpubl. data), we evaluated whether we could remotely detect mortality using piecewise regression models fit separately to ODBA, a proxy for energy expenditure, and temperature data from a confirmed mortality event (Figure 5).

FIGURE 5

The x‐axis is an index of time since the start of the time series for overall dynamic body acceleration on the top figure and temperature on the bottom figure for a trumpeter swan Cygnus buccinator. Grey lines show 25 draws from the posterior distribution, with 95% credible intervals for the mean response shown as red dotted lines. The posterior distribution for the change point is shown in blue on the x‐axis. The ODBA model includes a change point separating two segments with differing means and variances. The temperature model has a change point separating the first segment, which is modelled with an intercept and a first‐order autoregressive residual term, and the second segment, which is modelled with an intercept and a third‐order autoregressive residual term. A kth order autoregressive term models the correlation in residuals from previous values in the time series and is ideal for data such as temperature that is highly temporally correlated. We allow for autocorrelation by assuming the residuals of the first segment are a function of residuals at time t‐1, and the residuals of the second segment are a function of the residuals at time t − 1, t − 2 and t − 3 (see the full annotated code and a short tutorial on choosing autoregressive terms in the open‐access data repository). Both models showed a clear distinction at the change point representing mortality of the collared swan.

Migration phenology including stopovers

Seasonal migration allows species to optimize energetic budgets and undergo reproductive cycles while avoiding harsh environmental conditions and low food availability (Newton, 2010). Knowledge of migration phenology throughout the annual cycle can inform management activities such as timing of water drawdowns and annual surveys to occur during peak migration, advance understanding of disease dynamics based on timing and overlap with other species, and reveal a species' capacity to adapt their migratory timing in response to climate change to preserve optimal breeding conditions and peak food availability (Donnelly et al., 2019; Moller et al., 2008; Newman et al., 2009; Thurber et al., 2020). Despite the importance of understanding migration phenology, consistent reproducible methods for determining departure and arrival dates have not yet become commonplace (Cerritelli et al., 2020; Soriano‐Redondo et al., 2020). A simple repeatable approach is to segment phases of the migration cycle based on date ranges informed from prior studies (Takekawa et al., 2010; Wolfson et al., 2017a). Other approaches include using a spatial threshold, based on either the absolute distance from the capture origin, breeding territory, or last location; a spatiotemporal threshold, based on a certain distance moved within a period; or a crossing of a chosen latitude or landmark (Flack et al., 2016; Giunchi et al., 2019; Rotics et al., 2016). Although these approaches may yield useful estimates of migration phenology, they often rely on arbitrary criteria or species‐specific thresholds, which limits the ability to generalize to other study systems. Model‐based methods, including non‐linear theoretical movement models fit to NSD, are also commonly used to estimate migration phenology (Börger & Fryxell, 2012; Bunnefeld et al., 2011; de Grissac et al., 2016; Spitz et al., 2017). Using piecewise regression to segment an annual cycle based on NSD allows for a model‐based approach but is more flexible than typical NSD modelling sensu Bunnefeld et al. (2011) and can also provide additional information on the timing and duration of stop‐overs that is not attainable using traditional NSD‐based methods. We demonstrate the utility of piecewise regression in assessing movements during the annual cycle of a migratory bird using a NSD time‐series of sandhill crane Antigone canadensis locations in North America (Figure 6; Wolfson et al., 2017b; Wolfson, 2018). Although a basic three‐intercept model would sufficiently discriminate the summer and winter periods, adding additional intercepts reveals each major staging area that the crane used.

FIGURE 6

An annual migration cycle of a sandhill crane Antigone canadensis. Average daily displacement from the breeding territory (in kilometres) on the y‐axis, and an index of time since the start of the time series on the x‐axis. Grey lines show 25 draws from the posterior distribution, with 95% credible intervals for the mean response shown as red dotted lines and 80% prediction intervals in green dotted lines. The posterior distributions for the change points are shown in blue on the x‐axis. The ideal number of stopover sites can be visually examined or empirically derived using cross‐validation to compare multiple models with differing number of change points. We demonstrate a data‐driven model selection method by fitting seven separate models with increasing numbers of intercepts. Table 3 shows that the model with six intercepts (representing two spring staging areas and one fall staging area) is the most well supported using ELPD as a model comparison metric and that adding additional intercepts does not increase the predictive accuracy.

TABLE 3

Leave‐one‐out cross‐validation was used to compute the estimated log predictive density (ELPD) for seven models of migration phenology of a sandhill crane Antigone canadensis, each with an increasing number of intercepts. The highest ELPD for the model with six intercepts indicates that it has the highest predictive accuracy

Model syntax	ELPD difference	SE of difference
Six intercepts	0.00	0.00
Five intercepts	−0.48	0.83
Seven intercepts	−19.59	32.82
Four intercepts	−24.73	9.52
Two intercepts	−57.39	14.01
Three intercepts	−324.69	16.93
One intercept	−336.17	16.73

Segmentation of nesting stages

The timing of nest creation, incubation and hatching events can inform understanding of nest success, juvenile survival and local recruitment. Climate change is advancing the onset of spring, especially in the Arctic, where earlier snowmelt allows some species to adapt their breeding cycles to match changing conditions (Lameris et al., 2018; Nolet et al., 2020). Although data on the timing of breeding are critical to understanding reproduction, locating and regularly monitoring active nests is labor‐intensive and unfeasible in many remote locations (Schreven et al., 2021). As part of an ongoing trumpeter swan study (D. W. Wolfson, unpubl. data), we collected visual observations on a focal sample of collared swans several times a week during the breeding season to determine nesting status (building a nest, incubation, eggs hatched and cygnets present). We also collected fine‐scale accelerometry data from each observed swan's GPS‐GSM collar to evaluate if we could infer nesting status from ODBA values, representing activity levels. We used the visual observation dataset to ground‐truth model output from piecewise regression applied to a collared swan in the following case study. Between 1 April and 7 July, 2021, we collected 19 observations of nesting status and 735,311 tri‐axial accelerometer readings taken in 3‐s bursts at 10 Hz every 5 min that were converted to average hourly ODBA values (Figure 7).

FIGURE 7

The top plot shows the accelerometer sensor dataset for a nesting trumpeter swan Cygnus buccinator, with time on the x‐axis expressed in hourly time intervals and averaged ODBA values on the y‐axis. The bottom plot shows the 19 data points from the visual observation dataset with the estimated hatch date (the midpoint between the last incubation observation and the first cygnet observation; purple dotted line). We predicted that a shift to higher activity levels would occur once cygnets were present and expected that the change point for a model with two segments with varying intercepts would accurately predict the time of egg hatching. [Correction added on 15 August 2022, after first online publication: The duplication of 'increased' has been deleted in the code.] Table 4 shows that the model with two intercepts was the best choice at fitting the data based on LOO‐CV using ELPD.

TABLE 4

Leave‐one‐out cross‐validation used to compute the estimated log predictive density (ELPD) of three different models of trumpeter swan Cygnus buccinator activity; one fit with a single intercept, one with two intercepts and one with three intercepts. The higher ELPD for the model with two intercepts indicates that it has a higher predictive accuracy

Model syntax	ELPD difference	SE of difference
Two intercepts	0.0	0.0
Three intercepts	−9.9	7.3
One intercept	−188.4	19.0

FIGURE 8

Date is on the x‐axis and average hourly overall dynamic body acceleration for a trumpeter swan Cygnus buccinator is on the y‐axis. The coloured markers just above the x‐axis correspond to visual observations of the swan's nesting status. The dashed vertical line in purple is the observed transition from incubation to cygnets, and the solid brown line is the estimated change point from the piecewise regression model. Grey lines show 25 draws from the posterior distribution, with 95% credible intervals for the mean response shown as red lines.

DISCUSSION

As we demonstrated with examples of birth, death, migration and behavioural responses (both internal and external), piecewise regression is a flexible tool for identifying a wide variety of biological phenomena across different taxa and data types. When analysing data, we recommend fitting a limited number of models that attempt to capture specific biological hypotheses in the style of Chamberlin's framework of multiple working hypotheses (Chamberlin, 1890; Elliott & Brook, 2007). For example, in the first case study, we highlight two model syntaxes that represent different biological responses (one intercept represents the absence of a flee response, two intercepts indicate a flee and return). ELPD or WAIC, calculated using cross‐validation can then be used to evaluate support for each hypothesis. The mcp package also provides functionality for null‐hypothesis testing and interval estimation for any model parameter within a fitted model object (Lindeløv, 2020). Piecewise regression, and change point detection in general, is most appropriate when the objective is to partition a dataset into a limited number of heterogeneous segments that may correspond with different biological states. Alternative approaches, such as HMMs or BCPA, may be more appropriate when the goal is to identify recurring behavioural states (e.g. resting and foraging) using information derived from movement metrics (e.g. step lengths, turning angles). Although piecewise regression could be used in these situations, it will often be slower due to the computational challenges of estimating a large number of change points. Similar to BCPA, piecewise regression would also require a two‐step approach in which segments are first identified and then later grouped into homogeneous categories. Unlike BCPA, however, the response data do not have to be Normally distributed when using mcp. Many R packages can detect change points, although each has unique pros and cons. Several require a user‐defined threshold to detect change points, such as changepoint (Killick & Eckley, 2014), bcp (Erdman & Emerson, 2007) and ecp (James & Matteson, 2014), but the selection of a threshold value may often be species‐specific and hard to generalize. The changepoint package, which only identifies change points based on abrupt changes in the mean or variance, may be a good option for a simple segmentation analysis, but it does not estimate uncertainty or provide methods for model evaluation or comparison. Other packages provide more flexibility in specifying a regression model, such as strucchange (Zeileis et al., 2002) and segmented (Muggeo, 2008), but still offer limited options relative to mcp. The mcp package offers rigorous statistical methodology, flexibility, ease of use and reproducibility. It allows the user to specify a unique model syntax for each segment between change points; detect changes in mean, variance and autocorrelation; and it also allows for robust inference using full posterior distributions for all parameters and change points. Previous knowledge of the study system can be directly incorporated when specifying parameter ranges and prior distributions. Although computational performance may slow with very large datasets, mcp includes parallel processing to increase computational efficiency. We provide a suite of worked examples and encourage others to consider using piecewise regression for identifying signals in biotelemetry data.

AUTHOR CONTRIBUTIONS

All authors conceived the idea for the review; David W. Wolfson collected the data, conducted the analysis and drafted the initial manuscript; all authors contributed critically to writing and review of the manuscript and gave final approval for publication.

CONFLICT OF INTEREST

The authors have no conflict of interest to declare.

46 in total

1. Distinguishing technology from biology: a critical review of the use of GPS telemetry data in ecology.

Authors: Mark Hebblewhite; Daniel T Haydon
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2010-07-27 Impact factor: 6.237

2. Animal ecology meets GPS-based radiotelemetry: a perfect storm of opportunities and challenges.

Authors: Francesca Cagnacci; Luigi Boitani; Roger A Powell; Mark S Boyce
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2010-07-27 Impact factor: 6.237

3. Bayesian hypothesis testing for psychologists: a tutorial on the Savage-Dickey method.

Authors: Eric-Jan Wagenmakers; Tom Lodewyckx; Himanshu Kuriyal; Raoul Grasman
Journal: Cogn Psychol Date: 2010-01-12 Impact factor: 3.468

4. Populations of migratory bird species that did not show a phenological response to climate change are declining.

Authors: Anders Pape Møller; Diego Rubolini; Esa Lehikoinen
Journal: Proc Natl Acad Sci U S A Date: 2008-10-10 Impact factor: 11.205

5. Migration of waterfowl in the East Asian flyway and spatial relationship to HPAI H5N1 outbreaks.

Authors: John Y Takekawa; Scott H Newman; Xiangming Xiao; Diann J Prosser; Kyle A Spragens; Eric C Palm; Baoping Yan; Tianxian Li; Fumin Lei; Delong Zhao; David C Douglas; Sabir Bin Muzaffar; Weitao Ji
Journal: Avian Dis Date: 2010-03 Impact factor: 1.577

6. Moving towards acceleration for estimates of activity-specific metabolic rate in free-living animals: the case of the cormorant.

Authors: Rory P Wilson; Craig R White; Flavio Quintana; Lewis G Halsey; Nikolai Liebsch; Graham R Martin; Patrick J Butler
Journal: J Anim Ecol Date: 2006-09 Impact factor: 5.091

7. Automated detection and enumeration of marine wildlife using unmanned aircraft systems (UAS) and thermal imagery.

Authors: A C Seymour; J Dale; M Hammill; P N Halpin; D W Johnston
Journal: Sci Rep Date: 2017-03-24 Impact factor: 4.379

8. "Closer-to-home" strategy benefits juvenile survival in a long-distance migratory bird.

Authors: Yachang Cheng; Wolfgang Fiedler; Martin Wikelski; Andrea Flack
Journal: Ecol Evol Date: 2019-07-23 Impact factor: 2.912

9. Inferring parturition and neonate survival from movement patterns of female ungulates: a case study using woodland caribou.

Authors: Craig A Demars; Marie Auger-Méthé; Ulrike E Schlägel; Stan Boutin
Journal: Ecol Evol Date: 2013-09-23 Impact factor: 2.912

10. Costs of migratory decisions: A comparison across eight white stork populations.

Authors: Andrea Flack; Wolfgang Fiedler; Julio Blas; Ivan Pokrovsky; Michael Kaatz; Maxim Mitropolsky; Karen Aghababyan; Ioannis Fakriadis; Eleni Makrigianni; Leszek Jerzak; Hichem Azafzaf; Claudia Feltrup-Azafzaf; Shay Rotics; Thabiso M Mokotjomela; Ran Nathan; Martin Wikelski
Journal: Sci Adv Date: 2016-01-22 Impact factor: 14.136

1 in total

1. Using piecewise regression to identify biological phenomena in biotelemetry datasets.

Authors: David W Wolfson; David E Andersen; John R Fieberg
Journal: J Anim Ecol Date: 2022-07-31 Impact factor: 5.606

1 in total