Literature DB >> 31371399

Augmenting biologging with supervised machine learning to study in situ behavior of the medusa Chrysaora fuscescens.

Clara Fannjiang^1,2, T Aran Mooney³, Seth Cones³, David Mann⁴, K Alex Shorter⁵, Kakani Katija⁶.

Abstract

Zooplankton play critical roles in marine ecosystems, yet their fine-scale behavior remains poorly understood because of the difficulty in studying individuals in situ Here, we combine biologging with supervised machine learning (ML) to propose a pipeline for studying in situ behavior of larger zooplankton such as jellyfish. We deployed the ITAG, a biologging package with high-resolution motion sensors designed for soft-bodied invertebrates, on eight Chrysaora fuscescens in Monterey Bay, using the tether method for retrieval. By analyzing simultaneous video footage of the tagged jellyfish, we developed ML methods to: (1) identify periods of tag data corrupted by the tether method, which may have compromised prior research findings, and (2) classify jellyfish behaviors. Our tools yield characterizations of fine-scale jellyfish activity and orientation over long durations, and we conclude that it is essential to develop behavioral classifiers on in situ rather than laboratory data.

Entities: Chemical Disease Species

Keywords: Accelerometry; Invertebrate; Jellyfish; Telemetry; Zooplankton

Year: 2019 PMID： 31371399 PMCID： PMC6739807 DOI： 10.1242/jeb.207654

Source DB: PubMed Journal: J Exp Biol ISSN： 0022-0949 Impact factor: 3.312

INTRODUCTION

As oceans are altered by rising temperatures, acidification and other consequences of anthropogenic activity, understanding the behavioral patterns and responses of marine animals is required for effective stewardship. Researchers have made great strides in investigating marine megafauna behavior related to long-distance migrations (Block et al., 2011; Rasmussen et al., 2007; Sequeira et al., 2018) and foraging strategies (Sims et al., 2008; Weise et al., 2010). However, the behavior of more numerous, higher total-biomass, lower trophic-level animals such as zooplankton is much less well understood. Early attempts to investigate in situ behavior of zooplankton such as jellyfish relied on scuba divers following animals with hand-held video cameras (Colin and Costello, 2002; Costello et al., 1998) and later with remotely operated vehicles (ROVs; Kaartvedt et al., 2015; Purcell, 2009; Rife and Rock, 2003). Acoustic methods have also been used to describe large-scale movement patterns of jellyfish (Båmstedt et al., 2003; Kaartvedt et al., 2007; Klevjer et al., 2009), although these methods can be resolution-limited. A promising alternative is biologging, where electronic transmitters or loggers with environmental and motion sensors are affixed to organisms (Kooyman, 2004; Rutz and Hays, 2009). Biologging has enabled a diverse array of marine vertebrate studies (Block et al., 2011; Goldbogen et al., 2006; Johnson and Tyack, 2003), while several technological challenges have hindered its widespread use to study gelatinous invertebrates such as jellyfish. Their sensitivity to drag induces constraints on tag size, shape and buoyancy (Fossette et al., 2016; Mills, 1984; Mooney et al., 2015), which, when coupled with bandlimited transmission capabilities, often restricts sensor payloads to low-resolution depth or location pingers (Honda et al., 2009; Moriarty et al., 2012; Seymour et al., 2004). As a result, very few studies have successfully deployed high-resolution motion sensors such as accelerometers on jellyfish in situ (Fossette et al., 2015), and these studies adopted the ‘tether method’ for retrieval (Fossette et al., 2016; Hays et al., 2008), where the tag is tethered to a surface float transmitting location. As tethering can restrict movement, it is unknown whether data collected in this manner are representative of natural behavior. Furthermore, without simultaneous observation of the tagged animal, interpretation of biologging data can be easily biased (Brown et al., 2013; Jeantet et al., 2018). Recently, techniques from supervised machine learning (ML), which automatically fit or ‘learn’ patterns that optimally distinguish categories, have been successfully used to classify behaviors in various marine vertebrates (Brewster et al., 2018; Jeantet et al., 2018; Ladds et al., 2016). However, few studies develop their methods on ground-truthed in situ data owing to the difficulty of recording sustained observations of wild marine animals (Carroll et al., 2014). It is unknown whether classifiers developed on data from captive, controlled or laboratory conditions are equally effective on data from natural environments (Carroll et al., 2014), an example of a broader problem in ML known as domain adaptation (Pan and Yang, 2010; Zhang et al., 2013). In this study, we demonstrate how to investigate fine-scale zooplankton behavior in situ by combining biologging with supervised ML methods. We study the movements of the scyphomedusa Chrysaora fuscescens in Monterey Bay, CA, USA, using the ITAG, a biologging tag equipped with high-resolution motion sensors and engineered specifically for soft-bodied invertebrates (Mooney et al., 2015). We use the tether method for retrieval and simultaneously record video footage of the tagged animals. We develop classifiers using the resulting data to: (1) detect when the tether method influences jellyfish behavior, and (2) distinguish swimming from drifting. We provide principled estimates of the classifier error characteristics, which allow us to remove tether-influenced data and estimate the fine-scale in situ orientation and swimming activity of C. fuscescens individuals for up to 10 h. We also investigate how classifiers trained on laboratory data perform on in situ data. By combining a highly specialized tag with supervised ML, our approach is the first complete pipeline for acquiring and interpreting high-resolution motion data from individual jellyfish or other zooplankton in situ.

MATERIALS AND METHODS

Code and Jupyter Notebooks demonstrating how to reproduce our classifiers are available at https://bitbucket.org/mbari/jellymove.

Laboratory deployments

Laboratory investigations of jellyfish tagging were conducted at the Monterey Bay Aquarium Research Institute (MBARI) in Moss Landing, CA in the spring of 2018. Four jellyfish (Chrysaora fuscescens Brandt 1835) with bell diameters ranging from 16 to 25 cm were collected in Monterey Bay from RV Paragon (California Department of Fish and Wildlife permit SC-13337) and kept in plastic bags filled with unfiltered seawater. Within 4 h of collection, animals were transported into large holding tanks in a 5°C cold room in MBARI's seawater lab. Experiments were conducted in a 275,000 gallon (1.25 million liter) test tank with dimensions of 13 m (L)×10 m (W)×10 m (D). Animals were transported from the seawater lab in plastic bags and placed in the test tank to acclimate for at least 1 hour prior to tagging trials. After acclimation, a neutrally buoyant bio-logging tag (ITAG v0.4; Mooney et al., 2015) was prepared for attachment. The ITAG (6.3 cm×2.9 cm×1.6 cm, air weight 30 g) is equipped with a triaxial accelerometer, gyroscope and magnetometer synchronously sampling at a rate of 100 Hz, and pressure, temperature and light sensors sampling at 1 Hz. The tag was attached to the animal's aboral surface using veterinary-grade tissue adhesive (3M Vetbond, Maplewood, Minnesota, USA), following the ‘glue method’ (Fossette et al., 2016). Care was taken to center the attachment site on the bell apex between the four gonads, so that the tag axis conventions aligned with the jellyfish, and the animal's radial symmetry was not disrupted (Fig. 1H). The entire attachment procedure took no longer than 2 min.

Fig. 1.

Protocol consisted of transferring collected jellyfish to staging tub (A), drying the attachment site with absorbent towels (B), gently affixing tethered ITAG with VetBond (C), deploying SPOT drifter and drogue (D), deploying BlueROV with mounted GoPro (E), and gently releasing tagged jellyfish and tracking it with the BlueROV (F,G). (H) Definitions for positive x, y and z tag axes, and positive heading, roll and pitch angle. To replicate the in situ recovery strategy, the tags were attached by 6 m of monofilament line (20 lb/∼9 kg test) to a suspended walkway about 1 m above the tank surface (the tether length was set to prevent the animal from getting tangled with metal bars on the walls of the test tank). Simultaneous lateral-view video footage of the tagged jellyfish was collected with a HERO5 Black GoPro (GoPro, Inc., San Mateo, CA, USA) mounted onto a BlueROV2 (Blue Robotics, Torrance, CA, USA). Footage was synchronized with the tag data by sharply tapping the tag five times in front of the GoPro prior to attachment.

Field deployments

We deployed ITAGs on 8 C. fuscescens in Monterey Bay, CA in late spring of 2018, and collected in situ recordings with durations between 54 min and 10 h. The bell diameters of these animals were between 20 and 28 cm. Fig. 1A–G depict the main phases of the deployment protocol. Each animal was first spotted from aboard the RV Paragon, which then maneuvered next to the animal so it could be gently captured and brought aboard using a plastic bucket. Captured jellyfish were then transferred into individual 27 gallon (∼125 l) plastic holding tubs filled with seawater (Fig. 1A), and care was taken to not introduce air bubbles under their bells. To recover tags at the end of the deployment, we used the ‘tether method’ (Fossette et al., 2016): tags were tethered by 30 m of monofilament line to the bottom of a drogue, which was attached by dock line to a surface drifter. A fishing swivel was placed at the midpoint of the tether, as well as immediately below its attachment to the drogue, to prevent tether torsion from affecting the animal. The surface drifter consisted of PVC housing for a SPOT GPS tracking device (SPOT LLC, Milpitas, CA, USA), and a PVC pipe chamber containing batteries and ballasting material. The SPOT was configured to report its coordinates once every 15–20 min via email. The tethered tags were then affixed to jellyfish while in the holding tubs, following the aforementioned glue method also used for laboratory deployments (Fig. 1B,C). In order, the drogued drifter, ROV and tagged jellyfish were then released (Fig. 1D–F, respectively). The drogue was centered at a depth down to 9 m (see Table S1), and the jellyfish could therefore swim freely down to a depth of 30–39 m. We then piloted the ROV to track and record video footage of the tagged animal, until losing sight of it because of water turbidity and/or turbulence (see Movie 1 for example footage). Once visual contact with the tagged animal was lost, the ROV was recovered, and the tagged animals tethered to the drogued drifters were left behind. Tags were retrieved the morning after deployment. The boat was navigated to the most recent coordinates reported by the SPOT, and once the drifter and drogue were located, the drifter, drogue and tag (with or without the animal still attached) were recovered. Data from the tag were then brought back to shore for analysis.

Orientation estimation

We defined axes conventions appropriate for the typical jellyfish swimming position (see Fig. 1H). Since we have control over the tag attachment site, we can ensure that the tag x-axis (surge direction) is orthogonal to the jellyfish bell at the apex, so no further data processing is necessary to align the tag and jellyfish axes. In order to compute orientation from the accelerometer and magnetometer signals, we first used a finite impulse response filter to smooth the accelerometer and magnetometer data (Sato et al., 2003). The selected filter cut-off frequency of 0.2 Hz was within the typical range (0.25–0.50; Martín López et al., 2016) of the pulse frequency reported for Chrysaora (Matanoski et al., 2001). Filtering the accelerometer data separates the signal due to gravity (or static acceleration) from high-frequency animal-generated forces (or dynamic acceleration, DA; Wilson et al., 2006), which we later processed and featurized for behavioral classification. The resulting static acceleration was then combined with the smoothed magnetometer data to calculate orientation (i.e. Euler angles of heading, pitch, and roll) at every point in time, according to trigonometric relationships (Johnson and Tyack, 2003). Based on our axes conventions in Fig. 1A, heading refers to compass bearing from true north, positive pitch means the jellyfish bell apex is tilted upward with respect to the horizon, and positive roll means the jellyfish bell is rotating around its apex counterclockwise, when viewed facing the bell.

Annotation of video data

Laboratory and in situ video footage was manually annotated for jellyfish behavior and tether influence. Each second of footage was labeled according to whether the jellyfish was swimming or drifting (not actively pulsing its bell), and whether the tether was slack (i.e. when the animal was uninfluenced by tether tension) or taut (i.e. when the animal was influenced by tether tension). For in situ deployments, if the tether could not be clearly seen or was out of view because of turbidity and/or viewing angle (e.g. facing the subumbrella), the state of the tether was annotated as unknown. Similarly, if the jellyfish behavior could not be distinguished because of lighting or turbidity, the behavior was annotated as unknown. Any segments of footage where either the tether state or jellyfish behavior were unknown were excluded from training data for the methods we describe below.

Jellyfish behavior classification

When using the tether method as a tag retrieval strategy, prolonged deviations between the trajectories of the jellyfish and its tethered drifter can result in the tether pulling on the tag. These forces leave measurable signatures in the motion sensor data, which are distinct from the signals generated by the jellyfish's natural behavior. Our goal was to develop supervised ML methods to: (1) detect and remove segments of data corrupted by tether influence (tether influence classification), and (2) distinguish swimming from drifting on the remaining data (activity classification). In the following sections, we describe how these methods were developed.

Data preprocessing

In situ data were first split into annotated and unannotated pools. Annotated data were processed and featurized as described below, then set aside for model training and evaluation. Unannotated data were similarly processed and featurized, and then set aside for classification by the trained models. Laboratory data were completely annotated, since we were able to capture video footage of the entire deployment. We used the following procedure to assemble data samples for each of the four categories annotated as described above: tether-influenced, uninfluenced, swimming and drifting. Upon visual inspection, the DA of every deployment displayed a nearly constant periodic nature, consistent with the nearly constant bell pulsing observed in both laboratory and in situ video footage. We therefore computed the discrete cosine transform of the DA and took the frequency with the maximum absolute coefficient as the representative pulse frequency (RPF) for each deployment. For each category, we extracted all segments of motion sensor data whose corresponding video footage was annotated with that category. Each segment, which consisted of 10 channels of data (pressure sensor and triaxial accelerometer, gyroscope and magnetometer) was then split into consecutive, non-overlapping windows with a duration equal to the representative swimming cycle length (the reciprocal of the RPF). Segments shorter than this duration and trailing windows at the ends of segments shorter than this duration, were discarded from classification and analysis. Each of these windows, which we refer to as periods, was then featurized. Note that the period duration is different for each deployment, to account for the pulse frequency of each animal.

Featurization

For each period, we generated a total of 45 candidate features from the accelerometer and gyroscope. During training, we used a feature selection method to select a subset with the greatest predictive power, as described below. We computed various features of partial dynamic body acceleration (PDBA), which is the sum of the absolute values of the y- and z-axes of DA. PDBA is a variant of overall dynamic body acceleration (ODBA; Wilson et al., 2006), which is used extensively as a proxy for energetic input (Halsey et al., 2009; Wilson et al., 2006; Gleiss et al., 2011). By computing both PDBA and the absolute value of the x-axis of DA (DA), we can separate energy expenditure in the direction of jellyfish propulsion from movements in the orthogonal plane (i.e. the x-axis from the y–z plane in Fig. 1H). To account for variation in propulsive force between individual jellyfish, we divided the PDBA and DA by their respective averages over the deployment for each jellyfish. Analogous to PDBA and DA, we also considered the norm of the y- and z-axes of the gyroscope data (or partial angular velocity, PAV) and the absolute value of the x-axis (AV). Accelerometer-based features included the maxima, means and standard deviations of DA, PDBA, the absolute value of the x-axis of jerk, and the norm of the y- and z-axes of jerk. Spectral features were the sparsities of DA and PDBA spectra (the absolute value of the Fourier transform), as measured by the Gini index (Hurley and Rickard, 2009; Zonoobi et al., 2011) and the spectral energies of DA and PDBA in 0.2–1.0 Hz (roughly the typical range of pulse frequencies) and 1–8 Hz. We also included the spectral energy of DA over 8 Hz but excluded it for PDBA because the two were too highly correlated, thereby leading to numerically unstable covariance matrix inversions in our model. The remaining features were the number of peaks in the DA and PDBA, as identified by a peak-detection method (Duarte, 2013), the correlation between the y- and z-axes of DA, and the average of the correlations between the x- and y-axes and x- and z-axes of DA. The gyroscope-based features were completely analogous to the accelerometer-based features, substituting AV, PAV and angular acceleration for DA, PDBA and jerk, respectively, except excluding spectral energy of AV over 8 Hz. For behavioral analysis, we also computed normalized ODBA by dividing the ODBA signal by the average ODBA over the deployment, to accommodate differences in propulsive strength between individual jellyfish. However, it was not included as a classification feature owing to redundancy with PDBA and DA.

Training data

The video footage showed that the nature of tether influence was fundamentally different between in situ and laboratory deployments. In the test tank, the jellyfish simply turned slightly whenever it reached the end of the tether, whereas tether influence in situ took the form of sharp yanking or prolonged dragging on the jellyfish. Since our end goal was to detect tether influence in situ, and the nature of in situ tether influence was not replicated in laboratory footage, we only trained and evaluated the tether influence classifier on in situ data. For training the tether-influence classifier, the annotated in situ data yielded 325 s of tether-influenced behavior and 2825 s of uninfluenced behavior (or 83 and 1245 periods, respectively) across all deployments. For training the activity classifier, the annotated laboratory data yielded 366 s (68 periods) and 9201 s (3069 periods) of uninfluenced drifting and uninfluenced swimming behavior, respectively, and the annotated in situ data yielded 79 s (17 periods) and 2740 s (1228 periods) of uninfluenced drifting and uninfluenced swimming behavior, respectively. Since only 17 periods of uninfluenced in situ drifting were observed, we trained the activity classifier on the combined in situ and laboratory data (85 and 4297 periods of drifting and swimming, respectively) to sufficiently capture drifting behavior. To assess the value of incorporating in situ data for training, we also trained the classifier solely on the laboratory data.

Classification methods

Quadratic discriminant analysis

For both tether influence and activity classification, we trained a supervised ML method known as quadratic discriminant analysis (QDA; Hastie et al., 2009), a generalization of the classical linear discriminant analysis method introduced by Fisher (Fisher, 1936; Hastie et al., 2009). QDA models each category in feature space as a multivariate normal distribution with an individual mean and individual covariance matrix. That is, let x∈R denote the feature vector, where p is the number of features, and let y∈{0, 1} denote the categorical label (e.g. swimming versus drifting for the activity classifier). For convention, we let category 1 refer to the minority (less frequent) category, i.e. uninfluenced for tether-influence classification and drifting for activity classification. The data are then modeled as:where μ0, μ1 ∈ R and Σ0, Σ1 ∈ R are the mean and covariance matrix parameters, respectively, and α∈[0, 1] is the probability of category 1 occurring, known as the class prior. We fit the model by computing the maximum likelihood estimates for μ0, μ1, Σ0, Σ1 and α, which are simply the sample means and sample covariances of the categories, and the proportion of category 1 in the training set. Under this model, QDA then classifies a new instance to the category that maximizes the conditional probability (the category that is most likely given the features), which can be accessed via Bayes' rule. As the name implies, the resulting decision boundaries in feature space are quadratic curves. Owing to the simplicity of the model and closed-form nature of the maximum likelihood estimates, QDA is both easy to interpret and fast to train.

Feature selection

There is often a large number of candidate features one can consider for a classifier. Principled methods for choosing an optimal subset of these features can help produce classifiers that perform better (due to the removal of noisy, irrelevant or redundant features), are faster and cheaper to use (since fewer feature need to be measured and processed) and are more interpretable (Dash and Liu, 1997; Guyon and Elisseeff, 2003; Liu and Motoda, 1998). Under the broader umbrella of model selection, feature selection encourages finding the simplest model that explains the data, a principle that is critical for performance generalization (Hastie et al., 2009; MacKay, 2003). We first computed the list of 45 candidate features described above from accelerometer and gyroscope data. As part of training, we used a popular greedy heuristic known as sequential forward selection (SFS; Whitney, 1971), which starts with an empty subset of features and iteratively adds the next feature whose inclusion to the existing subset improves some evaluation metric the most. Despite its simplicity, SFS has been shown to match or outperform more complex search methods by being less prone to overfitting (Reunanen, 2003).

Metric for feature selection

In choosing an evaluation metric for SFS, we observed that our video annotations showed highly skewed category distributions for both classification tasks: tether-influenced periods and drifting were observed far less often than uninfluenced periods and swimming, respectively. In this case, the common metric of accuracy loses meaning, since the accuracy of a simple majority decision rule (i.e. always predict the majority categories: influenced and swimming) is high even though (1) the features are not considered and (2) all instances of the minority category are misclassified. Regardless of category imbalance, the evaluation metric should reflect how well a classifier extracts discriminating information from the features, and should account for the balance of false positives and false negatives on the minority category. In particular, consider precision and recall on the minority category, defined as:where TP denotes the number of true positives, or minority category periods correctly classified as the minority category; FP denotes the number of false positives, or majority category periods incorrectly classified as the minority category; and FN denotes the number of false negatives, or minority category periods incorrectly classified as the majority category. Given a trained probabilistic model of the data, such as the one posed by QDA, the decision rule to classify an instance as category 1 can be formulated in terms of a threshold on the probability p(y=1 | x). Varying this threshold exposes an inherent trade-off between precision and recall: a decision rule with a high threshold, which only selects the category given overwhelmingly high evidence, tends to achieve higher precision at the cost of lower recall. A decision rule with a lenient threshold, which liberally selects the category given only mild evidence, tends to achieve higher recall at the cost of lower precision. This trade-off is captured by the curve in precision–recall space (PR curve) generated by decreasing the decision threshold from 1 to 0, which is often used to characterize classifier performance on tasks with skewed category distributions (Bunescu et al., 2005; Davis and Goadrich, 2006; Fawcett, 2006; Manning and Schütze, 1999). The PR curve allows the analyst to choose an appropriate decision threshold, depending on the relative importance of precision and recall for the task at hand (Manning and Schütze, 1999). We use the area under this curve (AUPRC) as the metric for feature selection, which provides a summary of performance across all possible thresholds (Boyd et al., 2013; Richardson and Domingos, 2006). The AUPRC ranges from 0 to 1, where an ideal classifier that suffers no trade-off has an AUPRC of 1, and a classifier no better than random guessing has an expected AUPRC of the proportion of the category in the dataset. During feature selection, we terminated SFS when the inclusion of the next feature failed to improve the AUPRC by at least 0.02.

Classifier evaluation

Unbiased evaluation of a classifier's performance on unseen data requires complete separation of the data used in the training and evaluation phases. The standard way to evaluate classifier performance is with k-fold cross-validation (CV; Hastie et al., 2009; Kohavi, 1995), in which the annotated dataset is split equally into k parts. For each part, a classifier is trained on the remaining k−1 parts (the training set) and evaluated on the excluded part (the test set) using some evaluation metric, and the average of the resulting k evaluation scores (the CV score) is used as an estimate of the method's evaluation score on unseen data. Since we want to take full advantage of our annotated dataset for training, a final classifier can then be trained on the complete dataset and deployed for future predictions (Cawley and Talbot, 2010; Varma and Simon, 2006). Note that during evaluation, the training phase must include all aspects of model selection, including feature selection and choosing hyperparameter values. However, these aspects are sometimes incorrectly treated as external to the training process: performing hyperparameter and/or feature selection on the complete dataset prior to CV can result in dramatic inflations of the CV score (Ambroise and McLachlan, 2002; Cawley and Talbot, 2010; Smialowski et al., 2010; Varma and Simon, 2006). To remove this selection bias, for each of the k iterations of CV, we performed feature selection solely on the training set using an ‘inner’ CV (Ambroise and McLachlan, 2002; Varma and Simon, 2006). That is, the training set was itself split evenly into k parts, and for each iteration of SFS, each candidate feature was evaluated by: (1) adding it to the current feature subset, (2) training QDA with those features and evaluating it on the k pairs of inner training and test sets, and (3) averaging those k evaluation scores. The feature with the best inner CV score is then selected. After SFS has terminated, we use the finalized feature subset to train and evaluate QDA on the outer CV training and test set. For both the inner and outer CV, we take k=5. After the outer CV is complete, we report the mean±s.e. of the k AUPRC values. We then use the average of the k PR curves to choose a decision threshold (Fawcett, 2006). For the purposes of demonstrating our methods, we prioritize precision and recall equally, and simply choose the threshold value out of {0.1, 0.2, …, 0.9} that yields precision and recall values closest to each other. We call this the equal error rate threshold (EER; Duda et al., 2000) and report the CV precision, recall and accuracy for this classifier. For future studies that prioritize either precision or recall over the other, the researcher can use the average PR curve to pick a threshold that achieves the desired trade-off (Manning and Schütze, 1999; Duda et al., 2000; Fawcett, 2006).

Activity classifier baselines

To see if our featurization and feature selection approach improved activity classification beyond simpler alternatives, we trained and evaluated two baseline classifiers. The first baseline, which we refer to as ODBA thresholding, simply classifies a period as swimming if the mean normalized OBDA is above some decision threshold. Since ODBA is often used as a proxy for energetic expenditure, intuition would suggest it should be sufficient for discriminating swimming from drifting in a noiseless scenario. The second baseline follows our method but only uses accelerometry features, excluding features from the gyroscope data.

In situ behavior prediction

After training and evaluating the classifiers, we used them to predict tether influence and activity on the unannotated in situ data. After removing any periods classified as tether-influenced, we then classified each remaining uninfluenced period as swimming or drifting. These classifications provide estimates of: (1) how often the tether method interferes with the natural movements of jellyfish, and (2) how much time jellyfish spend swimming versus drifting in situ over long durations.

Orientation change

To assess change in orientation during swimming, we computed the difference in heading, pitch and roll angles between and start and end of each period. We converted these differences into a non-negative total angle of rotation (Diebel, 2006), which we refer to as orientation change over a period. We also used circular mean and circular s.d. to compute the means±s.d. of heading, pitch and roll angles over periods. To avoid ill-defined heading and roll values due to gimbal lock, we excluded periods where the absolute pitch angle exceeded 70 deg (this removed 0.7% of total laboratory and in situ periods).

Statistical tests

We ran several statistical tests on the annotated data to investigate potential distinctions between laboratory and in situ behavior, and between tether-influenced and uninfluenced behavior. For the following four tests, we used the nonparametric Mann–Whitney U-test to avoid any distributional assumptions on the quantities of interest, and because we expected to have considerably large sample sizes (each period constitutes only a few seconds of data). We pooled tether-influenced and uninfluenced periods across the in situ deployments, and tested whether either group tends to exhibit (1) greater normalized ODBA or (2) greater orientation change than the other. We also tested these two hypotheses between laboratory and in situ data, by pooling together uninfluenced periods across the laboratory deployments and across the in situ deployments.

RESULTS AND DISCUSSION

Laboratory and in situ deployments

Fig. 2A shows drifter trajectories and timestamps for the 8 in situ deployments in Monterey Bay over three separate days (see Table S1 for laboratory and in situ deployment details). Video footage was successfully captured for 7 of these deployments, and annotated for activity and tether influence as summarized in Table S2 (see Movie 1 for examples). Drifting behavior was observed in 5 deployments, and ranged from 0.5% to 4.3% of the time. Tether influence was also observed in 5 deployments (0.35–28.6%).

Fig. 2.

(A) Trajectories for the three deployment dates. Underlined times (PDT) denote deployment start; italicized times denote when tag was recovered; remaining times denote when tag stopped recording. (B) Maximum overall dynamic body acceleration (ODBA) and (C) total orientation change over annotated tether-influenced (N=83) and uninfluenced periods (N=1245). (D) Cross-validation (CV) precision–recall (PR) curves of the activity classifier, and precision and recall using the equal error rate threshold (EER).

Tether-influence classification

The tether-influence classifier had an AUPRC of 0.860±0.032 (mean±s.e.), and using the EER threshold had a precision of 86.1±5.3%, recall of 73.5±3.8% and accuracy of 97.6±0.4%. In order of selection by SFS, the features were: (1) spectral energy of DA over 8 Hz, (2) mean PDBA, (3) number of peaks in the PDBA, and (4) mean DA. SFS found no additional features that appreciably improved performance. We used the tether-influence classifier to classify each unannotated in situ period as influenced or uninfluenced. Table S3 shows the proportion of each deployment classified as influenced, which ranged from 3.3% to 35.1%. To understand how the tether influenced in situ behavior, we evaluated how normalized ODBA and orientation change differed between annotated uninfluenced and influenced periods (Fig. 2B,C). The maximum normalized ODBA of influenced periods (median 2.05) tended to be larger than that of uninfluenced periods (median 1.27; Mann–Whitney U-test, two-sided P<0.0001). Similarly, orientation change tended to be greater during influenced periods (median 30.8 deg) than during uninfluenced periods (median 5.6 deg; Mann–Whitney U-test, two-sided P<0.0001). That is, jellyfish exhibited greater ODBA and more severe orientation changes when influenced by the tether.

Activity classification

The activity classifier had an AUPRC of 0.747±0.047 (mean±s.e.) and using the EER threshold, had a precision of 74.3±4.9%, recall of 76.4±2.7% and accuracy of 99.0±0.1%. Fig. 2D demonstrates the average PR curve used to identify the EER threshold. The features, in order of selection by SFS, were: (1) number of peaks in the PAV, (2) sparsity of the PAV spectrum and (3) sparsity of the PDBA spectrum. Note that since drifting occupied only 1.9% of the annotated periods, a simple majority prediction rule has an accuracy of 98.1%. The other metrics therefore give more insight into whether the classifier actually learns discriminative information about the categories, rather than simply which category is more common. In comparison to our method, the baseline of ODBA thresholding had an AUPRC of 0.585±0.056 and, with the EER threshold, a precision of 68.9±9.3%, recall of 49.9±3.5% and accuracy of 98.6±0.1%. Training our classifier without gyroscope features and only with accelerometry features gave an AUPRC of 0.685±0.024, precision of 64.1±3.8%, recall of 63.7±4.7% and accuracy of 98.6±0.1% using the EER threshold. We used our method to classify each unannotated in situ period as swimming or drifting, which provided estimates of how much time each jellyfish spent for each activity. We first removed periods predicted to be tether-influenced, so that our estimates are restricted to data representative of natural behavior. The proportion of uninfluenced time each jellyfish was classified as drifting ranged between 0% and 5.6% (Table S3), with the exception of deployment S1-1 (19.1%) which also experienced frequent tether influence (both annotated and predicted). We can then combine the outputs of the influence classifier, activity classifier, and orientation estimation (again, restricted to periods predicted as uninfluenced) to visualize fine-scale information about in situ behavior over several hours (Fig. 3).

Fig. 3.

Fine-scale orientation and predicted activity of deployment S2-2. (A) Estimated heading and roll angle, and predicted tether influence and drifting, over entire deployment. Shaded regions denote one and two standard deviations around the mean. Note that the 1-pixel-width vertical lines are disproportionately wide, as each predicted event only lasts a few seconds. (B) Radial histogram of jellyfish heading relative to the drifter heading at zero. (C) Jellyfish roll angle throughout deployment.

Classifier trained on in situ versus laboratory data

When trained and evaluated only on laboratory data, the activity classifier had an AUPRC of 0.894±0.067 (mean±s.e.) and, using the EER threshold, precision of 87.4±6.1%, recall of 90.3±4.4% and accuracy of 99.4±0.2%. However, predictions made by this classifier on the annotated in situ data had an accuracy of 96.3%, precision of 0% and recall of 0%. We emphasize that this means none of the periods classified as drifting were truly drifting and none of the drifting periods were correctly classified. Similarly, ODBA thresholding had an optimistic AUPRC of 0.865±0.047 (mean±s.e.), precision of 78.2±6.3%, recall of 81.1±4.0% and accuracy of 99.1±0.1% when CV was performed only on laboratory data. However, predictions on annotated in situ data had an accuracy of 90.8%, precision of 0% and recall of 0%.

In situ versus laboratory behavior

The maximum normalized ODBA of uninfluenced laboratory periods (median 1.98) tended to be greater than that of in situ uninfluenced periods (median 1.27; Mann–Whitney U-test, two-sided P<0.0001). Orientation change also tended to be greater during uninfluenced laboratory periods (median 11.6 deg) than during uninfluenced in situ periods (median 5.6 deg; Mann–Whitney U-test, two-sided P<0.0001). Note that test tank walls did not cause turning behavior, since the tether length could not reach them.

Conclusions

Our work provides a pipeline for interpreting fine-scale in situ behavior of a zooplankton species (C. fuscescens) over long durations. Our approach of combining biologging with supervised ML methods yields records of in situ activity and orientation of individual jellyfish for several hours (up to 10 h so far), and may include the first successful in situ deployments of magnetometers and gyroscopes on jellyfish. Using our activity classifier, our estimates of in situ swimming activity on unannotated durations (on average 96.4% of the time; Table S3) is compatible with swimming in our annotated footage (on average 98.7% of behavior not annotated as unknown; Table S2). These long periods of sustained swimming with limited bouts of drifting are consistent with activity budget estimates of other oblate jellyfish (Colin et al., 2003; Costello et al., 1998), whose rowing mode of propulsion has been shown to be energy efficient (Dabiri et al., 2010; Gemmell et al., 2018). In spite of tether influence, uninfluenced periods of data also revealed that tagged animals underwent stereotypical vertical excursions (Fig. 3A; Hays et al., 2012). Although future studies of fine-scale zooplankton behavior would be best conducted with tetherless tag retrieval methods, our approach provides a reasonably precise solution for detecting this influence and removing it, since it may compromise findings on in situ energetics and orientation (Fig. 2B,C; Fossette et al., 2015; Hays et al., 2008). Our findings also highlight the importance of collecting in situ biologging data, rather than captive laboratory data, for developing behavioral classifiers. An assumption fundamental to justifying the deployment of ML methods, is that the data seen during training and inference are drawn from the same underlying distribution (Pan and Yang, 2010; Sugiyama et al., 2007; Zhang et al., 2013). However, classifiers for interpreting accelerometry data have been overwhelmingly trained and validated on laboratory data (Carroll et al., 2014). In doing this, these studies implicitly assume that behavioral data generated in the laboratory is distributionally similar to in situ behavioral data. Our findings suggest that this assumption has limited applicability, even for organisms displaying simple behaviors like swimming and drifting. First, basic descriptive statistics differed significantly between laboratory and in situ data: jellyfish pulses induced greater orientation changes and greater ODBA in the laboratory than in situ. Second, the activity classifier trained and validated solely on laboratory data had optimistic estimates of precision and recall, but performed poorly with zero precision and recall when evaluated on in situ data. We highlight this as a cautionary tale against naively deploying ML classifiers developed on laboratory data in the field. As biologging moves forward, methods involving technologies that capture the behavioral ground truth of in situ data, such as camera tags, are strongly encouraged. Our work also underscores the limitations of ODBA in characterizing even simple in situ behaviors. ODBA thresholding yielded zero precision and recall in classifying in situ swimming and drifting, but performed reasonably well when trained and evaluated on laboratory activity. This suggests that the standard way of computing ODBA may not be robust to dynamic and unpredictable sources of noise in in situ data (Shepard et al., 2008). Beyond accelerometry, our results also show that leveraging information from other sensors (e.g. gyroscope) can improve in situ behavioral classification considerably. Looking forward, our methods open the door to investigating more complex questions about fine-scale zooplankton behavior, such as how these species orient themselves in a current, whether they exhibit rolling behavior or lateral preferences (Fig. 3), and whether their behavioral patterns distinguish them from passive drifters.

26 in total

1. New frontiers in biologging science.

Authors: Christian Rutz; Graeme C Hays
Journal: Biol Lett Date: 2009-03-11 Impact factor: 3.703

2. Supervised accelerometry analysis can identify prey capture by penguins at sea.

Authors: Gemma Carroll; David Slip; Ian Jonsen; Rob Harcourt
Journal: J Exp Biol Date: 2014-11-13 Impact factor: 3.312

3. Tracking the kinematics of caudal-oscillatory swimming: a comparison of two on-animal sensing methods.

Authors: Lucía Martina Martín López; Natacha Aguilar de Soto; Patrick Miller; Mark Johnson
Journal: J Exp Biol Date: 2016-05-19 Impact factor: 3.312

4. Convergence of marine megafauna movement patterns in coastal and open oceans.

Authors: A M M Sequeira; J P Rodríguez; V M Eguíluz; R Harcourt; M Hindell; D W Sims; C M Duarte; D P Costa; J Fernández-Gracia; L C Ferreira; G C Hays; M R Heupel; M G Meekan; A Aven; F Bailleul; A M M Baylis; M L Berumen; C D Braun; J Burns; M J Caley; R Campbell; R H Carmichael; E Clua; L D Einoder; Ari Friedlaender; M E Goebel; S D Goldsworthy; C Guinet; J Gunn; D Hamer; N Hammerschlag; M Hammill; L A Hückstädt; N E Humphries; M-A Lea; A Lowther; A Mackay; E McHuron; J McKenzie; L McLeay; C R McMahon; K Mengersen; M M C Muelbert; A M Pagano; B Page; N Queiroz; P W Robinson; S A Shaffer; M Shivji; G B Skomal; S R Thorrold; S Villegas-Amtmann; M Weise; R Wells; B Wetherbee; A Wiebkin; B Wienecke; M Thums
Journal: Proc Natl Acad Sci U S A Date: 2018-02-26 Impact factor: 11.205

5. Selection bias in gene extraction on the basis of microarray gene-expression data.

Authors: Christophe Ambroise; Geoffrey J McLachlan
Journal: Proc Natl Acad Sci U S A Date: 2002-04-30 Impact factor: 11.205

6. Moving towards acceleration for estimates of activity-specific metabolic rate in free-living animals: the case of the cormorant.

Authors: Rory P Wilson; Craig R White; Flavio Quintana; Lewis G Halsey; Nikolai Liebsch; Graham R Martin; Patrick J Butler
Journal: J Anim Ecol Date: 2006-09 Impact factor: 5.091

7. Accelerometry to Estimate Energy Expenditure during Activity: Best Practice with Data Loggers.

Authors: L G Halsey; J A Green; R P Wilson; P B Frappell
Journal: Physiol Biochem Zool Date: 2009 Jul-Aug Impact factor: 2.247

8. Social behaviour in mesopelagic jellyfish.

Authors: Stein Kaartvedt; Karl I Ugland; Thor A Klevjer; Anders Røstad; Josefin Titelman; Ingrid Solberg
Journal: Sci Rep Date: 2015-06-11 Impact factor: 4.379

9. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

Authors: Monique A Ladds; Adam P Thompson; David J Slip; David P Hocking; Robert G Harcourt
Journal: PLoS One Date: 2016-12-21 Impact factor: 3.240

10. Development and application of a machine learning algorithm for classification of elasmobranch behaviour from accelerometry data.

Authors: L R Brewster; J J Dale; T L Guttridge; S H Gruber; A C Hansell; M Elliott; I G Cowx; N M Whitney; A C Gleiss
Journal: Mar Biol Date: 2018-03-08 Impact factor: 2.573

1 in total

1. Quantifying finer-scale behaviours using self-organising maps (SOMs) to link accelerometery signatures with behavioural patterns in free-roaming terrestrial animals.

Authors: Nicole Galea; Fern Murphy; Joshua L Gaschk; David S Schoeman; Christofer J Clemente
Journal: Sci Rep Date: 2021-06-30 Impact factor: 4.379

1 in total