| Literature DB >> 35336494 |
Dejan Pavlovic1, Mikolaj Czerkawski2, Christopher Davison2, Oskar Marko1, Craig Michie2, Robert Atkinson2, Vladimir Crnojevic1, Ivan Andonovic2, Vladimir Rajovic3, Goran Kvascev3, Christos Tachtatzis2.
Abstract
Monitoring and classification of dairy cattle behaviours is essential for optimising milk yields. Early detection of illness, days before the critical conditions occur, together with automatic detection of the onset of oestrus cycles is crucial for obviating prolonged cattle treatments and improving the pregnancy rates. Accelerometer-based sensor systems are becoming increasingly popular, as they are automatically providing information about key cattle behaviours such as the level of restlessness and the time spent ruminating and eating, proxy measurements that indicate the onset of heat events and overall welfare, at an individual animal level. This paper reports on an approach to the development of algorithms that classify key cattle states based on a systematic dimensionality reduction process through two feature selection techniques. These are based on Mutual Information and Backward Feature Elimination and applied on knowledge-specific and generic time-series extracted from raw accelerometer data. The extracted features are then used to train classification models based on a Hidden Markov Model, Linear Discriminant Analysis and Partial Least Squares Discriminant Analysis. The proposed feature engineering methodology permits model deployment within the computing and memory restrictions imposed by operational settings. The models were based on measurement data from 18 steers, each animal equipped with an accelerometer-based neck-mounted collar and muzzle-mounted halter, the latter providing the truthing data. A total of 42 time-series features were initially extracted and the trade-off between model performance, computational complexity and memory footprint was explored. Results show that the classification model that best balances performance and computation complexity is based on Linear Discriminant Analysis using features selected through Backward Feature Elimination. The final model requires 1.83 ± 1.00 ms to perform feature extraction with 0.05 ± 0.01 ms for inference with an overall balanced accuracy of 0.83.Entities:
Keywords: cattle behaviour monitoring; feature selection; precision agriculture
Mesh:
Year: 2022 PMID: 35336494 PMCID: PMC8951529 DOI: 10.3390/s22062323
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Placement of a RumiWatch muzzle-mounted halter and Afimilk Silent Herdsman neck-mounted collar. (a) Axis orientation diagram. (b) Photograph illustrating sensor placement.
Brief description of generic and knowledge-specific time-series features. All the features used within the analysis are derived using the tsfresh Python package [23] with the exception of FFT amplitude and Spectral flatness.
| Features | Definition |
|---|---|
| Aggregated autocorrelation | Standard deviation of autocorrelation function over a range of different |
| Autoregressive coefficient | Coefficient of the unconditional maximum likelihood of an autoregressive process |
| Autocorrelation |
|
| Benford correlation | Correlation of the time-series first digit distribution with N-B Law distribution |
| Binned entropy |
|
| Change quantiles | Standard deviation of changes of the time-series within the first and third quartile range |
| Complexity-invariant distance |
|
| Count above global mean | Number of observations higher than the mean value estimated on the training set |
| Count above local mean | Number of observations higher than the time-series mean |
| Count below global mean | Number of observations lower than the mean value estimated on the training set |
| Count below local mean | Number of observations lower than the time-series mean |
| c3 |
|
| Energy |
|
| FFT aggregated | Kurtosis of the absolute Fourier transform spectrum |
| FFT amplitude | Maximum of FFT magnitudes between 2 and 4 Hz |
| FFT coefficient | Sum of the FFT magnitudes between 2 and 4 Hz |
| First quartile | The value surpassed by exactly 25% of the time-series data points |
| Fourier entropy | Binned entropy of the time-series power spectral density |
| Kurtosis | Difference between the tails of analysed distribution and tails of a normal distribution |
| Lempel-Ziv complexity | Complexity estimate based on the Lempel-Ziv compression algorithm |
| Linear trend | Standard error of the estimated linear regression gradient |
| Longest strike above mean | Length of the longest sequence in time-series higher than its mean value |
| Longest strike below mean | Length of the longest sequence in time-series lower than its mean value |
| Maximum | The highest value in time-series |
| Median | The value surpassed by exactly 50% of the time-series data points |
| Minimum | The lowest value in time-series. |
| Number of CWT peaks | Number of peaks within ricker wavelet smoothed time-series |
| Number of peaks | Number of observations with a value higher than |
| Partial autocorrelation |
|
| Permutation entropy | Entropy of ordering permutations occurring in fixed-length time-series window chunks |
| Range count | Number of observations between the first and the third time-series quartile |
| Ratio beyond | Percentage of observations diverging from the mean by more than |
| Sample entropy | Negative logarithm of the conditional probability that two sequences remain similar |
| Skewness | Distortion or asymmetry that deviates from the normal distribution |
| Spectral flatness | Ratio between geometric and arithmetic mean of the power spectrum |
| Spectral Welch density | Power spectral density estimation using the Welch method at a certain frequency |
| Standard deviation |
|
| Sum of changes |
|
| Third quartile | The value surpassed by exactly 75% of the time-series data points |
| Time-series sum |
|
| Variation coefficient | Relative standard deviation, i.e., ratio of the standard deviation to the mean |
| Zero crossing | Number of points where time-series signal crosses a zero value |
† where p indicates percentage of samples falling into the given bin.
Figure 2A block diagram showing the methodology starting from the raw data to training and evaluation of the classification algorithms. The red arrows indicate the adopted methodology followed in this work.
Figure 3Balanced Accuracy for HMM, LDA and PLS-DA classification algorithms for two feature selection methodologies; MI and BFE for varying number of selected features. For PLS-DA the number of re-projected feature dimensions were varied to explore sensitivity of the hyper-parameter. The ⋆ denotes models with maximum balanced accuracy performance, while the ⋄ denotes models that were manually selected and balance the trade-off between balanced accuracy and time complexity.
Comparison of model performance and time complexity for MI and BFE feature selection approaches for HMM, LDA and PLS-DA classification algorithms. The ⋆ models achieve maximum balanced accuracy performance, while the ⋄ models are those that are manually selected and balance the trade-off between balanced accuracy and time complexity.
| Feature Selection | Classification | # of Input | Balanced | Time Complexity [ms] | ||
|---|---|---|---|---|---|---|
| Technique | Method | Features | Accuracy | Extraction | Inference | Total |
| MI | HMM ⋆ | 42 | 0.77 |
|
|
|
| HMM ⋄ | 22 | 0.74 |
|
|
| |
| LDA ⋆ | 42 | 0.81 |
|
|
| |
| LDA ⋄ | 27 | 0.80 |
|
|
| |
| PLS-DA ⋆ | 42 | 0.79 |
|
|
| |
| PLS-DA ⋄ | 27 | 0.77 |
|
|
| |
| BFE | HMM | 12 | 0.80 |
|
|
|
| LDA ⋆ | 27 | 0.81 |
|
|
| |
|
|
|
|
|
| ||
| PLS-DA ⋆ | 22 | 0.80 |
|
|
| |
| PLS-DA ⋄ | 17 | 0.79 |
|
|
| |
Figure 4Graphical comparison of dimensionality reduction and classification algorithms, in terms of time complexity and performance.
Figure 5Confusion matrices for the selected classification model based on a LDA utilising features selected through BFE that yielded the best trade-off between model performance and complexity—BFE-LDA ‘⋄’. (a) Validation dataset. (b) Test dataset.
Figure 6Number of reduction steps the features survived for MI and BFE selection methods. The ⋄ annotations represent the seven features selected by BFE-LDA ‘⋄’ and red bars the seven features that survived most reductions for each feature selection algorithm.
Figure 7Joint distribution of feature pairs selected by BFE-LDA ‘⋄’ with class annotations provided by the truthing data. Note that the diagonal plots are the univariate distributions of each feature.
Individual classification performance per steer in terms of weighted performance metrics on the test set.
| Test Steer | Balanced Accuracy | Precision | Recall |
|---|---|---|---|
| #1 | 0.82 | 0.86 | 0.85 |
| #2 | 0.86 | 0.90 | 0.87 |
| #3 | 0.80 | 0.89 | 0.79 |
| Average | 0.83 ± 0.03 | 0.88 ± 0.02 | 0.83 ± 0.04 |