| Literature DB >> 31623092 |
Abu Zar Shafiullah1, Jessica Werner2, Emer Kennedy3, Lorenzo Leso4, Bernadette O'Brien5, Christina Umstätter6.
Abstract
Sensor technologies that measure grazing and ruminating behaviour as well as physical activities of individual cows are intended to be included in precision pasture management. One of the advantages of sensor data is they can be analysed to support farmers in many decision-making processes. This article thus considers the performance of a set of RumiWatchSystem recorded variables in the prediction of insufficient herbage allowance for spring calving dairy cows. Several commonly used models in machine learning (ML) were applied to the binary classification problem, i.e., sufficient or insufficient herbage allowance, and the predictive performance was compared based on the classification evaluation metrics. Most of the ML models and generalised linear model (GLM) performed similarly in leave-out-one-animal (LOOA) approach to validation studies. However, cross validation (CV) studies, where a portion of features in the test and training data resulted from the same cows, revealed that support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGBoost) performed relatively better than other candidate models. In general, these ML models attained 88% AUC (area under receiver operating characteristic curve) and around 80% sensitivity, specificity, accuracy, precision and F-score. This study further identified that number of rumination chews per day and grazing bites per minute were the most important predictors and examined the marginal effects of the variables on model prediction towards a decision support system.Entities:
Keywords: binary classification; feeding behaviour and activities; herbage allowance; machine learning; precision pasture management
Mesh:
Year: 2019 PMID: 31623092 PMCID: PMC6832637 DOI: 10.3390/s19204479
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
List of feeding behaviour and activity related variables used in the classification models.
| Notation | Grazing Behaviour |
|---|---|
| BITEFREQ | Bite frequency or grazing bites per min (n/min) |
| GRAZINGSTART | Number of grazing bouts started per day (grazing bout = minimum duration of 7 min and intra-bout interval is smaller than 7 min [ |
|
| |
| RUMINATECHEW | Number of rumination chews per day (n/day) |
| RUMICHEWBOLUS | Mean number of rumination chews per bolus (n/bolus) |
| RUMIBOUTLENGTH | Mean duration of a rumination bout (rumination bout = minimum duration of 3 min and intra-bout interval is smaller than 1 min [ |
| RUMIBOUTTIME | Time of rumination within all rumination bouts (min/day) |
|
| |
| HACTIVITY | Head movement activity index (n) based on accelerometer data; the averaged variance of 3-dimensional acceleration captured on the head in 10-s segments |
| LAYDOWN | Number of event (n) at which the pedometer angle changes its position from a vertical angle towards a horizontal angle for a duration of at least 50 s when the cow is lying down or standing up [ |
Figure 1Side-by-side box plots of selected variables using the combined data for sufficient (100%) and insufficient (60%) herbage allowance groups.
Independent samples t-tests for significant differences of means of each predictor in the 100% and 60% herbage allowance groups using S2, S6, M2, M6, L2 and L6 data.
| Variable | S2 | S6 | M2 | M6 | L2 | L6 |
|---|---|---|---|---|---|---|
| BITEFREQ | ||||||
| GRAZINGSTART |
|
|
|
| ||
| RUMINATECHEW | ||||||
| RUMICHEWBOLUS |
| |||||
| RUMIBOUTLENGTH | ||||||
| RUMIBOUTTIME |
| |||||
| HACTIVITY |
| |||||
| LAYDOWN |
|
P-value: *** < 0.001; ** < 0.01; * < 0.05.
Independent samples t-tests for significant differences of means of each predictor in the 100% and 60% herbage allowance groups using W2, W6 and combined data.
| Variable | W2 | W6 | Combined |
|---|---|---|---|
| BITEFREQ | |||
| GRAZINGSTART |
| ||
| RUMINATECHEW | |||
| RUMICHEWBOLUS | |||
| RUMIBOUTLENGTH | |||
| RUMIBOUTTIME | |||
| HACTIVITY | |||
| LAYDOWN |
P-value: *** < 0.001; ** < 0.01; * < 0.05.
ANOVA F-test and multiple comparisons tests for the blocks of 60% and 100% herbage allowance groups using each predictor of the combined data.
| Variable | S2 | S6 | M2 | M6 | L2 | L6 | F-Test |
|---|---|---|---|---|---|---|---|
| BITEFREQ |
|
| |||||
| GRAZINGSTART |
|
|
|
|
|
| |
| RUMINATECHEW |
|
| |||||
| RUMICHEWBOLUS |
|
| |||||
| RUMIBOUTLENGTH |
| ||||||
| RUMIBOUTTIME |
|
| |||||
| HACTIVITY |
|
|
| ||||
| LAYDOWN |
|
|
|
P-value: *** < 0.001; ** < 0.01; * < 0.05.
List of machine learning methods with R packages.
| Machine Learning | R Package | Function(s) |
|---|---|---|
| k-Nearest Neighbour (kNN) | class [ | knn |
| Linear Discriminant Analysis (LDA) | MASS [ | lda |
| Neural Network (NNET) | nnet [ | nnet |
| Naïve Bayes (NB) | e1071 [ | naiveBayes |
| Support Vector Machine(SVM) | e1071 [ | svm |
| Decision Tree (DT) | rpart [ | rpart |
| Random Forest (RF) | randomForest [ | randomForest |
| Extreme Gradient Boosting (XGBoost) | xgboost [ | xgb.DMatrix, xgb.train |
Confusion matrix for estimating the classification evaluation metrics based on the number of actual and predicted classes among the test cases.
| Predicted Herbage Allowance | Allocated Herbage Allowance | |
|---|---|---|
| Insufficient | Sufficient | |
| Insufficient | True Positive (TP) | False Positive (FP) |
| Sufficient | False Negative (FN) | True Negative (TN) |
Estimators of sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) in terms of the number of true positive (TP), false positive (FP), true negative (TN) and false negative (FN) classes among the test cases.
| Evaluation Metric | Estimator |
|---|---|
| Sensitivity |
|
| Specificity |
|
| Accuracy |
|
| Positive predictive value (PPV) |
|
| F-score |
|
| AUC | Area under ROC curve |
Predictive performance of machine learning and generalised linear models based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) using leave-out-one-animal approach to validation studies for combined data.
| Classifier | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| kNN | 0.70 | 0.71 | 0.71 | 0.64 | 0.67 | 0.78 |
| NB | 0.72 |
|
|
|
|
|
| NNET | 0.77 | 0.67 | 0.71 | 0.63 |
| 0.80 |
| LDA |
| 0.65 | 0.70 | 0.62 | 0.69 | 0.79 |
| DT | 0.74 | 0.67 | 0.70 | 0.63 | 0.68 | 0.78 |
| SVM | 0.74 | 0.61 | 0.67 | 0.59 | 0.66 | 0.74 |
| XGBoost | 0.73 | 0.59 | 0.65 | 0.57 | 0.64 | 0.72 |
| RF | 0.75 | 0.63 | 0.68 | 0.60 | 0.67 | 0.76 |
| GLM | 0.74 | 0.64 | 0.69 | 0.63 |
| 0.76 |
The estimates in bold correspond to the best models.
Predictive performance of machine learning and generalised linear models based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) using cross validation studies for combined data.
| Classifier | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| kNN | 0.66 (0.004) | 0.67 (0.003) | 0.67 (0.002) | 0.65 (0.003) | 0.65 (0.003) | 0.73 (0.003) |
| NB | 0.73 (0.004) | 0.73 (0.003) | 0.73 (0.002) | 0.71 (0.004) | 0.72 (0.003) | 0.81 (0.003) |
| NNET | 0.76 (0.004) | 0.78 (0.004) | 0.76 (0.002) | 0.76 (0.004) | 0.76 (0.003) | 0.85 (0.003) |
| LDA | 0.76 (0.003) | 0.78 (0.004) | 0.77 (0.002) | 0.77 (0.004) | 0.76 (0.002) | 0.85 (0.002) |
| DT | 0.75 (0.003) | 0.76 (0.004) | 0.75 (0.003) | 0.74 (0.005) | 0.74 (0.003) | 0.83 (0.003) |
| SVM | ||||||
| XGBoost | ||||||
| RF | ||||||
| GLM | 0.76 (0.004) | 0.77 (0.003) | 0.76 (0.002) | 0.76 (0.004) | 0.76 (0.003) | 0.85 (0.003) |
Standard errors are indicated in parentheses. The estimates in bold correspond to the best models.
Predictive performance of generalised linear model based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction periods among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.78 (0.01) | 0.81 (0.006) | 0.80 (0.006) | 0.68 (0.013) | 0.72 (0.01) | 0.88 (0.007) |
| S6 | 0.84 (0.01) | 0.88 (0.006) | 0.86 (0.005) | 0.81 (0.01) | 0.82 (0.008) | 0.94 (0.004) |
| M2 | 0.82 (0.012) | 0.82 (0.007) | 0.81 (0.006) | 0.69 (0.014) | 0.74 (0.01) | 0.89 (0.007) |
| M6 | 0.78 (0.014) | 0.80 (0.005) | 0.79 (0.004) | 0.62 (0.012) | 0.68 (0.009) | 0.87 (0.006) |
| L2 | 0.74 (0.015) | 0.81 (0.007) | 0.78 (0.006) | 0.63 (0.016) | 0.67 (0.012) | 0.85 (0.008) |
| L6 | 0.81 (0.02) | 0.84 (0.005) | 0.83 (0.005) | 0.67 (0.02) | 0.72 (0.013) | 0.90 (0.007) |
| W2 | 0.74 (0.008) | 0.84 (0.004) | 0.81 (0.003) | 0.63 (0.008) | 0.68 (0.006) | 0.87 (0.004) |
| W6 | 0.71 (0.009) | 0.85 (0.003) | 0.81 (0.003) | 0.61 (0.008) | 0.65 (0.007) | 0.86 (0.004) |
Standard errors are indicated in parentheses.
Predictive performance of random forest based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction periods among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.90 (0.011) | 0.87 (0.007) | 0.88 (0.006) | 0.78 (0.015) | 0.84 (0.01) | 0.96 (0.005) |
| S6 | 0.91 (0.008) | 0.91 (0.005) | 0.91 (0.004) | 0.86 (0.01) | 0.88 (0.006) | 0.97 (0.002) |
| M2 | 0.89 (0.015) | 0.87 (0.007) | 0.88 (0.006) | 0.79 (0.015) | 0.83 (0.013) | 0.95 (0.008) |
| M6 | 0.89 (0.009) | 0.88 (0.004) | 0.89 (0.004) | 0.79 (0.011) | 0.83 (0.007) | 0.95 (0.002) |
| L2 | 0.87 (0.011) | 0.89 (0.006) | 0.88 (0.005) | 0.79 (0.014) | 0.82 (0.009) | 0.95 (0.005) |
| L6 | 0.91 (0.015) | 0.90 (0.005) | 0.90 (0.005) | 0.80 (0.02) | 0.85 (0.013) | 0.96 (0.005) |
| W2 | 0.78 (0.008) | 0.84 (0.004) | 0.82 (0.003) | 0.62 (0.01) | 0.68 (0.007) | 0.88 (0.004) |
| W6 | 0.78 (0.006) | 0.88 (0.003) | 0.85 (0.002) | 0.69 (0.009) | 0.73 (0.006) | 0.91 (0.002) |
Standard errors are indicated in parentheses.
Predictive performance of support vector machine based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction period among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.88 (0.011) | 0.82 (0.06) | 0.84 (0.005) | 0.68 (0.011) | 0.76 (0.008) | 0.92 (0.005) |
| S6 | 0.88 (0.008) | 0.90 (0.005) | 0.89 (0.004) | 0.84 (0.009) | 0.86 (0.007) | 0.96 (0.002) |
| M2 | 0.91 (0.013) | 0.82 (0.008) | 0.85 (0.006) | 0.68 (0.013) | 0.78 (0.011) | 0.94 (0.006) |
| M6 | 0.89 (0.012) | 0.83 (0.005) | 0.85 (0.005) | 0.68 (0.016) | 0.76 (0.012) | 0.93 (0.004) |
| L2 | 0.85 (0.012) | 0.85 (0.005) | 0.85 (0.004) | 0.72 (0.013) | 0.77 (0.008) | 0.92 (0.004) |
| L6 | 0.87 (0.014) | 0.79 (0.005) | 0.81 (0.004) | 0.52 (0.017) | 0.64 (0.014) | 0.89 (0.005) |
| W2 | 0.79 (0.007) | 0.83 (0.004) | 0.82 (0.003) | 0.60 (0.008) | 0.68 (0.006) | 0.89 (0.003) |
| W6 | 0.77 (0.007) | 0.87 (0.002) | 0.84 (0.002) | 0.66 (0.007) | 0.70 (0.005) | 0.90 (0.003) |
Standard errors are indicated in the parentheses.
Predictive performance of extreme gradient boosting based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction period among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.89 (0.011) | 0.89 (0.007) | 0.89 (0.006) | 0.83 (0.013) | 0.85 (0.009) | 0.96 (0.005) |
| S6 | 0.89 (0.009) | 0.91 (0.005) | 0.90 (0.004) | 0.85 (0.01) | 0.87 (0.006) | 0.96 (0.002) |
| M2 | 0.85 (0.014) | 0.87 (0.008) | 0.86 (0.007) | 0.79 (0.015) | 0.81 (0.011) | 0.94 (0.008) |
| M6 | 0.85 (0.01) | 0.88 (0.004) | 0.87 (0.004) | 0.79 (0.012) | 0.81 (0.009) | 0.94 (0.003) |
| L2 | 0.85 (0.013) | 0.90 (0.006) | 0.88 (0.006) | 0.81 (0.014) | 0.82 (0.011) | 0.95 (0.006) |
| L6 | 0.90 (0.02) | 0.92 (0.003) | 0.91 (0.004) | 0.83 (0.02) | 0.86 (0.015) | 0.97 (0.005) |
| W2 | 0.72 (0.008) | 0.84 (0.004) | 0.80 (0.004) | 0.64 (0.008) | 0.67 (0.006) | 0.86 (0.004) |
| W6 | 0.75 (0.005) | 0.88 (0.003) | 0.84 (0.002) | 0.71 (0.007) | 0.73 (0.004) | 0.91 (0.002) |
Standard errors are indicated in the parentheses.
Predictive performance of linear discriminant analysis based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction period among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.77 (0.003) | 0.76 (0.002) | 0.76 (0.002) | 0.55 (0.003) | 0.64 (0.003) | 0.84 (0.002) |
| S6 | 0.81 (0.002) | 0.84 (0.002) | 0.83 (0.002) | 0.74 (0.003) | 0.77 (0.002) | 0.90 (0.002) |
| M2 | 0.82 (0.003) | 0.81 (0.002) | 0.81 (0.002) | 0.67 (0.003) | 0.73 (0.002) | 0.89 (0.002) |
| M6 | 0.80 (0.003) | 0.77 (0.002) | 0.78 (0.002) | 0.54 (0.003) | 0.64 (0.003) | 0.86 (0.002) |
| L2 | 0.71 (0.003) | 0.77 (0.002) | 0.75 (0.002) | 0.53 (0.004) | 0.60 (0.003) | 0.81 (0.002) |
| L6 | 0.75 (0.003) | 0.77 (0.001) | 0.76 (0.001) | 0.48 (0.003) | 0.58 (0.002) | 0.82 (0.001) |
| W2 | 0.69 (0.003) | 0.79 (0.001) | 0.77 (0.001) | 0.51 (0.003) | 0.58 (0.002) | 0.81 (0.002) |
| W6 | 0.73 (0.003) | 0.83 (0.001) | 0.81 (0.001) | 0.56 (0.003) | 0.63 (0.002) | 0.85 (0.001) |
Standard errors are indicated in the parentheses.
Predictive performance of neural network based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction period among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.78 (0.003) | 0.77 (0.002) | 0.77 (0.002) | 0.59 (0.003) | 0.66 (0.003) | 0.85 (0.002) |
| S6 | 0.82 (0.004) | 0.86 (0.001) | 0.84 (0.002) | 0.76 (0.004) | 0.79 (0.003) | 0.92 (0.002) |
| M2 | 0.81 (0.003) | 0.82 (0.002) | 0.81 (0.002) | 0.69 (0.003) | 0.74 (0.002) | 0.89 (0.002) |
| M6 | 0.78 (0.003) | 0.78 (0.002) | 0.77 (0.002) | 0.56 (0.004) | 0.64 (0.003) | 0.85 (0.002) |
| L2 | 0.74 (0.003) | 0.80 (0.002) | 0.77 (0.002) | 0.60 (0.005) | 0.65 (0.003) | 0.84 (0.002) |
| W2 | 0.68 (0.003) | 0.80 (0.003) | 0.77 (0.001) | 0.53 (0.003) | 0.59 (0.002) | 0.81 (0.003) |
| W6 | 0.70 (0.002) | 0.83 (0.001) | 0.80 (0.001) | 0.57 (0.003) | 0.62 (0.002) | 0.84 (0.002) |
Standard errors are indicated in the parentheses.
Predictive performance of naïve Bayes based on the estimated sensitivity, specificity, accuracy, positive predictive value (PPV), F-score and the area under receiver operating characteristic curve (AUC) for two-week and six-week restriction period among the cows in early (S), mid (M) and late (L) lactation stage using cross validation studies.
| Subset | Sensitivity | Specificity | Accuracy | PPV | F-Score | AUC |
|---|---|---|---|---|---|---|
| S2 | 0.74 (0.003) | 0.78 (0.002) | 0.76 (0.002) | 0.61 (0.003) | 0.66 (0.002) | 0.83 (0.002) |
| S6 | 0.78 (0.003) | 0.86 (0.002) | 0.82 (0.001) | 0.77 (0.003) | 0.77 (0.002) | 0.90 (0.001) |
| M2 | 0.75 (0.003) | 0.80 (0.002) | 0.77 (0.002) | 0.66 (0.003) | 0.69 (0.002) | 0.85 (0.002) |
| M6 | 0.66 (0.003) | 0.76 (0.002) | 0.72 (0.002) | 0.54 (0.003) | 0.59 (0.003) | 0.78 (0.002) |
| L2 | 0.72 (0.003) | 0.82 (0.002) | 0.79 (0.002) | 0.67 (0.004) | 0.69 (0.003) | 0.86 (0.002) |
| L6 | 0.71 (0.003) | 0.81 (0.002) | 0.78 (0.002) | 0.60 (0.004) | 0.65 (0.003) | 0.84 (0.002) |
| W2 | 0.62 (0.003) | 0.81 (0.001) | 0.75 (0.001) | 0.57 (0.003) | 0.59 (0.002) | 0.80 (0.002) |
| W6 | 0.64 (0.003) | 0.85 (0.001) | 0.79 (0.001) | 0.64 (0.003) | 0.63 (0.002) | 0.83 (0.002) |
Standard errors are indicated in the parentheses.
Figure 2Variable importance plots for random forest based on mean decrease of Gini coefficients.
Figure 3Partial dependence plots for the marginal effects of the predictors on the random forest model: Estimated probability of insufficient allowance versus the observed values of each predictor in the combined data.
Figure 4Pairwise correlations among the important predictors in the random forest model. Intensity of blue and red colour represents the strength of positive and negative correlation, respectively.
Figure 5Contour plots for the partial dependencies of random forest on correlated predictors: Colour intensity represents the estimated probability of insufficient allowance versus the recorded predictors in the combined data.
Observed ranges of predictor values that correspond to the prediction of sufficient and insufficient herbage allowance by the random forest model.
| Predictor Variables |
|
| ||
|---|---|---|---|---|
| Min | Max | Min | Max | |
| BITEFREQ | 46 | 63 | 64 | 82 |
| GRAZINGSTART | 8 | 13 | 2 | 7 |
| RUMINATECHEW | 28,397 | 44,062 | 8461 | 27,685 |
| RUMICHEWBOLUS | 61 | 69 | 39 | 50 |
| RUMIBOUTLENGTH | 33 | 58 | 15 | 32 |
| RUMIBOUTTIME | 367 | 631 | 205 | 469 |
| HACTIVITY | 69 | 110 | 111 | 170 |
| LAYDOWN | 8 | 13 | 2 | 7 |