| Literature DB >> 35625096 |
Xiaojing Zhou1,2, Chuang Xu2, Hao Wang3, Wei Xu4, Zixuan Zhao2, Mengxing Chen2, Bin Jia3, Baoyin Huang3.
Abstract
We use multidimensional data from automated monitoring systems and milking systems to predict disorders of dairy cows by employing eight machine learning algorithms. The data included the season, days in milking, parity, age at the time of disorders, milk yield (kg/day), activity (unitless), six variables related to rumination time, and two variables related to the electrical conductivity of milk. We analyze 131 sick cows and 149 healthy cows with identical lactation days and parity; all data are collected on the same day, which corresponds to the diagnosis day for disordered cows. For disordered cows, each variable, except the ratio of rumination time from daytime to nighttime, displays a decreasing/increasing trend from d-7 or d-3 to d0 and/or d-1, with the d0, d-1, or d-2 values reaching the minimum or maximum. The test data sensitivity for three algorithms exceeded 80%, and the accuracies of the eight algorithms ranged from 65.08% to 84.21%. The area under the curve (AUC) of the three algorithms was >80%. Overall, Rpart best predicts the disorders with an accuracy, precision, and AUC of 81.58%, 92.86%, and 0.908, respectively. The machine learning algorithms may be an appropriate and powerful decision support and monitoring tool to detect herds with common health disorders.Entities:
Keywords: disorders; electrical conductivity of milk; machine learning; milk yield; prediction; rumination
Year: 2022 PMID: 35625096 PMCID: PMC9137925 DOI: 10.3390/ani12101251
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 3.231
Sources of information (automated monitoring system and milking system) and traits measured with variables at different measurement intervals, summed up to the daily values.
| Sensor | Trait | Variable | Measurements Interval | Unit |
|---|---|---|---|---|
| Automatic monitoring system | Activity | Activity | min/2 h | min/day |
| Rumination | Daily Rumination time | min/2 h | min/day | |
| Rumination deviation per 2-h | min/2 h | min/day | ||
| The sum of absolute values of the weighted rumination variation | No./2 h | min/day | ||
| Rumination at daytime | min/2 h | min/day | ||
| Rumination at nighttime | min/2 h | min/day | ||
| The ratio of rumination time at daytime to that at nighttime | No./2 h | Non | ||
| Milking system | Milk yield | Daily milk yield | kg/day | kg/day |
| Electrical conductivity of milk | Daily percentage of change of the electrical conductivity of milk | No./milking shift | Non | |
| peak electrical conductivity of milk | mS/cm/milking shift | mS/cm |
Performance of eight machine learning algorithms with six measuring criteria.
| Model | Sensitivity | Specificity | Accuracy | Precision | F1-Score | AUC (Confidence Interval) |
|---|---|---|---|---|---|---|
| Logistic | 0.6071 | 0.7143 | 0.6667 | 0.6296 | 0.6182 | 0.685 ([0.576, 0.794]) |
| SVM | 0.7857 | 0.8750 | 0.8421 | 0.7857 | 0.7857 | 0.744 ([0.598, 0.890]) |
| Rpart | 0.6842 | 0.9474 | 0.8158 | 0.9286 | 0.7879 | 0.908 ([0.723, 0.930]) |
| Random forest | 0.8333 | 0.8462 | 0.8421 | 0.7143 | 0.7692 | 0.854 ([0.695, 0.951]) |
| eXtreme Gradient | 0.5882 | 0.8056 | 0.7358 | 0.5882 | 0.5882 | 0.828 ([0.714, 0.942]) |
| Adaboost | 0.8000 | 0.7857 | 0.7895 | 0.5714 | 0.6667 | 0.744 ([0.598, 0.890]) |
| Naïve Bayes | 0.8462 | 0.6800 | 0.7143 | 0.4074 | 0.5500 | 0.676 ([0.574, 0.778]) |
| kknn | 0.4815 | 0.7778 | 0.6508 | 0.6190 | 0.5417 | 0.630 ([0.511, 0.748]) |
AUC, area under the receiver operating characteristic curve.
Figure 1Box plots with the error bar and significance of each variable for cows with health disorders and healthy ones. Subgraphs (A) depicts the difference of daily milk yield of the cows with disorders and the healthy ones, (B) the difference of daily activity of the two groups, (C) the difference of daily rumination time of the two groups, (D) the difference of rumination time at daytime of the two groups, (E) the difference of rumination time at nighttime of the two groups, (F) the difference of the ratio of the rumination time at daytime to nighttime of the two groups, (G) the difference of the absolute value of rumination deviation every 2 h of the two groups, (H) the sum of the absolute value of the weighted rumination variation of the two groups, (I) the daily percentage of the change in the electrical conductivity of milk of the two groups, and (J) the peak electrical conductivity of milk of the two groups, respectively, at a significant level of 0.001. The x-axis is defined as the time from d-7 to d0, i.e., 7 days before the diagnosis is denoted as d-7, 6 days before diagnosis denoted as d-6, 5 days before the diagnosis is denoted as d-5, 4 days before the diagnosis is denoted as d-4, 3 days before the diagnosis is denoted as d-3, 2 days before the diagnosis is denoted as d-2, 1 day before the diagnosis is denoted as d-1, and the diagnosis day is denoted as d0. The y-axis presents the values of these variables. Violet-red and blue represent the “disordered” and “healthy” groups, respectively. The error bars represented standard deviations of the value of each variable on d-7 or d-3 to d0. “***” represented the difference of the two groups was at the 0.001 level of significance.
Figure 2Importance of features for the ten variables in decision tree classification.
Figure 3Importance of features for the ten variables in eXtreme Gradient classification.
Figure 4Importance of the features for the ten variables in Adaboost classification.
Figure 5Receiver operating curve (ROC) of train data (75%) and test data (25%) of the Rpart algorithm.