| Literature DB >> 35565577 |
Alexey Ruchay1,2, Svetlana Gritsenko3, Evgenia Ermolova3, Alexander Bochkarev3, Sergey Ermolov3, Hao Guo4, Andrea Pezzuolo5.
Abstract
Live weight is an important indicator of livestock productivity and serves as an informative measure for the health, feeding, breeding, and selection of livestock. In this paper, the live weight of pig was estimated using six morphometric measurements, weight at birth, weight at weaning, and age at weaning. This study utilised a dataset including 340 pigs of the Duroc, Landrace, and Yorkshire breeds. In the present paper, we propose a comparative analysis of various machine learning methods using outlier detection, normalisation, hyperparameter optimisation, and stack generalisation to increase the accuracy of the predictions of the live weight of pigs. The performance of live weight prediction was assessed based on the evaluation criteria: the coefficient of determination, the root-mean-squared error, the mean absolute error, and the mean absolute percentage error. The performance measures in our experiments were also validated through 10-fold cross-validation to provide a robust model for predicting the pig live weight. The StackingRegressor model was found to provide the best results with an MAE of 4.331 and a MAPE of 4.296 on the test dataset.Entities:
Keywords: body measurement; ensemble methods; live pig weight estimation; machine learning; prediction; regression algorithm
Year: 2022 PMID: 35565577 PMCID: PMC9104573 DOI: 10.3390/ani12091152
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 3.231
Figure 1Histogram of the distribution of the live weight by breed.
Figure 2Picture of six measured body dimensions of a pig: (1) body length, (2) chest girth, (3) withers height, (4) chest depth, (5) chest width, and (6) metacarpus girth.
Figure 3In the boxplots, the whiskers show the range, the boxes show the upper and lower quartile and median (solid dark horizontal line) values, and red points are outliers.
The mean values, standard deviation (SD), and coefficient of variation (CV) of each feature.
| Features | Mean | SD | CV (%) |
|---|---|---|---|
| Live weight (kg) | 101.78 | 6.51 | 6.40 |
| Weight at birth (kg) | 1.21 | 0.12 | 10.13 |
| Weight at weaning (kg) | 6.11 | 0.77 | 12.61 |
| Age at weaning (days) | 24.56 | 3.04 | 12.37 |
| Body length (cm) | 113.73 | 5.27 | 4.63 |
| Chest girth (cm) | 109.27 | 4.35 | 3.98 |
| Withers height (cm) | 58.78 | 3.30 | 5.61 |
| Chest depth (cm) | 35.12 | 2.88 | 8.19 |
| Chest width (cm) | 30.11 | 2.58 | 8.56 |
| Metacarpus girth (cm) | 17.37 | 0.82 | 4.74 |
Comparison of the ensemble model performances in terms of , RMSE, MAE, and MAPE.
| Algorithm | On Training Dataset | On Testing Dataset | ||||||
|---|---|---|---|---|---|---|---|---|
|
| RMSE | MAE | MAPE |
| RMSE | MAE | MAPE | |
| VotingRegressor | 0.394 | 5.026 | 4.172 | 4.150 | 0.328 | 5.436 | 4.594 | 4.573 |
| BaggingRegressor | 0.300 | 5.403 | 4.432 | 4.399 | 0.303 | 5.539 | 4.504 | 4.487 |
| StackingRegressor | 0.377 | 5.095 | 3.803 | 3.803 | 0.352 | 5.339 | 4.331 | 4.296 |
Comparison of the model performances in terms of , RMSE, MAE, and MAPE.
| Algorithm | On Training Dataset | On Testing Dataset | ||||||
|---|---|---|---|---|---|---|---|---|
|
| RMSE | MAE | MAPE |
| RMSE | MAE | MAPE | |
| RandomForestRegressor | 0.652 | 3.811 | 3.125 | 3.101 | 0.264 | 5.688 | 4.798 | 4.777 |
| ExtraTreesRegressor | 0.588 | 4.145 | 3.389 | 3.362 | 0.247 | 5.755 | 4.903 | 4.881 |
| KNeighborsRegressor | 0.443 | 4.817 | 3.851 | 3.828 | 0.232 | 5.812 | 4.884 | 4.858 |
| LinearRegression | 0.313 | 5.354 | 4.431 | 4.405 | 0.282 | 5.619 | 4.607 | 4.592 |
| GradientBoostingRegressor | 0.756 | 3.192 | 2.572 | 2.551 | 0.260 | 5.706 | 4.757 | 4.701 |
| AdaBoostRegressor | 0.571 | 4.229 | 3.725 | 3.674 | 0.224 | 5.842 | 4.865 | 4.823 |
| RidgeCV | 0.307 | 5.374 | 4.437 | 4.410 | 0.297 | 5.561 | 4.533 | 4.521 |
| LassoCV | 0.299 | 5.408 | 4.465 | 4.438 | 0.301 | 5.545 | 4.542 | 4.532 |
| LassoLarsCV | 0.271 | 5.514 | 4.609 | 4.585 | 0.269 | 5.670 | 4.704 | 4.698 |
| BayesianRidge | 0.272 | 5.508 | 4.530 | 4.504 | 0.305 | 5.528 | 4.577 | 4.566 |
| TheilSenRegressor | 0.275 | 5.498 | 4.481 | 4.467 | 0.208 | 5.901 | 4.822 | 4.808 |
| XGBRegressor | 0.714 | 3.454 | 2.820 | 2.768 | 0.248 | 5.751 | 4.748 | 4.675 |
| LGBMRegressor | 0.801 | 2.877 | 2.239 | 2.222 | 0.270 | 5.667 | 4.720 | 4.667 |
| CatBoostRegressor | 0.786 | 2.986 | 2.422 | 2.408 | 0.288 | 5.596 | 4.692 | 4.658 |
Results of 10-fold cross-validation for the most efficient algorithms on the test dataset. SD () is the standard deviation.
| Algorithm |
| RMSE | MAE | MAPE | ||||
|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| StackingRegressor | 0.369 | 0.027 | 5.226 | 0.037 | 4.319 | 0.028 | 4.281 | 0.019 |
Figure 4Feature importance identified by the StackingRegressor algorithm.