| Literature DB >> 32231879 |
Siroj Bakoev1, Lyubov Getmantseva1, Maria Kolosova2, Olga Kostyunina1, Duane R Chartier3, Tatiana V Tatarinova4,5,6,7.
Abstract
Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics. ©2020 Bakoev et al.Entities:
Keywords: Animal behavior; Artificial intelligence; Bioinformatics; Computational biology; Data mining and machine learning; Evolutionary studies; Mathematical biology
Year: 2020 PMID: 32231879 PMCID: PMC7098386 DOI: 10.7717/peerj.8764
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Accuracy for training and testing sets for different ML approaches.
A graphical interpretation of the comparative analysis of the predicted value of all models shows that all models in the training set receive more accurate forecasts than in the test set. At the same time, the RF and KNN models provide high accuracy of prediction relative to other models. To determine the models that achieve the best results in solving the problem after the training procedures and their optimization, a comparative analysis was carried out. Obviously, the indicators obtained by validation are estimates of the ability of the model to predict new observations and these estimates have deviations.
Sample description.
The dataset contains 21,247 females and 3,337 males. 12,195 of Landrace and 12,389 of Large White breeds. Predictors: Average Daily Gain, Backfat Thickness, Muscle, Thickness, Birth Date, Breed, Sex. Dependent variables: scores for front and back legs.
| Average Daily Gain | 0.33 | 0.72 | 0.79 | 0.79 | 0.85 | 1.61 |
| Backfat Thickness | 4.30 | 10.90 | 12.90 | 13.25 | 15.20 | 35.60 |
| Muscle Thickness | 32.12 | 56.04 | 59.70 | 59.68 | 63.40 | 96.00 |
| Birth Date | 2012 | 2014 | 2015 | 2015 | 2016 | 2016 |
| Front Legs | 1.00 | 3.00 | 3.00 | 3.11 | 3.00 | 5.00 |
| Back Legs | 1.00 | 3.00 | 3.00 | 2.99 | 3.00 | 5.00 |
Figure 2Analysis of the collected measurements.
Variables (Average Daily Gain (A, B), Back Fat (C, D), and Muscle Thickness (E, F)). Back Fat (C) has an asymmetric distribution. Concordance analysis of predictors (G) shows a moderate correlation between the three parameters. This indicates that they do not contain redundant information.
Figure 3Relative importance of leg weakness predictors as assessed by Accuracy (A) and Gini (B).
Since the aim of the study is to assess the state (conformation) of legs by means of selected predictors (growth and meat quality), each variable is analyzed with respect to the variable Q2 = “good”. By analyzing the data in this way, one can begin to extract ideas about which variables are most associated with “good” legs. Alternatively, to study the importance of predictors, we use the Random Forest package. All studies algorithms have identified that the most important predictors are Muscle Thickness, Back Fat, Average Daily Gain, while the predictor Breed is not significant.
Comparison between the models using the testing dataset.
ML models were able to predict the state of the fore and hind legs. RF surpassed all other learning algorithms in all respects and scenarios. In some cases, RF did not have significant superiority over KNN. Accordingly, KNN was the second most efficient algorithm among all the characteristics and scenarios.
| RF | 0.8846 | 0.7693 | <2.2e−16 | 0.8232 | 0.9463 |
| KNN | 0.8754 | 0.7509 | 3.238e−16 | 0.8013 | 0.9499 |
| C50Tree | 0.6469 | 0.294 | 0.001603 | 0.5746 | 0.7195 |
| Boost | 0.6035 | 0.207 | 0.09968 | 0.5995 | 0.6075 |
| NNET | 0.5667 | 0.1335 | 0.01852 | 0.5619 | 0.5716 |
| LDA | 0.563 | 0.1258 | 2.343e−05 | 0.5986 | 0.5272 |
| GLM | 0.5624 | 0.1246 | 5.043e−05 | 0.5971 | 0.5275 |
| SVM | 0.5603 | 0.1202 | 3.248e−05 | 0.653 | 0.4671 |
| NB | 0.5411 | 0.0816 | <2.2e−16 | 0.6984 | 0.3832 |
Search results among all the models of non-parametric tests of Friedman and paired comparison of all models.
A comparison was made between all models with the non-parametric Friedman test and a pairwise comparison of all models. The best predictive capabilities in the dataset were shown by the Random Forest approach. In addition, it must be noted that such signs as Muscle Thickness, Back Fat, Average Daily Gain can act as predictors of leg weakness. Information on breed and gender were not significant for assessment the status of legs.
| boosting | arbol | 4.37E−02 | NB | logistic | 2.17E−08 |
| KNN | arbol | 2.17E−08 | NET | arbol | 2.17E−08 |
| KNN | boosting | 2.17E−08 | NET | boosting | 2.17E−08 |
| LDA | arbol | 2.17E−08 | NET | KNN | 2.17E−08 |
| LDA | boosting | 2.17E−08 | NET | LDA | 1.31E−07 |
| LDA | KNN | 2.17E−08 | NET | logistic | 1.31E−07 |
| logistic | arbol | 2.17E−08 | NET | NB | 2.17E−08 |
| logistic | boosting | 2.17E−08 | rf | arbol | 2.17E−08 |
| logistic | KNN | 2.17E−08 | rf | boosting | 2.17E−08 |
| logistic | LDA | 9.30E−02 | rf | KNN | 2.17E−08 |
| NB | arbol | 2.17E−08 | rf | LDA | 2.17E−08 |
| NB | boosting | 2.17E−08 | rf | logistic | 2.17E−08 |
| NB | KNN | 2.17E−08 | rf | NB | 2.17E−08 |
| NB | LDA | 2.17E−08 | rf | NET | 2.17E−08 |
| Friedman rank sum test | |||||
| Friedman chi-squared = 286.85, | |||||