Literature DB >> 33005668

Classification and prediction of milk yield level for Holstein Friesian cattle using parametric and non-parametric statistical classification models.

Hend Radwan1, Hadeel El Qaliouby2, Eman Abo Elfadl1.   

Abstract

OBJECTIVE: The objective of this study was to assess the veracities of most admired strategy discriminant analysis (DA), in comparison to the artificial neural network (ANN) for the anticipation and classification of milk production level in Holstein Friesian cattle using their performances.
MATERIALS AND METHODS: A total of 3,460 performance records of imported and locally born Holstein Friesian cows were gathered during the period from 2000 to 2016 to compare two alternative techniques for predicting the level of production based on performance traits in dairy cattle with the use of statistical software (Statistical Package for the Social Sciences, version 20.0).
RESULTS: The findings of the comparison indicated that ANN was more impressive in the expectancy of milk production level than did an imitator statistical method based on DA. The accuracy of the ANN model was high for the winter season (79.5%), whereas it was 47.3% for DA. The current findings were assured via the areas under receiver operating characteristic curves (AUROC) for DA and ANN. AUROC curves were smaller in the condition of the DA model across different calving seasons compared with the ANN model. The inaccuracies of variations were significant at a 5% significance level utilizing paired sample t-test.
CONCLUSION: ANN model can be used efficiently to predict the level of production across the different calving seasons compared to the DA model. Copyright: © Journal of Advanced Veterinary and Animal Research.

Entities:  

Keywords:  AUROC curves; Artificial neural network; discriminant analysis; milk production level

Year:  2020        PMID: 33005668      PMCID: PMC7521821          DOI: 10.5455/javar.2020.g438

Source DB:  PubMed          Journal:  J Adv Vet Anim Res        ISSN: 2311-7710


Introduction

Dairy cattle are most prevalent in humid and colder regions of temperate zones; global milk production reached 852 million tonnes in 2019, by geographical distribution, Asia recorded the most significant expansion followed by Europe, North and South Americas, and Africa and stagnated in Oceania [1]. Prevailing of the Friesian breed across the world is the leading cause of the prosperity of the selection for milk yield, as explained by Hammoud et al. [2]. Holstein Friesian cattle were introduced in many countries all over the world because it can perform and thrive well under adverse conditions and maintain higher production levels [3]. The prosperity of the cattle herd specializing in milk production rests on the average milk and reproductive performance of animals. The production of milk from a cow is the result of the interaction of both the genetic makeup of the cow and the environmental factors at the scheduled time and age, as explained by Yadav et al. [4]. It is, therefore, crucial to provide information on non-genetic factors throughout the genetic assessment of performance characteristics in milk-specified cows [5]. Non-genetic factors are classified into environmental factors with non-measurable effects such as diseases and ecological factors with measurable impacts such as the stage of lactation, age of the cow, parity, season, and year of calving, which are essential in formulating breeding programs [6]. In dairy cattle production, the analysis and expectation of milk production level are crucial; in so much of genetically manipulated selection, outstanding stud sires are dependent on their capability to produce female calves with a high genetic potential in milk production. Consequently, as expeditiously as these sires likely stipulated, the expeditiously the semen collection and insemination of cows can follow [7,8]. For classification and anticipation of milk production levels in dairy cattle, some methods are available such as different data mining methods, genetic algorithms, discriminant analysis (DA), decision trees, regression techniques, and artificial neural network (ANN) models [9]. DA is applied by estimating the weights for all independent predictors to magnify the disparities among different variance groups compared to intragroup variability [10]. DA is dependent on two hypotheses. The initial hypothesis, though, is the allotments of independent predictors which are normal, which supports the further use of quantitative data in the statistical model instead of categorical data. The other hypothesis concerns DA only, in which the covariance templates believed to be even for the various sets of observations [11]. The primary defect in the case of DA is occasionally insufficient for the minimization of the error and discriminating process. That is why the ANN can proceed correctly and make partitioning in the same way as DA and overcome any troubles that might occur [11]. ANN is a powerful instrument for system modeling in many applications [12] and one of the most popular models which can be used for prediction and classification. The structure of this model is inspired by neural networks of the human brain, as described by Heydari et al. [9]. ANN is formed of multiple computing modules called artificial neurons, and they linked with each other by connections. Artificial neurons are doing their job as summing and nonlinear mapping junctions [11]. ANN made up of three units or layers, a layer of “input” units which receive the measurement vector X and attached to a layer of “hidden” units, in which there is splitting for the input zone into two quasi spaces, which is related to a layer of “output” units [13,14]. By incorporating such semispaces, the units of the output layer can form any polygonal partition of the input space, as stated by Teshnizi and Ayatollahi [13]. The existing study was intended to predict the rate of milk yield by comparing two alternative models, such as DA, with the ANN in Holstein Friesian cows.

Materials and Methods

Ethical statement

The records were collected after necessary approval and under the supervision of the farm administration and their presence during records collection.

Data source

The original set consisted of the production records of 3,460 lactation records prevalent to 991 Holstein Friesian cows that had calved during 2000–2016. These cows were represented the daughters of 99 sires and 691 dams. The data were recorded by the computer program system (Dairy cattle comp). Data with missing and wrong information and/or sires that have less than five daughters were excluded from the data set. Data sets were collected from Alexandria-Copenhagen Company, about 76 km from Alexandria Province.

Herd management

Animals were divided according to their average milk production per day and fed on silage mixed with concentrate ration. Feed accessibility was versified according to physiological status and level of milk production, as proposed by Nutrient Requirements of Dairy Cattle [15]. Water was available ad libitum. Robotic milking occurred three times a day, with 8-h intervals between milking. Drying off was applied when pregnant cows entered the late stage of pregnancy. Milk was collected, weighed, and recorded individually. Heifers were bred artificially for the first time when their bodyweight reached 350 kg at 18 months of age, whichever came first. Heifers or cows were liable for rectal pregnancy diagnosis at 45–60 days’ post insemination, and heifers or cows that fail to procreate were inoculated again in the following standing heat period.

Studied variables and statistical analysis

The variables used in the current study were independent including age at first calving (month), days in milk first heat (day), parity order, total milk yield (kg), 305-day mature equivalent (kg), number of breedings per conception, days open (day), calving season, and dependent including the level of milk production. DA is used to divide the data into two or more divisions. The equation of the discrimination is linearly composed of the two or more independent predictors that differentiate better among the divisions of the sets predictors, as reviewed by Abo Elfadl and Abdalla [11]. The statistical analyses by DA were performed using statistical software (Statistical Package for the Social Sciences [SPSS], 20.0 version), according to Hair et al. [16]. where Z = discriminant Z-score of discriminant function j for k, a = intercept, W = discriminant weight for independent variable i, X = independent variable i for k. ANN is network computation formed of a dense mesh of computing units and connections. The strength of the connection is numerically phrased as a weight or synaptic weight, as stated by Abo Elfadl and Abdalla [11] and Nguyen et al. [14]. The incommensurate node numbers in the entry and exit layers are prescribed due to data structure. The number of invisible nodes dramatically increases the learning ability of the network, which may lead to overfitting of the data as described by Parsaeian et al. [17]. All factors were analyzed by ANN utilizing statistical software (SPSS, version 20.0) using the following functions. ANN function has the following form: where, Y(c) = takes real value and takes the range (-1, +1). Sigmoid function form was: Y(c) = takes real value and takes the range (0, 1). Herein, DA and ANN were carried out to check the significance of performance traits to predict the production level of the cow, where the production level was coded as following 1 = low, 2 = medium, and 3 = high. A paired sample t-test was applied to compare the classification accuracies of DA and ANN models after testing the normality of these accuracies using the one-sample Kolmogorov–Smirnov test. The comparisons of classification results obtained from models were using receiver operating characteristic curves, according to Abo Elfadl and Abdalla [11].

Results

The results of Table 1 represented the percentage classification accuracies showed by ANN and DA. The fourth column displayed accuracy distinction between both models.
Table 1.

Overall correctly classified accuracies via ANN and DA models.

Calving season Overall classified ANN model Overall classified DA model ANN–DA
Winter 79.5% 47.3% +32.2
Spring 79.4% 57.5% +21.9
Summer 79.4% 54.7% +24.7
Autumn 77.2% 54.4% +22.8
The highest classification accuracy of ANN models was shown for the winter season (79.5%) and the lowest for autumn (77.2%). However, the highest classification accuracy for DA was shown for the spring season (57.5%) and the lowest for winter (47.3%), and the accuracies of overall classification had preferred the superiority of ANN up on DA. Notably, for the winter season, the distinction between ANN and DA in the classification accuracy is +32.2. Furthermore, ANN and DA models were significant at a sense point of 0.05 level of significance with p < 0.0001 affiliated with paired sample t-test after testing for a normal distribution of the accuracies applying one-sample Kolmogorov–Smirnov test (0.004) and p-value (0.30). The correctly classified cases of the low, medium, and high levels of production were compared. ANN indicated the highest classification accuracy of the low output for summer (97.6%) and the lowest for the spring season (94.0%); however, DA indicated the highest classification accuracy of high production for spring and summer seasons (100%) and the lowest for autumn season (50.0%). On the contrary, the highest classification accuracy of high production was attained by ANN for the autumn season (16.7%), whereas the lowest for winter, spring, and summer seasons (0%). However, DA had indicated the highest classification accuracy of low production for autumn (59.3%) and the lowest for winter (48.7%). Here, the highest classification accuracy for medium production was obtained by DA (49.1%) for spring season and the lowest for autumn (36.5%); however, the classification accuracy of ANN was the highest for medium production in spring (43.9%) and the lowest in autumn (1.9%) as shown in Table 2.
Table 2.

ANN and DA classification accuracies for different levels of production.

Calving season Production level Correctly classified cases by ANN Correctly classified cases by DA ANN – DA
Winter Low 96.2% 48.7% +47.5
Medium 22.4% 41.2% −18.8
High 0% 60.0% −60
Spring Low 94.0% 59.0% +35
Medium 43.9% 49.1% −5.2
High 0% 100% −100
Summer Low 97.6% 56.7% +40.9
Medium 27.5% 45.0% −17.5
High 0% 100% −100
Autumn Low 95.5% 59.3% +36.2
Medium 1.9% 36.5% −34.6
High 16.7% 50.0% −33.3
The correctly classified cases were not significantly different for low, medium, and high production at 0.05 significance level utilizing the paired sample t-test. The findings were supported by the areas under receiver operating characteristic curves (AUROC) for ANN and DA. Thither, the AUROCs were applied for differentiating various distinguishing accuracies. As shown in Table 3, AUROC curves were smaller in a state of DA across various seasons relative to ANN models. The differences among such areas were found significant with p-value 0.015 associated via the paired sample t-test. All classification findings and AUROC curves for DA and ANN were morally preferable for ANN.
Table 3.

The AUROC via ANN and DA models.

Calving season AUROC curveANN model AUROC curveDA model ANN–DA
Winter 0.653 0.568 +0.085
Spring 0.777 0.666 +0.111
Summer 0.804 0.548 +0.256
Autumn 0.762 0.634 +0.128
Table 4 shows that the most significant classification predictors for ANN were age at first calving, breedings per conception, and days open with high coefficients (0.534, 0.680, and 0.603, respectively); however, for DA, days open, breedings per conception, and age to first calving (0.854, 0.330, and 0.279, respectively) appeared to be the most important predictors. The discriminant function was defined as follows:
Table 4.

Predictor contribution of ANN and DA.

Best predictor ANN model DA model
Function 1 Function 2 Function 1 Function 2
Age at first calving −1.92 0.534 −0.175 0.279
Breedings per conception −0.680 0.449 0.330 0.126
Days open 0.494 −0.603 0.488 −0.854
Z1 = 4.35-0.680 × breedings per conception + 0.494 × days open -1.92 × age at the first calving Z2= −2.58 + 0.499 × breedings per conception −0.603 × days open + 0.534 × age at the first calving Early calving would associate with a high level of production, as explained in function 1 for both ANN (−1.92) and DA (−0.175), whereas the relationship was direct in the second function. An increased number of breedings per conception associated by increment production level of both ANN and AD either in function 1 or 2 except in function 1 in the ANN model was a reverse relationship (−0.680). Too long days open was associated with an increased level of production in the first equation for both ANN and AD to be 0.494 and 0.488, respectively. However, in the second equation, the relationship between the level of production and days open was reverse that short days open followed by an increasing milk yield for both ANN and AD is −0.603 and −0.854, respectively.

Discussion

The primary purpose of the current work was to judge the competence of ANN and DA models in the classification and prediction of milk yield level of Holstein Friesian cattle. The current results exhibited that the ANN was more efficient than did DA in the expectation and partitioning of production level. That is because the presumptions were associated with DA as in DA, authors supposed the allotment of both variables as usual. These findings had been supported by numerous studies that have been approved that DA is fit to data of various types. However, as a matter of fact, some of the study predictors were non-distributor normal. Hence, these predictors had a brilliant impact on the DA findings. Somewhat of an infringement of these presumptions is prevalent and shows up to have little effect on results [18]. Another study performed by Görgülü [12] deduced that ANN could be used as a substitute to the multiple regression model to forecast milk production corrected at 305 days’ lactation period in Brown Swiss dairy cattle and cumulative milk yield in crossbred cattle [19]. Indeed, Abo Elfadl and Abdalla [11] concluded that DA model input is not adequate for the classification and anticipation of Friesian cow’s fertility status. Similarly, many authors compared ANN method to classical statistical methods such as fuzzy logic [20], K-means for clustering milk-producing cattle [18], and multiple linear regressions for the anticipation of body weight in hair-bearing goats [21]. They found that ANN approach has a better performance in prediction, especially when the association between variables is complicated. In addition, Behzadi and Aslaminej [22] reported that ANN had been used for both classification and prediction data in several knowledge fields. Chaturvedi et al. [23] stated that ANN is a strong predictor of future milk production, relying on early expressed merits. In a research conducted by Ali et al. [24] to compare neural networks to traditional statistical approaches, they found that the ANN made the best of testing the most complicated associations between input and output variables. Classification accuracies were higher in the ANN model than DA, whether in the case of both quantitative and qualitative independent predictors or only quantitative independent predictors. Due to the ANN unaffected by the type of distribution of the predictors, these results are in agreement with Iyer et al. [10]. In a previous paper conducted by Abo Elfadl and Abdalla [11], the classification accuracy findings established that ANN is more competent than did DA in phrasing all accuracies of classification and correctly classified cases. This finding conforms to the results of the present study, mainly, for the winter season as the difference in classification accuracy between two models was +32.2 with highly significant at p-value < 0.0001 using paired t-test. In the same way, Iyer et al. [10] published that ANN models produced righteous classification findings, with higher accuracies than those obtained by the DA model. In the same connection, Parsaeian et al. [17] stated that ANN could be used as an excellent predictive tool to calculate the accuracy of ANN for invisible data. In the same line with the current study, Torshizi [25] reported that both the feeding schedule and the relative humidity are the critical factors for calving season definition; hot weather seasons have a passive effect on milk characteristics, especially fat yield. As proposed by this author, cows that calved in autumn and winter seasons have average peak milk production more than ones calved in spring and summer seasons. These findings were also supported by the AUROC curves for two comparison models. AUROC curves were smaller for DA models across different seasons than did ANN models. Favoring to predict the level of milk production by ANN over DA agrees with the study of Abo Elfadl and Abdalla [11]. A different opinion, however, was reported by Blackard and Dean [26], and they argued that both the DA and ANN methods produced inadequate models in the classification process of the data. Age below 23 months, at first calving, evinces being the best agreeable alternative for good heifers rearing with perfect reproduction and performance rates [27]. In the same line with the primary function for both ANN and DA, Nilforooshan and Edriss [28] reported a negative impact of the initial age of calving on Holstein’s milk production at the age of 21months. At the same time, lactation yield would increase from 21 to 24 months, but if age at first calving becomes more than 24 months, this will associate with a decrease in milk yield. Nevertheless, Yadav et al. [4] declared that age at the first calving is not considered to be one of the best predictors of the total milk yield in crossbred cows using multiple linear regressions. The significant impact of the number of breedings per conception on the level of milk production coincided with the study of Abass [29], who reported that an increase in the level of milk yield of the cow up to 9,200.61 kg would be associated with an increased number of breedings per conception for up to six services per cow. Similarly, Rajala-Schultz and Frazer [30] reported a strong effect of the number of breedings per conception on milk production that the number of breedings per conception decreased in high producing cows than low producing ones. Regarding days open, Ali et al. [31] stated that an increased length of days open would be associated with an increased total milk yield. Still, overtime, this would reduce the number of calves per cow and decrease the production of milk per day in herd life. Finally, in a nutshell, this work assessed the efficiency of the ANN model to classify and expect the level of milk production in Holstein Friesians compared to a model reliant on DA. Numerous studies should be implicated in animal and poultry production science to use the other neural network models and compare these models to test whether one approach over the others or not.

Conclusion

In this paper, the results of DA and ANN were compared. The findings indicate that DA is an inadequate model for data classification in comparison with ANN, which provided a better classification and prediction of milk production levels across various calving seasons. Furthermore, first calving age, breedings per conception, and calving–conception interval were the best predictors to estimate and predict milk production levels in Holstein Friesian cattle.
  7 in total

1.  Reproductive performance in Ohio dairy herds in the 1990s.

Authors:  P J Rajala-Schultz; G S Frazer
Journal:  Anim Reprod Sci       Date:  2003-04-15       Impact factor: 2.145

2.  Effect of age at first calving on some productive and longevity traits in Iranian Holsteins of the Isfahan province.

Authors:  M A Nilforooshan; M A Edriss
Journal:  J Dairy Sci       Date:  2004-07       Impact factor: 4.034

3.  Effect of prepubertal and postpubertal growth and age at first calving on production and reproduction traits during the first 3 lactations in Holstein dairy cattle.

Authors:  L Krpálková; V E Cabrera; M Vacek; M Stípková; L Stádník; P Crump
Journal:  J Dairy Sci       Date:  2014-03-05       Impact factor: 4.034

4.  Comparison of artificial neural networks with logistic regression for detection of obesity.

Authors:  Seyed Taghi Heydari; Seyed Mohammad Taghi Ayatollahi; Najaf Zare
Journal:  J Med Syst       Date:  2011-05-10       Impact factor: 4.460

5.  Effects of season and age at first calving on genetic and phenotypic characteristics of lactation curve parameters in Holstein cows.

Authors:  Mahdi Elahi Torshizi
Journal:  J Anim Sci Technol       Date:  2016-02-20

6.  Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

Authors:  M Parsaeian; K Mohammad; M Mahmoudi; H Zeraati
Journal:  Iran J Public Health       Date:  2012-06-30       Impact factor: 1.429

7.  A Comparison of Logistic Regression Model and Artificial Neural Networks in Predicting of Student's Academic Failure.

Authors:  Saeed Hosseini Teshnizi; Sayyed Mohhamad Taghi Ayatollahi
Journal:  Acta Inform Med       Date:  2015-10-05
  7 in total
  1 in total

Review 1.  Historical Evolution of Cattle Management and Herd Health of Dairy Farms in OECD Countries.

Authors:  Ivo Medeiros; Aitor Fernandez-Novo; Susana Astiz; João Simões
Journal:  Vet Sci       Date:  2022-03-09
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.