| Literature DB >> 35864287 |
Xianjiang Chen1,2, Huiru Zheng3, Haiying Wang4, Tianhai Yan5.
Abstract
This study aims to compare the performance of multiple linear regression and machine learning algorithms for predicting manure nitrogen excretion in lactating dairy cows, and to develop new machine learning prediction models for MN excretion. Dataset used were collated from 43 total diet digestibility studies with 951 lactating dairy cows. Prediction models for MN were developed and evaluated using MLR technique and three machine learning algorithms, artificial neural networks, random forest regression and support vector regression. The ANN model produced a lower RMSE and a higher CCC, compared to the MLR, RFR and SVR model, in the tenfold cross validation. Meanwhile, a hybrid knowledge-based and data-driven approach was developed and implemented to selecting features in this study. Results showed that the performance of ANN models were greatly improved by the turning process of selection of features and learning algorithms. The proposed new ANN models for prediction of MN were developed using nitrogen intake as the primary predictor. Alternative models were also developed based on live weight and milk yield for use in the condition where nitrogen intake data are not available (e.g., in some commercial farms). These new models provide benchmark information for prediction and mitigation of nitrogen excretion under typical dairy production conditions managed within grassland-based dairy systems.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35864287 PMCID: PMC9304409 DOI: 10.1038/s41598-022-16490-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1The variance inflation factors (VIF) score of features selected based on the training dataset. The features included N intake (NI), diet N content (DNC), milk yield (MY), forage proportion (FP), live weight (LW) and diet metabolizable energy content (DMEC).
Predictive performance of different modeling approaches for manure N output using stepwise and variance inflating factors (VIF) as feature selection methods.
| Models1 | RMSE2 | CCC3 | ||
|---|---|---|---|---|
| Stepwise4 | VIF5 | Stepwise4 | VIF5 | |
| MLR | 44.7b | / | 0.60ab | / |
| RFR | 46.8b | 38.3b | 0.58a | 0.68b |
| SVR | 44.9b | 45.3c | 0.64b | 0.63a |
| ANN | 34.7a | 28.5a | 0.70c | 0.78c |
a,b,cMeans within a column with different superscripts differ (P < 0.05).
1MLR multiple linear regression; RFR random forests regression; SVR support vector regression; ANN artificial neural network.
2RMSE root mean square error (obtained by tenfold cross validation).
3CCC concordance correlation coefficients (obtained by tenfold cross validation).
4The features selected by using stepwise methods were NI (N intake), LW (live weight) and MY (milk yield).
5The features selected by using variance inflating factors (VIF) method were NI (N intake), LW (live weight), MY (milk yield), FP (forage proportion), DNC (diet N concentration) and DMEC (diet metabolizable energy concentration).
6The significance was determined by one-way analysis of variance and followed by Tukey’s Honest Significant Difference (HSD) test (n = 10, α = 0.05).
Figure 2The relationship between actual and residual (predicted–actual) manure N output of dairy cows with predicted manure N performed by models developed using the multiple linear regression (MLR) and artificial neural network (ANN), respectively.
Prediction accuracy of ANN model for predicting manure N output affected by network structure.
| Network structure1 | RRMSE2 | |||
|---|---|---|---|---|
| Minimum | Maximum | Median | Mean | |
| 6–1(1)–1 | 8.40 | 9.01 | 8.80 | 8.54 |
| 6–1(2)–1 | 8.23 | 8.93 | 8.73 | 8.48 |
| 6–1(3)–1 | 8.11 | 8.83 | 8.52 | 8.44 |
| 6–1(4)–1 | 8.12 | 8.63 | 8.31 | 8.35 |
| 6–1(5)–1 | 8.14 | 8.95 | 8.32 | 8.40 |
| 6–1(6)–1 | 8.07 | 8.79 | 8.28 | 8.32 |
| 6–2(2,1)–1 | 8.29 | 8.60 | 8.49 | 8.46 |
| 6–2(2,2)–1 | 8.15 | 8.48 | 8.41 | 8.39 |
| 6–2(2,3)–1 | 8.14 | 8.36 | 8.32 | 8.26 |
| 6–2(2,4)–1 | 8.21 | 8.29 | 8.24 | 8.25 |
| 6–2(2,5)–1 | 8.24 | 8.84 | 8.32 | 8.36 |
| 6–2(2,6)–1 | 8.22 | 8.33 | 8.21 | 8.23 |
| 6–2(3,1)–1 | 8.08 | 8.60 | 8.35 | 8.33 |
| 6–2(3,2)–1 | 8.10 | 8.61 | 8.23 | 8.32 |
| 6–2(3,3)–1 | 8.17 | 8.64 | 8.41 | 8.39 |
| 6–2(3,4)–1 | 8.15 | 8.38 | 8.18 | 8.20 |
| 6–2(3,5)–1 | 8.12 | 8.36 | 8.15 | 8.18 |
| 6–2(3,6)–1 | 8.01 | 8.15 | 8.11 | 8.04 |
| 6–2(4,1)–1 | 8.22 | 8.89 | 8.50 | 8.52 |
| 6–2(4,2)–1 | 8.07 | 8.76 | 8.33 | 8.34 |
| 6–2(4,3)–1 | 8.11 | 11.22 | 8.64 | 8.79 |
| 6–2(4,4)–1 | 8.07 | 8.68 | 8.30 | 8.34 |
| 6–2(4,5)–1 | 8.17 | 8.89 | 8.35 | 8.42 |
| 6–2(4,6)–1 | 8.13 | 8.50 | 8.28 | 8.28 |
| 6–2(5,1)–1 | 7.90 | 9.26 | 8.24 | 8.38 |
| 6–2(5,2)–1 | 7.91 | 8.50 | 8.25 | 8.25 |
| 6–2(5,3)–1 | 8.07 | 8.79 | 8.28 | 8.32 |
| 6–2(5,4)–1 | 8.05 | 8.56 | 8.36 | 8.33 |
| 6–2(5,5)–1 | 7.94 | 8.82 | 8.40 | 8.37 |
| 6–2(5,6)–1 | 8.12 | 8.52 | 8.23 | 8.30 |
| 6–2(6,1)–1 | 8.20 | 8.91 | 8.40 | 8.47 |
| 6–2(6,2)–1 | 8.09 | 8.64 | 8.22 | 8.25 |
| 6–2(6,3)–1 | 7.90 | 8.80 | 8.40 | 8.40 |
| 6–2(6,4)–1 | 8.11 | 8.69 | 8.27 | 8.30 |
| 6–2(6,5)–1 | 8.09 | 9.07 | 8.35 | 8.45 |
| 6–2(6,6)–1 | 8.09 | 8.57 | 8.23 | 8.26 |
| 6–3(1,3,6)–1 | 8.17 | 8.79 | 8.25 | 8.35 |
| 6–3(3,1,6)–1 | 8.13 | 8.76 | 8.23 | 8.26 |
| 6–3(6,3,1)–1 | 8.07 | 8.85 | 8.27 | 8.33 |
1The network structure is denoted as: input layer nodes—hidden layers (nodes in each hidden layer)—output layer nodes. The input layer nodes are N intake (NI), diet N concentration (DNC), milk yield (MY), forage proportion (FP), live weight (LW) and diet metabolizable energy concentration (DMEC). The output layer node is manure N (MN).
2RRMSE relative root mean square error (obtained by tenfold cross validation).
Turning of ANN models for manure N output with selected features.
| Algorithm | Learning ratesa | Thresholdb | RRMSEc |
|---|---|---|---|
| Backpropagation | 0.01 | 0.05 | 8.92a |
| 0.001 | 0.05 | 8.60b | |
| 0.0005 | 0.05 | 8.61b | |
| 0.0001 | 0.05 | 8.63b | |
| 0.00005 | 0.05 | 8.59b | |
| 0.00001 | 0.05 | 8.57b | |
| 0.00001 | 0.01 | 8.47c | |
| Resilient backpropagation with weight backtracking | Minus = 0.5, plus = 1.2 | 0.01 | 9.19a |
| Minus = 0.5, plus = 1.5 | 0.01 | 8.98a | |
| Minus = 0.4, plus = 1.2 | 0.01 | 8.75b | |
| Minus = 0.3, plus = 1.2 | 0.01 | 8.76b | |
| Minus = 0.3, plus = 1.1 | 0.01 | 8.76b |
The features selected are N intake (NI), diet N concentration (DNC), milk yield (MY), forage proportion (FP), live weight (LW) and diet metabolizable energy concentration (DMEC).
aLearning rate is a numeric value specifying the learning rate used for backpropagation algorithm. For resilient backpropagation with weight backtracking algorithm it’s a vector or a list containing the multiplication factors for the upper and lower learning rate and defined by function learningrate.factor.
bThreshold for the partial derivatives of the error function as stopping criteria.
cRRMSE relative root mean square error (obtained by tenfold cross validation). a,b Means within a column with different superscripts differ (P < 0.05). The significance was determined by one-way analysis of variation and followed by Tukey’s Honest Significant Difference (HSD) test (n = 10, α = 0.05).
Influence of features selected on the ANN model performance.
| Features1 | RRMSE2 | SD3 |
|---|---|---|
| NI + LW + MY + FP + DNC + DMEC | 8.48b | 0.53 |
| LW + MY + FP + DNC + DMEC | 12.6a | 1.08 |
| NI + MY + FP + DNC + DMEC | 8.49b | 0.80 |
| NI + LW + FP + DNC + DMEC | 8.62b | 0.64 |
| NI + FP + DNC + DMEC | 8.84b | 0.65 |
| NI + LW + MY + FP + DNC | 8.76b | 0.73 |
| NI + LW + MY + FP + DMEC | 8.46b | 0.89 |
| NI + LW + MY + DNC + DMEC | 8.45 b | 0.89 |
| NI + LW + MY + DNC | 8.99b | 0.56 |
| NI + LW + MY + FP | 8.95b | 0.62 |
| NI + LW + MY + DMEC | 8.68b | 0.49 |
1The learning algorithm = resilient backpropagation with weight backtracking; learningrate.factor = list (minus = 0.4, plus = 1.2). NI N intake; DNC diet N concentration; MY milk yield; FP forage proportion; LW live weight; DMEC diet metabolizable energy concentration.
2 RRMSE relative root mean square error (obtained by tenfold cross validation). a,b Means within a column with different superscripts differ (P < 0.05). The significance was determined by one-way analysis of variance and followed by Tukey’s Honest Significant Difference (HSD) test (n = 10, α = 0.05).
3SD standard deviation.
Optimized weights of connections and biases of the input nodes to nodes on hidden layer one of ANN model using NI as the primary predictor based on the combined data of the present training and testing datasets.
| Nodes on hidden layer one | |||
|---|---|---|---|
| 1 | 2 | 3 | |
| NI | 0.032 | − 1.640 | − 1.869 |
| LW | − 0.125 | − 1.684 | 0.474 |
| MY | 1.200 | 1.599 | − 1.241 |
| FP | − 0.253 | 1.344 | − 0.490 |
| DNC | − 1.968 | 0.413 | 0.456 |
| DMEC | − 0.5738 | 3.551 | − 0.519 |
| Bias to nodes on hidden layer one | − 0.859 | − 0.749 | 0.432 |
The ANN model is a feed-forward network with 6 input nodes, 2 hidden layers (the first layer with 3, and the second one with 6 hidden neurons).
aNI N intake; DNC diet N concentration; MY milk yield; FP forage proportion; LW live weight; DMEC diet metabolizable energy concentration.
Optimized weights of connections of the nodes on hidden layer one to two and biases to nodes on hidden layer two and output node of ANN model using NI as the primary predictor based on the combined data of the present training and testing datasets.
| Nodes on hidden layer two | Bias to MN | ||||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | ||
| 1 | − 0.528 | − 0.024 | − 0.881 | − 2.922 | 1.356 | − 1.106 | – |
| 2 | 0.044 | − 0.134 | 0.485 | − 1.170 | 1.294 | − 0.833 | – |
| 3 | 0.109 | − 1.334 | 0.119 | − 1.989 | 2.426 | 0.338 | – |
| Output node | |||||||
| MN | − 1.050 | 0.436 | 1.026 | 2.978 | − 1.551 | − 0.155 | 1.118 |
| Bias to nodes on hidden layer 2 | 0.321 | − 0.361 | 0.166 | − 0.367 | − 0.604 | 0.090 | – |
The ANN model is a feed-forward network with 6 input nodes, 2 hidden layers (the first layer with 3, and the second one with 6 hidden nodes). MN manure N.
Optimized weights of connections and biases of the input nodes to nodes on hidden layer one of ANN model using LW and MY as primary predictors based on the combined data of the present training and testing datasets.
| Nodes on hidden layer one | ||||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| LW | 0.0001 | − 0.567 | − 0.165 | − 1.961 |
| MY | − 12.200 | − 4.362 | − 0.251 | − 2.137 |
| DNC | − 2.600 | 3.271 | − 1.404 | 2.035 |
| CDMI | 2.499 | 0.685 | − 0.194 | − 0.305 |
| DMEC | 5.571 | 4.780 | 1.625 | 1.419 |
| Bias to nodes on hidden layer one | 0.461 | − 1.285 | 0.082 | − 0.286 |
The ANN model is a feed-forward network with 5 input nodes, 2 hidden layers (the first layer with 4, and the second one with 2 hidden nodes).
aLW live weight; MY milk yield; DNC diet N concentration; CDMI concentrate dry matter intake; DMEC diet metabolizable energy concentration.
Optimized weights of connections of the nodes on hidden layer 1 to 2 and biases to nodes on hidden layer 2 and output node of ANN model using LW and MY as primary predictors based on the combined data of the present training and testing datasets.
| Nodes on hidden layer two | Bias to MN | ||
|---|---|---|---|
| 1 | 2 | ||
| 1 | 8.171 | − 0.290 | – |
| 2 | − 1.788 | 1.386 | – |
| 3 | 8.982 | − 1.294 | – |
| 4 | 4.281 | − 1.354 | – |
| MN | − 0.799 | 1.966 | − 0.089 |
| Bias to nodes on hidden layer 2 | − 2.607 | 0.763 | – |
The ANN model is a feed-forward network with 5 input nodes, 2 hidden layers (the first layer with 4, and the second one with 2 hidden nodes). MN manure N.
Predictive performance of the ANN models for prediction of manure N output using the whole dataset.
| Primary predictors | Features1 | R2 | RMSE2 | RRMSE3 | CCC4 |
|---|---|---|---|---|---|
| NI | NI + LW + MY + FP + DNC + DMEC | 0.83 | 32.1 ± 1.68 | 10.9 ± 0.44 | 0.76 ± 0.025 |
| LW and MY | LW + MY + DNC + CDMI + DMEC | 0.79 | 35.2 ± 1.08 | 12.1 ± 0.47 | 0.70 ± 0.021 |
1NI N intake; DNC diet N concentration; MY milk yield; FP forage proportion; LW live weight; DMEC diet metabolizable energy concentration; CDMI concentrate dry matter intake.
2RMSE root mean square error (obtained by tenfold cross validation), mean ± standard deviation.
3RRMSE relative root mean square error (obtained by tenfold cross validation), mean ± standard deviation.
4CCC concordance correlation coefficients (obtained by tenfold cross validation), mean ± standard deviation.
Information on experiment, animal and forage types in the training and testing datasets of dairy cows used in the present study.
| Training dataset | Test dataset | |
|---|---|---|
| Years of experiments | 1990–2002 | 2005–2015 |
| Number of experiments | 27 | 16 |
| Number of individual cow data | 564 | 387 |
| Holstein–Friesian | 534 | 269 |
| Othersa | 30 | 118 |
| Forage typesb | GS, FG | GS, MS, WCW |
aIncluding Holstein crossbreds, Norwegian and Swedish Red.
bGS grass silage; FG fresh grass; MS maize silage; WCW whole crop wheat silage.
Descriptive statistics of animal, dietary and nitrogen utilization variables in the present study.
| Features1 | Unit3 | Abbreviation | Training dataset | Test dataset | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mean | SD2 | Minimum | Maximum | Mean | SD2 | Minimum | Maximum | |||
| Live weight | kg | LW | 564 | 65.3 | 385 | 781 | 549 | 70.1 | 379 | 757 |
| Milk yield | kg/d | MY | 21.4 | 6.61 | 6.10 | 49.1 | 23.6 | 7.16 | 5.87 | 48.8 |
| Energy-corrected milk yield | kg/d | ECMY | 21.8 | 6.70 | 5.53 | 45.6 | 24.0 | 6.50 | 5.10 | 49.5 |
| Forage DMI | kg/d | FDMI | 9.33 | 2.79 | 2.96 | 18.9 | 10.0 | 2.80 | 3.60 | 16.8 |
| Concentrate DMI | kg/d | CDMI | 7.08 | 3.51 | 0 | 16.9 | 8.18 | 2.99 | 3.21 | 16.0 |
| Total DMI | kg/d | TDMI | 16.4 | 3.02 | 7.54 | 24.3 | 18.2 | 2.92 | 10.8 | 26.6 |
| Forage proportion | kg/kg DM | FP | 0.58 | 0.183 | 0.21 | 1.00 | 0.55 | 0.142 | 0.31 | 0.79 |
| Diet N concentration | g/kg DM | DNC | 29.3 | 4.15 | 17.0 | 43.3 | 27.8 | 4.02 | 18.0 | 43.0 |
| Diet ME concentration | MJ/kg DM | DMEC | 12.1 | 0.92 | 9.89 | 19.1 | 12.1 | 0.82 | 9.68 | 14.1 |
| N intake | g/d | NI | 486 | 129.6 | 155 | 874 | 506 | 106.6 | 228 | 798 |
| Feces N output | g/d | FN | 142 | 36.1 | 48.4 | 241 | 159 | 32.9 | 73.7 | 284 |
| Urine N output | g/d | UN | 209 | 69.1 | 69.6 | 452 | 178 | 61.4 | 44.7 | 364 |
| Manure N output | g/d | MN | 351 | 97.7 | 130 | 679 | 337 | 77.1 | 159 | 577 |
1DMI dry matter intake; ME metabolizable energy; N nitrogen.
2standard deviation.
3DM dry matter.