| Literature DB >> 24977201 |
Alberto Gonzalez-Sanchez1, Juan Frausto-Solis2, Waldo Ojeda-Bustamante1.
Abstract
Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5' regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (R). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63).Entities:
Mesh:
Year: 2014 PMID: 24977201 PMCID: PMC4058283 DOI: 10.1155/2014/509429
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Potential attributes in crop datasets.
| Attribute | Attribute description |
|---|---|
| SP | Section (farm location where crop was sowed) |
| IWD | Irrigation water depth applied (mm) |
| SGR | Solar radiation (M-Joules/m2) |
| RF | Rainfall (mm) |
| Max | Maximal temperature (°C) |
| Min | Minimal temperature (°C) |
| RH | Relative humidity in leafs (%) |
Testing and training samples distribution per crop dataset.
| Dataset ID | Crop species | Cultivar | Training period | Training samples | Testing period | Testing samples |
|---|---|---|---|---|---|---|
| PJ01 | Pepper ( | Jalapeno | 1999–2005 | 116 | 2006 | 18 |
| CBP02 | Common bean ( | Peruano | 1999–2006 | 361 | 2007 | 9 |
| CBA03 | Common bean ( | Azufrado | 1999–2006 | 120 | 2007 | 21 |
| CBM04 | Common bean ( | Mayocoba | 1999–2006 | 332 | 2007 | 27 |
| CP05 | Corn ( | Pioneer 30G54 | 2000–2005 | 179 | 2006 | 19 |
| PA06 | Potato ( | Alpha | 1999–2006 | 1749 | 2007 | 116 |
| PA07 | Potato ( | Atlantic | 1999–2006 | 1062 | 2007 | 92 |
| TS08 | Tomato ( | Saladette | 1999–2005 | 182 | 2006 | 15 |
Algorithm 1Recursive algorithm to perform the optimal attribute subset search.
RRSE, R, and RMAE measures using the OAS on testing dataset.
| Crop dataset | RRSE (%) |
| RMAE (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MLR | M5′ | ANN | MLR | M5′ | ANN | MLR | M5′ | ANN | |
| PJ01 | 50.69 | 29.29 | 49.62 | 0.87 | 0.96 | 0.88 | 8.63 | 4.56 | 8.27 |
| CBP02 | 52.14 | 58.85 | 58.05 | 0.67 | 0.68 | 0.67 | 5.67 | 6.40 | 6.41 |
| CBA03 | 63.40 | 38.66 | 38.66 | 0.94 | 0.93 | 0.93 | 4.72 | 3.62 | 3.62 |
| CBM04 | 70.53 | 71.20 | 75.04 | 0.69 | 0.59 | 0.58 | 1.30 | 1.59 | 1.58 |
| CP05 | 87.83 | 83.52 | 87.59 | 0.72 | 0.65 | 0.70 | 8.13 | 6.39 | 8.46 |
| PA06 | 95.28 | 74.02 | 86.16 | −0.13 | 0.63 | 0.54 | 25.58 | 20.05 | 23.13 |
| PA07 | 95.84 | 88.14 | 91.24 | 0.60 | 0.51 | 0.45 | 17.78 | 16.42 | 17.40 |
| TS08 | 86.59 | 82.40 | 74.87 | 0.69 | 0.64 | 0.73 | 11.08 | 13.46 | 14.57 |
|
| |||||||||
| Average | 75.29 | 65.76 | 70.15 | 0.63 | 0.70 | 0.72 | 10.36 | 9.06 | 10.43 |
Quantity of crop yield models where attributes appear as optimal.
| Attribute | Regression technique | Average | ||
|---|---|---|---|---|
| MLR | M5′ | ANN | ||
| SP | 4 | 5 | 3 | 4.00 |
| IWD | 8 | 5 | 5 | 6.00 |
| SGR | 6 | 3 | 5 | 4.67 |
| RF | 5 | 2 | 3 | 3.33 |
| Max | 4 | 3 | 3 | 3.33 |
| Min | 5 | 5 | 3 | 4.33 |
| RH | 5 | 6 | 4 | 5.00 |
Figure 1Quantity of occurrences of each attribute in the OAS for each technique (only crop datasets with 2007 testing data).
RRSE, R, and RMAE measures using all the potential attributes.
| Crop dataset | RRSE (%) |
| RMAE (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MLR | M5′ | ANN | MLR | M5′ | ANN | MLR | M5′ | ANN | |
| PJ01 |
|
|
|
|
|
|
|
|
|
| CBP02 |
|
| 124.23 |
|
| 0.64 |
|
| 13.21 |
| CBA03 | 136.96 | 156.29 |
| 0.76 | 0.77 |
| 14.99 | 15.33 |
|
| CBM04 | 470.62 | 262.08 | 350.32 | −0.66 | −0.66 | −0.68 | 11.2 | 6.54 | 8.05 |
| CP05 | 102.68 | 362.5 | 123.61 | 0.36 | 0.08 | 0.54 | 10.12 | 32.25 | 11.75 |
| PA06 |
| 102.87 | 110.24 |
| 0.15 | 0.19 |
| 27.56 | 27.93 |
| PA07 | 110.86 | 165.41 | 113.18 | −0.03 | −0.13 | −0.18 | 20.67 | 37.07 | 24.23 |
| TS08 | 166.86 | 100.56 | 146.6 | 0.45 | 0.28 | 0.09 | 32.83 | 19.95 | 43.57 |
|
| |||||||||
| Average (RRSE < 100) | 94.41 | 74.34 | 75.79 | 0.53 | 0.77 | 0.76 | 16.83 | 8.62 | 8.55 |
| Count (<100) | 3 | 2 | 2 | 3 | 2 | 2 | |||
| Average (all) | 158.9 | 162.3 | 139.97 | 0.31 | 0.25 | 0.26 | 17.54 | 19.49 | 18.23 |
Distance from OAS error measures using the potential attribute set.
| Crop dataset | Distance from optimal (combinations) | ||
|---|---|---|---|
| MLR | M5′ | ||
| PJ01 | 38 | 14 | 18 |
| CBP02 | 80 | 189 | 135 |
| CBA03 | 111 | 118 | 32 |
| CBM04 | 231 | 135 | 216 |
| CP05 | 71 | 206 | 196 |
| PA06 | 23 | 173 | 186 |
| PA07 | 230 | 194 | 213 |
| TS08 | 229 | 69 | 185 |
|
| |||
| Average | 127 | 137 | 148 |
RRSE and R measures using the LAS on testing dataset.
| Crop dataset | RRSE (%) |
| ||||||
|---|---|---|---|---|---|---|---|---|
| SLR | MLR | M5′ | ANN | SLR | MLR | M5′ | ANN | |
| PJ01 | 203.86 |
|
|
| 0.87 |
|
|
|
| CBP02 | 130.52 |
|
|
| 0.66 |
|
|
|
| CBA03 | 98.76 | 136.96 | 112.45 |
| 0.64 | 0.76 | −0.05 |
|
| CBM04 | 479.43 | 306.29 |
|
| −0.67 | 0.66 |
|
|
| CP05 | 103.77 |
|
|
|
|
|
|
|
| PA06 | 102.41 | 102.36 |
| 101.33 | −0.42 | −0.32 |
| 0.11 |
| PA07 | 110.44 |
| 101.31 |
| −0.06 |
| 0.09 |
|
| TS08 | 112.85 |
|
| 137.48 | 0.42 |
|
| 0.69 |
|
| ||||||||
| Average (RRSE < 100) | 98.76 | 82.41 | 80.14 | 74.92 | 0.50 | 0.67 | 0.60 | 0.71 |
| Count (<100) | 1 | 5 | 6 | 6 | ||||
| Average (all) | 167.76 | 119.71 | 86.82 | 86.04 | 0.24 | 0.56 | 0.46 | 0.63 |
RMAE (%) measures using the LAS on testing dataset.
| Crop | RMAE (%) | |||
|---|---|---|---|---|
| SLR | MLR | M5′ | ANN | |
| PJ01 | 40.61 |
|
|
|
| CBP02 | 14.10 |
|
|
|
| CBA03 | 10.11 | 14.99 | 12.35 |
|
| CBM04 | 11.31 | 8.65 |
|
|
| CP05 |
|
|
|
|
| PA06 | 27.17 | 26.98 |
| 26.29 |
| PA07 | 21.01 |
| 18.35 |
|
| TS08 | 14.75 |
|
| 24.27 |
|
| ||||
| Average (RRSE < 100) | 10.11 | 11.13 | 10.79 | 8.75 |
| Count (RRSE < 100) | 1 | 5 | 6 | 6 |
| Average (all) | 18.65 | 13.59 | 11.93 | 12.89 |
Distance from LAS to OAS results.
| Crop | Distance from optimal | |||
|---|---|---|---|---|
| SLR | MLR | M5′ | ANN | |
| PJ01 | 184 | 30 | 26 | 35 |
| CBP02 | 145 | 7 | 113 | 1 |
| CBA03 | 43 | 111 | 69 | 5 |
| CBM04 | 232 | 170 | 6 | 6 |
| CP05 | 96 | 9 | 23 | 1 |
|
| ||||
| Average | 140 | 65.4 | 47.4 | 9.6 |
Figure 2Quantity of occurrences of each attribute on the LAS and OAS for each technique (only crop datasets with 2007 testing data).
Correlation coefficient between the counts of attributes in LAS-OAS intersection and RRSE.
| Regression |
|
|
|---|---|---|
| SLR | −0.707 | 0.347 |
| MLR | −0.542 | 0.150 |
| M5′ | −0.641 | 0.034 |
| ANN | −0.068 | 0.340 |
(a) SLR
| Crop dataset | Attributes | ||||||
|---|---|---|---|---|---|---|---|
| SP | IWD | SGR | RF | Max | Min | RH | |
| PJ01 |
| ∗ |
|
| ∗ | ||
| CBP02 | ∗ |
|
| ∗ |
| ||
| CBA03 |
| ∗ |
| ||||
| CBM04 |
| ∗ | ∗ |
| |||
| CP05 | ∗ |
|
| ∗ |
| ∗ | |
| PA06 |
|
| ∗ |
| ∗ |
| ∗ |
| PA07 |
|
| ∗ |
| ∗ | ||
| TS08 |
| ∗ | ∗ |
| ∗ | ∗ | |
|
| |||||||
| Count (OAS) | 4 | 8 | 6 | 5 | 4 | 5 | 6 |
|
| |||||||
| Count (LAS) | 5 | 3 | 3 | 5 | 2 | 3 | 1 |
(b) MLR
| Crop dataset | Attributes | ||||||
|---|---|---|---|---|---|---|---|
| SP | IWD | SGR | RF | Max | Min | RH | |
| PJ01 | √ |
| ∗ |
|
|
| ∗ |
| CBP02 |
|
|
|
|
|
| |
| CBA03 |
|
|
|
|
|
|
|
| CBM04 | ∗ | ∗ |
|
|
| ||
| CP05 |
|
|
| ∗ |
| ∗ | |
| PA06 |
|
| ∗ | ∗ |
|
| |
| PA07 |
|
| ∗ |
| ∗ | ||
| TS08 |
| ∗ |
| ∗ |
| ∗ | |
|
| |||||||
| Count (OAS) | 4 | 8 | 6 | 5 | 4 | 5 | 6 |
|
| |||||||
| Count (LAS) | 4 | 6 | 7 | 4 | 6 | 5 | 3 |
(c) M5′
| Crop dataset | Attributes | ||||||
|---|---|---|---|---|---|---|---|
| SP | IWD | SGR | RF | Max | Min | RH | |
| PJ01 |
|
|
|
|
|
| |
| CBP02 |
|
|
|
| ∗ |
| |
| CBA03 |
|
|
|
| ∗ | ||
| CBM04 |
|
|
| ∗ |
| ∗ | |
| CP05 |
| ∗ |
| ||||
| PA06 |
|
|
| ∗ |
| ∗ | |
| PA07 | ∗ |
|
|
| ∗ | ||
| TS08 |
| ∗ | ∗ | ∗ |
|
| |
|
| |||||||
| Count (OAS) | 5 | 5 | 3 | 2 | 3 | 4 | 6 |
|
| |||||||
| Count | 7 | 4 | 5 | 2 | 6 | 5 | 2 |
(d) ANN
| Crop dataset | Attributes | ||||||
|---|---|---|---|---|---|---|---|
| SP | IWD | SGR | RF | Max | Min | RH | |
| PJ01 |
| ∗ |
|
| ∗ | ∗ | ∗ |
| CBP02 |
|
|
| ||||
| CBA03 |
| ∗ |
| ||||
| CBM04 |
| ∗ |
| ||||
| CP05 |
|
|
| ||||
| PA06 |
|
|
|
| ∗ | ||
| PA07 |
|
|
| ||||
| TS08 |
| ∗ |
| ∗ | |||
|
| |||||||
| Count (OAS) | 3 | 5 | 5 | 3 | 2 | 3 | 4 |
|
| |||||||
| Count (LAS) | 5 | 3 | 3 | 5 | 2 | 3 | 1 |