| Literature DB >> 35161467 |
Alexis Pang1, Melissa W L Chang1,2, Yang Chen1,3.
Abstract
Wheat accounts for more than 50% of Australia's total grain production. The capability to generate accurate in-season yield predictions is important across all components of the agricultural value chain. The literature on wheat yield prediction has motivated the need for more novel works evaluating machine learning techniques such as random forests (RF) at multiple scales. This research applied a Random Forest Regression (RFR) technique to build regional and local-scale yield prediction models at the pixel level for three southeast Australian wheat-growing paddocks, each located in Victoria (VIC), New South Wales (NSW) and South Australia (SA) using 2018 yield maps from data supplied by collaborating farmers. Time-series Normalized Difference Vegetation Index (NDVI) data derived from Planet's high spatio-temporal resolution imagery, meteorological variables and yield data were used to train, test and validate the models at pixel level using Python libraries for (a) regional-scale three-paddock composite and (b) individual paddocks. The composite region-wide RF model prediction for the three paddocks performed well (R2 = 0.86, RMSE = 0.18 t ha-1). RF models for individual paddocks in VIC (R2 = 0.89, RMSE = 0.15 t ha-1) and NSW (R2 = 0.87, RMSE = 0.07 t ha-1) performed well, but moderate performance was seen for SA (R2 = 0.45, RMSE = 0.25 t ha-1). Generally, high values were underpredicted and low values overpredicted. This study demonstrated the feasibility of applying RF modeling on satellite imagery and yielded 'big data' for regional as well as local-scale yield prediction.Entities:
Keywords: Normalized Difference Vegetation Index (NDVI); random forests; satellite imagery; wheat; yield prediction
Mesh:
Year: 2022 PMID: 35161467 PMCID: PMC8839090 DOI: 10.3390/s22030717
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Summary of workflow processes and datasets used for building, testing and evaluating RF model wheat yield prediction method.
Figure 2Location of study paddocks in southeast Australia, covering the states of Victoria (VIC), New South Wales (NSW) and South Australia (SA).
Location, cropping, climate and soil characteristics of study paddocks.
| Location | 2018 | Paddock Area & | Climate | Soil Description |
|---|---|---|---|---|
| Ouyen, | Variety: | 181.2 ha | Mean Max Temp: 23.8 °C | Calcarosol (dune systems with series of alkaline sandy/loamy duplex, and sandy clay soils). |
| Barmedman, NSW | Variety: | 67.6 ha | Mean Max Temp: 24.0 °C | Brown Vertosol (heavy clay soil, alkaline with strongly sodic subsoil). |
| Pinery, SA | Variety: | 120.1 ha | Mean Max Temp: 23.6 °C | Calcarosol (alkaline silty clay loam to medium-heavy clay) variable soil profiles on dune systems. |
PlanetScope imagery fortnightly Periods, dates and corresponding Days After Sowing (DAS) in year 2018 for each location in the states of Victoria (VIC), New South Wales (NSW) and South Australia (SA), Australia.
| Location | Ouyen, VIC | Barmedman, NSW | Pinery, SA | |||
|---|---|---|---|---|---|---|
| Period | 2018 Date | DAS | 2018 Date | DAS | 2018 Date | DAS |
|
| - | - | 19 April | 15 | - | - |
|
| - | - | 30 April | 26 | - | - |
|
| 25 May | 10 | 14 May | 40 | 16 May | 7 |
|
| 31 May | 16 | 29 May | 55 | 31 May | 22 |
|
| 14 June | 30 | 22 June | 79 | 13 June | 35 |
|
| 30 June | 46 | 30 June | 87 | 29 June | 51 |
|
| 14 July | 60 | 14 July | 101 | 14 July | 66 |
|
| 29 July | 75 | 12 August | 130 | 29 July | 81 |
|
| 13 August | 90 | 27 August | 145 | 26 August | 109 |
|
| 7 September | 115 | 4 September | 153 | 4 September | 118 |
|
| 20 September | 128 | 21 September | 170 | 17 September | 131 |
|
| 4 October | 142 | 30 September | 179 | 1 October | 145 |
|
| 19 October | 157 | 18 October | 197 | 19 October | 163 |
|
| 4 November | 173 | 11 November | 221 | 2 November | 177 |
|
| 18 November | 187 | 26 November | 236 | 17 November | 192 |
|
| - | - | 12 December | 252 | - | - |
Descriptive statistics for regional-scale observed and RF model predicted yield.
| Observed Yield | Predicted Yield | |
|---|---|---|
| sample size, | 75,495 | 75,495 |
| minimum (t ha−1) | 0.35 | 0.38 |
| maximum (t ha−1) | 2.79 | 2.67 |
| mean (t ha−1) | 1.60 | 1.60 |
| standard deviation (t ha−1) | 0.47 | 0.44 |
Statistical performance of regional-scale RF yield prediction model.
| Metric | Test | Validation |
|---|---|---|
| R Squared ( | 0.858 | 0.860 |
| Adjusted R Squared ( | 0.858 | 0.860 |
| Mean Absolute Error ( | 0.126 | 0.126 |
| Mean Squared Error ( | 0.032 | 0.031 |
| Root Mean Squared Error ( | 0.179 | 0.177 |
Figure 3Scatterplot of observed and predicted yield of VIC, NSW and SA paddocks combined.
Figure 4Top 10 features of importance for regional RF yield prediction model. Note: Data labels e.g., P12 refer to NDVI in Periods described in Table 2.
Descriptive statistics for predicted yields from individual RF models compared with observed yields for VIC, NSW and SA paddocks.
| VIC | NSW | SA | ||||
|---|---|---|---|---|---|---|
| Yield | Observed | Predicted | Observed | Predicted | Observed | Predicted |
| mean | 1.55 | 1.56 | 1.08 | 1.08 | 1.95 | 1.94 |
| standard | 0.44 | 0.41 | 0.20 | 0.19 | 0.33 | 0.22 |
| minimum | 0.36 | 0.38 | 0.34 | 0.40 | 0.91 | 0.96 |
| maximum | 2.72 | 2.66 | 1.67 | 1.59 | 2.80 | 2.66 |
Statistical performance of VIC, NSW and SA RF yield prediction models.
| VIC | NSW | SA | ||||
|---|---|---|---|---|---|---|
| Metric | Test | Validation Dataset | Test | Validation Dataset | Test | Validation Dataset |
|
| 0.890 | 0.887 | 0.870 | 0.878 | 0.447 | 0.443 |
| Adjusted | 0.890 | 0.887 | 0.869 | 0.877 | 0.445 | 0.441 |
| Mean Absolute Error | 0.110 | 0.111 | 0.056 | 0.054 | 0.186 | 0.185 |
| Mean Squared Error | 0.021 | 0.022 | 0.005 | 0.005 | 0.061 | 0.060 |
| Root Mean Squared Error ( | 0.146 | 0.147 | 0.073 | 0.071 | 0.246 | 0.246 |
Figure 5Comparison of predicted vs. observed yield for (a) VIC; (b) NSW and (c) SA paddocks.
Figure 6Yield maps and histograms for (a) VIC; (b) NSW and (c) SA paddocks. Notes: Yield maps—darker colors indicate higher yield values; yield histogram y-axes differ in range for NSW and SA paddocks.
Top ten most important features for VIC, NSW and SA paddock RF models, and corresponding NDVI Period and mean decrease in accuracy (MDA) if excluded.
| Feature | VIC | NSW | SA | |||
|---|---|---|---|---|---|---|
| NDVI | MDA | NDVI | MDA | NDVI | MDA | |
| 1 | 18 | 0.68 | 16 | 0.68 | 13 | 0.22 |
| 2 | 17 | 0.11 | 17 | 0.14 | 16 | 0.12 |
| 3 | 20 | 0.04 | 7 | 0.02 | 18 | 0.09 |
| 4 | 13 | 0.03 | 21 | 0.02 | 15 | 0.08 |
| 5 | 12 | 0.03 | 18 | 0.02 | 14 | 0.07 |
| 6 | 16 | 0.02 | 19 | 0.02 | 17 | 0.06 |
| 7 | 15 | 0.02 | 20 | 0.02 | 12 | 0.06 |
| 8 | 19 | 0.02 | 13 | 0.02 | 22 | 0.06 |
| 9 | 14 | 0.02 | 14 | 0.01 | 19 | 0.04 |
| 10 | 11 | 0.01 | 9 | 0.01 | 11 | 0.04 |