| Literature DB >> 31379888 |
Zhou Zhang1,2, Yufang Jin1, Bin Chen1, Patrick Brown3.
Abstract
California's almond growers face challenges with nitrogen management as new legislatively mandated nitrogen management strategies for almond have been implemented. These regulations require that growers apply nitrogen to meet, but not exceed, the annual N demand for crop and tree growth and nut production. To accurately predict seasonal nitrogen demand, therefore, growers need to estimate block-level almond yield early in the growing season so that timely N management decisions can be made. However, methods to predict almond yield are not currently available. To fill this gap, we have developed statistical models using the Stochastic Gradient Boosting, a machine learning approach, for early season yield projection and mid-season yield update over individual orchard blocks. We collected yield records of 185 orchards, dating back to 2005, from the major almond growers in the Central Valley of California. A large set of variables were extracted as predictors, including weather and orchard characteristics from remote sensing imagery. Our results showed that the predicted orchard-level yield agreed well with the independent yield records. For both the early season (March) and mid-season (June) predictions, a coefficient of determination (R 2) of 0.71, and a ratio of performance to interquartile distance (RPIQ) of 2.6 were found on average. We also identified several key determinants of yield based on the modeling results. Almond yield increased dramatically with the orchard age until about 7 years old in general, and the higher long-term mean maximum temperature during April-June enhanced the yield in the southern orchards, while a larger amount of precipitation in March reduced the yield, especially in northern orchards. Remote sensing metrics such as annual maximum vegetation indices were also dominant variables for predicting the yield potential. While these results are promising, further refinement is needed; the availability of larger data sets and incorporation of additional variables and methodologies will be required for the model to be used as a fertilization decision support tool for growers. Our study has demonstrated the potential of automatic almond yield prediction to assist growers to manage N adaptively, comply with mandated requirements, and ensure industry sustainability.Entities:
Keywords: almond orchard; central valley; machine learning; nitrogen management; remote sensing; yield prediction; yield variation
Year: 2019 PMID: 31379888 PMCID: PMC6656960 DOI: 10.3389/fpls.2019.00809
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1Almond orchard study area in California’s Central Valley. (A) Historical yield records from growers were collected for a total of 185 almond orchards (diamond), located in three subregions: north, middle, and south (black boxes). The long-term mean maximum temperature (LT Tmax) during April–June from 1900 to 2009 is shown in the background. Also shown are (B) Mean monthly temperature and (C) monthly precipitation averaged over northern and central (blue) and southern subregions (magenta).
A detailed summary of the input variables for early-/mid-season SGB models.
| Yield records from 8 major almond growers in the Central Valley of California. | Planting year | Age | Age | −0.16* |
| CA-BCM data with 270 m spatial resolution for years 1990–20161; | Monthly mean daily maximum temperature (Tmax); | Current year monthly Tmax and Tmin from January to June2, and PPT from January to March | Tmin, Tmax, PPT January | −0.06*, −0.06, −0.22* |
| Previous year summer mean temperature averaged over July and August | Pre Tmean July–August | 0.36* | ||
| Long-term mean seasonal Tmax, Tmin, PPT (averaged over 1990–2009 for each season3). | LT Tmin, Tmax, PPT January–March | 0.26*, 0.50*, −0.58* | ||
| CIMIS station data for years 2009–2017 | Hourly temperature | Winter chilling portions calculated by the Dynamic Model | ChillP | −0.14* |
| NAIP aerial imagery from 2016 with 0.6m resolution | NAIP RGB imagery acquired in 2016 | 2016 canopy cover percentage | CCP | 0.20* |
| Landsat satellite imagery with 30 m resolution years 2009–2017 | Landsat multispectral imagery every 16 days | Previous year annual maximum NDVI and EVI; | Pre Max NDVI | 0.13* |
FIGURE 2Distributions of (A) orchard age in 2017 and (B) cultivars for the study orchards (N = 185).
FIGURE 3Flowchart of the four-fold cross validation modeling framework.
FIGURE 4Historical yield patterns for two sample orchards, Orchard #1 is located in Colusa County, planted in 2001 (black), and Orchard #2 is located in Kern County, planted in 2006 (blue). The corresponding early season predictions were also shown as (a) dashed lines by the full model and (b) dotted lines by the reduced model excluding the historical yield and cultivar composition from predictors.
FIGURE 5Mean annual reported almond yield and early-season predictions averaged (A) over mature (7–17 years old) orchards and (D) over all orchards within the study area. Similar results were presented here for (B,E) orchards in the northern and central region and (C,F) for southern orchards, respectively. The vertical lines represented the standard deviation among orchards.
Comparison of the performance of five machine learning approaches for orchard-level almond yield prediction, when using the full set of input variables.
| Early season | Linear regression | 0.58 (0.04) | 422 (4.1) | 2.19 (0.19) |
| Support vector regression | 0.51 (0.04) | 460 (16.5) | 2.01 (0.16) | |
| Artificial neural network | 0.50 (0.05) | 474 (26.1) | 1.96 (0.16) | |
| Random Forest | 0.69 (0.04) | 364 (14.8) | 2.55 (0.28) | |
| Mid-season | Linear regression | 0.59 (0.05) | 416 (6.1) | 2.23 (0.22) |
| Support vector regression | 0.52 (0.04) | 453 (15.5) | 2.05 (0.17) | |
| Artificial neural network | 0.48 (0.04) | 473 (7.6) | 1.96 (0.15) | |
| Random Forest | 0.69 (0.04) | 365 (13.9) | 2.54 (0.27) | |
Performance of the Stochastic Gradient Boosting (SGB) approach in predicting the almond yield at individual orchards, using different set of input variables, as shown by the statistics from a four-fold cross-validation.
| 2 | Early season | Without historical yields | Early NoYld | 47 | 0.70 (0.05) | 355 (17.6) | 2.62 (0.32) |
| 3 | Early season | Without historical yields and cultivar percentage | Early NoYld-NoCul | 27 | 0.68 (0.04) | 370 (6.2) | 2.51 (0.26) |
| 4 | Early season | Only important variables | Early Imp | 8 | 0.67 (0.04) | 375 (18.2) | 2.48 (0.26) |
| 6 | Mid-season | Without historical yields | Mid-NoYld | 54 | 0.70 (0.05) | 360 (17.2) | 2.59 (0.34) |
| 7 | Mid-season | Without historical yields and cultivar percentage | Mid-NoYld-NoCul | 34 | 0.69 (0.05) | 364 (15.9) | 2.56 (0.33) |
| 8 | Mid-season | Only important variables | Mid-Imp | 10 | 0.68 (0.04) | 371 (6.3) | 2.50 (0.25) |
FIGURE 6Scatter plots of predicted vs. reported yields, from three sets of models for early season (top panels) and mid-season predictions (bottom panel): (A,D) full models, (B,E) reduced model excluding historical yields as predictors, and (C,F) reduced models excluding both historical yield and cultivar composition information. (A): Early Full; (B) Early NoYld; (C) Early NoYld-NoCul; (D): Mid-Full; (E) Mid-NoYld; (F) Mid-NoYld-NoCul. The predicted values shown here were from the reserved testing dataset in one round of the cross-validation (N = 247).
FIGURE 7Variable importance for four sets of prediction models (A) Early Full; (B) Early NoYld-NoCul; (C) Mid-Full; (D) Mid-NoYld-NoCul. See Table 1 for the detailed variable names.
FIGURE 8Partial dependence of target year’s yield on (A) age, (B) previous year maximum NDVI; (C) previous year maximum EVI; (D) long term maximum temperature in April–June; (E) target year March precipitation; (F) target year’s January maximum temperature; (G) previous year summer temperature; (H) target year June mean EVI. *The results were based on “Mid-NoYld-NoCul” model.
FIGURE 9Time series of almond yield reported vs. March precipitation for a sample orchard located in Colusa county, planted in 2001.
FIGURE 10Single decision tree built upon mature orchards using five important variables as predictors. Nodes with higher yields were represented with darker colors.
FIGURE 11(A) Spatial distribution of the long term mean Tmax from April to June (LT Tmax April–June); (B) locations of almond orchards in this study overlaid on top of a binary map of LT Tmax April–June with a threshold of 28.294°C; orchards with LT Tmax April–June < = 28.294°C (red cross); orchards with LT Tmax April–June > 28.294°C (blue cross); The binary long term mean April–June temperature map is shown in the background.
FIGURE 12Time series of almond yield and March precipitation for the orchards in node #5 (Figure 10).