| Literature DB >> 31695728 |
Jiating Li1, Arun-Narenthiran Veeranampalayam-Sivakumar1, Madhav Bhatta2, Nicholas D Garst3, Hannah Stoll3, P Stephen Baenziger3, Vikas Belamkar3, Reka Howard4, Yufeng Ge1, Yeyin Shi1.
Abstract
BACKGROUND: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm.Entities:
Keywords: LASSO; Phenotyping; Random forest; Ridge regression; SVM; Unmanned aerial vehicle; Yield prediction
Year: 2019 PMID: 31695728 PMCID: PMC6824016 DOI: 10.1186/s13007-019-0508-7
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1Field location and layout. The field was located in eastern Nebraska, USA (left). Cyan line indicates the flight path, and yellow rectangles indicate the studied 170 plots (right). The map was generated using images collected on April 27, 2018
Fig. 2The UAV system (left) and the flight parameter settings on the DJI GS Pro application (right)
Seven data collections over the spring season of 2018
| Date | Acquired image type | Day of year (DOY) | Growth stage |
|---|---|---|---|
| April 22 | RGB | 111 | Tillering stage: Feekes 3 |
| April 27 | RGB and Multispectral | 116 | Green-up stage: Feekes 5 |
| May 7 | RGB and Multispectral | 126 | Jointing stage: Feekes 6 |
| May 15 | RGB | 134 | Flag leaf stage: Feekes 8 |
| May 21 | RGB and Multispectral | 140 | Boot stage: Feekes 9 |
| June 1 | RGB and Multispectral | 151 | Grain filling: Feekes 10.5.3 |
| June 18 | RGB and Multispectral | 168 | Physiological maturity: Feekes 11 |
Fig. 3Dynamic growth rate calculation based on the dynamic curve of NDVI
Fig. 5Growth dynamic curves of VIs (a) and plant height (b) over the spring growing season in 2018. Error bars represent standard deviation among the 170 plots
Summary of the 172 variables in each plot extracted from VI and plant height maps
| UAV derived map | Number of variables | |
|---|---|---|
| Pixel statistics | Dynamic growth rate | |
| Plant height | 7 features × 7 dates = 49* | 6 |
| NDVI | 7 features × 5 dates = 35** | 4 |
| NDRE | 7 features × 5 dates = 35** | 4 |
| GNDVI | 7 features × 5 dates = 35** | 4 |
| Total number of variables | 172 | |
* Median, 95th percentile, standard deviation, contrast, correlation, energy, and homogeneity
** Median, mode, standard deviation, contrast, correlation, energy, and homogeneity
Fig. 4Multi-temporal maps of plant height and three vegetation indices. DOY represents ‘day of year’ for each date, corresponding to x-axis in Fig. 5. a Whole field maps. b An example plot selected randomly (field map used images collected on May 7, 2019)
Ten variables selected by LASSO and random forest, respectively
| Rank | Feature | Originated map | Date | Abbreviation |
|---|---|---|---|---|
| LASSO selected variables | ||||
| 1 | Median | Plant height | 6th |
|
| 2 | Contrast | NDRE | 3rd | NDRE.Date3.Var1 |
| 3 | Standard deviation | NDRE | 2nd |
|
| 4 | 95th percentile | Plant height | 5th | PH.Date5.Var1 |
| 5 | 95th percentile | Plant height | 1st | PH.Date1.Var1 |
| 6 | Homogeneity | Plant height | 7th | PH.Date7.Var1 |
| 7 | Standard deviation | NDVI | 3rd | NDVI.Date3.Var1 |
| 8 | Standard deviation | Plant height | 2nd | PH.Date2.Var1 |
| 9 | Correlation | GNDVI | 6th | GNDVI.Date6.Var1 |
| 10 | Standard deviation | NDRE | 3rd | NDRE.Date3.Var2 |
| Random forest selected variables | ||||
| 1 | Median | Plant height | 6th |
|
| 2 | Median | Plant height | 7th | PH.Date7.Var1 |
| 3 | Second growth rate | GNDVI | 3rd and 5th | GNDVI. Date3-Date5 |
| 4 | 95th percentile | Plant height | 2nd | PH.Date2.Var1 |
| 5 | Fifth growth rate | Plant height | 5th and 6th | PH.Date5-Date6 |
| 6 | Correlation | Plant height | 1st | PH.Date1.Var1 |
| 7 | 95th percentile | Plant height | 6th | PH.Date6.Var2 |
| 8 | Median | Plant height | 4th | PH.Date4.Var1 |
| 9 | Standard deviation | NDRE | 2nd |
|
| 10 | 95th percentile | Plant height | 7th | PH.Date7.Var2 |
Common variables selected by both algorithms were given in italic
Fig. 6Variable importance scores of top 10 variables selected by LASSO and random forest
Performance of grain yield prediction on testing data, using variable sets determined from LASSO and random forest, as well as all available variables
| Variables | LASSO selected variables | Random forest selected variables | All 172 variables | |||
|---|---|---|---|---|---|---|
| Sample size |
| RMSE* (g/plot) |
| RMSE (g/plot) |
| RMSE (g/plot) |
| (1) Predictions of SVM model with Gaussian radial basis kernel | ||||||
| All lines | 0.32 | 320.19 | 0.39 | 306.15 | 0.29 | 314.77 |
| NE lines | 0.58 | 326.97 | 0.77 | 254.66 | 0.72 | 284.08 |
| TX lines | 0.21 | 271.44 | 0.36 | 255.51 | 0.57 | 215.92 |
| WB lines | 0.28 | 271.88 | 0.41 | 236.82 | 0.25 | 264.53 |
| OK and SY lines | 0.39 | 201.45 | 0.45 | 191.06 | 0.36 | 193.31 |
| (2) Predictions of ridge regression model | ||||||
| All lines | 0.49 | 283.86 | 0.39 | 301.89 | 0.25 | 314.83 |
| NE lines | 0.73 | 272.72 | 0.81 | 225.45 | 0.73 | 295.92 |
| TX lines | 0.55 | 235.37 | 0.50 | 255.68 | 0.47 | 242.99 |
| WB lines | 0.40 | 247.57 | 0.42 | 246.21 | 0.22 | 266.25 |
| OK and SY lines | 0.59 | 163.69 | 0.54 | 164.90 | 0.58 | 169.29 |
* Values of r and RMSE were averaged from 20 random sets of testing data
Fig. 7Relationship between measured and predicted grain yield of an example training–testing set for each group of lines. Dashed line is the 1:1 line. r is correlation coefficient; n is testing sample size