| Literature DB >> 31191564 |
Abstract
Crop yield is a highly complex trait determined by multiple factors such as genotype, environment, and their interactions. Accurate yield prediction requires fundamental understanding of the functional relationship between yield and these interactive factors, and to reveal such relationship requires both comprehensive datasets and powerful algorithms. In the 2018 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the genotype and yield performances of 2,267 maize hybrids planted in 2,247 locations between 2008 and 2016 and asked participants to predict the yield performance in 2017. As one of the winning teams, we designed a deep neural network (DNN) approach that took advantage of state-of-the-art modeling and solution techniques. Our model was found to have a superior prediction accuracy, with a root-mean-square-error (RMSE) being 12% of the average yield and 50% of the standard deviation for the validation dataset using predicted weather data. With perfect weather data, the RMSE would be reduced to 11% of the average yield and 46% of the standard deviation. We also performed feature selection based on the trained DNN model, which successfully decreased the dimension of the input space without significant drop in the prediction accuracy. Our computational results suggested that this model significantly outperformed other popular methods such as Lasso, shallow neural networks (SNN), and regression tree (RT). The results also revealed that environmental factors had a greater effect on the crop yield than genotype.Entities:
Keywords: deep learning; feature selection; machine learning; weather prediction; yield prediction
Year: 2019 PMID: 31191564 PMCID: PMC6540942 DOI: 10.3389/fpls.2019.00621
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Hybrids locations across the United States. Data collected from the 2018 Syngenta Crop Challenge Syngenta (2018).
Figure 2Neural network structure for weather prediction with a 4-year lag.
Figure 3Neural networks designed for predicting yield difference.
Figure 4Deep neural network structure for yield or check yield prediction. The input layer takes in genotype data (G ∈ ℤ), weather data (), and soil data () as input. Here, n is the number of observations, p is the number of genetic markers, k1 is the number of weather components, and k2 is the number of soil conditions. Odd numbered layers have a residual shortcut connection which skips one layer. Each sample is fed to the network as a vector with dimension of .
Prediction performance with ground truth weather variables.
| DNN | Yield | 10.55 | 88.3 | 12.79 | 81.91 |
| Check yield | 8.21 | 91.00 | 11.38 | 85.46 | |
| Yield difference | 11.79 | 45.87 | 12.40 | 29.28 | |
| Lasso | Yield | 20.28 | 36.68 | 21.40 | 27.56 |
| Check yield | 18.85 | 28.49 | 19.87 | 23.00 | |
| Yield difference | 15.32 | 19.78 | 13.11 | 6.84 | |
| SNN | Yield | 12.96 | 80.21 | 18.04 | 60.11 |
| Check yield | 10.24 | 71.18 | 15.18 | 60.48 | |
| Yield difference | 9.92 | 58.74 | 15.19 | 11.39 | |
| RT | Yield | 14.31 | 76.7 | 15.03 | 73.8 |
| Check yield | 14.55 | 82.00 | 14.87 | 69.95 | |
| Yield difference | 17.62 | 21.12 | 15.92 | 5.1 |
The average ± standard deviation for yield, check yield, and yield difference are, respectively, 116.51 ± 27.7, 128.27 ± 25.34, and −11.76 ± 14.27. DNN, Lasso, SNN, and RT stand for deep neural networks, least absolute shrinkage and selection operator, shallow neural network, and regression tree, respectively.
Figure 5The yield prediction error for individual regions in the validation dataset. The map shows the validation locations across the United States.
Figure 6The probability density functions of the ground truth yield and the predicted yield by DNN model. The plots indicate that DNN model can approximately preserve the distributional properties of the ground truth yield.
Prediction performance with predicted weather variables.
| DNN | Yield | 11.64 | 85.66 | 13.94 | 78.65 |
| Check yield | 9.49 | 78.35 | 12.51 | 75.04 | |
| Yield difference | 12.80 | 37.64 | 15.54 | 19.86 |
Yield prediction performances of DNN(G), DNN(S), DNN(W), and Average model.
| DNN(G) | 21.74 | 20.26 | 21.72 | 15.09 |
| DNN(S) | 15.28 | 73.37 | 15.49 | 72.04 |
| DNN(W) | 14.26 | 76.98 | 14.96 | 72.60 |
| Average | 24.40 | 0.0 | 23.14 | 0.0 |
Figure 7Bar plot of estimated effects of 627 genetic markers.
Figure 9Bar plot of estimated effects of 6 weather components measured for 12 months of each year, starting from January. The vertical axes were normalized across all weather components to make the effects comparable.
Yield prediction performance of DNN on the subset of features.
| DNN | 12.01 | 84.01 | 12.81 | 81.44 |
The DNN model used 50 genetic markers and 20 environmental components selected by feature selection method.