| Literature DB >> 31748577 |
Kyle A Parmley1, Race H Higgins1, Baskar Ganapathysubramanian2, Soumik Sarkar2, Asheesh K Singh3.
Abstract
We explored the capability of fusing high dimensional phenotypic trait (phenomic) data with a machine learning (ML) approach to provide plant breeders the tools to do both in-season seed yield (SY) prediction and prescriptive cultivar development for targeted agro-management practices (e.g., row spacing and seeding density). We phenotyped 32 SoyNAM parent genotypes in two independent studies each with contrasting agro-management treatments (two row spacing, three seeding densities). Phenotypic trait data (canopy temperature, chlorophyll content, hyperspectral reflectance, leaf area index, and light interception) were generated using an array of sensors at three growth stages during the growing season and seed yield (SY) determined by machine harvest. Random forest (RF) was used to train models for SY prediction using phenotypic traits (predictor variables) to identify the optimal temporal combination of variables to maximize accuracy and resource allocation. RF models were trained using data from both experiments and individually for each agro-management treatment. We report the most important traits agnostic of agro-management practices. Several predictor variables showed conditional importance dependent on the agro-management system. We assembled predictive models to enable in-season SY prediction, enabling the development of a framework to integrate phenomics information with powerful ML for prediction enabled prescriptive plant breeding.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31748577 PMCID: PMC6868245 DOI: 10.1038/s41598-019-53451-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Predictor importance and (b) model performance of a Random Forest model trained using data from the row spacing and seeding density studies and from all treatments for seed yield prediction.
Model performance of Random Forest models trained using observations from each agro-management treatment from row spacing and seeding density studies.
| Study | Treatment | OOB Training | Testing | ||
|---|---|---|---|---|---|
| RMSE | R2 | RMSE | R2 | ||
| IA-RS | 38 cm | 328.2 | 0.57 | 222.6 | 0.53 |
| 76 cm | 334.8 | 0.50 | 204.9 | 0.50 | |
| IA-SD | Low (124,000 plants/ha) | 232.7 | 0.83 | 185.3 | 0.82 |
| Med (346,000 plants/ha) | 303.9 | 0.71 | 241.1 | 0.69 | |
| High (568,000 plants/ha) | 299.7 | 0.71 | 209.4 | 0.74 | |
OOB = Out of Bag, RMSE = Root Mean Square Error. IA-RS = row spacing study. IA-SD = seeding density study.
Figure 2Heatmap of predictor feature importance computed for each agro-management treatment level from the row spacing and seeding density studies. Darker blue colors indicate higher importance of traits for seed yield prediction.
Figure 3Random Forest model performance for seed yield prediction after the Recursive Feature Elimination process to remove uninformative predictors and to minimize data collection efforts. Model performance was accessed on an independent testing set not used during model training.