| Literature DB >> 32424224 |
Marco Lopez-Cruz1, Eric Olson1, Gabriel Rovere2,3,4, Jose Crossa5, Susanne Dreisigacker5, Suchismita Mondal5, Ravi Singh5, Gustavo de Los Campos6,7,8.
Abstract
High-throughput phenotyping (HTP) technologies can produce data on thousands of phenotypes per unit being monitored. These data can be used to breed for economically and environmentally relevant traits (e.g., drought tolerance); however, incorporating high-dimensional phenotypes in genetic analyses and in breeding schemes poses important statistical and computational challenges. To address this problem, we developed regularized selection indices; the methodology integrates techniques commonly used in high-dimensional phenotypic regressions (including penalization and rank-reduction approaches) into the selection index (SI) framework. Using extensive data from CIMMYT's (International Maize and Wheat Improvement Center) wheat breeding program we show that regularized SIs derived from hyper-spectral data offer consistently higher accuracy for grain yield than those achieved by standard SIs, and by vegetation indices commonly used to predict agronomic traits. Regularized SIs offer an effective approach to leverage HTP data that is routinely generated in agriculture; the methodology can also be used to conduct genetic studies using high-dimensional phenotypes that are often collected in humans and model organisms including body images and whole-genome gene expression profiles.Entities:
Mesh:
Year: 2020 PMID: 32424224 PMCID: PMC7235263 DOI: 10.1038/s41598-020-65011-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Prediction of the genetic merit for grain yield using hyper-spectral crop image data. (A) Data consists of hyper-spectral reflectance data () and phenotypic measurements of the target trait (, e.g., grain yield). (B) A subset of the data (the training set) is used to derive the coefficients () of a selection index. (C) These coefficients are then applied to image data of individuals in the testing set to derive the index () for each individual. The predictive ability of the index is assessed by calculating the accuracy of indirect selection () in the testing set.
Average grain yield and heritability by environmental condition.
| Planting conditions | Number of irrigations | Abbreviation | Average (SD) Yield | Heritability (SD) | |
| Date | System | ||||
| Optimum | Flat | Minimal | Flat-Drought | 2.06 (0.58) | 0.83 (0.016) |
| Bed | 2 | Bed-2IR | 3.67 (0.43) | 0.66 (0.032) | |
| 5 | Bed-5IR | 6.11 (0.61) | 0.43 (0.025) | ||
| Early | 5 | Bed-EHeat | 6.43 (0.73) | 0.61 (0.018) | |
SD: standard deviation.
Figure 2Accuracy of indirect selection of regularized SIs and its components. Square root heritability (green), genetic correlation (orange), and accuracy of indirect selection (purple, all averaged over 100 training-testing partitions), versus the number of predictors used to build the index: (A) number of active bands in the case of the L1-PSI, or (B) number of PCs in the PC-SI. Each panel represents one environment (latest time-point).
Figure 3Accuracy of indirect selection achieved by a standard (SI) and by regularized (PC-SI and L1-PSI) selection indices. The lines provide the average accuracy over 100 training-testing partitions. Vertical lines represent a 95% confidence interval for the average. The horizontal axis gives the time-point at which images were collected and are expressed in both days after sowing (DAS) and stages (VEG = vegetative, GF = grain filling, MAT = maturity).
Accuracy and relative efficiency of indirect selection of an L1-penalized SI using data from one and nine time-points.
| Environment | Accuracy (SD) | Relative Efficiency (SD) | ||
|---|---|---|---|---|
| Best single time-point* | Nine time-points combined | Best single time-point* | Nine time-points combined | |
| Flat-Drought | 0.69 (0.05) | 0.70 (0.05) | 0.74 (0.05) | 0.75 (0.05) |
| Bed-2IR | 0.46 (0.04) | 0.54 (0.03) | 0.57 (0.05) | 0.67 (0.04) |
| Bed-5IR | 0.47 (0.06) | 0.55 (0.05) | 0.72 (0.08) | 0.83 (0.08) |
| Bed-EHeat | 0.68 (0.04) | 0.71 (0.04) | 0.88 (0.05) | 0.91 (0.04) |
Values are presented as an average across 100 training-testing partitions. SD: standard deviation. *For each environment we include the time-point that gave the highest accuracy of selection (see Fig. 3 for other time-points).
Figure 4Heatmap of regression coefficients for L1-penalized selection indices. Separate indices were derived for each environment using multi time-point data. DAS = days after sowing, VEG, GF, MAT represent vegetative, grain-filling and maturity stages, respectively. The bottom color-bar shows the light color associated with each waveband in the visible spectrum (≤750 m); black was used to represent the near-infrared spectrum (wavelength > 750 nm).