| Literature DB >> 36204056 |
Hieu Pham1, John Reisner2, Ashley Swift3, Sigurdur Olafsson3, Stephen Vardeman2,3.
Abstract
Phenotypic variation in plants is attributed to genotype (G), environment (E), and genotype-by-environment interaction (GEI). Although the main effects of G and E are typically larger and easier to model, the GEI interaction effects are important and a critical factor when considering such issues as to why some genotypes perform consistently well across a range of environments. In plant breeding, a major challenge is limited information, including a single genotype is tested in only a small subset of all possible test environments. The two-way table of phenotype responses will therefore commonly contain missing data. In this paper, we propose a new model of GEI effects that only requires an input of a two-way table of phenotype observations, with genotypes as rows and environments as columns that do not assume the completeness of data. Our analysis can deal with this scenario as it utilizes a novel biclustering algorithm that can handle missing values, resulting in an output of homogeneous cells with no interactions between G and E. In other words, we identify subsets of genotypes and environments where phenotype can be modeled simply. Based on this, we fit no-interaction models to predict phenotypes of a given crop and draw insights into how a particular cultivar will perform in the unused test environments. Our new methodology is validated on data from different plant species and phenotypes and shows superior performance compared to well-studied statistical approaches.Entities:
Keywords: linear model; machine learning; missing data; no-interaction model; unsupervised learning
Year: 2022 PMID: 36204056 PMCID: PMC9530907 DOI: 10.3389/fpls.2022.975976
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Figure 1Biclustering creates row clusters (genotypes) and column clusters (environments) with homogeneous phenotype in a checkerboard pattern. Raw matrix (left) and “checkerboard-like” biclustering (right).
Case study characteristics.
|
|
|
|
|
|
|---|---|---|---|---|
| Sorghum | Flowering time | 237 | 7 | 3.0% |
| Maize | Yield | 211 | 8 | 0% |
| Rice | Flowering time | 176 | 9 | 2.8% |
Benchmark models for genotype-by-environment interaction (GEI) using a two-way table of phenotype means data.
|
|
| |
|---|---|---|
| 1. | Additive Model (no-interaction) | μ |
| 3. | Regression on the Mean | μ |
| (see Finlay and Wilkinson, | ||
| 4. | Additive Main Effects and Multiplicative Interactions |
|
| (AMMI) (see Gollob, |
Figure 2Comparison of biclusterings plots for sorghum. The homogeneity of the cells indicates that the biclustering algorithm was able to successfully group the response variables in distinct clusterings. Top left—response (1); Top right—response (2); bottom left—response (3), and bottom right—response (4).
Figure 3Comparison of the number of row and column clusters for SSEbc (left) and SSEni (right) as the number of column and row clusters approach the dataset dimensions, the SSE decreases toward zero.
Summary of degrees of freedom and sum of squares for the sorghum dataset.
|
|
|
|
|---|---|---|
|
| ||
| G | 236 | 53,020,738 |
| E | 6 | 199,593,392 |
| Error | 1367 | 67,635,610 |
|
| ||
| G | 236 | 53,020,738 |
| E | 6 | 199,593,392 |
| GEI | 1367 | 67,635,610 |
| Error | 0 | 0 |
|
| ||
| G | 236 | 53,020,738 |
| E | 6 | 199,593,392 |
| GEI Ind | 236 | 52,422,217 |
| Error | 1131 | 15,213,393 |
|
| ||
| G | 236 | 53,020,738 |
| E | 6 | 199,593,392 |
| PC1 | 241 | 54,483,854 |
| PC2 | 239 | 5,581,472 |
| Error | 887 | 7,570,284 |
Summary of degrees of freedom and sum of squares for the maize dataset.
|
|
|
|
|---|---|---|
|
| ||
| G | 210 | 614 |
| E | 7 | 5,679 |
| Error | 1,470 | 813 |
|
| ||
| G | 210 | 614 |
| E | 7 | 5,679 |
| GEI | 1,470 | 813 |
| Error | 0 | 0 |
|
| ||
| G | 210 | 614 |
| E | 7 | 5,679 |
| GEI Ind | 210 | 230 |
| Error | 1,260 | 583 |
|
| ||
| G | 210 | 614 |
| E | 7 | 5,679 |
| PC1 | 216 | 242 |
| PC2 | 214 | 173 |
| Error | 1,040 | 398 |
Biclustering results with the smallest SSE values obtained from fitting no-interaction models on each bicluster for raw responses (30 trials - maize data).
|
| |||
|---|---|---|---|
|
|
|
|
|
| (2, 2) | 604 | 580 | 639 |
| (2, 3) | 418 | 463 | 446 |
| (2, 4) | 333 | 352 | 350 |
| (3, 2) | 579 | 595 | 524 |
| (3, 3) | 431 | 456 | 460 |
| (3, 4) | 300 | 339 | 314 |
| (4, 2) | 589 | 576 | 557 |
| (4, 3) | 434 | 454 | 402 |
| (4, 4) | 296 | 334 | 296 |
Figure 4Biclustering illustration of response (4) with initialization (left) and final bicluster (right) for four row and column clusters (maize data). The homogeneity of the cells indicates that the biclustering algorithm was able to successfully group the response variables in distinct clusterings.
Summary of degrees of freedom and sum of squares for the rice dataset—FTgdd.
|
|
|
|
|---|---|---|
|
| ||
| G | 175 | 132,865,164 |
| E | 8 | 19,169,848 |
| Error | 1355 | 59,620,672 |
|
| ||
| G | 175 | 132,865,164 |
| E | 8 | 19,169,848 |
| GEI | 1355 | 59,620,672 |
| Error | 0 | 0 |
|
| ||
| G | 175 | 132,865,164 |
| E | 8 | 19,169,848 |
| GEI Ind | 175 | 23,653,430 |
| Error | 1180 | 35,967,242 |
|
| ||
| G | 175 | 132,865,164 |
| E | 8 | 19,169,848 |
| PC1 | 182 | 50,094,094 |
| PC2 | 180 | 3,362,842 |
| Error | 993 | 6,163,736 |
Biclustering results with the smallest SSE values obtained from fitting no-interaction models on each bicluster for raw responses (30 trials—rice data FTgdd).
|
| |||
|---|---|---|---|
|
|
|
|
|
| (7, 3) | 4,605,916 | 4,692,904 | 4,606,188 |
| (7, 4) | 3,417,453 | 3,962,718 | 3,368,339 |
| (8, 3) | 4,604,048 | 4,653,856 | 4,503,244 |
| (8, 4) | 3,374,006 | 3,971,685 | 3,306,126 |
| (9, 3) | 4,577,429 | 4,385,709 | 4,460,673 |
| (9, 4) | 3,293,641 | 3,910,443 | 3,531,670 |
Figure 5Biclustering illustration of response (4) with initialization (left) and final bicluster (right) for eight rows and four column clusters (rice data). The homogeneity of the cells indicates that the biclustering algorithm was able to successfully group the response variables in distinct clusterings.
Summary of degrees of freedom and sum of squares for the rice dataset—FTdap.
|
|
|
|
|---|---|---|
|
| ||
| G | 175 | 134,189 |
| E | 8 | 485,135 |
| error | 1355 | 58,071 |
|
| ||
| G | 175 | 134,189 |
| E | 8 | 485,135 |
| GEI | 1355 | 58,071 |
| Error | 0 | 0 |
|
| ||
| G | 175 | 134,189 |
| E | 8 | 485,135 |
| GEI Ind | 175 | 41,697 |
| Error | 1180 | 16,374 |
|
| ||
| G | 175 | 134,405 |
| E | 8 | 484,918 |
| PC1 | 182 | 47,971 |
| PC2 | 180 | 4,272 |
| Error | 993 | 5,828 |
Biclustering results with the smallest SSE values obtained from fitting no-interaction models on each bicluster for raw responses (30 trials—rice data FTdap).
|
| |||
|---|---|---|---|
|
|
|
|
|
| (7, 3) | 4,339 | 4,303 | 4,419 |
| (7, 4) | 3,535 | 3,511 | 3,362 |
| (8, 3) | 4,283 | 4,325 | 4,391 |
| (8, 4) | 3,471 | 3,502 | 3,505 |
| (9, 3) | 4,229 | 4,277 | 4,302 |
| (9, 4) | 3,406 | 3,467 | 3,252 |
Figure 6Biclustering illustration of response (4) with initialization (left) and final bicluster (right) for eight rows and four column clusters (rice data). The homogeneity of the cells indicates that the biclustering algorithm was able to successfully group the response variables in distinct clusterings.