| Literature DB >> 33193659 |
Jaime Cuevas1, Osval A Montesinos-López2, J W R Martini3, Paulino Pérez-Rodríguez4, Morten Lillemo5, Jose Crossa3,4.
Abstract
The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.Entities:
Keywords: approximate kernels; computing time; genomic-enabled prediction; genotype × environment interaction; large data sets
Year: 2020 PMID: 33193659 PMCID: PMC7594507 DOI: 10.3389/fgene.2020.567757
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Data set 1.
| Cycle 2017_2018 ( | 0.575 (0.016) | 0.575 (0.016) | 0.570 (0.015) | 0.557 (0.017) | 0.534 (0.017) | 0.464 (0.02) |
| Cycle 2016_2017 ( | 0.483 (0.011) | 0.483 (0.011) | 0.477 (0.015) | 0.465 (0.013) | 0.447 (0.011) | 0.386 (0.012) |
| Cycle 2015_2016 ( | 0.533 (0.013) | 0.533 (0.013) | 0.522 (0.014) | 0.508 (0.013) | 0.483 (0.014) | 0.402 (0.016) |
| Cycle 2014_2015 ( | 0.494 (0.017) | 0.493 (0.012) | 0.485 (0.020) | 0.470 (0.016) | 0.441 (0.018) | 0.318 (0.021) |
| Cycle 2013_2014 ( | 0.572 (0.015) | 0.572 (0.015) | 0.567 (0.016) | 0.549 (0.015) | 0.515 (0.016) | 0.366 (0.004) |
| Cycle 2017_2018 ( | 0.282 (0.009) | 0.282 (0.009) | 0.284 (0.008) | 0.290 (0.009) | 0.300 (0.008) | 0.336 (0.01) |
| Cycle 2016_2017 ( | 0.369 (0.009) | 0.369 (0.010) | 0.364 (0.010) | 0.377 (0.010) | 0.385 (0.010) | 0.410 (0.011) |
| Cycle 2015_2016 ( | 0.304 (0.010) | 0.304 (0.010) | 0.309 (0.013) | 0.315 (0.010) | 0.326 (0.010) | 0.356 (0.012) |
| Cycle 2014_2015 ( | 0.309 (0.012) | 0.309 (0.013) | 0.311 (0.011) | 0.319 (0.013) | 0.329 (0.013) | 0.368 (0.016) |
| Cycle 2013_2014 ( | 0.413 (0.011) | 0.413 (0.013) | 0.413 (0.014) | 0.429 (0.012) | 0.451 (0.012) | 0.508 (0.011) |
| Cycle 2017_2018 ( | 0.247 (0.003) | 0.250 (0.003) | 0.262 (0.002) | 0.275 (0.002) | 0.293 (0.003) | 0.330 (0.004) |
| Cycle 2016_2017 ( | 0.317 (0.003) | 0.323 (0.003) | 0.337 (0.003) | 0.350 (0.003) | 0.365 (0.003) | 0.400 (0.003) |
| Cycle 2015_2016 ( | 0.255 (0.003) | 0.257 (0.003) | 0.279 (0.003) | 0.297 (0.003) | 0.315 (0.004) | 0.357 (0.005) |
| Cycle 2014_2015 ( | 0.259 (0.003) | 0.266 (0.003) | 0.280 (0.003) | 0.298 (0.003) | 0.315 (0.004) | 0.366 (0.004) |
| Cycle 2013_2014 ( | 0.313 (0.004) | 0.324 (0.005) | 0.358 (0.005) | 0.391 (0.006) | 0.424 (0.006) | 0.501 (0.006) |
| Cycle 2017_2018 ( | 3931 | 1710 | 707 | 345 | 174 | 47 |
| Cycle 2016_2017 ( | 4350 | 1765 | 768 | 356 | 176 | 48 |
| Cycle 2015_2016 ( | 4200 | 1750 | 759 | 375 | 184 | 49 |
| Cycle 2014_2015 ( | 3850 | 1310 | 695 | 330 | 165 | 51 |
| Cycle 2013_2014 ( | 2800 | 1135 | 533 | 247 | 134 | 44 |
FIGURE 2Data set 1. Models FGSE (yellow, m = all lines) and APSE (blue, black, green, purple and orange that correspond to m = 4000, m = 2000, m = 1000, m = 500, m = 100), (A) average correlation between observed and predictive values of FGSE and APSE models at different sizes of m; bars indicated 2 standard deviations) (B) average prediction mean squared error (PMSE) values of FGSE and APSE models at different sizes of m, (C) error variance of FGSE and APSE models () at different sizes of m, and (D) time in seconds to fit FGSE and APSE models at different sizes of m.
Data set 2.
| E1 | 0.506 (0.046) | 0.501 (0.047) | 0.468 (0.063) | 0.425 (0.073) | 0.362 (0.060) | 0.266 (0.088) |
| E2 | 0.471 (0.068) | 0.461 (0.062) | 0.439 (0.066) | 0.407 (0.071) | 0.374 (0.060) | 0.283 (0.072) |
| E3 | 0.384 (0.046) | 0.384 (0.047) | 0.381 (0.059) | 0.359 (0.053) | 0.318 (0.064) | 0.267 (0.068) |
| E4 | 0.448 (0.051) | 0.439 (0.05) | 0.420 (0.048) | 0.398 (0.053) | 0.359 (0.050) | 0.302 (0.053) |
| E1 | 0.771 (0.074) | 0.776 (0.047) | 0.806 (0.075) | 0.848 (0.085) | 0.899 (0.086) | 0.957 (0.088) |
| E2 | 0.751 (0.08) | 0.761 (0.078) | 0.782 (0.081) | 0.809 (0.092) | 0.834 (0.077) | 0.891 (0.090) |
| E3 | 0.821 (0.085) | 0.817 (0.082) | 0.822 (0.098) | 0.837 (0.087) | 0.863 (0.087) | 0.892 (0.090) |
| E4 | 0.802 (0.098) | 0.811 (0.090) | 0.827 (0.096) | 0.844 (0.097) | 0.873 (0.096) | 0.912 (0.090) |
| E1 | 0.523 (0.041) | 0.572 (0.038) | 0.656 (0.035) | 0.733 (0.037) | 0.819 (0.037) | 0.890 (0.040) |
| E2 | 0.587 (0.039) | 0.635 (0.041) | 0.707 (0.036) | 0.768 (0.037) | 0.840 (0.041) | 0.902 (0.046) |
| E3 | 0.602 (0.039) | 0.691 (0.043) | 0.768 (0.048) | 0.823 (0.041) | 0.877 (0.045) | 0.930 (0.048) |
| E4 | 0.598 (0.046) | 0.652 (0.044) | 0.720 (0.040) | 0.775 (0.041) | 0.833 (0.038) | 0.890 (0.044) |
| TE1 | 17 | 13.7 | 11 | 10.9 | 9.25 | 8.6 |
| E2 | 17 | 13.7 | 11 | 10.9 | 9.25 | 8.6 |
| E3 | 17 | 13.7 | 11 | 10.9 | 9.25 | 8.6 |
| E4 | 17 | 13.7 | 11 | 10.9 | 9.25 | 8.6 |
FIGURE 3Data set 2. Models FGSE (yellow, m = all lines) and APSE (blue, black, green, purple and orange that correspond to m = 4000, m = 2000, m = 1000, m = 500, m = 100), (A) average correlation between observed and predictive values of FGSE and APSE models at different sizes of m; bars indicated 2 standard deviations) (B) average prediction mean squared error (PMSE) values of FGSE and APSE models at different sizes of m, (C) error variance of FGSE and APSE models () at different sizes of m, and (D) time in seconds to fit FGSE and APSE models at different sizes of m.
The models FGGE and APGE considering the size of m, as 25% of the original training set.
| Cycle | Training | CORR | PMSE | TIME (h) | |
| Cycle 2014_2015 | Cycle 2013_2014 | 0.222 | 2.45 | 0.317 | 4.96 |
| Cycle 2015_2016 | Cycle 2013_2014 | 0.328 | 0.525 | 0.287 | 11.10 |
| Cycle 2014_2015 | |||||
| Cycle 2016_2017 | Cycle 2013_2014 | 0.328 | 0.480 | 0.275 | 23.72 |
| Cycle 2014_2015 | |||||
| Cycle 2015_2016 | |||||
| Cycle 2017_2018 | Cycle 2013_2014 | 0.426 | NA | NA | NA |
| Cycle 2014_2015 | |||||
| Cycle 2015_2016 | |||||
| Cycle 2016_2017 | |||||
| Cycle 2014_2015 | Cycle 2013_2014 | 0.206 | 1.08 | 0.363 | 0.68 |
| Cycle 2015_2016 | Cycle 2013_2014 | 0.347 | 0.408 | 0.309 | 2.80 |
| Cycle 2014_2015 | |||||
| Cycle 2016_2017 | Cycle 2013_2014 | 0.321 | 0.517 | 0.29 | 5.08 |
| Cycle 2014_2015 | |||||
| Cycle 2015_2016 | |||||
| Cycle 2017_2018 | Cycle 2013_2014 | 0.427 | 0.618 | 0.301 | 8.38 |
| Cycle 2014_2015 | |||||
| Cycle 2015_2016 | |||||
| Cycle 2016_2017 | |||||
| E1 | E2 | -0.166 | 1.520 | 0.532 | 175 |
| E3 | |||||
| E4 | |||||
| E2 | E1 | 0.511 | 0.912 | 0.600 | 178 |
| E3 | |||||
| E4 | |||||
| E3 | E1 | 0.469 | 0.879 | 0.577 | 180 |
| E2 | |||||
| E4 | |||||
| E4 | E1 | 0.311 | .940 | 0.570 | 187 |
| E2 | |||||
| E3 | |||||
| E1 | E2 | -0.188 | 1.54 | 0.607 | 70 |
| E3 | |||||
| E4 | |||||
| E2 | E1 | 0.491 | 0.942 | 0.71 | 72 |
| E3 | |||||
| E4 | |||||
| E3 | E1 | 0.445 | 0.887 | 0.70 | 73 |
| E2 | |||||
| E4 | |||||
| E4 | E1 | 0.281 | 0.960 | 0.651 | 82 |
| E2 | |||||
| E3 | |||||
Estimated variance components for model APGE for data set 1 and data set 2.
| Testing | Training | |||
| Cycle 2014–2015 | Cycle 2013–2014 | 0.3624 | 0.4680 | 0.3300 |
| Cycle 2015–2016 | Cycle 2013–2014 | 0.3087 | 0.2638 | 0.3337 |
| Cycle 2014–2015 | ||||
| Cycle 2016–2017 | Cycle 2013–2014 | 0.2916 | 0.22705 | 0.2956 |
| Cycle 2014–2015 | ||||
| Cycle 2015–2016 | ||||
| Cycle 2017–2018 | Cycle 2013–2014 | 0.3019 | 0.1886 | 0.2962 |
| Cycle 2014–2015 | ||||
| Cycle 2015–2016 | ||||
| Cycle 2016–2017 | ||||
| E1 | E1 | 0.6070 | 0.3953 | 0.5576 |
| E3 | ||||
| E4 | ||||
| E2 | E1 | 0.7102 | 0.3183 | 0.1120 |
| E3 | ||||
| E4 | ||||
| E3 | E1 | 0.7001 | 0.3053 | 0.1356 |
| E2 | ||||
| E4 | ||||
| E4 | E1 | 0.6510 | 0.2981 | 0.1985 |
| E2 | ||||
| E3 | ||||
FIGURE 1(A) Average correlation for 80% training and 20% testing for 20 random samples for data set1, versus the proportion of size m with respect to the total number of observations (lines) n; (B) time in seconds for each sample versus the proportion of size of m over the total number of lines (n).