| Literature DB >> 31289023 |
Jaime Cuevas1, Osval Montesinos-López2, Philomin Juliana3, Carlos Guzmán3, Paulino Pérez-Rodríguez4, José González-Bucio1, Juan Burgueño3, Abelardo Montesinos-López5, José Crossa6.
Abstract
Kernel methods are flexible and easy to interpret and have been successfully used in genomic-enabled prediction of various plant species. Kernel methods used in genomic prediction comprise the linear genomic best linear unbiased predictor (GBLUP or GB) kernel, and the Gaussian kernel (GK). In general, these kernels have been used with two statistical models: single-environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has been used as an inexpensive and non-destructive high-throughput phenotyping method for predicting unobserved line performance in plant breeding trials. In this study, we used a non-linear arc-cosine kernel (AK) that emulates deep learning artificial neural networks. We compared AK prediction accuracy with the prediction accuracy of GB and GK kernel methods in four genomic data sets, one of which also includes pedigree and NIR information. Results show that for all four data sets, AK and GK kernels achieved higher prediction accuracy than the linear GB kernel for the single-environment and GE multi-environment models. In addition, AK achieved similar or slightly higher prediction accuracy than the GK kernel. For all data sets, the GE model achieved higher prediction accuracy than the single-environment model. For the data set that includes pedigree, markers and NIR, results show that the NIR wavelength alone achieved lower prediction accuracy than the genomic information alone; however, the pedigree plus NIR information achieved only slightly lower prediction accuracy than the marker plus the NIR high-throughput data.Entities:
Keywords: GenPred; Genomic Best Unbiased Predictor (GBLUP, GB linear and non-linear kernel methods); Genomic Prediction; Genomic based prediction; Shared Data Resources; deep learning; genomic × environment interaction model; near infrared (NIR) high-throughput phenotype; single-environment model
Mesh:
Year: 2019 PMID: 31289023 PMCID: PMC6723142 DOI: 10.1534/g3.119.400493
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
WHEAT1 data set. Average Pearson’s correlations between observed and predictive values (and their standard deviation in parentheses) for seven methods for a single-environment model for 50 random partitions with 70% of the lines in the training set and 30% of the lines in the testing set. Methods GB, GK, and AK are the GBLUP, Gaussian Kernel, and Arc-Cosine Kernel, respectively. Methods - correspond to the Arc-Cosine kernel model with 1-4 levels (layers). The best predictive model for each environment (E1-E4) is in boldface
| Environment | GB | AK | Level | GK | ||||
|---|---|---|---|---|---|---|---|---|
| Single-environment model | ||||||||
| E1 | 0.490 (0.04) | 0.520 (0.04) | 0.536 (0.04) | 0.544 (0.04) | 0.551 (0.04) | 11 | ||
| E2 | 0.469 (0.05) | 0.474 (0.05) | 0.476 (0.05) | 0.477 (0.05) | 3 | 0.477 (0.05) | ||
| E3 | 0.378 (0.06) | 0.390 (0.05) | 0.400 (0.05) | 0.401 (0.05) | 0.409 (0.05) | 13 | 0.416 (0.05) | |
| E4 | 0.450 (0.05) | 0.470 (0.05) | 0.482 (0.05) | 0.491 (0.04) | 0.491 (0.04) | 11 | 0.506 (0.05) | |
+ Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GB.
Figure 1WHEAT1 data set. Marginal likelihood for several levels (layers) for environments.
WHEAT1 data set. Average Pearson’s correlations between observed and predictive values (and their standard deviation in parentheses) for three methods for a GE multi-environment model for 50 random partitions with 70% of the lines in the training set and 30% of the lines in the testing set. Methods GB, GK, and AK are the GBLUP, Gaussian Kernel and Arc-Cosine Kernel, respectively, with six levels (layers). The best predictive model for each environment (E1-E4) is in boldface
| Environment | GB | AK | Level | GK |
|---|---|---|---|---|
| GE multi-environment model | ||||
| E1 | 0.422 (0.05) | 6 | 0.482 (0.06) | |
| E2 | 0.537 (0.04) | 6 | 0.581 (0.04) | |
| E3 | 0.441 (0.05) | 6 | 0.494 (0.05) | |
| E4 | 0.485 (0.05) | 6 | 0.551 (0.05) | |
+ Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GB.
MAIZE2 data set. Average Pearson’s correlations between observed and predictive values (and their standard deviation in parentheses) for 3 methods for single-environment and GE multi-environment models for 50 random partitions with 70% of the lines in the training set and 30% of the lines in the testing set. Methods GB, GK, and AK are the GBLUP, Gaussian Kernel and Arc-Cosine Kernel, respectively. The best predictive model for each environment (E1-E5) is in boldface
| Single-environment model | |||
|---|---|---|---|
| Environment | GB | AK | GK |
| E1 | 0.647 (0.07) | 0.733 (0.05) | |
| E2 | 0.384 (0.07) | 0.488 (0.07) | |
| E3 | 0.678 (0.03) | 0.718 (0.04) | |
| E4 | 0.368 (0.03) | 0.469 (0.05 | |
| E5 | 0.393 (0.07) | 0.522 (0.06) | |
Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GK.
+Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GB.
MAIZE3 data set. Average Pearson’s correlations between observed and predictive values (and their standard deviation in parentheses) for 3 methods for single-environment and GE multi-environment models for 50 random partitions with 70% of the lines in the training set and 30% of the lines in the testing set. Methods GB, GK, and AK are the GBLUP, Gaussian Kernel and Arc-Cosine Kernel, respectively. The best predictive model for each environment is in boldface
| Single-environment model | |||
|---|---|---|---|
| Enviroment | GB | AK | GK |
| E1 | 0.246 (0.05) | 0.271 | |
| E2 | 0.319 (0.04) | 0.320 (0.04) | |
| E3 | 0.294 (0.06) | 0.294 (0.06) | |
| E4 | 0.42 (0.05) | 0.421 (0.05) | |
Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GK.
+Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GB.
WHEAT4 data set. Average Pearson’s correlations between observed and predictive values (and their standard deviation in parentheses) for 3 methods for a single-environment model for 50 random partitions with 80% of the lines in the training set and 20% of the lines in the testing set. Methods GB, GK, and AK are the GBLUP, Gaussian Kernel and Arc-Cosine Kernel, respectively. The best predictive model for each environment is in boldface. The three GB, AK, and GK were applied to data from NIR1 (first derivative), NIR2 (second derivative), markers, pedigree and some combinations. The best predictive model for each type of data used is in boldface
| Data used | GB | AK | GK |
|---|---|---|---|
| NIR1 | 0.349 (0.07) | 0.347 (0.07) | |
| NIR2 | 0.346 (0.07) | 0.354 (0.07) | |
| GENOMIC | 0.424 (0.07) | 0.454 (0.07) | |
| GENOMIC | 0.436 (0.07) | 0.456 (0.07) | |
| GENOMIC | 0.435 (0.07) | 0.455 (0.07) | |
| Pedigree | 0.396 (0.07) | — | — |
| GENOMIC | 0.437 (0.07) | 0.450 (0.07) | |
| Pedigree | 0.420 (0.07) | 0.413 (0.07) | |
| Pedigree | 0.418 (0.07) | ||
| GENOMIC | 0.448 (0.07) | 0.455 (0.07) | |
| GENEMIC | 0.448 (0.07) | 0.459 (0.07) |
+ Significant at the 0.05 probability level of the t-test for the hypothesis that the average of the correlation of kernel AK is superior to the mean of the correlation of kernel GB.
Figure 2WHEAT4 data set. All spectra for wheat lines are depicted in the three figures; light colors are for each wheat line, while the strong color line in each graph is the average of all spectra. (a) absorbance raw data (red); (b) normalized absorbance (first derivative) (blue); (c) normalized second derivative (green).