| Literature DB >> 27555865 |
Laval Jacquin1, Tuong-Vi Cao1, Nourollah Ahmadi1.
Abstract
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.Entities:
Keywords: epistasis; genomic prediction; kernel “trick”; non-parametric; parametric; semi-parametric
Year: 2016 PMID: 27555865 PMCID: PMC4977290 DOI: 10.3389/fgene.2016.00145
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
RPA means with their associated standard errors within parantheses (.), and the SNR means within square brackets [.], for the 60 examined situations.
| Data set 1 | PH | 0.34 (0.11) [0.11] | 0.40 (0.08) [0.14] | 0.40 (0.08) [0.16] | 0.37 (0.07) [0.21] |
| 230 accessions | FL | 0.59 (0.07) [0.42] | 0.65 (0.06) [0.93] | 0.67 (0.06) [0.73] | 0.66 (0.07) [0.75] |
| 22691 SNP | AR | ||||
| NT | |||||
| Data set 2 | SB | ||||
| 167 accessions | RL | 0.39 (0.09) [0.29] | 0.53 (0.09) [0.39] | 0.54 (0.08) [0.33] | 0.54 (0.09) [0.40] |
| 16444 SNP | NR | ||||
| DR | |||||
| RS | 0.55 (0.08) [0.38] | 0.54 (0.09) [0.70] | 0.57 (0.07) [0.45] | 0.57 (0.10) [0.30] | |
| PH | 0.66 (0.07) [0.85] | 0.69 (0.06) [1.15] | 0.70 (0.05) [0.90] | 0.69 (0.06) [0.81] | |
| Data set 3 | CD | 0.48 (0.11) [0.29] | 0.39 (0.09) [0.58] | 0.47 (0.09) [0.26] | 0.46 (0.09) [0.38] |
| 188 accessions | FE | ||||
| 38390 SNP | NS | ||||
| SY | |||||
| NP | 0.64 (0.08) [0.85] | 0.70 (0.06) [0.80] | 0.68 (0.06) [0.62] | 0.67 (0.06)[0.65] | |
Figure 1Boxplots of RPA distributions associated to PH, FL, and AR for data set 1.
Figure 5Boxplots of RPA distributions associated to NS, SY, and NP for data set 3.
Figure 3Boxplots of RPA distributions associated to NR, DR, and RS for data set 2.