| Literature DB >> 23226325 |
Xiaochun Sun1, Ping Ma, Rita H Mumm.
Abstract
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23226325 PMCID: PMC3511520 DOI: 10.1371/journal.pone.0050604
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
For scenarios with no epistasis, Pearson correlation coefficients between estimated breeding value and true breeding value (rEBV:TBV) or phenotype (rEBV:PHE) obtained through ten-fold cross-validation with Cycle 0 (C0) and prediction of Cycle 1(C1), implemented for simulated traits with heritability of 0.1, 0.2, 0.4, 0.8, via the various statistical methods.
| Heritability | C0/C1 | Methods | rEBV:TBV ± SE | rEBV:PHE ± SE |
| h2 = 0.1 | C0 | RR-BLUP | 0.474±0.015 | 0.174±0.019 |
| C0 | BayesA | 0.451±0.016 | 0.170±0.023 | |
| C0 | BayesB | 0.475±0.015 | 0.180±0.020 | |
| C0 | RKHS-M | 0.350±0.021 | 0.103±0.018 | |
| C0 | pRKHS-E | 0.422±0.019 | 0.189±0.005 | |
| C0 | pRKHS-NE | 0.480±0.016 | 0.192±0.013 | |
| C1 | RR-BLUP | 0.329±0.017 | 0.127±0.018 | |
| C1 | BayesA | 0.307±0.020 | 0.124±0.017 | |
| C1 | BayesB | 0.338±0.017 | 0.134±0.018 | |
| C1 | RKHS-M | 0.252±0.014 | 0.066±0.016 | |
| C1 | pRKHS-E | 0.342±0.023 | 0.121±0.026 | |
| C1 | pRKHS-NE | 0.382±0.016 | 0.155±0.019 | |
| h2 = 0.2 | C0 | RR-BLUP | 0.572±0.019 | 0.235±0.018 |
| C0 | BayesA | 0.568±0.015 | 0.230±0.014 | |
| C0 | BayesB | 0.582±0.018 | 0.244±0.010 | |
| C0 | RKHS-M | 0.442±0.012 | 0.179±0.018 | |
| C0 | pRKHS-E | 0.494±0.011 | 0.248±0.013 | |
| C0 | pRKHS-NE | 0.599±0.018 | 0.254±0.010 | |
| C1 | RR-BLUP | 0.470±0.019 | 0.289±0.015 | |
| C1 | BayesA | 0.431±0.010 | 0.265±0.011 | |
| C1 | BayesB | 0.479±0.020 | 0.298±0.013 | |
| C1 | RKHS-M | 0.363±0.018 | 0.235±0.019 | |
| C1 | pRKHS-E | 0.341±0.018 | 0.180±0.011 | |
| C1 | pRKHS-NE | 0.450±0.019 | 0.257±0.015 | |
| h2 = 0.4 | C0 | RR-BLUP | 0.785±0.014 | 0.421±0.018 |
| C0 | BayesA | 0.697±0.017 | 0.354±0.016 | |
| C0 | BayesB | 0.799±0.016 | 0.427±0.015 | |
| C0 | RKHS-M | 0.614±0.017 | 0.352±0.012 | |
| C0 | pRKHS-E | 0.756±0.017 | 0.395±0.011 | |
| C0 | pRKHS-NE | 0.789±0.020 | 0.388±0.014 | |
| C1 | RR-BLUP | 0.614±0.013 | 0.425±0.011 | |
| C1 | BayesA | 0.529±0.013 | 0.361±0.015 | |
| C1 | BayesB | 0.622±0.013 | 0.433±0.020 | |
| C1 | RKHS-M | 0.535±0.022 | 0.384±0.023 | |
| C1 | pRKHS-E | 0.513±0.016 | 0.381±0.016 | |
| C1 | pRKHS-NE | 0.574±0.018 | 0.402±0.017 | |
| h2 = 0.8 | C0 | RR-BLUP | 0.827±0.009 | 0.729±0.006 |
| C0 | BayesA | 0.763±0.012 | 0.673±0.004 | |
| C0 | BayesB | 0.831±0.009 | 0.735±0.008 | |
| C0 | RKHS-M | 0.768±0.011 | 0.698±0.009 | |
| C0 | pRKHS-E | 0.678±0.016 | 0.686±0.012 | |
| C0 | pRKHS-NE | 0.815±0.014 | 0.675±0.012 | |
| C1 | RR-BLUP | 0.744±0.012 | 0.674±0.014 | |
| C1 | BayesA | 0.664±0.021 | 0.601±0.022 | |
| C1 | BayesB | 0.752±0.011 | 0.682±0.013 | |
| C1 | RKHS-M | 0.675±0.010 | 0.620±0.010 | |
| C1 | pRKHS-E | 0.633±0.008 | 0.571±0.011 | |
| C1 | pRKHS-NE | 0.734±0.010 | 0.613±0.009 |
Average correlations ± SE were obtained from thirty replications of each simulation.
For scenarios with a low level of epistasis (10% of the epistasis interaction effects are nonzero), Pearson correlation coefficients between estimated breeding value and true breeding value (rEBV:TBV) or phenotype (rEBV:PHE) obtained through ten-fold cross-validation with Cycle 0 (C0) and prediction of Cycle 1 (C1), implemented for simulated traits with heritability of 0.1, 0.2, 0.4, 0.8, via the various statistical methods.
| Heritability | C0/C1 | Methods | rEBV:TBV ± SE | rEBV:PHE ± SE |
| h2 = 0.1 | C0 | RR-BLUP | 0.418±0.015 | 0.144±0.009 |
| C0 | BayesA | 0.402±0.015 | 0.134±0.008 | |
| C0 | BayesB | 0.421±0.014 | 0.143±0.009 | |
| C0 | RKHS-M | 0.257±0.012 | 0.089±0.008 | |
| C0 | pRKHS-E | 0.433±0.012 | 0.169±0.018 | |
| C0 | pRKHS-NE | 0.419±0.015 | 0.142±0.015 | |
| C1 | RR-BLUP | 0.369±0.017 | 0.164±0.010 | |
| C1 | BayesA | 0.340±0.019 | 0.153±0.011 | |
| C1 | BayesB | 0.367±0.018 | 0.163±0.010 | |
| C1 | RKHS-M | 0.258±0.018 | 0.100±0.008 | |
| C1 | pRKHS-E | 0.394±0.021 | 0.168±0.005 | |
| C1 | pRKHS-NE | 0.358±0.017 | 0.159±0.006 | |
| h2 = 0.2 | C0 | RR-BLUP | 0.535±0.011 | 0.228±0.019 |
| C0 | BayesA | 0.518±0.008 | 0.234±0.016 | |
| C0 | BayesB | 0.536±0.011 | 0.235±0.018 | |
| C0 | RKHS-M | 0.435±0.014 | 0.186±0.016 | |
| C0 | pRKHS-E | 0.542±0.010 | 0.237±0.015 | |
| C0 | pRKHS-NE | 0.540±0.010 | 0.245±0.019 | |
| C1 | RR-BLUP | 0.512±0.015 | 0.313±0.016 | |
| C1 | BayesA | 0.479±0.014 | 0.267±0.014 | |
| C1 | BayesB | 0.514±0.015 | 0.315±0.016 | |
| C1 | RKHS-M | 0.413±0.010 | 0.234±0.015 | |
| C1 | pRKHS-E | 0.484±0.014 | 0.336±0.006 | |
| C1 | pRKHS-NE | 0.481±0.014 | 0.326±0.011 | |
| h2 = 0.4 | C0 | RR-BLUP | 0.688±0.007 | 0.444±0.008 |
| C0 | BayesA | 0.632±0.009 | 0.421±0.003 | |
| C0 | BayesB | 0.687±0.006 | 0.438±0.008 | |
| C0 | RKHS-M | 0.569±0.011 | 0.358±0.018 | |
| C0 | pRKHS-E | 0.696±0.009 | 0.448±0.008 | |
| C0 | pRKHS-NE | 0.681±0.011 | 0.434±0.008 | |
| C1 | RR-BLUP | 0.606±0.017 | 0.377±0.015 | |
| C1 | BayesA | 0.535±0.008 | 0.327±0.010 | |
| C1 | BayesB | 0.600±0.020 | 0.372±0.017 | |
| C1 | RKHS-M | 0.503±0.013 | 0.320±0.011 | |
| C1 | pRKHS-E | 0.605±0.021 | 0.372±0.020 | |
| C1 | pRKHS-NE | 0.615±0.016 | 0.384±0.015 | |
| h2 = 0.8 | C0 | RR-BLUP | 0.802±0.001 | 0.692±0.002 |
| C0 | BayesA | 0.734±0.003 | 0.633±0.006 | |
| C0 | BayesB | 0.816±0.004 | 0.699±0.006 | |
| C0 | RKHS-M | 0.776±0.004 | 0.698±0.007 | |
| C0 | pRKHS-E | 0.809±0.012 | 0.694±0.012 | |
| C0 | pRKHS-NE | 0.821±0.007 | 0.701±0.010 | |
| C1 | RR-BLUP | 0.770±0.013 | 0.690±0.012 | |
| C1 | BayesA | 0.710±0.012 | 0.634±0.011 | |
| C1 | BayesB | 0.787±0.013 | 0.705±0.011 | |
| C1 | RKHS-M | 0.751±0.014 | 0.689±0.014 | |
| C1 | pRKHS-E | 0.775±0.013 | 0.693±0.010 | |
| C1 | pRKHS-NE | 0.797±0.014 | 0.712±0.012 |
Average correlations ± SE were obtained from thirty replications of each simulation.
For scenarios with a moderate level of epistasis (50% of the epistasis interaction effects are nonzero), Pearson correlation coefficients between estimated breeding value and true breeding value (rEBV:TBV) or phenotype (rEBV:PHE) obtained through ten-fold cross-validation with Cycle 0 (C0) and prediction of Cycle 1 (C1), implemented for simulated traits with heritability of 0.1, 0.2, 0.4, 0.8, via the various statistical methods.
| Heritability | C0/C1 | Methods | rEBV:TBV ± SE | rEBV:PHE ± SE |
| h2 = 0.1 | C0 | RR-BLUP | 0.372±0.021 | 0.175±0.022 |
| C0 | BayesA | 0.363±0.020 | 0.158±0.023 | |
| C0 | BayesB | 0.336±0.016 | 0.141±0.015 | |
| C0 | RKHS-M | 0.173±0.018 | 0.119±0.013 | |
| C0 | pRKHS-E | 0.382±0.020 | 0.203±0.020 | |
| C0 | pRKHS-NE | 0.363±0.018 | 0.171±0.021 | |
| C1 | RR-BLUP | 0.309±0.013 | 0.182±0.011 | |
| C1 | BayesA | 0.327±0.019 | 0.192±0.009 | |
| C1 | BayesB | 0.298±0.019 | 0.188±0.010 | |
| C1 | RKHS-M | 0.157±0.015 | 0.139±0.008 | |
| C1 | pRKHS-E | 0.328±0.013 | 0.194±0.012 | |
| C1 | pRKHS-NE | 0.298±0.010 | 0.176±0.011 | |
| h2 = 0.2 | C0 | RR-BLUP | 0.487±0.022 | 0.172±0.020 |
| C0 | BayesA | 0.444±0.022 | 0.175±0.017 | |
| C0 | BayesB | 0.507±0.024 | 0.184±0.025 | |
| C0 | RKHS-M | 0.331±0.026 | 0.192±0.024 | |
| C0 | pRKHS-E | 0.512±0.030 | 0.254±0.023 | |
| C0 | pRKHS-NE | 0.492±0.024 | 0.230±0.021 | |
| C1 | RR-BLUP | 0.416±0.020 | 0.282±0.011 | |
| C1 | BayesA | 0.408±0.017 | 0.256±0.010 | |
| C1 | BayesB | 0.416±0.008 | 0.299±0.011 | |
| C1 | RKHS-M | 0.295±0.011 | 0.214±0.005 | |
| C1 | pRKHS-E | 0.441±0.018 | 0.303±0.010 | |
| C1 | pRKHS-NE | 0.435±0.014 | 0.286±0.008 | |
| h2 = 0.4 | C0 | RR-BLUP | 0.526±0.016 | 0.263±0.015 |
| C0 | BayesA | 0.520±0.015 | 0.261±0.021 | |
| C0 | BayesB | 0.557±0.017 | 0.300±0.019 | |
| C0 | RKHS-M | 0.427±0.017 | 0.306±0.021 | |
| C0 | pRKHS-E | 0.603±0.016 | 0.347±0.031 | |
| C0 | pRKHS-NE | 0.551±0.018 | 0.333±0.023 | |
| C1 | RR-BLUP | 0.504±0.022 | 0.311±0.018 | |
| C1 | BayesA | 0.462±0.017 | 0.285±0.014 | |
| C1 | BayesB | 0.511±0.021 | 0.315±0.017 | |
| C1 | RKHS-M | 0.347±0.021 | 0.267±0.014 | |
| C1 | pRKHS-E | 0.525±0.016 | 0.390±0.015 | |
| C1 | pRKHS-NE | 0.463±0.014 | 0.344±0.014 | |
| h2 = 0.8 | C0 | RR-BLUP | 0.680±0.009 | 0.407±0.007 |
| C0 | BayesA | 0.599±0.008 | 0.324±0.009 | |
| C0 | BayesB | 0.697±0.011 | 0.420±0.008 | |
| C0 | RKHS-M | 0.584±0.011 | 0.561±0.012 | |
| C0 | pRKHS-E | 0.706±0.008 | 0.535±0.001 | |
| C0 | pRKHS-NE | 0.660±0.009 | 0.480±0.010 | |
| C1 | RR-BLUP | 0.612±0.013 | 0.298±0.029 | |
| C1 | BayesA | 0.596±0.014 | 0.283±0.031 | |
| C1 | BayesB | 0.637±0.023 | 0.320±0.028 | |
| C1 | RKHS-M | 0.475±0.022 | 0.308±0.053 | |
| C1 | pRKHS-E | 0.638±0.017 | 0.418±0.046 | |
| C1 | pRKHS-NE | 0.618±0.020 | 0.281±0.036 |
Average correlations ± SE were obtained from thirty replications of simulation.
For each scenario with pRKHS, the percent of the total variation explained by top three SPCs (%P1, %P2 and %P3), the number of influential markers (MP1, MP2 and MP3) included in the respective SPCs, and number of SPC interactions at three given cosine thresholds.
| Scenarios | %P1 | %P2 | %P3 | MP1 | MP2 | MP3 | # of SPC interactions | ||
| >0.2 | >0.25 | >0.3 | |||||||
|
| 10.4–15.1 | 5.8–11.1 | 5.3–9.0 | 83–127 | 61–104 | 43–86 | 5–12 | 1–5 | 0–3 |
|
| 12.2–17.7 | 5.5–10.8 | 5.2–8.1 | 124–136 | 59–71 | 56–85 | 3–11 | 1–6 | 0–4 |
|
| 9.3–14.9 | 6.8–11.7 | 5.9–9.9 | 67–111 | 59–90 | 56–96 | 4–16 | 1–6 | 0–3 |
|
| 10.4–15.3 | 5.8–11.0 | 5.3–9.1 | 76–124 | 53–89 | 48–87 | 5–20 | 1–7 | 0–1 |
|
| 11.1–17.7 | 6.1–9.7 | 5.4–8.4 | 105–130 | 55–98 | 50–92 | 4–16 | 1–5 | 0–3 |
|
| 11.9–16.5 | 5.6–11.8 | 5.1–8.2 | 110–125 | 66–85 | 43–88 | 4–12 | 2–7 | 1–4 |
|
| 9.2–13.7 | 6.0–10.4 | 5.8–9.4 | 61–122 | 62–111 | 53–102 | 5–18 | 1–6 | 1–5 |
|
| 11.2–13.0 | 5.6–10.6 | 5.1–9.5 | 69–118 | 54–77 | 44–94 | 6–20 | 2–8 | 1–5 |
|
| 10.5–14.3 | 5.7–9.8 | 5.1–7.8 | 75–120 | 57–86 | 48–103 | 5–18 | 2–7 | 1–2 |
|
| 12.0–18.7 | 6.4–10.2 | 5.7–7.0 | 131–137 | 54–95 | 37–71 | 3–17 | 2–7 | 1–4 |
|
| 12.1–18.5 | 5.5–11.7 | 5.0–7.2 | 122–129 | 83–99 | 41–74 | 5–18 | 3–7 | 1–5 |
|
| 11.2–18.3 | 5.8–10.5 | 5.1–8.6 | 76–126 | 52–107 | 45–96 | 6–21 | 2–9 | 2–5 |
Values reflect the lows and highs obtained using various marker subsets (from 500 markers to all markers). Note that larger cosine values are equivalent to smaller p-values.
Figure 1Mean percentage of variation (across the 12 simulation scenarios) explained by the top 18 SPCs with pRHKS, which together explain 70% of the total variation.
Applying pRKHS to real life scenarios, Pearson correlation coefficients between estimated breeding value (EBV) and phenotype obtained from five-fold cross-validation (CV) implemented for maize anthesis-silking interval (ASI) and grain yield (GY) for each of the 6 statistical methods.
| Trait | CV | Methods | MarkerNumber | %PC | Correlation |
| ASI | CV | RR-BLUP | 0.495 | ||
| CV | BayesA | 0.388 | |||
| CV | BayesB | 0.495 | |||
| CV | RKHS-M | 0.554 | |||
| CV | pRKHS-E | 700 | 70% | 0.520 | |
| CV | pRKHS-NE | 600 | 65% | 0.526 | |
| GY | CV | RR-BLUP | 0.423 | ||
| CV | BayesA | 0.392 | |||
| CV | BayesB | 0.421 | |||
| CV | RKHS-M | 0.447 | |||
| CV | pRKHS-E | 1000 | 75% | 0.422 | |
| CV | pRKHS-NE | 900 | 65% | 0.425 |
The optimal number of markers contributing to phenotypic variation and percent of variations explained by the included SPCs were shown for pRKHS methods; results were averaged across five repeated fittings. Optimal cosine value was 0.3 for pRKHS-E across all datasets.
Applying pRKHS to real life scenarios, Pearson correlation coefficients between estimated breeding value (EBV) and phenotype obtained from ten-fold CV using genotypes and phenotypes of barley lines in year 2007 and prediction based on genotypes of different lines in year 2008 and 2009 implemented for grain yield (GYD) and plant height (PHT) for each of the 6 statistical methods.
| Traits | Year | Methods | MarkerNumber | %PC | Correlation |
| GYD | 2007 | RR-BLUP | 0.449 | ||
| 2007 | BayesA | 0.448 | |||
| 2007 | BayesB | 0.510 | |||
| 2007 | RKHS-M | 0.260 | |||
| 2007 | pRKHS-E | 1500 | 70% | 0.438 | |
| 2007 | pRKHS-NE | 800 | 75% | 0.538 | |
| 2008 | RR-BLUP | 0.104 | |||
| 2008 | BayesA | 0.073 | |||
| 2008 | BayesB | 0.108 | |||
| 2008 | RKHS-M | -0.009 | |||
| 2008 | pRKHS-E | 1500 | 70% | 0.295 | |
| 2008 | pRKHS-NE | 800 | 75% | 0.188 | |
| 2009 | RR-BLUP | 0.052 | |||
| 2009 | BayesA | 0.085 | |||
| 2009 | BayesB | 0.047 | |||
| 2009 | RKHS-M | 0.130 | |||
| 2009 | pRKHS-E | 1500 | 70% | -0.081 | |
| 2009 | pRKHS-NE | 800 | 75% | 0.148 | |
| PHT | 2007 | RR-BLUP | 0.447 | ||
| 2007 | BayesA | 0.446 | |||
| 2007 | BayesB | 0.460 | |||
| 2007 | RKHS-M | 0.514 | |||
| 2007 | pRKHS-E | 1000 | 75% | 0.465 | |
| 2007 | pRKHS-NE | 1000 | 75% | 0.520 | |
| 2008 | RR-BLUP | -0.015 | |||
| 2008 | BayesA | -0.006 | |||
| 2008 | BayesB | -0.049 | |||
| 2008 | RKHS-M | -0.083 | |||
| 2008 | pRKHS-E | 1000 | 75% | 0.084 | |
| 2008 | pRKHS-NE | 1000 | 75% | 0.062 | |
| 2009 | RR-BLUP | 0.076 | |||
| 2009 | BayesA | 0.111 | |||
| 2009 | BayesB | 0.107 | |||
| 2009 | RKHS-M | 0.191 | |||
| 2009 | pRKHS-E | 1000 | 75% | 0.203 | |
| 2009 | pRKHS-NE | 1000 | 75% | 0.222 |
The optimal number of markers contributing to phenotypic variation and percent of variations explained by the included SPCs were shown for pRKHS methods; results were averaged across five repeated fittings. Optimal cosine value was 0.3 for pRKHS-E across all datasets.