| Literature DB >> 24449687 |
Nina Hofheinz1, Matthias Frisch.
Abstract
Ridge regression with heteroscedastic marker variances provides an alternative to Bayesian genome-wide prediction methods. Our objectives were to suggest new methods to determine marker-specific shrinkage factors for heteroscedastic ridge regression and to investigate their properties with respect to computational efficiency and accuracy of estimated effects. We analyzed published data sets of maize, wheat, and sugar beet as well as simulated data with the new methods. Ridge regression with shrinkage factors that were proportional to single-marker analysis of variance estimates of variance components (i.e., RRWA) was the fastest method. It required computation times of less than 1 sec for medium-sized data sets, which have dimensions that are common in plant breeding. A modification of the expectation-maximization algorithm that yields heteroscedastic marker variances (i.e., RMLV) resulted in the most accurate marker effect estimates. It outperformed the homoscedastic ridge regression approach for best linear unbiased prediction in particular for situations with high marker density and strong linkage disequilibrium along the chromosomes, a situation that occurs often in plant breeding populations. We conclude that the RRWA and RMLV approaches provide alternatives to the commonly used Bayesian methods, in particular for applications in which computational feasibility or accuracy of effect estimates are important, such as detection or functional analysis of genes or planning crosses.Entities:
Keywords: GenPred; Shared data resources; genome-wide prediction; heteroscedastic marker variances; linkage disequilibrium; plant breeding populations; ridge regression
Mesh:
Year: 2014 PMID: 24449687 PMCID: PMC3962491 DOI: 10.1534/g3.113.010025
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of GWP approaches organized by the assumption of marker variances in the present study
| Approach | Marker Variances | Reference/R Package | |
|---|---|---|---|
| Homoscedastic | Heteroscedastic | ||
| BLUP | x | ||
| rrBlupM6 | x | ||
| RIR | x | ||
| BL | x | ||
| HEM | x | ||
| RMLA | x | New approach | |
| RMLV | x | New approach | |
| RRWA | x | New approach | |
GWP, genome-wide prediction; BLUP, best linear unbiased prediction; RIR, ridge regression employing preliminary estimates of the heritability; BL, Bayesian LASSO ; HEM, heteroscedastic effects model; RMLA, estimation of the error and genetic variance components with restricted maximum likelihood and partitioning according to analysis of variance components; RMLV, modification of the restricted maximum likelihood procedure that yields heteroscedastic variances; RRWA, ridge regression with weighing factors according to analysis of variance components.
Computing time (sec) required for the estimation of marker effects with different GWP approaches
| Homoscedastic Marker Variances | Heteroscedastic Marker Variances | |||||||
|---|---|---|---|---|---|---|---|---|
| RIR | BLUP | rrBLUPM6 | RMLV | RRWA | RMLA | BL | HEM | |
| Simulated data, 500 individuals | ||||||||
| 330 markers | 0.03 | 0.16 | 0.91 | 5.07 | 0.05 | 0.16 | 5.14 | 39.92 |
| 810 markers | 0.05 | 3.18 | 1.55 | 50.30 | 0.13 | 3.38 | 7.99 | 49.56 |
| 1610 markers | 0.23 | 32.11 | 1.68 | 330.60 | 0.30 | 28.22 | 11.77 | 63.65 |
| 1135 SNP markers | 0.10 | 9.08 | 0.37 | 118.20 | 0.14 | 9.17 | 11.10 | 8.79 |
| 1717 DArT markers | 0.23 | 61.8 | 0.62 | 405.60 | 0.37 | 60.60 | 8.96 | 12.49 |
| 300 SNP markers | 0.01 | 0.12 | 0.35 | 3.72 | 0.04 | 0.11 | 5.51 | 3.69 |
For the maize data set, the trait GY-WW was investigated, for the wheat data set the trait GY, and for the sugar beet data set the trait SC. GWP, genome-wide prediction; RIR, ridge regression employing preliminary estimates of the heritability; BLUP, best linear unbiased prediction; RMLV, modification of the restricted maximum likelihood procedure that yields heteroscedastic variances; RRWA, ridge regression with weighing factors according to analysis of variance components; RMLA, estimation of the error and genetic variance components with restricted maximum likelihood and partitioning according to analysis of variance components; BL, Bayesian LASSO; HEM, heteroscedastic effects model; SNP, single-nucleotide polymorphism; DArT, diversity array technology; GY, grain yield; WW, well-watered; SC, sugar content.
Correlation between observed and predicted phenotypic values determined with cross validation for different traits in the maize, wheat, and sugar beet data sets
| Trait-Environment | Heteroscedastic Marker Variances | |||||
|---|---|---|---|---|---|---|
| BLUP | RMLV | RRWA | RMLA | BL | HEM | |
| MFL-WW | 0.36 | 0.28 | 0.35 (0.8) | 0.38 | 0.36 | 0.35 |
| MFL-SS | 0.45 | 0.28 | 0.38 (0.8) | 0.39 | 0.45 | 0.44 |
| FFL-WW | 0.31 | 0.27 | 0.32 (0.8) | 0.31 | 0.31 | 0.32 |
| FFL-SS | 0.51 | 0.35 | 0.46 (0.8) | 0.47 | 0.48 | 0.50 |
| ASI-WW | 0.51 | 0.35 | 0.50 (0.8) | 0.52 | 0.51 | 0.47 |
| ASI-SS | 0.51 | 0.35 | 0.44 (0.8) | 0.46 | 0.50 | 0.45 |
| GY-WW | 0.54 | 0.36 | 0.46 (0.9) | 0.50 | 0.54 | 0.52 |
| GY-SS | 0.43 | 0.19 | 0.34 (0.9) | 0.37 | 0.43 | 0.35 |
| GY-average | 0.65 | 0.54 | 0.66 (0.8) | 0.66 | 0.63 | 0.63 |
| DTH-average | 0.59 | 0.41 | 0.57 (0.9) | 0.60 | 0.58 | 0.55 |
| SC | 0.83 | 0.78 | 0.80 (0.9) | 0.80 | 0.83 | 0.82 |
| ML | 0.85 | 0.82 | 0.84 (0.4) | 0.86 | 0.86 | 0.85 |
For the RRWA approach, the preliminary heritability estimates are given in brackets. BLUP, best linear unbiased prediction; RMLV, modification of the restricted maximum likelihood procedure that yields heteroscedastic variances; RRWA, ridge regression with weighing factors according to analysis of variance components; RMLA, estimation of the error and genetic variance components with restricted maximum likelihood and partitioning according to analysis of variance components; BL, Bayesian LASSO; HEM, heteroscedastic effects model; GY, grain yield; MFL, male flowering; WW, well-watered; SS, severe drought stress; FFL, female flowering; ASI, anthesis-silking interval; DTH, days to heading; SC, sugar content; ML, molasses loss.
Figure 1Comparison of the estimated marker effects for grain yield (GY) in the wheat data set for the best linear unbiased prediction (BLUP), ridge regression with weighing factors according to analysis of variance components (RRWA), estimation of the error and genetic variance components with restricted maximum likelihood and partitioning according to analysis of variance components (RMLA), modification of the restricted maximum likelihood procedure that yields heteroscedastic variances (RMLV), and heteroscedastic effects model (HEM) approaches.
Figure 2Marker effects (blue circles) estimated with different GWP approaches in the simulated data set plotted against marker locations [M] for the first chromosome. The positions of the simulated quantitative trait loci are symbolized by open red diamonds, ngen is the number of random intermating generations, and md is the marker distance [cM] of two adjacent markers.