| Literature DB >> 23173092 |
Jeffrey B Endelman1, Jean-Luc Jannink.
Abstract
The additive relationship matrix plays an important role in mixed model prediction of breeding values. For genotype matrix X (loci in columns), the product XX' is widely used as a realized relationship matrix, but the scaling of this matrix is ambiguous. Our first objective was to derive a proper scaling such that the mean diagonal element equals 1+f, where f is the inbreeding coefficient of the current population. The result is a formula involving the covariance matrix for sampling genomic loci, which must be estimated with markers. Our second objective was to investigate whether shrinkage estimation of this covariance matrix can improve the accuracy of breeding value (GEBV) predictions with low-density markers. Using an analytical formula for shrinkage intensity that is optimal with respect to mean-squared error, simulations revealed that shrinkage can significantly increase GEBV accuracy in unstructured populations, but only for phenotyped lines; there was no benefit for unphenotyped lines. The accuracy gain from shrinkage increased with heritability, but at high heritability (> 0.6) this benefit was irrelevant because phenotypic accuracy was comparable. These trends were confirmed in a commercial pig population with progeny-test-estimated breeding values. For an anonymous trait where phenotypic accuracy was 0.58, shrinkage increased the average GEBV accuracy from 0.56 to 0.62 (SE < 0.00) when using random sets of 384 markers from a 60K array. We conclude that when moderate-accuracy phenotypes and low-density markers are available for the candidates of genomic selection, shrinkage estimation of the relationship matrix can improve genetic gain.Entities:
Keywords: GenPred; Shared Data Resources; breeding value prediction; genomic selection; realized relationship matrix; shrinkage estimation
Mesh:
Year: 2012 PMID: 23173092 PMCID: PMC3484671 DOI: 10.1534/g3.112.004259
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Populations
| Population | Lines ( | SNPs ( | 1st PC | CV | ||
|---|---|---|---|---|---|---|
| Pig | 3534 | 52,843 | 0.03 | 0.06 | 4.7 | 1 |
| Maize | 274 | 44,431 | 0.97 | 0.05 | 1.3 | 1.01 |
| 2-row Barley | 383 | 2398 | 0.95 | 0.08 | 2.6 | 0.35 |
| 2+6-row Barley | 763 | 1884 | 0.97 | 0.32 | 9.0 | 0.06 |
| Rice | 407 | 31,443 | 0.96 | 0.34 | 7.2 | 0.05 |
Inbreeding coefficient, estimated from the relationship matrix.
Fraction of total variance captured by the first principal component (PC).
Coefficient of variation (1 = 100%) for the eigenvalues of the covariance matrix.
Quantities are relative to the pig population (= 1).
Figure 3 Maximizing accuracy vs. minimizing MSE. At shrinkage intensities ranging from 0 to 0.7, with 0.05 increments, the relationship matrix was calculated for random sets of 384 markers. In each replicate, the MSE was calculated relative to the full marker relationship matrix (MSE = n−2‖A384 − Afull‖2), and GEBV accuracy was estimated using simulated phenotypes. The two curves (dashed = accuracy, solid = MSE) show the mean from 40 simulations (SE less than 3% of the mean).
Figure 4 Prediction accuracy for simulated phenotypes in the maize population. The three curves show the difference between GEBV accuracy and phenotypic accuracy as a function of phenotypic accuracy (SE < 0.004 not shown). GEBV accuracy was highest using all markers, followed by 384 SNPs with shrinkage. All three prediction methods peaked when phenotypic accuracy was 0.3, while the accuracy gain due to shrinkage increased monotonically with phenotypic accuracy. Phenotypic accuracies between 0.4 and 0.6 represented a “sweet spot” for shrinkage: in this range, heritability was high enough for shrinkage to substantially improve GEBV accuracy but not so high that phenotypes were more accurate.
Figure 1 Histograms of entries in the realized relationship matrix for the 2-row and 2+6-row barley populations. The diagonal elements have a mean of 1 + f ≈ 2 for inbred lines, while the off-diagonal elements have a mean of −(1 + f)/n ≈ 0. The bimodal distribution of the off-diagonal elements reveals the highly structured nature of the 2+6-row barley population. The positive peak contains relationships between lines with the same row number, while the negative peak is between lines with different row numbers.
Figure 2 Shrinkage intensity to minimize the expected MSE. Each point is the mean from 20 random subsets of markers (SE < 0.01). As expected, the optimal shrinkage decreased as the number of markers increased. There was little shrinkage for the structured populations (rice, 2+6-row barley) because of their high eigenvalue dispersion (see CV in Table 1).
Prediction accuracies for pig traits
| Trait | Phenotypic Accuracy | GEBV | GEBV Accuracy 384 SNP + Shrinkage | GEBV Accuracy 384 SNP, No Shrinkage | ||
|---|---|---|---|---|---|---|
| T3 | 0.38 | 3141 | 0.580 | 0.690 | 0.617 (0.002) | 0.561 (0.002) |
| 393 | – | 0.465 | 0.370 (0.007) | 0.370 (0.007) | ||
| T4 | 0.58 | 3152 | 0.751 | 0.809 | 0.718 (0.002) | 0.630 (0.002) |
| 382 | – | 0.569 | 0.469 (0.004) | 0.469 (0.004) | ||
| T5 | 0.62 | 3184 | 0.734 | 0.765 | 0.678 (0.003) | 0.584 (0.003) |
| 350 | – | 0.520 | 0.429 (0.012) | 0.429 (0.012) |
Heritability reported by Cleveland .
Accuracy = correlation with progeny-test-estimated breeding values.
Genomic-estimated breeding values (GEBV) calculated using all phenotyped individuals.
Within each trait, the top row is for individuals with a measured phenotype; the bottom row is for individuals without a phenotype.
Mean and SE based on 20 random sets of 384 markers.