| Literature DB >> 24314298 |
Sidi Boubacar Ould Estaghvirou, Joseph O Ogutu1, Torben Schulz-Streeck, Carsten Knaak, Milena Ouzunova, Andres Gordillo, Hans-Peter Piepho.
Abstract
BACKGROUND: In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other.Entities:
Mesh:
Year: 2013 PMID: 24314298 PMCID: PMC3879103 DOI: 10.1186/1471-2164-14-860
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The variance components for the AgReliant real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model
| Marker ( | 0.2019 | 0.2019/10 |
| Block ( | 69.9089 | 69.9089 |
| Residual ( | 48.6728 | 48.6728 |
The variance components for the KWS-Synbreed real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model
| Marker ( | 0.005892 | 0.005892/10 |
| Trial ( | 11.8285 | 11.8285 |
| Trial×Replicate ( | 3.3231 | 3.3231 |
| Trial×Replicate×Block ( | 6.3148 | 6.3148 |
| Tester×Non-genotyped | 34.5717 | 34.5717 |
| Residual ( | 53.8715 | 53.8715 |
Definition of the variables in the KWS-Synbreed dataset used to compute covariance parameters used in the simulations for Scenarios 3 and 4
| T0 | C1-C6 | 0 | 0 | C1-C6 | 1 | Hybrid checks 1-6 |
| T0 | nT | 0 | 1 | nT1- nT4 | 1 | 4 lines, not genotyped, unknown tester |
| T0 | fT | 0 | 1 | fT1- fT66 | 1 | 16 lines, not genotyped, tested with a foreign tester |
| T1 | G0 | 0 | 1 | T11-T166 | 1 | 66 lines, not genotyped and tested to T1 |
| T2 | G0 | 0 | 1 | T21-T261 | 1 | 61 lines, not genotyped and tested to T2 |
| T1 | G1 | 1 | 0 | 1-682 | 1-682 | 682 lines, genotyped and tested to T1 in group G1 |
| T1 | G3 | 1 | 0 | 683-698 | 683-698 | 16 lines, genotyped and tested to T1 in group G3 |
GRP = grouping factor for checks (C1-C6), testers (nT, fT) and groups (G0-G3) of genotyped lines. Z1=1 for genotyped lines and 0 otherwise. Z2=1 for non-genotyped lines and 0 otherwise. GENA denotes all the individual genotypes. GENB=GENA for genotyped lines and 1 otherwise. GENB helps specify the vector of random effects of genotyped lines with a length that matches the dimension of the covariance matrix of genotypes.
Summary of the main properties of the seven methods
| 1 | Yes | Yes | Yes | Yes | Yes |
| 2 | Yes | Yes | Yes | Yes | Yes |
| 3 | Yes | Yes | Yes | Yes | Yes |
| 4 | Yes | Yes | Yes | Yes | Yes |
| 5 | Yes | No | - | - | No |
| 6 | No | No | - | Yes | Yes |
| 7 | No | No | - | - | No |
Methods 1 to 4 that require heritability to estimate predictive accuracy are called indirect methods whereas Methods 5 to 7 that do not are called direct methods in the text. The symbol (-) means the question is not applicable for a particular model.
Descriptive statistics for the estimated true heritability for all the datasets out of a possible total of 1000 for which an estimate of heritability was available for all the five methods in each scenario
| | | | |||||
|---|---|---|---|---|---|---|---|
| | | ||||||
|
| |||||||
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| | MIN | 0.56 | 0.09 | 0.16 | 0.16 | 0.16 | 0.50 |
| | MAX | 0.82 | 0.50 | 0.66 | 0.66 | 0.55 | 0.80 |
| | MEAN | 0.71 | 0.32 | 0.48 | 0.48 | 0.34 | 0.67 |
| | STD | 0.04 | 0.06 | 0.08 | 0.08 | 0.06 | 0.05 |
| | MSD | 0.000 | 0.160 | 0.061 | 0.061 | 0.143 | 0.004 |
| 2 | 0 | 152 | 152 | 0 | 0 | 0 | |
| | MIN | 0.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| | MAX | 0.73 | 0.30 | 0.46 | 0.46 | 0.28 | 0.63 |
| | MEAN | 0.42 | 0.09 | 0.15 | 0.18 | 0.08 | 0.33 |
| | STD | 0.11 | 0.07 | 0.11 | 0.10 | 0.05 | 0.11 |
| | †MSD | 0.000 | 0.128 | 0.094 | 0.083 | 0.128 | 0.025 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | |
| | MIN | 0.66 | 0.35 | 0.33 | 0.34 | 0.28 | 0.64 |
| | MAX | 0.79 | 0.62 | 0.61 | 0.61 | 0.53 | 0.78 |
| | MEAN | 0.73 | 0.51 | 0.49 | 0.50 | 0.40 | 0.72 |
| | STD | 0.02 | 0.04 | 0.04 | 0.04 | 0.03 | 0.02 |
| | MSD | 0.000 | 0.051 | 0.057 | 0.055 | 0.110 | 0.001 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | |
| | MIN | 0.36 | 0.00 | 0.00 | -0.55 | 0.02 | 0.23 |
| | MAX | 0.63 | 0.32 | 0.31 | 0.31 | 0.15 | 0.52 |
| | MEAN | 0.52 | 0.14 | 0.13 | 0.13 | 0.07 | 0.39 |
| | STD | 0.04 | 0.07 | 0.06 | 0.07 | 0.02 | 0.05 |
| MSD | 0.000 | 0.151 | 0.155 | 0.152 | 0.202 | 0.020 | |
Methods 1 to 4 but not 5 use cross-validation. M0 is the square of the true correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated heritability.
†MSD=Mean squared deviation and H2 = 0 is the number of datasets for which the estimated heritability was zero. *The number of the equation used in the text is in parenthesis.
Descriptive statistics for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) by scenario
| | | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||
| 1 | N | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |
| | MIN | 0.750 | 0.327 | 0.265 | 0.265 | 0.384 | 0.707 | 0.316 | 0.750 |
| | MEAN | 0.843c | 0.877a | 0.727 e | 0.727e | 0.858b | 0.819d | 0.663f | 0.840c |
| | MAX | 0.908 | 1.000 | 1.000 | 1.000 | 1.000 | 0.893 | 0.884 | 0.899 |
| | STD | 0.024 | 0.109 | 0.104 | 0.104 | 0.093 | 0.028 | 0.084 | 0.023 |
| | MSD | 0.000 | 0.013 | 0.025 | 0.025 | 0.009 | 0.002 | 0.040 | 0.001 |
| | Q1 | 0.829 | 0.807 | 0.661 | 0.661 | 0.802 | 0.803 | 0.612 | 0.826 |
| | Median | 0.846 | 0.890 | 0.723 | 0.724 | 0.862 | 0.822 | 0.665 | 0.841 |
| | Q3 | 0.860 | 0.983 | 0.793 | 0.793 | 0.923 | 0.839 | 0.716 | 0.856 |
| 2 | N | 839 | 839 | 839 | 839 | 839 | 839 | 839 | 839 |
| | MIN | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.06 | 0.00 | 0.08 |
| | MEAN | 0.65a | 0.63c | 0.50e | 0.50e | 0.64ab | 0.58d | 0.46f | 0.64bc |
| | MAX | 0.85 | 1.00 | 1.00 | 1.00 | 1.00 | 0.79 | 1.00 | 0.82 |
| | STD | 0.09 | 0.29 | 0.26 | 0.26 | 0.26 | 0.10 | 0.20 | 0.09 |
| | MSD | 0.000 | 0.083 | 0.092 | 0.092 | 0.069 | 0.02 | 0.081 | 0.011 |
| | Q1 | 0.61 | 0.42 | 0.31 | 0.31 | 0.47 | 0.53 | 0.33 | 0.59 |
| | Median | 0.66 | 0.64 | 0.48 | 0.48 | 0.68 | 0.59 | 0.47 | 0.65 |
| | Q3 | 0.71 | 0.91 | 0.67 | 0.67 | 0.85 | 0.64 | 0.59 | 0.70 |
| 3 | N | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |
| | MIN | 0.81 | 0.60 | 0.61 | 0.61 | 0.69 | 0.80 | 0.54 | 0.78 |
| | MEAN | 0.85a | 0.72c | 0.73c | 0.73c | 0.81b | 0.85a | 0.64d | 0.81b |
| | MAX | 0.89 | 0.88 | 0.90 | 0.89 | 0.96 | 0.88 | 0.78 | 0.84 |
| | STD | 0.01 | 0.04 | 0.04 | 0.04 | 0.04 | 0.01 | 0.04 | 0.01 |
| | MSD | 0.0000 | 0.0193 | 0.0169 | 0.0176 | 0.0036 | 0.0002 | 0.0477 | 0.0017 |
| | Q1 | 0.85 | 0.69 | 0.70 | 0.70 | 0.79 | 0.84 | 0.61 | 0.81 |
| | Median | 0.85 | 0.72 | 0.73 | 0.73 | 0.81 | 0.85 | 0.64 | 0.81 |
| | Q3 | 0.86 | 0.75 | 0.76 | 0.76 | 0.84 | 0.85 | 0.66 | 0.82 |
| 4 | N | 955 | 955 | 955 | 955 | 955 | 955 | 955 | 955 |
| | MIN | 0.60 | 0.14 | 0.14 | 0.14 | 0.15 | 0.48 | 0.24 | 0.52 |
| | MEAN | 0.72a | 0.32e | 0.33e | 0.33e | 0.36d | 0.62c | 0.63b | 0.64b |
| | MAX | 0.79 | 0.52 | 0.53 | 0.52 | 0.56 | 0.72 | 0.93 | 0.72 |
| | STD | 0.03 | 0.06 | 0.06 | 0.06 | 0.07 | 0.04 | 0.09 | 0.03 |
| | MSD | 0.000 | 0.160 | 0.157 | 0.158 | 0.130 | 0.012 | 0.015 | 0.007 |
| | Q1 | 0.70 | 0.29 | 0.29 | 0.29 | 0.32 | 0.60 | 0.57 | 0.62 |
| | Median | 0.72 | 0.32 | 0.33 | 0.33 | 0.37 | 0.62 | 0.64 | 0.64 |
| Q3 | 0.74 | 0.36 | 0.37 | 0.37 | 0.41 | 0.65 | 0.70 | 0.66 | |
All the methods except 5 and 7 use cross-validation. M0 is the correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated predictive accuracy. N is the number of data sets out of a possible total of 1000 for which estimates were available for all the seven methods. Means for pairs of methods within each scenario with the same superscript letter are not significantly different at the 5% level of significance based on the t-test.
†MSD=Mean squared deviation, Q1 is the lower quartile and Q3 is the upper quartile. * The number of the equation used in the text is in parenthesis.
Figure 1Box Whisker plot for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios.
Figure 2Frequency histograms for the true accuracy versus the estimated predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios.
Figure 3Scatter plots of estimated predictive accuracy against the true simulated accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the methods in each scenario. The 1:1 (y = x) line is superimposed for comparison.
Figure 4Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 1.
Figure 5Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 2.
Figure 6Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 3.
Figure 7Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 4.
Correlation between predictive accuracies (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for pairs of the seven methods by scenario
| 1 | M1 | 1.00 | | | | | | |
| | M2 | 0.94 | 1.00 | | | | | |
| | M3 | 0.94 | 1.00 | 1.00 | | | | |
| | M4 | 0.90 | 0.89 | 0.89 | 1.00 | | | |
| | M5 | -0.02 | 0.04 | 0.04 | -0.18 | 1.00 | | |
| | M6 | 0.81 | 0.84 | 0.84 | 0.96 | -0.06 | 1.00 | |
| | M7 | -0.02 | 0.04 | 0.04 | -0.18 | 1.00 | -0.07 | 1.00 |
| 2 | M1 | 1.00 | | | | | | |
| | M2 | 0.97 | 1.00 | | | | | |
| | M3 | 0.97 | 1.00 | 1.00 | | | | |
| | M4 | 0.82 | 0.78 | 0.78 | 1.00 | | | |
| | M5 | 0.21 | 0.17 | 0.17 | 0.04 | 1.00 | | |
| | M6 | 0.70 | 0.67 | 0.67 | 0.91 | -0.02 | 1.00 | |
| | M7 | 0.26 | 0.22 | 0.22 | 0.11 | 0.95 | 0.03 | 1.00 |
| 3 | M1 | 1.00 | | | | | | |
| | M2 | 1.00 | 1.00 | | | | | |
| | M3 | 1.00 | 1.00 | 1.00 | | | | |
| | M4 | 0.91 | 0.91 | 0.91 | 1.00 | | | |
| | M5 | -0.11 | -0.12 | -0.12 | -0.32 | 1.00 | | |
| | M6 | 0.83 | 0.83 | 0.83 | 0.98 | -0.28 | 1.00 | |
| | M7 | -0.11 | -0.12 | -0.12 | -0.32 | 1.00 | -0.28 | 1.00 |
| 4 | M1 | 1.00 | | | | | | |
| | M2 | 1.00 | 1.00 | | | | | |
| | M3 | 1.00 | 1.00 | 1.00 | | | | |
| | M4 | 0.99 | 0.99 | 0.99 | 1.00 | | | |
| | M5 | 0.59 | 0.59 | 0.59 | 0.58 | 1.00 | | |
| | M6 | 0.77 | 0.77 | 0.77 | 0.79 | 0.04 | 1.00 | |
| M7 | 0.59 | 0.59 | 0.59 | 0.58 | 1.00 | 0.05 | 1.00 |
The number of simulated datasets out of a possible total of 1000 for which estimates of predictive were available for each pair of methods was taken as the minimum for the pair.