| Literature DB >> 23134637 |
Haroldo H R Neves1, Roberto Carvalheiro, Sandra A Queiroz.
Abstract
BACKGROUND: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population.Entities:
Mesh:
Year: 2012 PMID: 23134637 PMCID: PMC3563460 DOI: 10.1186/1471-2156-13-100
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Available information* on the genetic architecture of the traits in study
| %CD8+ | 17 | 36.3 | 8.00 | 0.89 |
| CD4+/CD8+ | 11 | 33.1 | 11.90 | 0.80 |
| W6W | 19 | 38.3 | 3.20 | 0.74 |
| WGS | 10 | 20.6 | 2.40 | 0.30 |
| BL | 6 | 16.7 | 3.10 | 0.13 |
Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Nº QTL = number of QTL mapped, proportion of the variance explained by them (variance explained, in %) and proportion of the variance explained by the QTL with largest effect(largest QTL, in %). *Information published in Valdar et al. [10] and Valdar et al. [11].
Summary statistics* pertaining to phenotypic data** employed in cross validation
| W6W | within | 1925 | 1059 | 1061.9 | 1066 | −0.155 | 1.96 | −0.188 | 1.95 |
| WGS | within | 1917 | 1056 | 1059.6 | 1068 | 0.001 | 0.04 | 0.002 | 0.04 |
| BL | within | 1840 | 1013 | 1017.9 | 1035 | −0.002 | 0.40 | −0.006 | 0.40 |
| %CD8+ | within | 1407 | 774 | 778.6 | 785 | 0.010 | 4.34 | 0.122 | 4.36 |
| CD4+/CD8+ | within | 1403 | 772 | 774.5 | 781 | 0.003 | 0.07 | 0.004 | 0.07 |
| W6W | across | 1925 | 1059 | 1067.6 | 1081 | −0.162 | 1.95 | −0.180 | 1.96 |
| WGS | across | 1917 | 1057 | 1063.3 | 1076 | 0.001 | 0.04 | 0.002 | 0.04 |
| BL | across | 1840 | 1014 | 1022.3 | 1030 | −0.002 | 0.40 | −0.005 | 0.40 |
| %CD8+ | across | 1407 | 775 | 780.9 | 791 | −0.149 | 4.23 | 0.319 | 4.43 |
| CD4+/CD8+ | across | 1403 | 774 | 782.2 | 799 | 0.003 | 0.07 | 0.003 | 0.07 |
*N = total number of phenotypic records. Minimum, average and maximum size of training set (Min, Ave and Max) and mean and standard deviation (SD) of the adjusted records considered in training and testing sets (averaged across replicates).**Traits considered: weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Split = splitting strategy in cross-validation (within or across-family).
REML estimates of variance components (and related parameters) for traits of a heterogeneous stock mice population
| W6W | 3.915 | 29.836 | 1.719 | 13.100 | 3.E-05 | 9.E-04 | 0.695 | 0.030 |
| WGS | 8.E-04 | 2.E-04 | 1.E-03 | 1.E-04 | 9.E-04 | 1.E-04 | 0.295 | 0.069 |
| BL | 0.036 | 0.012 | 0.039 | 0.007 | 0.148 | 0.009 | 0.161 | 0.051 |
| %CD8+ | 19.370 | 2.851 | 1.990 | 0.471 | 0.357 | 1.505 | 0.892 | 0.101 |
| CD4+/CD8+ | 5.E-03 | 7.E-04 | 6.E-04 | 1.E-04 | 4.E-04 | 4.E-04 | 0.825 | 0.081 |
Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+).
σ2u: additive genetic variance ; σ2c: variance due the random environmental effect of cage; σ2e: residual variance; h2: heritability (standard error, SE, in brackets).
Figure 1Predictive ability* of the different methods employed in within-family predictions for five traits in a mice population. *Average of ten replicates. Bars sharing the same letter are not different (P >0.05). Traits: weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+).
Figure 2Predictive ability* of the different methods employed in across-family predictions for five traits in a mice population. Average of ten replicates. Bars sharing the same letter are not different (P >0.05). Traits: weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+).
Bias* of genomic predictions from different methods, obtained for five traits of a mice population
| Within | POL | −4% | 1% | −1% | 1% | 1% |
| emBayesB | −62% | 24% | 2% | −42% | −52% | |
| RR_GBLUP | −9% | 8% | 13% | 2% | 4% | |
| SS_BY | −18% | 22% | - | 5% | 5% | |
| SS_ABS | −35% | 4% | −30% | 5% | 17% | |
| RKHS | −1% | 1% | −1% | 1% | 2% | |
| SVR | −4% | 3% | −6% | 5% | 4% | |
| BayesCpi | −26% | 66% | 81% | −568% | −91% | |
| BayesC | −43% | 263% | 69% | −397% | −420% | |
| LASSO | −85% | 53% | 2% | −73% | −73% | |
| RF | −1% | 1% | 1% | 0% | 1% | |
| Across | POL | - | - | - | - | - |
| emBayesB | −51% | 5% | 3% | −58% | −87% | |
| RR_GBLUP | −31% | 8% | 8% | −2% | 14% | |
| SS_BY | −44% | 18% | - | −1% | 17% | |
| SS_ABS | −54% | −7% | −38% | −16% | 31% | |
| RKHS | −23% | 4% | −4% | 9% | 10% | |
| SVR | −24% | 6% | −8% | 14% | 10% | |
| BayesCpi | −15% | 83% | 46% | −719% | −196% | |
| BayesC | 23% | 280% | 115% | −239% | −359% | |
| LASSO | −116% | 40% | 7% | −70% | −98% | |
| RF | −13% | 2% | −1% | 5% | 5% | |
*Average of ten replicates. Bias was measured as the average difference between observed and predicted phenotypes of testing set and is presented as a proportion of the standard deviation of each trait (in %). Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Splitting= splitting strategy in cross-validation (within or across-family).
Inflation* of genomic predictions from different methods, obtained for five traits of a mice population
| Within | POL | 1.21 | 0.91 | 0.89 | 1.03 | 0.97 |
| emBayesB | 1.43 | 1.57 | 3.70 | 1.65 | 1.55 | |
| RR_GBLUP | 0.95 | 0.70 | 0.88 | 0.66 | 0.75 | |
| SS_BY | 0.79 | 0.66 | - | 0.60 | 0.72 | |
| SS_ABS | 0.72 | 0.49 | 0.49 | 0.64 | 0.72 | |
| RKHS | 1.08 | 0.95 | 0.89 | 1.06 | 1.26 | |
| SVR | 1.24 | 0.91 | 0.68 | 1.16 | 1.18 | |
| BayesCpi | 0.55 | 0.29 | 0.40 | 0.17 | 0.29 | |
| BayesC | 0.18 | 0.12 | 0.21 | 0.19 | 0.20 | |
| LASSO | 0.93 | 1.01 | 1.08 | 1.00 | 1.07 | |
| RF | 1.48 | 1.10 | 0.78 | 1.12 | 1.11 | |
| Across | POL | - | - | - | - | - |
| emBayesB | 0.46 | 0.90 | 1.47 | 1.43 | 1.33 | |
| RR_GBLUP | 0.37 | 0.49 | 0.52 | 0.46 | 0.52 | |
| SS_BY | 0.30 | 0.46 | - | 0.42 | 0.52 | |
| SS_ABS | 0.29 | 0.30 | 0.32 | 0.50 | 0.52 | |
| RKHS | 0.55 | 0.95 | 0.75 | 1.21 | 1.36 | |
| SVR | 0.61 | 0.89 | 0.62 | 1.42 | 1.27 | |
| BayesCpi | 0.22 | 0.21 | 0.14 | 0.17 | 0.24 | |
| BayesC | 0.07 | 0.08 | 0.11 | 0.18 | 0.15 | |
| LASSO | 0.38 | 0.71 | 0.56 | 0.98 | 1.01 | |
| RF | 0.72 | 1.22 | 0.65 | 1.27 | 1.22 | |
*Average of ten replicates. Inflation was measured as the slope of the regression of observed phenotypes on predicted phenotypes of testing set
Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Splitting= splitting strategy in cross-validation (within or across-family).
Normalized root-mean squared error(NRMSE)* of genomic predictions from different methods, obtained for five traits of a mice population
| Within | POL | 0.099 | 0.114 | 0.123 | 0.138 | 0.145 |
| emBayesB | 0.140 | 0.122 | 0.124 | 0.165 | 0.165 | |
| RR_GBLUP | 0.103 | 0.114 | 0.123 | 0.147 | 0.136 | |
| SS_BY | 0.112 | 0.122 | - | 0.158 | 0.144 | |
| SS_ABS | 0.117 | 0.120 | 0.132 | 0.161 | 0.143 | |
| RKHS | 0.098 | 0.112 | 0.122 | 0.127 | 0.131 | |
| SVR | 0.099 | 0.112 | 0.123 | 0.128 | 0.130 | |
| BayesCpi | 0.166 | 0.214 | 0.268 | 1.252 | 0.502 | |
| BayesC | 0.534 | 0.877 | 0.251 | 1.035 | 1.080 | |
| LASSO | 0.172 | 0.137 | 0.127 | 0.204 | 0.196 | |
| RF | 0.102 | 0.112 | 0.123 | 0.124 | 0.119 | |
| Across | POL | - | - | - | - | - |
| emBayesB | 0.154 | 0.132 | 0.126 | 0.199 | 0.230 | |
| RR_GBLUP | 0.128 | 0.121 | 0.125 | 0.184 | 0.170 | |
| SS_BY | 0.141 | 0.126 | - | 0.193 | 0.175 | |
| SS_ABS | 0.145 | 0.130 | 0.140 | 0.188 | 0.180 | |
| RKHS | 0.122 | 0.119 | 0.124 | 0.152 | 0.152 | |
| SVR | 0.122 | 0.119 | 0.125 | 0.155 | 0.153 | |
| BayesCpi | 0.163 | 0.243 | 0.240 | 1.519 | 0.593 | |
| BayesC | 0.637 | 0.747 | 0.275 | 0.896 | 1.091 | |
| LASSO | 0.201 | 0.142 | 0.129 | 0.213 | 0.225 | |
| RF | 0.119 | 0.119 | 0.124 | 0.144 | 0.130 | |
*Average of ten replicates. Lower values are associated with better overall fit
Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Splitting= splitting strategy in cross-validation (within or across-family).
Summary statistics* associated with distributions of estimated marker effects (within-family splitting)
| | | ||||||
|---|---|---|---|---|---|---|---|
| BL | t1000 | 0.83 | 1.00 | 0.52 | 1.00 | 1.00 | 1.00 |
| t500 | 0.75 | 1.00 | 0.35 | 1.00 | 0.99 | 0.97 | |
| t100 | 0.53 | 0.99 | 0.12 | 0.97 | 0.52 | 0.76 | |
| t20 | 0.34 | 0.96 | 0.04 | 0.68 | 0.18 | 0.43 | |
| |g| > 0 | 9457 | 4076 | 9820 | 199 | 623 | 541 | |
| kurt | 284.0 | 2650.1 | 0.6 | 496.3 | 37.7 | 503.7 | |
| CD4+/CD8+ | t1000 | 1.00 | 1.00 | 0.53 | 1.00 | 1.00 | 0.96 |
| t500 | 0.99 | 1.00 | 0.37 | 1.00 | 0.99 | 0.88 | |
| t100 | 0.92 | 0.99 | 0.13 | 0.97 | 0.57 | 0.60 | |
| t20 | 0.64 | 0.97 | 0.04 | 0.71 | 0.21 | 0.32 | |
| |g| > 0 | 9559 | 3938 | 9820 | 228 | 577 | 1172 | |
| kurt | 407.2 | 2790.4 | 1.0 | 597.7 | 48.6 | 319.9 | |
| %CD8+ | t1000 | 0.91 | 1.00 | 0.53 | 1.00 | 1.00 | 0.98 |
| t500 | 0.83 | 1.00 | 0.36 | 1.00 | 0.99 | 0.93 | |
| t100 | 0.50 | 0.99 | 0.13 | 0.97 | 0.54 | 0.69 | |
| t20 | 0.21 | 0.96 | 0.04 | 0.71 | 0.19 | 0.39 | |
| |g| > 0 | 9820 | 3792 | 9820 | 203 | 603 | 754 | |
| kurt | 36.1 | 3009.2 | 0.7 | 553.6 | 41.2 | 443.6 | |
| W6W | t1000 | 0.51 | 1.00 | 0.52 | 1.00 | 1.00 | 1.00 |
| t500 | 0.35 | 1.00 | 0.36 | 1.00 | 0.99 | 1.00 | |
| t100 | 0.12 | 1.00 | 0.12 | 0.93 | 0.54 | 0.85 | |
| t20 | 0.04 | 1.00 | 0.04 | 0.49 | 0.19 | 0.46 | |
| |g| > 0 | 9820 | 4711 | 9820 | 325 | 612 | 456 | |
| kurt | 0.7 | 1094.8 | 0.6 | 218.0 | 39.9 | 155.5 | |
| WGS | t1000 | 1.00 | 1.00 | 0.52 | 1.00 | 1.00 | 1.00 |
| t500 | 1.00 | 1.00 | 0.35 | 1.00 | 0.99 | 0.98 | |
| t100 | 1.00 | 1.00 | 0.12 | 0.95 | 0.54 | 0.77 | |
| t20 | 0.93 | 1.00 | 0.04 | 0.58 | 0.19 | 0.40 | |
| |g| > 0 | 1716 | 4623 | 9820 | 260 | 617 | 655 | |
| kurt | 877.0 | 1998.6 | 0.6 | 310.9 | 40.1 | 134.7 | |
*Average of 10 replicates. t1000, t500, t100 and t20 = proportion of the variance accounted for the markers (varM) explained by those with the largest 1000, 500, 100 and 20 absolute effects, respectively. , in which and pi are the estimated effect and the allele frequency for the ith marker, respectively. |g| > 0 = number of markers with non-null estimated effect. kurt = excess kurtosis of the distribution of estimated marker effects. Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+).
Figure 3Average computing time* required to perform model training using different statistical methods. *Average of ten replicates. Elapsed times were measured during model training, carried out with the information available in the reference set, aiming to compute genomic predictions for five traits in a mice population.
Figure 4Expected and realized accuracy of genomic predictions with RR_GBLUP. Expected accuracies were calculated according to Daetwyler et al. (2010), by considering two approximations for the number of independent chromosome segments (Me): Me1 = 2NeL/ln(4NeL) or Me2 = 2NeL, where Ne is the effective population size and L is the genome length (in Morgans). realized = realized accuracy of GBLUP (average of ten replicates). Five traits were considered: weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Two scenarios of prediction were considered within-family (_within) and across-family predictions (_across).
Summary of the results* of the test for equality of predictive ability across groups
| emBayesB | 0.42 | 0.30 | 0.26 | 0.28 | 0.05 | 0.22 | 0.47 | 0.33 | 0.06 | 0.03 |
| RR_GBLUP | 0.35 | 0.41 | 0.27 | 0.19 | 0.11 | 0.30 | 0.36 | 0.28 | 0.10 | 0.09 |
| SS_BY | 0.32 | 0.36 | 0.16 | 0.13 | 0.07 | 0.40 | 0.30 | 0.20 | 0.09 | 0.03 |
| SS_ABS | 0.39 | 0.34 | 0.28 | 0.04 | 0.17 | 0.20 | 0.28 | 0.34 | 0.06 | 0.06 |
| RKHS | 0.38 | 0.33 | 0.33 | 0.24 | 0.06 | 0.30 | 0.41 | 0.28 | 0.28 | 0.06 |
| SVR | 0.41 | 0.34 | 0.33 | 0.25 | 0.08 | 0.28 | 0.39 | 0.35 | 0.30 | 0.05 |
| BayesCpi | 0.37 | 0.28 | 0.27 | 0.28 | 0.07 | 0.31 | 0.38 | 0.25 | 0.12 | 0.07 |
| BayesC | 0.34 | 0.44 | 0.25 | 0.27 | 0.08 | 0.29 | 0.37 | 0.26 | 0.19 | 0.06 |
| LASSO | 0.45 | 0.29 | 0.24 | 0.24 | 0.10 | 0.30 | 0.36 | 0.38 | 0.10 | 0.03 |
| RF | 0.40 | 0.35 | 0.32 | 0.16 | 0.07 | 0.29 | 0.62 | 0.27 | 0.16 | 0.13 |
*P-values for the chi-square test for equality of predictive ability across groups (average of 10 replicates). In order to reduce the influence of discrepant replicates, the average of log10(p-value) was calculated for each trait and method, and then back-transformed to the original scale. Animals were clustered into 4 groups based on genetic distance between them.
Trait = weight at 6 weeks (W6W), weight growth slope (WGS), body length (BL), percentage of CD8+ cells (%CD8+), ratio between CD4+ and CD8+ cells (CD4+/CD8+). Splitting= splitting strategy in cross-validation (within or across-family).