| Literature DB >> 31632436 |
Abstract
The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r2 ) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias-variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r2 was used as a measure. However, when model selection was based on test MSE and coefficient of determination R2 the ALASSO proved to be the best method. Hence, use of r2 may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R2 as a standardized measure of the accuracy.Entities:
Keywords: accuracy; bias–variance trade-off; coefficient of determination; genomic selection; model comparison
Year: 2019 PMID: 31632436 PMCID: PMC6781837 DOI: 10.3389/fgene.2019.00899
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Mean squared error (MSE), predictive correlation accuracy (r), coefficient of determination (R), covariance between test phenotypes and predicted test phenotypes (COV[y, ŷ]), and variance of predicted test phenotypes (VAR[ŷ]) for ridge regression (RR), LASSO and adaptive LASSO (ALASSO), evaluated on the simulated QTLMAS2010 data.
| Method | MSE | COV[ | VAR[ŷ] | ||
|---|---|---|---|---|---|
| RR | 83.07 | 0.300 | 0.291 | 32.22 | 29.54 |
| LASSO | 65.73 | 0.460 | 0.439 | 44.30 | 36.41 |
| ALASSO | 64.52 | 0.455 | 0.449 | 50.68 | 48.17 |
Ranking of the 10 best individuals from the simulated QTLMAS2010 data based on ŷ for RR, LASSO and ALASSO using min MSE and max predictive correlation accuarcy (r) as model selection measures.
| Rank | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Metdod/selection statistic | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| RR/min[MSE] | 2,586 | 2,772 | 2,977 | 3,050 | 3,195 | 3,056 | 2,756 | 2,738 | 2,821 | 3,184 |
| RR/max[ | 2,586 | 2,772 | 3,195 | 2,977 | 3,050 | 3,184 | 2,589 | 2,821 | 2,756 | 2,738 |
| LASSO/min[MSE] | 2,967 | 2,820 | 2,586 | 2,809 | 3,050 | 2,977 | 3,195 | 2,582 | 2,688 | 2,765 |
| LASSO/max[ | 2,967 | 2,820 | 2,809 | 2,688 | 2,582 | 2,586 | 3,195 | 3,050 | 2,977 | 2,972 |
| ALASSO/min[MSE] | 2,820 | 2,582 | 2,586 | 2,809 | 3,050 | 2,832 | 3,195 | 3,006 | 2,589 | 2,817 |
| ALASSO/max[ | 2,820 | 2,582 | 2,809 | 2,586 | 3,050 | 3,195 | 2,832 | 3,006 | 2,817 | 2,972 |