Literature DB >> 30400768

Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method.

Abstract

BACKGROUND: Cross-validation tools are used increasingly to validate and compare genetic evaluation methods but analytical properties of cross-validation methods are rarely described. There is also a lack of cross-validation tools for complex problems such as prediction of indirect effects (e.g. maternal effects) or for breeding schemes with small progeny group sizes.
RESULTS: We derive the expected value of several quadratic forms by comparing genetic evaluations including "partial" and "whole" data. We propose statistics that compare genetic evaluations including "partial" and "whole" data based on differences in means, covariance, and correlation, and term the use of these statistics "method LR" (from linear regression). Contrary to common belief, the regression of true on estimated breeding values is (on expectation) lower than 1 for small or related validation sets, due to family structures. For validation sets that are sufficiently large, we show that these statistics yield estimators of bias, slope or dispersion, and population accuracy for estimated breeding values. Similar results hold for prediction of future phenotypes although we show that estimates of bias, slope or dispersion using prediction of future phenotypes are sensitive to incorrect heritabilities or precorrection for fixed effects. We present an example for a set of 2111 Brahman beef cattle for which, in repeated partitioning of the data into training and validation sets, there is very good agreement of statistics of method LR with prediction of future phenotypes.
CONCLUSIONS: Analytical properties of cross-validation measures are presented. We present a new method named LR for cross-validation that is automatic, easy to use, and which yields the quantities of interest. The method compares predictions based on partial and whole data, which results in estimates of accuracy and biases. Prediction of observed records may yield biased results due to precorrection or use of incorrect heritabilities.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30400768 PMCID： PMC6219059 DOI： 10.1186/s12711-018-0426-6

Source DB: PubMed Journal: Genet Sel Evol ISSN： 0999-193X Impact factor: 4.297

Background

Models for genetic evaluation are an oversimplification of reality that usually holds only in the short run and in closely-related populations. Their properties are rarely well known, which can lead to unexpected results. For instance, initial applications of genomic predictions of breeding values (GEBV) in dairy cattle led to biases, with young “genomic” selected bulls with high GEBV being overpredicted, as verified by posterior progeny testing [1-3]. As a result, further use of GEBV in the dairy industry required extensive cross-validation and a more formal analytical framework [4-6]. The introduction of new methods for genetic or genomic evaluation raises the question of model choice (comparing across models) and model quality (features of a particular model). Thus, we need tools to rank, understand and quantify the behavior of prediction models in an “animal breeding” context. The need for these tools has dramatically increased with the implementation of genomic selection, given its built-in encouragement to take riskier decisions such as selection of unproven young candidates, in particular in dairy cattle. The method that is most commonly used to check genomic predictions is some form of cross-validation, a test that was rarely used in pedigree-based genetic evaluation studies, which relied primarily on progeny testing (but see [7, 8]). In genomic prediction, cross-validation studies are indeed the norm [4, 9, 10]. Cross-validation tests rely on either one of two approaches: (1) comparing (G)EBV or predicted phenotypes to (pre-corrected) observed phenotypes, deregressed proofs, or yield deviations [9]; or (2) comparing (G)EBV to highly accurate EBV from progeny testing. Another approach, which is in between the two above approaches, is based on daughter yield deviations (DYD; [6]), which are close to highly accurate EBV if heritability is high and the number of daughters is large. Cross-validation is very useful but there are some concerns about the quality or adequacy of these approaches for several reasons, including: (a) the need to pre-correct phenotypes; (b) the growing difficulty to obtain unbiased estimates of DYD with the increasing use of non-progeny tested bulls selected based on GEBV; and (c) their inadequacy for indirect predictions such as those of maternal effects, for which there is no direct observation related to the animal. Apparent contradictions exist, such as lower accuracy of GEBV than that of pedigree EBV [5, 11], or accuracies higher than 1 for lowly heritable traits. For a detailed review of cross-validation in animal breeding and its metrics, we refer the reader to our review [12]. Cross-validation is a good tool but has some limitations as discussed above. Thus, there is an increasing need for a simple general tool that can be used in several complex scenarios, including for traits with a low heritability (e.g. reproductive and fitness traits), for indirectly observed traits (random regression coefficients, maternal effects, GxE interactions), and with limited size of progeny test groups (e.g., pigs). Here, we propose to complement cross-validation approaches with semiparametric procedures based on the classical theory of genetic evaluation. Semiparametric procedures based on the mixed model equations are appealing because they combine theory, which we know is approximately and/or asymptotically correct, with model-free evidence from data. In the 1990’s, there was some effort to develop such procedures [13]. Reverter et al. [14] showed that the amount of change in EBV from one genetic evaluation to the next (i.e., with the arrival of “new” data) was predictable. In parallel, bias in across-country predictions [15, 16] led to the introduction of the Interbull tests [17], which draw on a similar idea. This family of methods has been used to check unbiasedness of predictions and, in the case of the Interbull tests, relies heavily on the availability of progeny tests based on large numbers of daughters. In this work, we draw on analytical results from [14] and present theoretical features of semi-parametric procedures, namely method LR (from “linear regression”). These procedures are a series of statistics, which describe the change of predictions from “old” to “recent” evaluations that can be used to compute and compare population accuracies and biases of (genomic) predictions. We also explore analytical properties of the ability to predict future phenotypes, sometimes called “predictivity”. Then, we illustrate the method with deterministic results for simple designs and for experimental beef cattle data. This work proposes estimates of the “population” accuracy, which is the correlation between true (TBV) and estimated breeding values (EBV) across individuals in a population. Population accuracy is relevant to compare the predictive ability of models and to maximize genetic progress. This work does not propose methods to estimate individual accuracies, which are a measure of the risk when choosing a particular animal for breeding [18].

Methods: analytical developments

We propose to test the quality of evaluation methods using cross-validation tests based on successive EBV of a set of “focal” individuals (a validation cohort). These “focal” individuals can be the whole population [14, 19] or a set of “focal” individuals of interest, such as “genomic” candidates for selection [6]. We will use the convention that refers to a scalar, the variance of a random element from a single realization of random vector (in other words, where n is the size of ), whereas refers to the variance–covariance matrix of elements in during conceptual repetitions. We use a similar convention for and , which are scalars that represent the covariance and correlation across elements in and , whereas is a matrix.

Definition of population accuracy, bias, and dispersion

Let be the true breeding value (TBV) and an estimated breeding value (EBV) of a single individual. The classical definition of accuracy is the correlation for one individual across conceptual repeated sampling [20], which is a measure of the expected magnitude of the change in EBV with increasing information. Accuracies are also used to forecast genetic progress in a selection scheme [18, 21, 22]. This use applies to large unrelated populations, and made sense at the time of selection-index based selection (e.g. selecting boars based on family information). However, for the joint evaluation of all animals, the relevant measure according to Bijma [18] is “the correlation between true i.e. TBV and EBV in the candidates for selection, which is a property of a population, not of an individual”. This “population accuracy” (we will use this term hereafter) is the correlation across a series of individuals. Accordingly, bias is defined as the difference of means and dispersion as the slope of the regression of on : . Indeed, in practice, proxies to these empirical measures are used in cross-validation studies. In other words, accuracy measures the ability to rank individuals within the focal set of individuals, taking the possible relatedness within the sample into account [23, 24], as well as the buildup of the Bulmer effect that reduces genetic variance and makes evaluation more difficult [18, 25]. Note that the three quantities accuracy, bias, and dispersion are defined as scalars, i.e.and have distributions, i.e. over conceptual repetitions have themselves a mean and a variance. We also use indicators of (self-)relationships and of genetic variances within the sample. If the relationship matrix across focal individuals is , then we use where is the inbreeding coefficient and is the relationship between individuals ( can be understood as coancestry), and the bar operators imply averages, i.e. is the average across elements of . The statistic was used by [26] to describe the decrease in genetic variance due to relationships in a related but unselected population. For selected populations, even of infinite size, there is a further decrease in genetic variance due to the Bulmer effect [18, 27], and we will use where is the reduction due to selection and is the genetic variance at equilibrium in a population under selection. The equivalence between Henderson’s [28] results for the decrease in genetic variance in a selected population and was shown (in simplified settings) by [27, 29].

Statistics to test the quality of evaluation methods in brief

Consider successive evaluations with “partial” and “whole” data ( and , respectively), which is based on the use of “old” () and “recent + old” ( from “whole”) phenotype data, respectively. Note that in the following, and have the same dimension and may be a subset of “focal” individuals (e.g. the young candidates for selection) or the number of animals in the entire dataset (i.e., in the relationship matrix). In general, the breeder is concerned with the population accuracy of candidates for selection, because higher population accuracy of selection candidates implies greater genetic progress. Typically, focal individuals have no phenotype (or offspring phenotyped) in but have phenotype (or offspring phenotyped) in , but this is not a requirement for the proposed method. Reverter et al. [14] described the amount of change that is expected in consecutive genetic evaluations of individuals as a function of their respective accuracies, and they proposed statistics to check biases in genetic evaluations. The proposed criteria were very beneficial because (1) they do not require knowledge of the TBV, only the EBV from successive evaluations, and (2) they do not require knowledge of adjustment factors to pre-correct phenotypes. In general, assumptions are: , and . Henderson [28] proved that even in the presence of selection, which when coupled with the results in [14] yields . Intuitively, this holds if “old” errors in prediction () are uncorrelated with “new” information, which in turn holds if the model takes selection correctly into account. Another assumption, which will be shown later in this paper, is that the set of focal individuals is sufficiently large and “diverse” (for instance, there are several full-sib families and not just one). The derivations of Reverter et al. [14] referred to the individual case (e.g. ) and not to sets of individuals (e.g. ) that are used for cross-validation. We extend their results as shown below, which leads to the following main results. The statistic , has an expected value of 0 if the evaluation is unbiased. The regression of EBV obtained with “whole” () data on EBV estimated with “partial” () data has an expectation, if there is no over/under dispersion. The correlation of EBV based on partial and whole data, , is a function of their respective accuracies (), with an expected value , where is the population accuracy (correlation between TBV and EBV across animals). The covariance of EBV based on partial and whole data is a function of the squared accuracy (reliability) of the partial EBV, , . The slope of the regression of EBV based on partial on EBV based on whole data, is, on expectation, a function of the respective accuracies that is, the expectation of the slope is proportional to the relative increase in average reliabilities from EBV based on partial to EBV based on whole data.

Proofs of the adequacy of the statistics

In the following, we prove that the statistics described above are related to bias, slope and accuracies. We make repeated use of the following results for biquadratic forms [30]: consider random vectors , such that . The expectation of the quadratic form is . Empirical (co)variances (scalars) are quadratic forms, for instance where is the centering matrix [31]. Considering our problem, we make the hypothesis that the two genetic evaluations (e.g. males before (“partial”) and after (“whole”) progeny testing) have different means: Note that the meaning of the “expected mean of EBV ” is unclear under selection. For instance, the last generation is expected to have means higher than 0, but these means will differ for males (heavily selected) and females (less selected). We will assume that the focal individuals include sets of animals that are comparable, i.e. under repeated sampling they have the same average genetic level. For instance, if 1% of the elite females and 10% of the elite bulls are selected, offspring from these animals should have on average the same genetic level across conceptual repetitions of the breeding scheme; the actual animals that are selected will differ but the genetic gain will on average be the same. We also assume (as commonly done) that, because of selection, ( is a relationship matrix) instead of the non-selection case, . This implies that Henderson’s [28] description of decrease in variance due to the Bulmer effect is correct and that it can be summarized by a single parameter [18, 25, 27]. Following classical notation: and , and the matrix of prediction error variances and covariances (PEV and PEC) can be described as [14, 28].where and are matrices of PEV and PEC for the partial and whole analysis, respectively. These expressions assume that genetic evaluation deals correctly with the decrease in genetic variance due to selection [28, 32] in which case . From here, we derive expectations of several possible quadratic forms that are combined to produce estimators of bias, dispersion and accuracy. In principle, genetic evaluation does not need to be based on mixed models (statistics can be computed regardless of the procedure), but our results only hold if the variances and covariances of estimators and true values are as described above. Average inbreeding and relationships in are also needed. Ideally, the evaluation is based on conditional means such that the properties described before hold. Precision of the estimators of accuracy and bias depends on the distributional properties of the EBV and TBV, which can be derived when assuming multivariate normality but we have not attempted to do so herein.

Averages of estimated breeding values to estimate bias

It is straightforward to show that , . Thus, is a direct measure of bias.

Quadratic forms of estimated breeding values

For the method R of covariance estimation, it is recommended that the dispersion (relationship) matrix ( is included in the quadratic forms, especially in the presence of selection [33]: However, these weighted quadratic forms lead to estimators that are difficult to understand. Hence, in the following, we will use “unweighted” quadratic forms. The quadratic form using not-centered , has expectation:where is the number of individuals, is the average self-relationship, is the average prediction error variance and is the genetic variance. It is worth noting that the classical definition of individual accuracy is based on [20]. Thus, the expression above for is a function of individual expected average reliabilities , but also of means , . To remove dependence of the quadratic form above on means, it makes sense to use centered and :where is the centering matrix [31]. By its properties, and , and thereforeis a function of the average self-relationships minus the average relationship between individuals, , and PEV minus PEC. Inclusion of relationships between individuals results in the corresponding reduction in genetic variance due to inbreeding to be accounted for, i.e., if as usual , then , which shows the decrease in genetic variance [26, 34, 35]. Similarly, considers the fact that estimates of are correlated across individuals (the so-called “co-reliabilities” [25]), showing that there is little value in having high individual accuracy if predictors are correlated across individuals. The remaining quadratic forms needed for our developments are:and In the remainder of this paper, we assume that the expectation of a ratio of quadratic forms is equal to the ratio of the expectations. The “Appendix” shows that this holds when the number of individuals included in the statistics is large (several hundred or more) or when they are not structured into very large sibships. Otherwise, as shown in the “Appendix”, both the true regression coefficient and its estimator have an expectation less than 1, even when the model is perfect and the EBV have the right dispersion.

Quadratic forms of estimated and true breeding values

Here, we give an alternative definition of the population accuracy, i.e. the expected correlation of EBV and TBV in a sample, as a ratio of quadratic forms: Using and , this has expectation: The denominator corresponds to the expected genetic variance in the focal population and takes the reduction in variance due to relationships and selection () into account. With all these elements, we can compute the expectation of the derived statistics, as done in the following.

Derivation of statistics to test the quality of evaluation methods

Comparison of means of EBV from whole and EBV from partial data

Regression of EBV from whole data on EBV from partial data

The regression is composed of two quadratic forms. When assuming that the expectation of the ratio is equal to the ratio of the expectations, Note that this expectation involves PEV and off-diagonal PEC. Importantly, it must hold that (as usually assumed).

Correlation of EBV from whole and EBV from partial data

This statistic is composed of three quadratic forms and assuming that the square root of the expectation is equal to the expectation of the root, it gives: Therefore, is a direct estimator of the increase in population accuracy of EBV from partial to whole data, .

Estimation of accuracy from the covariance of EBV based on whole and EBV based on partial data

We can get a direct estimator of accuracy (and not of ratios of accuracies) from , from which we can derive the statistic:with expectation as follows Thus, is a direct estimate of squared population accuracy of EBV based on partial data, which we call since it is an estimator of the squared accuracy (a squared correlation) based on the covariance between and . This statistic requires an estimator of that can be obtained by modelling the selection scheme [18] or be explicitly estimated [34].

Regression of EBV from partial data on EBV from whole data

with expectation:which is a function of squared population accuracies, i.e. . In addition, , although and need not be equal for single realizations, i.e. for the analysis of one particular dataset.

Effect of over/underdispersion of breeding values on statistics

Statistics used to compute slopes and accuracies deal well with regular bias () because the and are centered. However, overdispersion (inflation) of EBV is a frequent phenomenon [6]. To consider a simple case, assume that EBV based on partial and whole data are uniformly scaled by regression coefficients and , with (i.e., there is more overdispersion with less data or with old data), resulting in:yielding, e.g., The regression of EBV from whole on partial data, , yields on expectation , which is not equal to 1 but equal to the ratio of dispersions. Thus, a value of (as often observed for genomic predictions) may indicate overdispersion of EBV based on partial data but also underdispersion of EBV based on whole data. The reverse regression of EBV from partial on whole data, yields on expectation , which is a ratio of dispersions and reliabilities. Finally, the correlation has the following expected value:retrieving a ratio of accuracies. Thus, the statistic (correlation of “whole data” and “partial data” EBV) is an estimator of change in accuracy and is not affected by this very simplistic form of overdispersion. Note that equivalent biases result when , i.e. when there is more overdispersion with more data or with recent data. Thus, regression of EBV from whole on partial data informs about over/underdispersion, regression of EBV from partial on whole data can be interpreted as a function of accuracies, and the correlation of EBV from partial and whole data is useful as a ratio of accuracies.

Predictivity: correlation of EBV with precorrected data

A very common strategy in cross-validation tests is to compare predictions of EBV with precorrected phenotypes for the predicted individuals [9, 10], i.e. using , where is the precorrected “new” data available in the whole data. It is, however, not clear whether this is a valid estimator of accuracy and what the effect of precorrection is. Here we derive some results that show that the use of precorrected data can be problematic in some cases: many levels of the main environmental effect or wrong variance components. Precorrected data are obtained with the whole dataset using , where is typically a BLUE estimator of fixed effects. In fact, for [36], which leads to:where is the PEV of fixed effects in obtained from analysis of the whole dataset. Now, we will consider only new data that are not in the partial dataset and assume for simplicity one record per individual. We further assume that the new data are only affected by a single fixed effect (say contemporary group), such that: The covariance of EBV with can be obtained as follows: Because by orthogonality, ([37] equation 5.28), and where , and , the latter because EBV based on partial data do not influence (again, assuming there is no effect of selection). Therefore, , which yields the following expectations for individuals:which is equal to 1 as expected, and the correlation is equal to:with expectation: Thus, the cross-validation correlation of EBV with precorrected phenotypes depends on population accuracy, heritability, and errors in estimates of fixed effects. If fixed effects are estimated with high precision and off-diagonals (both in relationships and in PEV) are negligible, then: If we divide the square of this by the population heritability (i.e. in the selected population, not in the base population):and therefore, Thus, if there has been no selection, we can estimate accuracy from cross-validation as: where is heritability in the base population [9]. However, if there has been selection, using underestimates population accuracy because and . Using the “dairy” example in [18], and , such that . If the observed , this yields (biased) and (correct) . The latter can, in turn, be translated to an “unselected accuracy” of 0.82 [18, 27]. There is a second and not negligible source of bias due to . For a single fixed effect, matrix contains (the variance of the estimate of the effect that affects the -th record) on the diagonal and on off-diagonals (the covariance of the estimates of the effects that affect the -th and -th records). We will assume that covariances of estimates across levels of the fixed effect are negligible (this is not true if relatives are spread across fixed effects). For a balanced design with records in , records for each of the levels of the fixed effect, and with records ordered within level, the structure of is:where . Also, we will assume that , in other words, relationships add little information to estimates of the fixed effect. This results in and , which results in . Plugging this expression in above and ignoring off-diagonals results in: This results in overestimation of the accuracy of , which does not disappear with high values of n. Thus, if there are several levels of the fixed effect, the estimate of the cross-validation accuracy will have an upward bias, which is greater for a smaller number of records per contemporary group. This bias is due to the assumption that the precorrection is perfect. For instance, for and 25 contemporary groups, the bias is an extra 5% apparent accuracy. For “large” and “not small”, bias is approximately , i.e., inversely proportional to the size of the contemporary group, which does not disappear with increasing .

Comparison with current Interbull validation procedures

The Interbull method [6] uses a simple regression that can be written as , where are daughter yield deviations (computed with the whole dataset) and act as pseudo-data for bulls. Elements of are assumed to be independent across bulls with variance inversely proportional to the equivalent number of daughters (this can be viewed as DYD having different heritabilities across bulls). Thus, this setting is similar to the previous section on predictivity. The above proofs apply and the expected value of is 1, although, using does not yield a correct estimate of , i.e. bias, unless . Also, the expected value of is , where is the average reliability of the EBV of bulls based on progeny. Here, as in the analysis on predictivity, off-diagonals are ignored, which should not affect results if progeny numbers are large enough.

Markers considered as “new” data: pedigree BLUP and (SS)GBLUP

The addition of marker genotypes to a pedigree-based BLUP genetic evaluation can also be viewed as having “more data”, e.g. on a correlated trait [38, 39]. Thus, a way to check the increase in accuracy from adding marker information (e.g. from BLUP to GBLUP) is to view the data with marker genotypes as “whole” and the data without markers as “partial”. Using to refer to EBV with markers and to EBV without markers, this yields:i.e., the lower the correlation between genomic EBV and pedigree EBV, the higher the extra accuracy from genomic data. This assumes that , as assumed by [39], which sounds reasonable but has been formally proved only for a single marker that is fitted as a correlated trait [38]. The procedure above uses the same phenotypes for the evaluations with either or . An alternative procedure may be to compare the increase in accuracy from “partial” to “whole” in both approaches. In this case, to compare EBV from a genomic-based method (GBLUP or SSGBLUP) with EBV from a pedigree-based method (PBLUP), we suggest the following procedure: Compute EBV with all data (“whole”) using the method that is deemed to be optimal; we will assume that this is GBLUP. Choose a cutoff date and create a partial dataset by setting phenotypes after cutoff date to missing; Compute GEBV based on the partial data using GBLUP; For “focal” individuals (i.e., the validation cohort): compute statistics , , and that describe respectively bias, dispersion and accuracy of EBV from GBLUP; Compute PEBV based on “partial data” and using PBLUP; Compute statistics , , and that describe respectively bias, dispersion and accuracy of PEBV from PBLUP; The statistic quantifies the inverse of the relative increase in accuracy from PBLUP to GBLUP in the partial data; The statistic quantifies the inverse of the relative increase in accuracy from PBLUP to GBLUP in the whole data.

Data: example using beef cattle data

Animal population, genotypes and phenotypes

The statistics described above were tested in a real-life dataset. We used genetic and phenotypic resources (for details see Table 1) from Brahman cows (N = 995) and bulls (N = 1116) that have been widely described in the recent literature [40-42]. Yearling body weight (YWT) computed from the average of all body weights recorded between 300 and 420 days of age was used as the phenotype. The 2111 Brahman cattle were genotyped using either the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA; [43]) or the BovineHD panel (Illumina Inc., San Diego, CA) that includes more than 770,000 single nucleotide polymorphisms (SNPs). Animals that were genotyped using the lower density array had their genotypes imputed to higher-density, as described previously [44]. The imputation was performed on 30 iterations of BEAGLE [45], using 519 individuals genotyped using the BovineHD chip as reference. After imputation, we retained genotypes on 729,068 SNPs, of which 651,253 were mapped to autosomal chromosomes and had a minor allele frequency (MAF) higher than 1% and were used to build the genomic relationship matrix (GRM) according to Method 1 in [46].

Table 1

Summary statistics for age and body weight (YWT) in yearling records used in the beef cattle data example

Sex	N	Variable	Mean	SD	Min.	Max.
Cows	995	Age (days)	361.77	12.68	323	400
Cows	995	BWT (kg)	209.73	30.54	115	299
Bulls	1116	Age (days)	359.10	20.54	302	416
Bulls	1116	BWT (kg)	243.71	29.17	138	353

Summary statistics for age and body weight (YWT) in yearling records used in the beef cattle data example

Procedure to generate partial datasets and cross-validation statistics

The data described above comprised the whole dataset. One thousand partial datasets were generated by setting a random 50% of records missing. It is worth noting that these animals are contemporaries (the resource population spans a few years and animals are not descendants from each other) and, therefore, there are no issues related to selection. A simple breeding value mixed-model was used for the analysis of YWT with the fixed effects of contemporary group (combination of sex, year and location), age of dam at birth in year classes, and age at measurement as a covariate, and the random additive polygenic effects and residuals as random effects. Variance components estimates and BLUPs of breeding values were obtained using the Qxpak5 software [47]. All datasets were analyzed using both the pedigree-based numerator relationship matrix (NRM) and the SNP-based genomic relationship matrix (GRM). Table 2 lists the 16 statistics that were used to compare EBV from the whole and partial datasets. Note that in order to highlight the impact of the data partitioning, some of these statistics were computed separately for the individuals in the whole and the partial datasets, in the same context as ‘reference’ and ‘validation’ individuals, respectively. For instance, is the correlation between the EBV obtained using the whole dataset and the EBV obtained using the partial dataset, but computed only by using the validation individuals that have missing phenotypes in the partial dataset, i.e. the random 50% individuals with omitted phenotypes in the ‘partial’ dataset. The EBV of those animals are predicted in the partial dataset using parent average (i.e. using pedigree) or using genomic information from relatives (i.e. using the GRM). In the whole dataset, they are predicted using own records.

Table 2

Set of 16 statistics used to compare predictions based on the whole and partial beef cattle datasets

Statistic	Description
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{2}$$\end{document}h2	REML estimate of heritability for each ‘Partial’ dataset (each random 50% missing)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}$$\end{document}bw,p	Regression of whole on partial EBV (expectation of 1.0)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}^{r}$$\end{document}bw,pr	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}$$\end{document}bw,p computed within reference samples (i.e. Those with phenotypes maintained in the creation of the partial sample)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}^{v}$$\end{document}bw,pv	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}$$\end{document}bw,p computed within validation samples (i.e. Those with phenotypes treated as missing in the creation of the partial sample)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}$$\end{document}bp,w	Regression of partial on whole EBV (expectation depends on accuracies)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}^{r}$$\end{document}bp,wr	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}$$\end{document}bp,w computed within reference samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}^{v}$$\end{document}bp,wv	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}$$\end{document}bp,w computed within validation samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}$$\end{document}ρw,p	Correlation between whole and partial EBV (expectation depends on accuracies)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}^{r}$$\end{document}ρw,pr	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}$$\end{document}ρw,p computed within reference samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}^{v}$$\end{document}ρw,pv	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}$$\end{document}ρw,p computed within validation samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\left( {y_{r} ,\hat{u}_{r} } \right)$$\end{document}ryr,u^r	Correlation between the partial EBV and the adjusted phenotypes for the reference samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\left( {y_{v} ,\hat{u}_{v} } \right)$$\end{document}ryv,u^v	Correlation between the partial EBV and the adjusted phenotypes for the validation samples (NB. This is the conventional measure of accuracy in cross-validation genomic selection studies)
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{w,p}^{r}$$\end{document}dw,pr	Difference between whole and partial EBV (in absolute value) computed within reference samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{w,p}^{v}$$\end{document}dw,pv	Difference between whole and partial EBV (in absolute value) computed within validation samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Vd_{w,p}^{r}$$\end{document}Vdw,pr	Variance of the difference between whole and partial EBV computed within reference samples
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Vd_{w,p}^{v}$$\end{document}Vdw,pv	Variance of the difference between whole and partial EBV computed within validation samples

Set of 16 statistics used to compare predictions based on the whole and partial beef cattle datasets

Results

Table 3 provides summary metrics (mean, standard deviation, minimum and maximum) for the 16 statistics across the 1000 partial datasets obtained using the NRM and the GRM. The means are also presented in the bar diagram of Fig. 1. Notable changes from using NRM versus GRM were the 66.5% increase in the estimated heritability (from 0.260 to 0.433), the 21.4% increase in (from 0.550 to 0.668) and the 4.1-fold increase in (from 0.076 to 0.312).

Table 3

Statistic	Pedigree-based NRM				SNP-based GRM
Statistic	Mean	SD	Min.	Max.	Mean	SD	Min.	Max.
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{2}$$\end{document}h2	0.260	0.021	0.211	0.371	0.433	0.044	0.316	0.598
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}$$\end{document}bw,p	0.957	0.064	0.741	1.206	0.961	0.083	0.718	1.275
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}^{r}$$\end{document}bw,pr	0.970	0.059	0.763	1.180	0.954	0.077	0.729	1.231
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{w,p}^{v}$$\end{document}bw,pv	0.925	0.082	0.688	1.272	0.975	0.099	0.685	1.372
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}$$\end{document}bp,w	0.751	0.077	0.522	1.189	0.710	0.066	0.519	0.967
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}^{r}$$\end{document}bp,wr	1.079	0.090	0.840	1.541	0.955	0.079	0.730	1.238
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{p,w}^{v}$$\end{document}bp,wv	0.423	0.056	0.253	0.743	0.462	0.046	0.329	0.667
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}$$\end{document}ρw,p	0.751	0.024	0.665	0.809	0.823	0.013	0.772	0.864
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}^{r}$$\end{document}ρw,pr	0.909	0.013	0.859	0.943	0.952	0.006	0.934	0.967
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho_{w,p}^{v}$$\end{document}ρw,pv	0.550	0.035	0.425	0.637	0.668	0.021	0.584	0.736
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\left( {y_{r} ,\hat{u}_{r} } \right)$$\end{document}ryr,u^r	0.849	0.012	0.804	0.892	0.898	0.015	0.852	0.944
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\left( {y_{v} ,\hat{u}_{v} } \right)$$\end{document}ryv,u^v	0.076	0.022	0.011	0.156	0.312	0.021	0.227	0.373
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{w,p}^{r}$$\end{document}dw,pr	2.253	0.266	1.684	3.902	2.905	0.288	2.344	4.476
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{w,p}^{v}$$\end{document}dw,pv	3.865	0.167	3.441	4.422	6.726	0.216	5.932	7.575
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Vd_{w,p}^{r}$$\end{document}Vdw,pr	8.303	1.988	4.585	24.081	13.798	2.977	8.839	32.127
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Vd_{w,p}^{v}$$\end{document}Vdw,pv	23.893	2.003	19.174	30.920	73.330	4.676	57.355	91.677

Fig. 1

Mean value for the 16 statistics across the 1000 partial (random 50%) beef cattle datasets obtained using either the pedigree-based NRM or the SNP-based GRM. Double-ended arrows indicate ± 1 standard deviation (SD). Refer to Tables 2 and 3 for a description of the statistics and the actual values, respectively

Summary metrics (mean, standard deviation, minimum and maximum) for the 16 statistics across the 1000 partial datasets (each one setting a random 50% as missing phenotypes) and obtained using either the pedigree-based NRM or the SNP-based GRM Mean value for the 16 statistics across the 1000 partial (random 50%) beef cattle datasets obtained using either the pedigree-based NRM or the SNP-based GRM. Double-ended arrows indicate ± 1 standard deviation (SD). Refer to Tables 2 and 3 for a description of the statistics and the actual values, respectively Figure 2 presents a heatmap of the correlation matrix among the 16 statistics obtained using the NRM and the GRM. The individual values are provided in Additional file 1: Tables S1 and S2. We observed a strong negative correlation ( in all cases) between the heritability estimates and the regressions of EBV from whole on EBV from partial data (i.e. , and ). This is consistent with the expectation of over- and under-dispersion for regression values < 1.0 and > 1.0, respectively.

Fig. 2

Heatmap of the correlation matrix among the 16 statistics obtained using the pedigree-based NRM (left panel) and the SNP-based GRM (right panel). Refer to Table 2 for a description of the statistics and to Supplementary Tables 1 and 2 for the actual correlation values One metric of interest is the correlation of EBV with precorrected phenotype (i.e., denoted here as ) since this is one of the most frequent measures of accuracy in cross-validation studies. Quite encouraging is the high correlation observed between and the regressions of EBV from whole on EBV from partial data (i.e. , and ), which ranged from 0.604 to 0.746, as well as the high positive correlation of with the correlations between “whole’ on “partial’ (i.e. , and ), with a maximum correlation of 0.806 between and . These results illustrate that the proposed metrics, particularly , are also estimators of the accuracy of EBV based on the partial (earlier) data (termed in our algebraical derivations). Striking is the novel finding of the strong negative correlation of (where are “reference” animals in the training dataset) with . The former is bound to be high since it reflects the prediction’s goodness of fit when computed on the data that is used to build the prediction, and averaged to 0.849 and 0.898 when using the NRM and the GRM, respectively (Table 3). However, the negative correlation of with indicates that when the breeding value model is particularly good at fitting the reference (‘old’) data (reflected in part by a high heritability estimate), this strong fitting ability disappears when applied to the validation (‘new’) data, which seems to imply overfitting (by chance). Indeed, a very strong correlation ( = 0.933) was observed between the estimate of heritability and , and a moderately strong negative correlation (r = − 0.543) between the estimate of heritability and (Fig. 2 and Additional file 1: Table S2). Importantly, these problematic relationships were not observed with either or . Finally, we explored the changes in ‘consecutive predictions’, which are represented here by the move from partial (old) to whole (new) data. We used the absolute difference between predictions (statistics and ) and the variance of the difference of predictions ( and ) and explored the relationships of these with the previous 12 statistics. Please note the strong negative correlation (r = − 0.838) between and . This contrasts with the not so strong correlation (r = − 0.548) between and (Fig. 3).

Fig. 3

Scatter plot of the relationship of the absolute difference in EBV between the whole and partial datasets () with the correlation of the EBV based on the partial data with the adjusted phenotypes (; left panel) and the correlation between EBV based on the whole and partial data (; right panel) across the 1000 partial (random 50%) beef cattle datasets Between two competing measures of accuracy, the measure that is more closely related to changes in predictions will be preferred. Based on this and our results, we conclude that is better than .

Discussion

Thompson [13] outlined and discussed methods for the statistical validation of genetic models for genetic evaluation [14, 17, 48]. He emphasized the need for the statistical models to be based on genetic considerations. Today, different genetic considerations (e.g. oligogenic vs polygenic models) may lead to different prediction models, in particular in the area of genomic selection. Thus, the question “which model is best?” is today more important than ever. In this work, we attempt to provide quantitative geneticists with a set of tools to make their own decisions. Why do animal breeders aim at having predictions that are unbiased in both senses, i.e. and ? Practically, to avoid suboptimal “biased” decisions, e.g. choosing too few or too many, or simply the wrong set of, young animals. Theoretically, best predictors, defined as conditional expectations, have optimal selection properties [49, 50], and therefore we should aim for models (not necessarily linear) that yield such best predictors. In practice, unbiasedness is a property that holds on expectation: for any real dataset, from one evaluation to the next, there will be small deviations; for instance, may differ from 0 just because of small noises. However, it is important to ascertain if these deviations are large (and affect the practice of selection) or not. In selection, the expected genetic gain at the stage of selection is , i.e. the average EBV of the “s” selected animals minus the average EBV of the “c” animals candidates to selection. To avoid surprises (over- or under-estimation of selected animals), we need , i.e., the estimate of the mean and the true mean of breeding values should be the same for selected animals. For this to hold, we must avoid two kinds of systematic errors: bias (wrong estimate of genetic trend) and over-/under-dispersion, which is often incorrectly referred to as “bias” in animal breeding literature. If selection is by truncation on EBV, the true mean after selection, under multivariate normality, is , where is the mean of all selection candidates and is the genetic gain. This genetic mean is (implicitly) predicted before selection as . For to hold, we need an unbiasedness condition (i.e. among all selection candidates) and a second condition that is . The latter condition, however, only holds if , which amounts to the regression coefficient to be 1. However, the equality holds under quite restrictive conditions [30, 33]. In a frequentist context, Henderson [28, 32] proved that selection can be ignored if the model is correct, selection is contained “in the data”, and under the assumption of multivariate normality. In a Bayesian context, Sorensen et al. [34] proved that selection can be ignored if the evaluation model is correct. However, models are rarely correct, at most they are robust. In particular, the widely used animal model that includes unknown parent groups [51] is biased by construction, because genetic groups are due to genetic selection but fitted as fixed effects, which ignores established genetic theory [52]. It may be argued that for the results in [14] to hold (roughly, future errors in prediction are not correlated to current errors in prediction), future data does not need to depend on past data. This is, however, not the case if there is selection: unborn progeny of unselected animals do not yield data. In principle, models should consider selection correctly, if all information is included. A counterexample where, old data affect future errors of prediction is as follows. Consider EBV () of a young bull based on one record of the dam, with , and a subsequent EBV based on progeny records () but not on maternal performance. Then, , which is not equal to because the dam performance was not included in . If there is no selection, there is no problem, but this is rarely the case, and it is actually selection that creates bias due to an increase in the genetic level of the trait and a reduction in genetic variance. Thus, we see the process of estimation of accuracy and bias of EBV by our proposed method LR as a double process. First, checking of the model in order to have a model that empirically has the “best” properties (estimation of bias); and then, estimation of its accuracy. We propose the following two-step praxis approach. First, to ascertain as best as possible that models are empirically unbiased using the statistics and which should have values 0 and 1, respectively—perhaps using, if not all, many animals (as in the original paper of Reverter et al. [14]). Second, for all models that are empirically unbiased, accuracies can be compared based on the proposed statistics, which rely on unbiasedness. Still, there is a problem in method LR, and in all methods that rely on linear regression of “predictands” (pseudo-TBV from accurate progeny testing, less accurate EBV or precorrected records) on “predictors” (typically EBV). As shown in the “Appendix”, due to family structure and the not complete accuracy of EBV, the true value of the regression of TBV on EBV, i.e. the “true” , has an expectation lower than 1, . Accordingly, regression of “whole data” EBV (or of precorrected data) on “partial data” may seem to indicate bias: because for the “true” , . In other words, EBV may appear to be over-dispersed when they are actually not, which holds for method LR and for any other similar method such as “predictivity” or the Interbull tests, since it is a fundamental property of the crude regression of a vector of TBV on a vector of EBV . It seems relevant to assess, in practice, the extent of this inequality , since evaluations are often scaled such that (actually its estimate) is equal to 1, which implies that EBV may be too much deflated after the scaling. However, we will not address these points here, since this should be the subject of a simulation study that goes far beyond this paper. The deviation of from 1 is important if the cohort, or focal group, is small and related, and it does not depend on the quality of the “predictand”. Therefore, our recommendation is to use large cohorts for validation. This bias inherent to cross-validation analysis deserves further examination in future studies. Fixing the models to observe constraints on estimated bias should be based on rigorous genetic or statistical arguments (i.e. re-estimating variance components and heritabilities), rather than quick fixing procedures such as multiplying by constants, manipulating relationships or changing hyper-parameters of prior distributions. For instance, [53] found empirically that equaling statistics of and provided unbiased predictions, but this has a genetic interpretation of modelling selection and drift from the base to the genotyped population [54, 55]. In the analysis of genetic trend for litter size in pigs, Sorensen et al. [48] also emphasized “forward” cross-validation for model checking, using what we called in this paper “predictivity”, instead of relying solely on model-based predictions. Recently, Putz et al. [56] tested by simulation several methods to validate accuracies by cross-validation. They reported poor performance of comparisons of (in our notation) and , without realizing that is not an estimator of accuracy but of ratios of accuracies. In addition, they did not simulate selection, in which case theoretical accuracy is equal to validation accuracies. We have shown that precorrection of phenotypes using whole data may bias the result of predictivity. This is particularly relevant for small contemporary groups such as in dairy or beef cattle as opposed to, say, sheep or aquaculture species. Some measure of error in precorrection due to estimation of contemporary groups should be reported in cross-validation results. Although the ranking of methods should be similar, estimates of population accuracies may be biased. Comparing and , as we propose in this work, might be a better option, although it involves more parametric assumptions. One final consideration involves discussing the difference between population and individual accuracy. Quoting [18]: “For response to selection, the [population] accuracy should reflect the correlation between true and EBV in the candidates for selection, which is a property of a population, not of an individual. For the stability of EBV, the accuracy should reflect the standard error of an EBV, which can be defined for a single individual.” Our work deals with population accuracies, not with individual accuracies. The former are useful for model selection and for genetic gain; the latter are useful for individual decisions. The population accuracy is not a function of individual accuracies. For instance, consider full sibs that are evaluated by using parent average and for which their parents are known exactly: individual accuracy is 0.71. However, population accuracy is 0, since all full-sibs have exactly the same parent average. Thus, population accuracies involve both individual reliabilities and co-reliabilities [24, 25].

Conclusions

In this paper, we present properties of cross-validation measures obtained from successive genetic evaluations. These measures allow estimation of population accuracies and biases, which are of interest to quantitative geneticists in general, and animal and plant breeders in particular. We hope that with these tools, researchers can report and compare competing prediction models, in particular for complex cases such as for lowly heritable traits or for indirect genetic values such as maternal effects. Additional file 1: Tables S1 and S2. Correlation among the 16 statistics employed in the cross-validation study of the beef cattle dataset using the pedigree-based NRM or the SNP-based GRM (NB: These are the values used to generate the left and right panels of Fig. 2

37 in total

1. Best linear unbiased estimation and prediction under a selection model.

Authors: C R Henderson
Journal: Biometrics Date: 1975-06 Impact factor: 2.571

2. Marker-assisted selection for commercial crossbred performance.

Authors: J C M Dekkers
Journal: J Anim Sci Date: 2007-05-15 Impact factor: 3.159

3. Efficient methods to compute genomic predictions.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2008-11 Impact factor: 4.034

4. Application of genomic evaluation to dairy cattle in New Zealand.

Authors: A M Winkelman; D L Johnson; B L Harris
Journal: J Dairy Sci Date: 2014-10-30 Impact factor: 4.034

5. Genomic inbreeding depression for climatic adaptation of tropical beef cattle.

Authors: A Reverter; L R Porto-Neto; M R S Fortes; P Kasarapu; M A R de Cara; H M Burrow; S A Lehnert
Journal: J Anim Sci Date: 2017-09 Impact factor: 3.159

6. Three methods to validate the estimation of genetic trend for dairy cattle.

Authors: D Boichard; B Bonaiti; A Barbat; S Mattalia
Journal: J Dairy Sci Date: 1995-02 Impact factor: 4.034

7. Computation of deregressed proofs for genomic selection when own phenotypes exist with an application in French show-jumping horses.

Authors: A Ricard; S Danvy; A Legarra
Journal: J Anim Sci Date: 2012-12-10 Impact factor: 3.159

Review 8. Invited review: Genomic selection in dairy cattle: progress and challenges.

Authors: B J Hayes; P J Bowman; A J Chamberlain; M E Goddard
Journal: J Dairy Sci Date: 2009-02 Impact factor: 4.034

9. Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome-wide association studies.

Authors: Sunduimijid Bolormaa; Jennie E Pryce; Kathryn E Kemper; Ben J Hayes; Yuandan Zhang; Bruce Tier; William Barendse; Antonio Reverter; Mike E Goddard
Journal: Genet Sel Evol Date: 2013-10-29 Impact factor: 4.297

10. Genomic correlation: harnessing the benefit of combining two unrelated populations for genomic selection.

Authors: Laercio R Porto-Neto; William Barendse; John M Henshall; Sean M McWilliam; Sigrid A Lehnert; Antonio Reverter
Journal: Genet Sel Evol Date: 2015-11-02 Impact factor: 4.297

39 in total

1. Genomic selection in American mink (Neovison vison) using a SSGBLUP model for size and quality traits graded on live mink.

Authors: Trine M Villumsen; Guosheng Su; Bernt Guldbrandtsen; Torben Asp; Mogens S Lund
Journal: J Anim Sci Date: 2021-01-08 Impact factor: 3.159

2. Beef trait genetic parameters based on old and recent data and its implications for genomic predictions in Italian Simmental cattle.

Authors: Alberto Cesarani; Jorge Hidalgo; Andre Garcia; Lorenzo Degano; Daniele Vicario; Yutaka Masuda; Ignacy Misztal; Daniela Lourenco
Journal: J Anim Sci Date: 2020-08-01 Impact factor: 3.159

3. Genomic study of the resilience of buffalo cows to a negative energy balance.

Authors: Francisco Ribeiro de Araujo Neto; Jessica Cristina Gonçalves Dos Santos; Cherlynn Daniela da Silva Arce; Rusbel Raul Ascpilcueta Borquis; Daniel Jordan Abreu Dos Santos; Katia Cylene Guimarães; André Vieira do Nascimento; Henrique Nunes de Oliveira; Humberto Tonhati
Journal: J Appl Genet Date: 2022-01-25 Impact factor: 3.240

4. Prediction ability for growth and maternal traits using SNP arrays based on different marker densities in Nellore cattle using the ssGBLUP.

Authors: Juan Diego Rodriguez Neira; Elisa Peripolli; Maria Paula Marinho de Negreiros; Rafael Espigolan; Rodrigo López-Correa; Ignacio Aguilar; Raysildo B Lobo; Fernando Baldi
Journal: J Appl Genet Date: 2022-02-08 Impact factor: 3.240

Review 5. Association Studies and Genomic Prediction for Genetic Improvements in Agriculture.

Authors: Qianqian Zhang; Qin Zhang; Just Jensen
Journal: Front Plant Sci Date: 2022-06-02 Impact factor: 6.627

6. Short communication: investigation of the feasibility of genomic selection in Icelandic Cattle.

Authors: Egill Gautason; Goutam Sahana; Guosheng Su; Baldur Helgi Benjamínsson; Guðmundur Jóhannesson; Bernt Guldbrandtsen
Journal: J Anim Sci Date: 2021-07-01 Impact factor: 3.159

7. Composite genotypes of progestogen-associated endometrial protein gene and their association with composition and quality of dairy cattle milk.

Authors: Magdalena Kolenda; Beata Sitkowska; Dariusz Kamola; Barry D Lambert
Journal: Anim Biosci Date: 2021-02-15

8. The utility of genomic prediction models in evolutionary genetics.

Authors: Suzanne E McGaugh; Aaron J Lorenz; Lex E Flagel
Journal: Proc Biol Sci Date: 2021-08-04 Impact factor: 5.530

9. Multiple Country and Breed Genomic Prediction of Tick Resistance in Beef Cattle.

Authors: Fernando Flores Cardoso; Oswald Matika; Appolinaire Djikeng; Ntanganedzeni Mapholi; Heather M Burrow; Marcos Jun Iti Yokoo; Gabriel Soares Campos; Claudia Cristina Gulias-Gomes; Valentina Riggio; Ricardo Pong-Wong; Bailey Engle; Laercio Porto-Neto; Azwihangwisi Maiwashe; Ben J Hayes
Journal: Front Immunol Date: 2021-06-23 Impact factor: 7.561

10. Predicting the accuracy of genomic predictions.

Authors: Jack C M Dekkers; Hailin Su; Jian Cheng
Journal: Genet Sel Evol Date: 2021-06-29 Impact factor: 4.297