Literature DB >> 24314298

Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.

Sidi Boubacar Ould Estaghvirou, Joseph O Ogutu¹, Torben Schulz-Streeck, Carsten Knaak, Milena Ouzunova, Andres Gordillo, Hans-Peter Piepho.

Abstract

BACKGROUND: In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other.
RESULTS: The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best.
CONCLUSIONS: The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 24314298 PMCID： PMC3879103 DOI： 10.1186/1471-2164-14-860

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Genomic selection (GS) is a method for predicting genomic breeding values using molecular markers covering the whole genome [1-3]. GS is fast becoming popular in plant and animal breeding [1,4,5], because of recent advances in high-throughput marker technologies and accompanying reduction in the costs of genotyping. The performance of genomic selection (GS) procedures is often assessed by k-fold cross-validation (CV) [6]. Accurate evaluation of the performance of genomic selection is difficult in practice because true breeding values are typically unknown. As result, simulation modeling is often used to generate breeding values as a basis for assessing the accuracy of genomic prediction [1]. Once the true breeding values are available, the accuracy of genomic prediction can be expressed as the correlation between the true and the predicted breeding values. In this paper, we use simulated true breeding values to directly compute the true correlation (accuracy) between the true and the predicted breeding values as a benchmark for evaluating the performance of seven contending methods. Four of the seven methods (Methods 1 to 4) first estimate heritability H2[7] and then divide the cross-validation sample correlation between the predicted breeding values () and the observed phenotypic values (p), predictive ability, by the square root of heritability H2[8,9] to obtain an estimate of predictive accuracy . The remaining three methods (Methods 5 to 7) estimate the predictive accuracy directly without having to first estimate heritability, even though Method 5 also estimates heritability. Here, we investigate the relative merits of the seven methods for estimating predictive accuracy using simulated breeding values. For five of the seven methods for estimating predictive accuracy (Methods 1, 2, 3, 4 and 6), we comparatively evaluate their predictive accuracies using three-fold cross-validation. Of the seven methods, two direct methods (Methods 5 and 6) and one indirect method (Method 4) for estimating predictive accuracy are proposed and described here for the first time whereas the remaining four methods were obtained from the literature. Methods 1 to 3 assume uncorrelated genotypes in the model for estimating heritability but assume correlated genotypes in the model for estimating predictive ability.

Methods

We denote the standard deviation of a sample with s and that of a population with σ and the sample and population variance of the true genetic breeding values g with and , respectively. Further, we denote with r, , ρ and ρ the sample correlation, the sample correlation between the BLUP of g and the observed “phenotypes” p, the population correlation and the population correlation between the true genetic breeding values g and the observed “phenotypes” p, respectively. Also, we use , , and to denote the sample correlation between the true and the predicted genetic breeding values, the sample covariance between the true and the predicted breeding values, and the sample variance of the predicted breeding value and the phenotypic sample variance, respectively. In this paper, the sample will generally refer to a trial with n genotypes, real or simulated, assumed to have been obtained from an infinite population of genotypes.

True correlation

The true correlation is given by the correlation between the true (g) and the predicted () breeding values where is the covariance between the true and the predicted breeding values. Further, where denotes the estimated mean of g (i = 1, …, n) and where denotes the estimated mean of , are sample variances of the true and the predicted breeding values, respectively. We take the unobservable correlation to be the main quantity of interest to the breeder or geneticist. Seven alternative procedures are evaluated, by simulation, regarding the accuracy and precision with which they are able to estimate

Two-stage approaches

We consider the case of a trial conducted in a single location. The analysis can be done in two stages [10]. The model for the observed plot data can be written as where y is the vector of the observed phenotypic values, μ is a vector containing the adjusted genotype means to be estimated from a model in which genotype enters as a fixed effect and X1 is an associated design matrix and f combines all the fixed, random design and error effects (replicates, blocks, etc.).

The first stage of the two-stage approaches

At the first stage, means (μ) for the testcross genotypes are estimated using model (5) and submitted to the second stage. The adjusted means of the standard varieties are excluded from the dataset before submission to the second stage.

The second stage of the two-stage approaches

The adjusted means from the first stage are used in the second stage to predict the true breeding values g. The second stage model is given by where is the adjusted mean of the i-th genotype (estimates of μ = φ + g from the first stage), φ is the general effect or mean, g is the random effect of the i-th genotype and e is the residual error associated with , e = (e1, …, e ) assumed to follow . The random vector g = (g1, …, g ) is modelled by a linear random regression on the SNP markers with random regression coefficients or marker effects u = (u1, …, u ) as where Z is the matrix of SNP marker covariates, , I is the p -dimensional identity matrix and is the variance of marker effects. Thus, we simulated the random SNP marker effects as random draws from a normal distribution with zero mean and variance . Genotyping of all the genotypes was done by SNP markers (275 for the AgReliant dataset and 11646 for the KWS Synbreed dataset, see below) and the information stored in a matrix Z = {z }. The marker covariate z for the i-th genotype (i = 1, 2, …, n) and the k-th marker (k = 1, 2, …, p) for biallelic SNP markers with alleles A1 and A2 was set to 1 for A1A1, -1 for A2A2 and to 0 for A1A2, A2A1 and missing values. Thus, the genotypic effect g has variance where Z is the transpose of Z. Alternatively, for computing some of the heritability measures, we also fitted model (6) with , i.e., assuming that genotypic effects are uncorrelated for Methods 1 to 3, where is the genetic variance and I is the n-dimensional identity matrix. In general, when fitting model 6, assuming a linear variance-covariance model for G as defined in (8) it can sometimes happen that the estimated marker variance () is negative, yet it should not be. To ensure that the estimated is nonnegative it is necessary to specify a zero lower boundary constraint for . This can be accomplished using the lower b=value-list option of the parms statement of the MIXED procedure when using the SAS system.

Simulation of datasets

To simulate block and plot effects using variance components from the real maize (Zea mays) dataset provided by AgReliant we generated a dataset with 177 genotypes distributed over 10 incomplete blocks per replicate, each with 18 plots according to an alpha-design with two replicates. An appropriate model for this design must have an effect for the complete replicates, and another effect for the incomplete blocks, nested within replicates. We therefore simulated the field trial data according to an alpha design [11] using the model: where y is the yield of the i-th genotype in the j-th block nested within the k-th complete replicate, φ is the general effect or mean, γ is the fixed effect of the k-th complete replicate, b is the random effect of the j-th block nested within the k-th complete replicate, g is the random effect of the i-th genotype, and e is the residual plot error associated with y . We similarly simulated the block and plot effects using variance components estimated from the real maize dataset with 698 genotypes provided by KWS according to an alpha-design with two replicates based on model (9). The complete simulated datasets contained true genetic, block and plot effects. We used these datasets to compute the true correlation between the predicted and the true breeding values, true heritability as the square of the correlation between the predicted and the true breeding values and estimates of heritability for each of four different indirect methods. We considered four simulation scenarios defined by the parameters in Tables 1 and 2.

Table 1

The variance components for the AgReliant real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model

Variance components	Scenario 1	Scenario 2
Marker (σu2)	0.2019	0.2019/10
Block (σb2)	69.9089	69.9089
Residual (σe2)	48.6728	48.6728

Table 2

The variance components for the KWS-Synbreed real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model

Variance components	Scenario 3	Scenario 4
Marker (σu2)	0.005892	0.005892/10
Trial (σl2)	11.8285	11.8285
Trial×Replicate (σr2)	3.3231	3.3231
Trial×Replicate×Block (σb2)	6.3148	6.3148
Tester×Non-genotyped σg22 lines×GRP (σg2)	34.5717	34.5717
Residual (σe2)	53.8715	53.8715

The variance components for the AgReliant real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model The variance components for the KWS-Synbreed real maize data set estimated by RR-BLUP models assuming genotypes are correlated according to the linear variance model We used two real datasets to get marker information and estimates of the marker, block and error variance components, which we needed to simulate the true breeding values and phenotypic data, assuming correlated genotypes (Tables 1 and 2). For Scenarios 2 and 4 we divided the marker variance for Scenarios 1 and 3 by 10, respectively, to obtain smaller estimates of heritability (Tables 1 and 2).

The AgReliant maize dataset

The first data set we used was a small dataset provided by AgReliant Genetics. It consisted of 177 doubled haploid maize lines derived from biparental crosses. The hybrid performance for kernel dry weight per plot of testcross genotypes was assessed with the same common tester using an unreplicated augmented trial design with incomplete blocks. Although the testcross genotypes were tested in six locations in one year, not all testcross genotypes were tested in each location. Furthermore, three to five incomplete blocks, each having one single row of plots, were used per location. Standard varieties connected the different blocks in the sense that they allowed estimation of the inter-block variance and separation of the block from the error variance. The standard varieties themselves were not used in predicting g but were used merely to facilitate analysis of the testcross genotypes. Markers with more than 20% missing values, or more than 5% heterozygous genotypes, or with minor allele frequency less than 2.5% were discarded [10]. We used the data for only one of the six locations with the RR-BLUP model to obtain variance components needed to simulate the random marker, block and plot effects for Scenarios 1 and 2. The selected location had a single unreplicated trial, 5 blocks, 2 checks and 177 lines, all of which were genotyped. Since the two checks had markers, just like all the other genotypes, they were treated in the exact same way as the other genotypes in the RR-BLUP model. The RR-BLUP model assumed a linear variance model for the correlation between the genotypes: where y is the yield of the i-th genotype in the j-th block, φ is the general effect or mean, b is the random effect of the j-th block, the random vector g = (g1, …, g ) is modeled as in equation 7 and also has variance . The terms Z, Z , u and are defined as in equations 7 and 8 whereas e is the residual plot error associated with y .

The KWS-Synbreed maize dataset

The second data set was extracted for one location from a larger data set provided by KWS for the Synbreed project [12]. It had a total of 900 doubled haploid maize lines of which 698 testcrosses were genotyped while the remaining 202 were not, six hybrid checks and five line checks. The genotypes were crossed with four testers (Table 3). The testcross genotypes were tested using 9 trials each laid out according to a 10×10 lattice square design with incomplete blocks. Each trial had two replicates. There were a total of 1800 observations, 38 of which had no yield measurements.

Table 3

Definition of the variables in the KWS-Synbreed dataset used to compute covariance parameters used in the simulations for Scenarios 3 and 4

Tester	GRP	Z1	Z2	GENA	GENB	Description of GENA
T₀	C₁-C₆	0	0	C₁-C₆	1	Hybrid checks 1-6
T₀	nT	0	1	nT₁- nT₄	1	4 lines, not genotyped, unknown tester
T₀	fT	0	1	fT₁- fT₆₆	1	16 lines, not genotyped, tested with a foreign tester
T₁	G₀	0	1	T1₁-T1₆₆	1	66 lines, not genotyped and tested to T1
T₂	G₀	0	1	T2₁-T2₆₁	1	61 lines, not genotyped and tested to T2
T₁	G₁	1	0	1-682	1-682	682 lines, genotyped and tested to T1 in group G1
T₁	G₃	1	0	683-698	683-698	16 lines, genotyped and tested to T1 in group G3

GRP = grouping factor for checks (C1-C6), testers (nT, fT) and groups (G0-G3) of genotyped lines. Z1=1 for genotyped lines and 0 otherwise. Z2=1 for non-genotyped lines and 0 otherwise. GENA denotes all the individual genotypes. GENB=GENA for genotyped lines and 1 otherwise. GENB helps specify the vector of random effects of genotyped lines with a length that matches the dimension of the covariance matrix of genotypes.

Definition of the variables in the KWS-Synbreed dataset used to compute covariance parameters used in the simulations for Scenarios 3 and 4 GRP = grouping factor for checks (C1-C6), testers (nT, fT) and groups (G0-G3) of genotyped lines. Z1=1 for genotyped lines and 0 otherwise. Z2=1 for non-genotyped lines and 0 otherwise. GENA denotes all the individual genotypes. GENB=GENA for genotyped lines and 1 otherwise. GENB helps specify the vector of random effects of genotyped lines with a length that matches the dimension of the covariance matrix of genotypes. Fitting a linear variance model (7 and 8) to these data requires using a variance-covariance matrix of dimension n1 × n1, where n1 is the number of genotyped lines. The vector of effects of genotyped lines must therefore be of dimension n1. This presents a challenge for the KWS- data set because the vector of random effects of all the genotypes (g) contains both the vector of effects of the n1 genotyped lines (g1) plus the vector of the effects of the n2 non-genotyped lines (g2) and so has a larger dimension (n1+n2) than n1. To facilitate fitting the linear variance model for the genotyped lines we proceed as follows. First, we create a dummy variable in our dataset (Z1, i =1,…,n1, m=1,…, 11 groups) equal to one for genotyped lines and zero otherwise. Second, we create a variable called GENA in the dataset with a unique level for each of the genotyped and the non-genotyped lines. Third, we create a second variable called GENB equal to GENA for the genotyped lines but equal to the level for any one of the genotyped lines for all the non-genotyped lines. Thus, the variable GENB has n1 levels, corresponding to the n1 genotyped lines. For example in Table 3, the variable GENB, whose effect is modelled by g1, has been set equal to 1, 2,…, 698 for the 698 genotyped lines and to 1, the label for the first genotyped line, for all the 202 ungenotyped lines. The genetic effect g1 of the i-th genotyped line will be represented in the mixed model by the term Z1g1. This term will become zero for all the records corresponding to the non-genotyped lines, because for these records we have set Z1=0. This ensures that the number of random genotypic effects to be predicted for g1 equals the dimension of the linear variance-covariance matrix (n1). The non-genotyped lines therefore make no contribution at all to the estimated variance-covariance matrix of the genotypes. They are, in other words, switched off. The vector of random effects for the genotyped lines g1 is modelled by RR-BLUP as g1 = Zu with , where Z is the n1 × p design matrix for SNP markers for the n1 genotyped lines and u = (u1, …, u ) is the vector of p random SNP marker effects, with . I is the p-dimensional identity matrix and is the variance of SNP marker effects. We represent the genetic effects g2 of the i-th non-genotyped line in a similar fashion as for g1, i.e., we use the term Z2g2, where Z2 is a dummy variable that is equal to one for all the non-genotyped lines and equal to zero for all the genotyped lines. The effects g2 are assumed to be independent normally and distributed with variance . The following mixed model, assuming that the genotyped lines are correlated according to the RR-BLUP model (10), was used to estimate the variance components used in the simulations for Scenarios 3 and 4: where y is the response of the i-th genotype in the j-th block nested within the k-th replicate in the l-th trial in the m-th group tested against the n-th tester. φ is the general effect, t is the random effect of the l-th trial, assumed iid , r is the random effect of the k-th replicate nested within the l-th trial, assumed iid , b is the random effect of the j-th block nested within the k-th replicate in the l-th trial, assumed iid , δ is the fixed effect of the m-th group of checks, testers and genotypes, τ denotes the effect of the n-th tester (Table 3) and e is the residual plot error associated with y and is assumed to be iid , where is the error variance. To implement the model using the REML package PROC MIXED of the SAS System [13], the random genotypic effects were coded using the variables defined in Table 3. The random genotypic effect of the i-th genotyped line in the m-th group, Z1g1, was coded as (Z1*TS*GRP*GENB) using the variables tester (TS), group (GRP), genotypes (GENB), and Z1, where the last variable was specified as a quantitative variable, while the first three variables were declared as categorical variables (using the CLASS statement). The variable Z1 corresponds to the switch variable Z1 in the model (11). The random effect Z2g2 of the i-th non-genotyped line in the m-th group was coded as (Z2*TS*GRP*GENA) using the variables tester (TS), group (GRP) and genotypes (GENA) described in Table 3. Z2 is a second quantitative variable corresponding to the switch variable Z2.

Estimating predictive accuracy from predictive ability and heritability

Four of the seven approaches indirectly estimate predictive accuracy as the correlation between the predicted genetic and phenotypic values , called predictive ability, divided by the square root of heritability (H2) [14], separately for each of 15 three-fold cross-validation replicates. Predictive ability can be estimated from cross-validation as A key assumption of the approach is that [8], which implies that the genotypic effect estimate is not correlated with environmental components in the phenotypic value p. This suggests [8] that in practice we can replace (1) with The other assumption is that , so that An important question is how to estimate heritability H2. The fact that the definition of used in requires a marker-based model for g suggests that the same model should be used for defining heritability H2. The problem in practice is that the true model is not known, so that different methods for genomic selection (GS) are usually applied and their predictions compared empirically via CV [15]. To make any progress, some model must be chosen for defining predictive accuracy, and if the chosen model is close to the model for some GS method, then that same method would potentially be preferred for the estimation of predictive accuracy. Moreover, some methods for GS do not have an explicit underlying model. This is the case for some methods, for example, in the machine learning realm. The difficulty in choosing a model for heritability H2 makes it hard to devise an unambiguous definition for H2. Thus, any estimate of predictive accuracy should be taken only as a rough indication of precision. We suggest that the model underlying ridge regression BLUP be used to define heritability H2. This is because this method is the most commonly used one for GS and has been shown to have good properties. It is based on a specific mixed model, so an estimate of H2 can be obtained in various ways [16].

Cross-validation

We used cross-validation (CV) to obtain an estimate of the correlation between the predicted breeding values and the observed phenotypic values , which we needed to compute predictive accuracy for Methods 1, 2, 3, 4 and 6. We used a three-fold cross-validation to evaluate predictive accuracy for both datasets because of the small number of genotypes (177) in the AgReliant dataset. The dataset with the adjusted means for the testcross genotypes was split into three random subsamples, one of which was held out as a validation set at a time. The remaining two subsamples were combined into one training set. The three-fold CV procedure was replicated five times, yielding a total of 15 replicate datasets. We then fitted a ridge regression model (7) to each of the 15 replicate validation and training sets. We next computed predictive ability, the correlation between the predicted breeding values and the phenotypic values across all the genotypes. This procedure was repeated for each of the 1000 datasets simulated for each of the four scenarios. The correlation between the predicted and the true breeding values , the predictive accuracy, was computed by dividing predictive ability by the square root of estimated heritability for each of the four indirect methods. The estimates of predictive accuracy were compared using the simulated true breeding values. Moreover, we directly computed predictive accuracy using Methods 5, 6 and 7 as described below. Table 4 summarizes the key properties of the seven methods.

Table 4

Summary of the main properties of the seven methods

Method	Estimates heritability?	Requires heritability to estimate predictive accuracy?	Model for heritability assumes uncorrelated genotypes?	Model for predictive ability assumes correlated genotypes?	Requires cross validation?
1	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes
5	Yes	No	-	-	No
6	No	No	-	Yes	Yes
7	No	No	-	-	No

Methods 1 to 4 that require heritability to estimate predictive accuracy are called indirect methods whereas Methods 5 to 7 that do not are called direct methods in the text. The symbol (-) means the question is not applicable for a particular model.

Summary of the main properties of the seven methods Methods 1 to 4 that require heritability to estimate predictive accuracy are called indirect methods whereas Methods 5 to 7 that do not are called direct methods in the text. The symbol (-) means the question is not applicable for a particular model.

Methods for estimating predictive accuracy

We used the following five methods (Methods 1 to 5) to estimate heritability (Table 5). The first approach is the standard method for estimating heritability that is commonly used by plant breeders [12]. The second and the third approaches are modifications of the ad hoc measure based on BLUE and BLUP [16]. The fourth approach is based on a new proposal for estimating heritability. This approach uses the ratio of the expected value of the genetic variance to the expected value of phenotypic variance. The fifth approach is our second new method for estimating heritability without cross-validation using similar ideas to those used in computing the ad hoc measures of H2 (Table 5). Methods 1 to 3 assume that the genotypes are not correlated, while Methods 4 and 5 assume correlated effects according to the model underlying the RR-BLUP. We then used the quantity to compute predictive accuracy, where H is the square root of heritability computed using each of the first four methods (Methods 1 to 4) only. This is because even though Method 5 also computes heritability, it calculates predictive accuracy directly, similar to Methods 6 and 7. The three Methods 5, 6 and 7 were thus used to estimate predictive accuracy directly without first dividing predictive ability by the square root of heritability. The three direct methods assume correlated effects of genotypes according to the model underlying RR-BLUP.

Table 5

			Methods
		M0	*M1(15)	M2(16)	M3(17)	M4(21)	M5(24)
Scenario	^†Statistic	Htrue2	H^m12	H^m22	H^m32	H^m42	H^m52
1	H^2=0	0	0	0	0	0	0
	MIN	0.56	0.09	0.16	0.16	0.16	0.50
	MAX	0.82	0.50	0.66	0.66	0.55	0.80
	MEAN	0.71	0.32	0.48	0.48	0.34	0.67
	STD	0.04	0.06	0.08	0.08	0.06	0.05
	MSD	0.000	0.160	0.061	0.061	0.143	0.004
2	H^2=0	0	152	152	0	0	0
	MIN	0.09	0.00	0.00	0.00	0.00	0.00
	MAX	0.73	0.30	0.46	0.46	0.28	0.63
	MEAN	0.42	0.09	0.15	0.18	0.08	0.33
	STD	0.11	0.07	0.11	0.10	0.05	0.11
	^†MSD	0.000	0.128	0.094	0.083	0.128	0.025
3	H^2=0	0	0	0	0	0	0
	MIN	0.66	0.35	0.33	0.34	0.28	0.64
	MAX	0.79	0.62	0.61	0.61	0.53	0.78
	MEAN	0.73	0.51	0.49	0.50	0.40	0.72
	STD	0.02	0.04	0.04	0.04	0.03	0.02
	MSD	0.000	0.051	0.057	0.055	0.110	0.001
4	H^2=0	0	0	0	0	0	0
	MIN	0.36	0.00	0.00	-0.55	0.02	0.23
	MAX	0.63	0.32	0.31	0.31	0.15	0.52
	MEAN	0.52	0.14	0.13	0.13	0.07	0.39
	STD	0.04	0.07	0.06	0.07	0.02	0.05
	MSD	0.000	0.151	0.155	0.152	0.202	0.020

Methods 1 to 4 but not 5 use cross-validation. M0 is the square of the true correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated heritability.

†MSD=Mean squared deviation and H2 = 0 is the number of datasets for which the estimated heritability was zero. *The number of the equation used in the text is in parenthesis.

Descriptive statistics for the estimated true heritability for all the datasets out of a possible total of 1000 for which an estimate of heritability was available for all the five methods in each scenario Methods 1 to 4 but not 5 use cross-validation. M0 is the square of the true correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated heritability. †MSD=Mean squared deviation and H2 = 0 is the number of datasets for which the estimated heritability was zero. *The number of the equation used in the text is in parenthesis. Method 1: Standard measure Plant breeders often compute heritability for a single trial using where is the genetic variance, r is the number of replicates and σ2 is the variance of plot error [12]. This estimator is valid for randomized complete block designs, but is an ad hoc approximation for incomplete block designs. Also, the estimator is not applicable with spatial methods of analysis [17]. Method 2: A measure proposed by [16] that uses the BLUEs and is computed as where is the mean variance of a difference of two adjusted genotypic means (BLUE) and is the genetic variance estimated from (6) assuming independent genotypic effects. Method 3: An ad hoc measure proposed by [18] that is based on BLUP assuming independent genotypic effects and is computed as where is the mean variance of a difference of the BLUP of two genotypic effects . We used the BLUP of μ = φ + g as the phenotypic data in the mixed model for Method 3. Further details on the computation of and are in Additional file 1. Method 4: A new proposed measure for estimating heritability The sample variance of the true genetic breeding value g1 (3) can be written as where , I is the n-dimensional identity matrix and J = n × n is a square matrix of ones. In a similar way, we may represent the phenotypic sample variance as where is the vector of observed phenotypes. The observed phenotype of the i-th genotype is the adjusted mean computed for this genotype for a single location. We cannot compute the sample variance of g for real data because these are not observed. But if we assume a mixed model with linear variance structure for g taking the form , where g is the vector of g of all tested entries (i = 1, …, n), and is the SNP marker variance, then we can compute the expected value of the sample variance of g in equation 18. From standard results on the expected value of quadratic forms [12] we have Thus, we may define heritability as the expected genetic sample variance over the expected phenotypic sample variance: The expected value is now derived. The variance of phenotypes (i.e., adjusted means p) is given bywhere R is the variance-covariance matrix of the error term in equation (6). Therefore An estimate of this is easy to compute, as is an estimate of , by plugging in estimates for the variance-components in G and R. For illustration of the new Method 4, we consider three special cases in Additional file 2. Method 5: A new direct method for estimating This method uses the RR-BLUP of g as the “phenotype” to compute an alternative estimator of predictive accuracy unlike that produced by the methods that require cross-validation and use the adjusted means as the phenotypes. Heritability can be computed as the square root of the estimated predictive accuracy. Using equation 7, we have and Thus, BLUP(g = Zu) = GV- 1Qp, where Q = I - 1(1 V- 11)- 11 V- 1. Or in an even more compact form with C = GV- 1Q. Now consider the sample correlation of the true genotypic value g = Zu and its estimator , the BLUP of g. The sample covariance is given by where P is defined as in equation 18. We cannot compute this sample covariance directly, because g is not observable. But we can compute its expected value as follows: In the same vein, we find for the sample variances of the true (g) and the predicted () breeding values: The sample true correlation from equation 1 is then given by We want to estimate the expected value of this correlation. Approximately, we have Note that the correlation involves a function of three correlated random variables () and we must acknowledge that the expected value of a function of random variables is not usually exactly equal to the same function evaluated at the expected values of the random variables [19]. Some improvement of the approximation may be possible using the delta method, but we will not pursue this here. Instead, the closeness of the approximation (23) will be investigated in our simulations. From a practical point of view, the advantage of (23) is its simplicity. With this approximation, we may replace the expected values with their explicit expressions to yield the estimated predictive accuracy: Method 6: A further new direct method for estimating Our objective is to estimate by evaluating (14). It is straightforward to compute and in (14) directly from the data (p) and the predicted breeding values (). The only difficulty is the estimation of . If we could observe the g of all entries, we would compute their sample variance , just as we compute the sample variance and the sample covariance . Note that we similarly computed the sample correlation between the predicted breeding values and the observed phenotypic values p for each cross-validation replicates. Our proposed estimator becomes where was computed using equation 23. A Problem with this method as well as with Methods 1, 2, 3 and 4 is that can exceed one. We have therefore presented values of truncated at 1 in the main body of the paper and the untruncated values in Additional file 3 (Table 6 and Additional file 3: Table S1).

Table 6

Descriptive statistics for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) by scenario

		Methods
Scenario	^†Statistic	M0	^*M1(15)	M2(16)	M3(17)	M4(21)	M5(24)	M6(25)	M7(35)
		rg,g^	rg^,pH^m1	rg^,pH^m2	rg^,pH^m3	rg^,pH^m4	H^m5	rg,g^,m6	ρ^m7
1	N	1000	1000	1000	1000	1000	1000	1000	1000
	MIN	0.750	0.327	0.265	0.265	0.384	0.707	0.316	0.750
	MEAN	0.843^c	0.877^a	0.727 ^e	0.727^e	0.858^b	0.819^d	0.663^f	0.840^c
	MAX	0.908	1.000	1.000	1.000	1.000	0.893	0.884	0.899
	STD	0.024	0.109	0.104	0.104	0.093	0.028	0.084	0.023
	MSD	0.000	0.013	0.025	0.025	0.009	0.002	0.040	0.001
	Q1	0.829	0.807	0.661	0.661	0.802	0.803	0.612	0.826
	Median	0.846	0.890	0.723	0.724	0.862	0.822	0.665	0.841
	Q3	0.860	0.983	0.793	0.793	0.923	0.839	0.716	0.856
2	N	839	839	839	839	839	839	839	839
	MIN	0.31	0.00	0.00	0.00	0.00	0.06	0.00	0.08
	MEAN	0.65^a	0.63^c	0.50^e	0.50^e	0.64^ab	0.58^d	0.46^f	0.64^bc
	MAX	0.85	1.00	1.00	1.00	1.00	0.79	1.00	0.82
	STD	0.09	0.29	0.26	0.26	0.26	0.10	0.20	0.09
	MSD	0.000	0.083	0.092	0.092	0.069	0.02	0.081	0.011
	Q1	0.61	0.42	0.31	0.31	0.47	0.53	0.33	0.59
	Median	0.66	0.64	0.48	0.48	0.68	0.59	0.47	0.65
	Q3	0.71	0.91	0.67	0.67	0.85	0.64	0.59	0.70
3	N	1000	1000	1000	1000	1000	1000	1000	1000
	MIN	0.81	0.60	0.61	0.61	0.69	0.80	0.54	0.78
	MEAN	0.85^a	0.72^c	0.73^c	0.73^c	0.81^b	0.85^a	0.64^d	0.81^b
	MAX	0.89	0.88	0.90	0.89	0.96	0.88	0.78	0.84
	STD	0.01	0.04	0.04	0.04	0.04	0.01	0.04	0.01
	MSD	0.0000	0.0193	0.0169	0.0176	0.0036	0.0002	0.0477	0.0017
	Q1	0.85	0.69	0.70	0.70	0.79	0.84	0.61	0.81
	Median	0.85	0.72	0.73	0.73	0.81	0.85	0.64	0.81
	Q3	0.86	0.75	0.76	0.76	0.84	0.85	0.66	0.82
4	N	955	955	955	955	955	955	955	955
	MIN	0.60	0.14	0.14	0.14	0.15	0.48	0.24	0.52
	MEAN	0.72^a	0.32^e	0.33^e	0.33^e	0.36^d	0.62^c	0.63^b	0.64^b
	MAX	0.79	0.52	0.53	0.52	0.56	0.72	0.93	0.72
	STD	0.03	0.06	0.06	0.06	0.07	0.04	0.09	0.03
	MSD	0.000	0.160	0.157	0.158	0.130	0.012	0.015	0.007
	Q1	0.70	0.29	0.29	0.29	0.32	0.60	0.57	0.62
	Median	0.72	0.32	0.33	0.33	0.37	0.62	0.64	0.64
	Q3	0.74	0.36	0.37	0.37	0.41	0.65	0.70	0.66

All the methods except 5 and 7 use cross-validation. M0 is the correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated predictive accuracy. N is the number of data sets out of a possible total of 1000 for which estimates were available for all the seven methods. Means for pairs of methods within each scenario with the same superscript letter are not significantly different at the 5% level of significance based on the t-test.

†MSD=Mean squared deviation, Q1 is the lower quartile and Q3 is the upper quartile. * The number of the equation used in the text is in parenthesis.

Descriptive statistics for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) by scenario All the methods except 5 and 7 use cross-validation. M0 is the correlation between the predicted and the true simulated breeding values used as the benchmark for assessing the estimated predictive accuracy. N is the number of data sets out of a possible total of 1000 for which estimates were available for all the seven methods. Means for pairs of methods within each scenario with the same superscript letter are not significantly different at the 5% level of significance based on the t-test. †MSD=Mean squared deviation, Q1 is the lower quartile and Q3 is the upper quartile. * The number of the equation used in the text is in parenthesis. Method 7: This method is commonly used in animal breeding [20-22] We used the linear mixed model: where X is the design matrix for the fixed effects, β is the vector of regression coefficients for the fixed effects, is the random marker effects with variance , the residual errors e = (e1, …, e ) are assumed to follow with variance and Z is the marker matrix. The random vector g = (g1, …, g ) is obtained from a linear regression on the random marker (SNP) effects u , i.e. u = (u1, …, u ) as A common approach to the evaluation of predictive accuracy (ρ) in animal breeding is the use of the squared correlation between the true and the predicted breeding values (ρ2), called reliability [20]. We used the approach of [20] as implemented by [16]. The calculation of ρ2 requires a solution to the mixed model equations [21] given by [16] where ()- denotes a generalized inverse of the coefficient matrix of the MME [23]. If the variance-covariance matrix of the random effects u and the genetic effects g = Zu is given by where and then it follows that [24] where The distribution of g and is then multivariate normal with zero mean and variance-covariance matrix where [25]. The REML estimate of M can be computed from a mixed model package by noting that After substituting for M, equation 32 reduces to from which the reliability for each genotype is calculated as where we extract only the corresponding diagonal elements from the block matrices var(g), and . The reliability of all genotypes in each dataset is then estimated by and the accuracy by its square root, where n is the total number of genotypes in the data set. The SAS (version 9.3) code used to simulate the phenotypic data and fit all the seven models is provided in Additional file 4.

Evaluation of simulated data

We used the two-stage analysis and the methods described above to estimate the correlations between the predicted and the true breeding values . We computed the true heritability as the square of the correlation between the predicted and the true breeding values and estimated heritability using the five different approaches. We then computed the ratio of the expected value of the genetic variance to the expected value of the phenotypic variance and used this to compute heritability based on the new proposed method (Method 4) for estimating heritability. Moreover, we estimated the adjusted least square means of genotypes from the first stage using simulated data. The adjusted least square means were used in the second stage in cross-validations as the phenotypic data. Also, the variance-covariance matrix of the adjusted means was used to compute an ad hoc measure of heritability according to equation 16. Because the KWS maize dataset had many genotypes (n = 698) the mixed models were computationally very demanding to fit. So, for example, the slowest method, Method 6, took only 17.5 hrs to fit all the 1000 small data sets in one scenario in SAS Version 9.3 running on a 64-bit Windows 7 workstation with 8 GB RAM and an Intel Core Quad 2.66, but it took 192 hrs to fit the same model to 1000 large data sets. Hence, we first estimated the start values for the variance-components of the mixed models using the hpmixed procedure of SAS to enhance computational efficiency.

Comparing heritabilities

We computed true heritability as the square of the correlation between the predicted and the simulated true breeding values: To compare the true heritabilities H2 with their estimates computed using each of the four different indirect methods (Methods 1, 2, 3, and 4) and Method 5 we computed the mean squared deviation (MSD) of each estimate from the true heritability for each simulated dataset and method combination as where N is the total number of the simulated datasets and j= (1, 2, …, N) denotes the j-th simulated data set. Moreover, we computed descriptive statistics for the true and estimated heritabilities.

Comparing predictive accuracies

For each simulated dataset we computed the “true” correlation (accuracy) as the correlation between the predicted and the simulated true breeding values and compared this with estimates of the same correlation computed using each of the seven different methods. To compare the true correlation with its seven estimates we computed the mean squared deviation (MSD) of each estimate from the true correlation for each simulated dataset and method combination where N is the total number of the simulated datasets and j= (1, 2, …, N) denotes the j-th simulated dataset. We recall here that MSD integrates information on (i) the correlation between the predicted and the true accuracy, (ii) the slope of the regression of the predicted against the true accuracy and (iii) the bias between the predicted and the true accuracy [26]. Nevertheless, the correlation and bias between the predicted and the simulated true accuracies are displayed or can readily be inferred from Figures 1, 2 and 3 and Additional file 3: Figures S1-S3 in the supplementary materials. Further, we calculated descriptive statistics for the true correlation and all its seven estimates. We also compared the estimated predictive accuracies between pairs of the seven methods.

Figure 1

Box Whisker plot for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios.

Figure 2

Frequency histograms for the true accuracy versus the estimated predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios.

Figure 3

Scatter plots of estimated predictive accuracy against the true simulated accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the methods in each scenario. The 1:1 (y = x) line is superimposed for comparison.

Box Whisker plot for predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios. Frequency histograms for the true accuracy versus the estimated predictive accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the seven methods in each of the four scenarios. Scatter plots of estimated predictive accuracy against the true simulated accuracy (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for all the methods in each scenario. The 1:1 (y = x) line is superimposed for comparison. For each scenario we compared the simulated true and the estimated predictive accuracies for the seven methods using t-tests (α = 5%) adjusted for multiplicity using simulation adjustment. The t-tests were derived from a mixed model with fixed effects for method and scenario and their interaction and a random effect for simulation replicates nested within scenarios [6].

Results

Heritability

Heritability was estimated by Methods 1 to 5 only. The estimated heritability was closer to its true simulated value for Methods 2, 3 and 5 than for Methods 1 and 4 in terms of its minimum, maximum, mean, standard deviation and mean squared deviation for all the four scenarios. All the five methods (Methods 1 to 5) underestimate the minimum, maximum and the mean true heritability in all the scenarios. Method 5 produced estimates closest to the true heritability for all the four scenarios. Across scenarios based on the same data set, the estimated heritability tended to be closer to its true value in Scenario 1 than in 2 and in Scenario 3 than in 4 (Table 5), implying that reducing the genetic variance by a factor of 10 in scenarios 2 and 4 reduced the accuracy of estimated heritability.

Predictive accuracy

In general, all the seven methods produced reasonable estimates of predictive accuracy across all the four scenarios. The estimated predictive accuracy was more precise for Scenarios 3 and 4, based on the large dataset, than for Scenarios 1 and 2, for all the seven methods (Table 6, Figures 1, 2 and 3). Reducing the genetic variance by a factor of 10 while leaving all the other variance components unchanged degrades the precision of the estimated predictive accuracy more for the smaller of the two datasets (Table 6, Figures 1, 2 and 3 and Additional file 3: Figures S1-S4). Methods 5 and 7 were the best overall and gave the least biased and most precise estimates of predictive accuracy, most notably for Scenarios 1, 3 and 4 (Table 6 and Figures 1, 2 and 3). Even so, Method 5 tended to be more precise than Method 7 for both the scenarios (3 and 4) based on the larger dataset. All estimates of predictive accuracy for Methods 5 and 7 were between 0 and 1 for all scenarios. Also, the models for Methods 5 and 7 converged for all the 1000 simulated datasets in all the scenarios (Tables 5 and 6, Additional file 3: Figures S1-S2). Unlike Methods 5 and 7, the performances of the other methods were more varied across scenarios. In particular, there was a greater tendency for the estimated predictive accuracy to exceed one (“overshooting”) or to be smaller than zero (“undershooting”) and a higher frequency of failure of the algorithms used to fit the RR-BLUP models to converge (Additional file 3: Table S1, Additional file 3: Figures S1-S8). For Scenario 1 overshooting was most frequent and severe (greater in magnitude) for Method 1 (21.8%, n=1000, range 1.0004-1.8960) followed by Methods 4 (6.8%, range 1.0015-1.1917), 2 (1.5%, range 1.0022-1.4152) and 3 (1.5%, range 1.0018-1.4153). In contrast, all estimates of predictive accuracy for Methods 5, 6 and 7 fell between 0 and 1. Moreover, all the seven models converged for all the 1000 simulated datasets (Table 6 and Additional file 3: Table S1). Compared to Scenario 1, overshooting and undershooting were more common for Methods 1, 2, 3, 4 and 6 in Scenario 2. Generally, overshooting was more serious (greater in absolute magnitude) than undershooting, particularly for Methods 1 (n=164 datasets), 2 (n=65), 3 (n=65) and 4 (n=113) (Table 6, Additional file 3: Figures S1, S4 and S6). The problem of overshooting or undershooting did not occur for any method in Scenario 3. Although the problem of overshooting still persisted for Methods 1 (n=95), 2 (n=104), 3 (n=87) and 4 (n=163) in Scenario 4, these methods had three further drawbacks in Scenario 4. The first was the breakdown of the method for computing predictive accuracy because estimated heritability was zero, most especially for Methods 1 (n=27) and 2 (n=27). The second was that the genetic variance estimate was zero for Method 3 (n=27). The third was the failure of the mixed model used to compute predictive ability required by Methods 1, 2, 3, 4 and 6 to converge (n=18). As a result, predictive accuracy could not be computed for 45 datasets for each of the Methods 1, 2 and 3 in Scenario 4. Undershooting was rather rare by comparison and was noted only for Method 3 (n=17) in Scenario 4 (Additional file 3: Figures S2, S4 and S8). Aside from overshooting and undershooting, deviation of the predictive accuracy from its expected values was also caused by overestimation and underestimation. Considering only the values of predictive accuracy between 0 and 1, the seven methods tended to underestimate the true accuracy across all the four scenarios. Underestimation tended to be more severe for Methods 1, 2, 3, 4 and 6 than Methods 5 and 7. This was most evident in Scenario 4. By comparison, overestimation was far less common (Table 6, Figures 2 and 3, Additional file 3: Figure S2-S4). For 11 datasets from Scenario 2 the estimated true accuracy was smaller than zero because the estimated variances of the BLUPs were zero and the estimated genetic variances were nearly zero. We expect plant breeders to discard genotypes for which the estimated genetic variance is nearly zero in making selection decisions in real applications. Consequently, we excluded the 11 simulated datasets with negative true accuracies from all the comparisons to mimic what plant breeders do in practice. There was therefore no estimated true accuracy to compare with the corresponding estimated predictive accuracy for all the seven methods. Similarly excluded from all the comparisons were four further datasets in Scenario 2 for which the mixed models for estimating the true accuracy did not converge. A comparison of the performances of the methods showed that methods with similar performances clustered into two distinct groups in each of the four scenarios. One of the two groups identified by regressing the estimated predictive accuracies for each pair of the seven methods on each other comprised Methods 1, 2, 3, 4, and 6 whereas the other consisted of Methods 5 and 7 (Table 6 and Additional file 3: Table S2, Figures 4, 5, 6 and 7 and Additional file 3: Figures S5-S8). Results of the t-tests reaffirmed this general pattern besides showing that the estimated true accuracies for Methods 5 and 7 are generally closer to the true predictive accuracy than the estimates for the other methods, especially for the two scenarios based on the large data set (Table 6 and Additional file 3: Table S1).

Figure 4

Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 1.

Figure 5

Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 2.

Figure 6

Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 3.

Figure 7

Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 4.

Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 1. Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 2. Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 3. Scatter plots comparing all the estimated predictive accuracies for pairs of the seven tested methods for Scenario 4. We expected the predictive accuracies of pairs of methods that accurately estimate the true predictive accuracy to be positively and not negatively correlated with each other. However, correlations between estimated predictive accuracies for some pairs of the seven methods were, surprisingly, negative even though the predictive accuracies for the individual methods were themselves high and positive (Table 7 and Additional file 3: Table S2, Figures 4, 5, 6 and 7 and Additional file 3: Figure S5-S8). This was the case, for example, for Method 4 versus 5 and 7 in Scenario 1, and for Method 5 versus 6 in Scenarios 2 and 3. This is due to the fact that some Methods (e.g., 5 and 7) tended to produce large values of predictive accuracy while others (e.g., Methods 4 and 6) tended to produce small values when the estimated genetic variance was very small because of the way the genetic variance enters the denominators of the estimators used by these methods to compute predictive accuracy.

Table 7

Correlation between predictive accuracies (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for pairs of the seven methods by scenario

Scenario	Method	M1	M2	M3	M4	M5	M6	M7
1	M1	1.00
	M2	0.94	1.00
	M3	0.94	1.00	1.00
	M4	0.90	0.89	0.89	1.00
	M5	-0.02	0.04	0.04	-0.18	1.00
	M6	0.81	0.84	0.84	0.96	-0.06	1.00
	M7	-0.02	0.04	0.04	-0.18	1.00	-0.07	1.00
2	M1	1.00
	M2	0.97	1.00
	M3	0.97	1.00	1.00
	M4	0.82	0.78	0.78	1.00
	M5	0.21	0.17	0.17	0.04	1.00
	M6	0.70	0.67	0.67	0.91	-0.02	1.00
	M7	0.26	0.22	0.22	0.11	0.95	0.03	1.00
3	M1	1.00
	M2	1.00	1.00
	M3	1.00	1.00	1.00
	M4	0.91	0.91	0.91	1.00
	M5	-0.11	-0.12	-0.12	-0.32	1.00
	M6	0.83	0.83	0.83	0.98	-0.28	1.00
	M7	-0.11	-0.12	-0.12	-0.32	1.00	-0.28	1.00
4	M1	1.00
	M2	1.00	1.00
	M3	1.00	1.00	1.00
	M4	0.99	0.99	0.99	1.00
	M5	0.59	0.59	0.59	0.58	1.00
	M6	0.77	0.77	0.77	0.79	0.04	1.00
	M7	0.59	0.59	0.59	0.58	1.00	0.05	1.00

The number of simulated datasets out of a possible total of 1000 for which estimates of predictive were available for each pair of methods was taken as the minimum for the pair.

Correlation between predictive accuracies (estimates less than 0 were set to 0 whereas estimates greater than 1 were set to 1) for pairs of the seven methods by scenario The number of simulated datasets out of a possible total of 1000 for which estimates of predictive were available for each pair of methods was taken as the minimum for the pair.

Discussion

One possible definition of heritability [27] is based on the squared correlation between “genotype” and “phenotype”. In our current work, we use various ad hoc measures of heritability. The one proposed by [18] assesses the squared correlations between BLUPs of genotypic values and the true genotypic values using Method 3. Another measure, proposed by [16], considers the correlation between adjusted means (BLUEs of genotypic values) and the true genotypic values using Method 2. So, one may argue that these different measures use different definitions of “the phenotype”. In this paper, we use estimates of heritability to compute “predictive accuracy” as , where is the correlation between genomic selection estimators of the true breeding values g and the “phenotypic values” in the validation set and H is the square root of heritability. In this application, the “phenotypes” are adjusted means (BLUEs), so estimators of the square root of heritability H that take BLUEs to be the phenotype are appropriate. It is important to note that is itself an estimator of the square root of heritability if we take the RR-BLUPs as the “phenotypes”. An important property of the procedures we consider (this provides estimators of H by taking RR-BLUPs to be the phenotypes) is that is obtained from cross-validation. Alternatively, we also estimate heritability H2 (or H) without cross-validation using similar ideas to those used for the ad hoc measures of heritability H2 and for the “direct method” (Method 5). The quantity (24) provides an alternative estimator of heritability when the RR-BLUP of the genetic variance (g) is considered as the “phenotype”. This estimator () was the most accurate of the five we used to compute heritability. But in our cross-validations, the phenotypes are always the adjusted means. So, we just use (24) as an alternative estimator of predictive accuracy. The strikingly poor performances of Methods 1 to 3 as indicated by all the estimated mean heritabilities falling below the minimum of the true heritability may seem surprising at first sight but in the case of Methods 1 to 3 may reflect the fact that these three methods assume that genotypes are uncorrelated. If this is true then we would expect the performance of these methods to improve considerably if the true heritability used as the benchmark were also estimated using a model that assumes uncorrelated genotypes. To test this expectation, we re-calculated the true heritability assuming that the genotypes are uncorrelated for each of the four scenarios by setting all the covariances in the variance-covariance matrix of genotypes to zero. Using the new benchmark led, as expected, to a much better agreement between the new true heritability and its estimates by Methods 1 to 3 (Additional file 3: Table S2). This demonstrates compellingly that Methods 1 to 3 are all reasonably good at estimating heritability when genotypes are not correlated but are severely biased downwards when they are. The downward bias implies, furthermore, that Methods 1 to 3 are not suited for estimating heritability used to divide predictive ability to obtain predictive accuracy in genomic prediction for which genotypes are typically assumed to be correlated. By contrast, Method 5 that assumes correlated genotypes performs much better at estimating heritability even though the estimated heritability is merely a by-product and not needed for estimating predictive accuracy. Method 4 differs from Methods 1 to 3 in assuming that the genotypes are correlated, unlike the latter methods that assume that genotypes are independent. This suggest that there must be yet another reason that the estimated mean heritability for Method 4 falls below the minimum expected true heritability estimated assuming that the genotypes are correlated. We can think of two plausible explanations for this discrepancy both of which apply equally to all Methods 1 to 4. The first is that all the heritability estimators for Methods 1 to 4 are constructed as ratios of the genetic and the phenotypic variances such that the greater is the phenotypic variance the smaller is the estimated heritability, whereas we defined the true heritability in terms of the squared correlation between true and estimated genotypic effect. The second relates to the observation that despite its poor estimates of heritability relative to those for Methods 1 to 3, the estimated predictive accuracy for Method 4 is generally closer to the estimated true accuracy than the estimates for Methods 1 to 3. This is quite intriguing because Methods 1 to 4 calculate predictive accuracy as the ratio of predictive ability to heritability. Since all the four methods use the exact same values of predictive ability and since Method 4 yields somewhat poorer estimates of heritability than Methods 1 to 3, we would not logically expect Method 4 to produce better estimates of predictive accuracy. That it actually did suggests that the reliability of approximating predictive accuracy by dividing predictive ability by the square root of heritability is questionable, at the very least for the specific configurations of the phenotypic and genotypic variances we used in our simulation scenarios. We compared the performances of seven methods for estimating predictive accuracy in genomic selection using 1000 datasets simulated according to an alpha-design for each of four scenarios based on genetic and residual variance estimates calculated from two real datasets. The results show that, of the seven methods, a new proposed method (Method 5) and a method which is well established in animal breeding programs (Method 7, [18]), consistently gave the least biased, most precise and stable estimates of predictive accuracy across all the four scenarios. Method 5 was at least as good as or better than Method 7 for estimating predictive accuracy. The other methods performed somewhat inconsistently across scenarios and suffered varying degrees of overshooting, undershooting and convergence problems. All the methods were more likely to underestimate than overestimate the true predictive accuracy when only datasets for which the estimated predictive accuracies fell between 0 and 1 were considered. The 0–1 truncation of the estimated predictive accuracy reflects the fact that we should not use the ratio of predictive ability to heritability () in practice if it does not fall within the interval [0, 1]. In summary, Methods 5 and 7 had the best performance followed by Method 4. Method 6 was the third best whereas Methods 1, 2 and 3 had rather similar and the worst performance. Methods 5 and 7 had rather similar performance in all scenarios despite the theoretical expectation that Method 5 should do better than Method 7 for the scenarios with small sample size. This expectation arises from the fact that whereas Method 7 assumes that the focal genotypes are derived from an infinite target population, Method 5 assumes that the sampled genotypes arise from a finite population. Consequently, the two methods may be expected to perform well for large sample sizes and Method 5 to perform better than Method 7 in small sample situations. The similar performance of Methods 5 and 7 is therefore tentative and its generality will be explored for a wider range of sample sizes in a sequel to this paper focusing on the influence of sample size on the predictive performance of the seven methods. Although their empirical performances in the simulations were often reasonable, Methods 1, 2 and 3 involve dividing two quantities computed using two different models with conflicting assumptions. Specifically, they involve dividing predictive ability computed from an RR-BLUP model, assuming that genotypes are correlated, by the square root of heritability computed from a model assuming that genotypes are uncorrelated. The computation of predictive ability using the model assuming that the genotypes are correlated, when heritability is computed assuming uncorrelated genotypes, would seem unavoidable when using RR-BLUP. This is because genomic breeding values cannot be predicted using RR-BLUP if the genotypes are assumed to be independent. This theoretical inconsistency undermined the performance of these three methods in several instances when the genetic variance estimated assuming uncorrelated genotypes was zero or nearly zero. This rendered predictive accuracy inestimable for Methods 1, 2 and 3 in these cases. Despite the inferior performance of Methods 1 to 3, linked to the inconsistency in the definitions of their numerators and denominators, relative to the methods that estimate predictive accuracy directly, this theoretical inconsistency has not deterred plant breeders from using these approaches. In fact, Methods 1 to 3 are the most frequently used by plant breeders. These three methods are not very similar by construction, despite the similarity of their performance. Hence, the similar performance could not necessarily have been anticipated a priori. Accordingly, by studying the properties of these methods alongside those of the other contending methods, we have ascertained whether and when their theoretical inconsistency may lead to inferior performance. Our findings suggest that these methods should be used with care, especially when the genetic variance is very small, so that predictive accuracy is likely to be either inestimable or overestimated. Although rare and probably relatively simple, there are instances in which the theoretical inconsistency is immaterial. For example, an independent model can be obtained in such simple special cases as a doubled haploid population, resulting from a single cross [2], if the dependent model is a conditional model that considers genetic variance-covariance conditioning on the marker genotypes whereas the marker genotypes are taken as random variables. In complex pedigrees, however, the unconditional model will also involve correlations, so that the impact of the theoretical inconsistency is more likely to be consequential. We emphasize also that while our simulation results hold for the RR-BLUP model, their applicability to other models than RR-BLUP remains to be investigated.

Influence of the size of genetic variance

The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in estimates of predictive accuracy. The estimated predictive accuracy was closer to its true value for all the methods in Scenarios 1 and 3 than in Scenarios 2 and 4 for all the 1000 datasets, because the simulated genetic variances for Scenarios 1 and 3 were 10 times larger than those for Scenarios 2 and 4. When the simulated genetic variance was small (Scenarios 2 and 4), there was a higher likelihood of obtaining extremely high values of estimated genetic variances than when a higher genetic variance was simulated (Scenarios 1 and 3). Methods 1, 2 and 3 were the most sensitive to variation in the estimated genetic variance because they all divide predictive ability by the square root of heritability to obtain predictive accuracy and hence break down when genetic variance estimate is zero, since the estimated heritability is then also zero. For Method 3, in particular, predictive accuracy becomes infinitely large when the estimated genetic variance is zero and extremely large when this variance is very small. Since Method 4 also computes predictive accuracy in the same way as Methods 1, 2 and 3 do, it also breaks down when the estimated genetic variance and hence estimated heritability is zero.

Influence of the number of genotypes

Increasing the number of genotypes, for example from 177 for Scenarios 1 and 2, to 698 for Scenarios 3 and 4, increased the time required to compute predictive accuracy by all the seven methods from a few minutes to several days, most notably for the methods that require cross-validation (Methods 1, 2, 3, 4 and 6).

Simplicity of implementation of methods

Methods 1, 2, 3, 4 and 6 that use cross-validation are computationally much more intensive to implement than Methods 5 and 7 that do not involve cross-validation. As a result, Methods 5 and 7 are the simplest and computationally most efficient to implement of the seven methods. This argues for their routine use in assessing predictive accuracy in genomic selection studies. Considering only estimated correlations between zero and one, Methods 4 and 6 gave the best estimates of predictive accuracy among the five methods that use cross-validation, followed by Methods 3, 2 and 1, in that order.

Design considerations

We considered a single trial laid out as an alpha-design in a single location for simplicity. Hence, a more extensive simulation with more trials, locations and trial designs would be required to establish the generality of our results. Further, we considered only two data sets with different numbers of genotypes (177 and 698) and markers (275 and 11646). However, variation in the number of genotypes and markers probably also affects the estimated predictive accuracy and thus also merit further investigation. Some breeders compute heritability using methods assuming that the datasets are balanced and that the genotypes are independent. Four of the seven methods (Methods 4, 5, 6 and 7) relax these restrictive assumptions by allowing for both balanced and unbalanced datasets as well as for independent and correlated genotypes.

Conclusions

Methods 5 and 7 were the most computationally efficient to implement and gave consistently the most accurate, robust and stable estimates of predictive accuracy of the seven methods across all the four scenarios. These properties argue for their routine use in assessing predictive accuracy in genomic selection studies. Among the five methods that use cross-validation, Methods 4 and 6 performed better than Methods 1, 2 and 3 but were clearly inferior to Methods 5 and 7. Both the genetic variance and the number of genotypes exerted strong influences on predictive accuracy. Thus, predictive accuracy was higher for the larger data set. Furthermore, reducing the genetic variance degraded predictive accuracy much more for the smaller of the two data sets. We are investigating the influences of genetic variance and the number of genotypes on predictive accuracy in genomic selection in greater detail in a sequel to this paper.

Abbreviations

GS: Genomic selection; RR-BLUP: Ridge-regression BLUP; BLUE: Best linear unbiased estimation; BLUP: Best linear unbiased prediction; REML: Restricted maximum likelihood; SNP: Single nucleotide polymorphism; CV: Cross-validation; H2: Heritability; MME: Mixed model equations; : BLUP of the genetic effects g; p: the BLUE of genotype means (“phenotypes”); S: Sample standard deviation; : Sample variance of the true genetic breeding values g; : Sample variance of the predicted breeding value; : Phenotypic sample variance; σ: Population standard deviation; : Population variance of the true genetic breeding values g; r: Sample correlation; : Sample correlation between the true and the predicted breeding values; : Sample correlation between the BLUP of g and the observed “phenotypes” p; ρ: Population correlation; ρg,p: Population correlation between the true genetic breeding values g and the observed “phenotypes” p; : Sample covariance between the true and predicted breeding values.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SBOE and JOO participated in the design of the study, conducting the simulations, interpreting the results, drafting and editing the manuscript. TSS participated in conducting the simulations. CK and MO conceived and designed the field trials and supervised the collection of the Synbreed data set. AG conceived and designed the field trials and supervised the collection of the AgReliant data set. HPP conceived the study, participated in its design, writing and editing the manuscript and oversaw the project. All authors read and approved the final manuscript.

Additional file 1

How to compute the average variance of a difference from the variance-covariance matrix of adjusted means. Click here for file

Additional file 2

Three special cases for the new Method 4. Click here for file

Additional file 3

Descriptive statistics for predictive accuracy by scenario (Table S1), Descriptive statistics for the estimated true heritability assuming that genotypes are not correlated for each of the four scenarios (Table S2), Box Whisker plot of all the predictive accuracies for scenario 2 (Figure S1), Box Whisker plot of all the predictive accuracies for scenario 4 (Figure S2), Frequency histograms for the true (green) versus the estimated (white) predictive accuracy for all the seven methods and four scenarios (Figure S3), Scatter plots of estimated predictive accuracy against the true accuracy for all the seven methods and four scenarios (Figure S4), Scatter plots comparing the estimated predictive accuracies for pairs of the seven tested methods for scenario 1 (Figure S5), Scatter plots comparing the estimated predictive accuracies for pairs of the seven tested methods for scenario 2 (Figure S6), Scatter plots comparing the estimated predictive accuracies for pairs of the seven tested methods for scenario 3 (Figure S7) and Scatter plots comparing the estimated predictive accuracies for pairs of the seven tested methods for scenario 4 (Figure S8). Click here for file

Additional file 4

SAS (version 9.3) code used to simulate phenotypic data and implement all the seven methods. Click here for file

11 in total

1. Prediction of total genetic value using genome-wide dense marker maps.

Authors: T H Meuwissen; B J Hayes; M E Goddard
Journal: Genetics Date: 2001-04 Impact factor: 4.562

2. Marker-assisted selection using ridge regression.

Authors: J C Whittaker; R Thompson; M C Denham
Journal: Genet Res Date: 2000-04 Impact factor: 1.588

3. Computing heritability and selection response from unbalanced plant breeding trials.

Authors: Hans-Peter Piepho; Jens Möhring
Journal: Genetics Date: 2007-11 Impact factor: 4.562

4. Prediction of response to marker-assisted and genomic selection using selection index theory.

Authors: J C M Dekkers
Journal: J Anim Breed Genet Date: 2007-12 Impact factor: 2.380

Review 5. Genomic selection.

Authors: M E Goddard; B J Hayes
Journal: J Anim Breed Genet Date: 2007-12 Impact factor: 2.380

6. Efficient methods to compute genomic predictions.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2008-11 Impact factor: 4.034

7. Genome-based prediction of testcross values in maize.

Authors: Theresa Albrecht; Valentin Wimmer; Hans-Jürgen Auinger; Malena Erbe; Carsten Knaak; Milena Ouzunova; Henner Simianer; Chris-Carolin Schön
Journal: Theor Appl Genet Date: 2011-04-20 Impact factor: 5.699

8. Comparisons of single-stage and two-stage approaches to genomic selection.

Authors: Torben Schulz-Streeck; Joseph O Ogutu; Hans-Peter Piepho
Journal: Theor Appl Genet Date: 2012-08-19 Impact factor: 5.699

9. The impact of genetic relationship information on genomic breeding values in German Holstein cattle.

Authors: David Habier; Jens Tetens; Franz-Reinhold Seefried; Peter Lichtner; Georg Thaller
Journal: Genet Sel Evol Date: 2010-02-19 Impact factor: 4.297

10. Performance of genomic selection in mice.

Authors: Andrés Legarra; Christèle Robert-Granié; Eduardo Manfredi; Jean-Michel Elsen
Journal: Genetics Date: 2008-08-30 Impact factor: 4.562

14 in total

Review 1. Genomic approaches to selection in outcrossing perennials: focus on essential oil crops.

Authors: David Kainer; Robert Lanfear; William J Foley; Carsten Külheim
Journal: Theor Appl Genet Date: 2015-08-04 Impact factor: 5.699

2. Heritability in Plant Breeding on a Genotype-Difference Basis.

Authors: Paul Schmidt; Jens Hartung; Jörn Bennewitz; Hans-Peter Piepho
Journal: Genetics Date: 2019-06-27 Impact factor: 4.562

3. Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato.

Authors: Jeffrey B Endelman; Cari A Schmitz Carley; Paul C Bethke; Joseph J Coombs; Mark E Clough; Washington L da Silva; Walter S De Jong; David S Douches; Curtis M Frederick; Kathleen G Haynes; David G Holm; J Creighton Miller; Patricio R Muñoz; Felix M Navarro; Richard G Novy; Jiwan P Palta; Gregory A Porter; Kyle T Rak; Vidyasagar R Sathuvalli; Asunta L Thompson; G Craig Yencho
Journal: Genetics Date: 2018-03-07 Impact factor: 4.562

4. Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

Authors: Sidi Boubacar Ould Estaghvirou; Joseph O Ogutu; Hans-Peter Piepho
Journal: G3 (Bethesda) Date: 2014-10-01 Impact factor: 3.154

5. Genomic Prediction of Yield Traits in Single-Cross Hybrid Rice (Oryza sativa L.).

Authors: Marlee R Labroo; Jauhar Ali; M Umair Aslam; Erik Jon de Asis; Madonna A Dela Paz; M Anna Sevilla; Alexander E Lipka; Anthony J Studer; Jessica E Rutkoski
Journal: Front Genet Date: 2021-06-30 Impact factor: 4.599

6. The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye.

Authors: Angela-Maria Bernal-Vasquez; Jens Möhring; Malthe Schmidt; Manfred Schönleben; Chris-Carolin Schön; Hans-Peter Piepho
Journal: BMC Genomics Date: 2014-08-04 Impact factor: 3.969

Method	Estimates heritability?	Requires heritability to estimate predictive accuracy?	Model for heritability assumes uncorrelated genotypes?	Model for predictive ability assumes correlated genotypes?	Requires cross validation?
1	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes
5	Yes	No	-	-	No
6	No	No	-	Yes	Yes
7	No	No	-	-	No

Method	Estimates heritability?	Requires heritability to estimate predictive accuracy?	Model for heritability assumes uncorrelated genotypes?	Model for predictive ability assumes correlated genotypes?	Requires cross validation?
1	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes
5	Yes	No	-	-	No
6	No	No	-	Yes	Yes
7	No	No	-	-	No

Method	Estimates heritability?	Requires heritability to estimate predictive accuracy?	Model for heritability assumes uncorrelated genotypes?	Model for predictive ability assumes correlated genotypes?	Requires cross validation?
1	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes
5	Yes	No	-	-	No
6	No	No	-	Yes	Yes
7	No	No	-	-	No