Literature DB >> 34841597

Single-step genomic evaluation of Russian dairy cattle using internal and external information.

Andrei A Kudinov^1,2,3, Esa A Mäntysaari¹, Timo J Pitkänen¹, Ekaterina I Saksa³, Gert P Aamand⁴, Pekka Uimari², Ismo Strandén¹.

Abstract

Genomic data are widely used in predicting the breeding values of dairy cattle. The accuracy of genomic prediction depends on the size of the reference population and how related the candidate animals are to it. For populations with limited numbers of progeny-tested bulls, the reference populations must include cows and data from external populations. The aim of this study was to implement state-of-the-art single-step genomic evaluations for milk and fat yield in Holstein and Russian Black & White cattle in the Leningrad region (LR, Russia), using only a limited number of genotyped animals. We complemented internal information with external pseudo-phenotypic and genotypic data of bulls from the neighbouring Danish, Finnish and Swedish Holstein (DFS) population. Three data scenarios were used to perform single-step GBLUP predictions in the LR dairy cattle population. The first scenario was based on the original LR reference population, which constituted 1,080 genotyped cows and 427 genotyped bulls. In the second scenario, the genotypes of 414 bulls related to the LR from the DFS population were added to the reference population. In the third scenario, LR data were further augmented with pseudo-phenotypic data from the DFS population. The inclusion of foreign information increased the validation reliability of the milk yield by up to 30%. Suboptimal data recording practices hindered the improvement of fat yield. We confirmed that the single-step model is suitable for populations with a low number of genotyped animals, especially when external information is integrated into the evaluations. Genomic prediction in populations with a low number of progeny-tested bulls can be based on data from genotyped cows and on the inclusion of genotypes and pseudo-phenotypes from the external population. This approach increased the validation reliability of the implemented single-step model in the milk yield, but shortcomings in the LR data recording scheme prevented improvements in fat yield.

Entities: Chemical

Keywords: dairy cattle; genomic prediction; multi-country genomic evaluation; single-step GBLUP

Mesh：

Year: 2021 PMID： 34841597 PMCID： PMC9299785 DOI： 10.1111/jbg.12660

Source DB: PubMed Journal: J Anim Breed Genet ISSN： 0931-2668 Impact factor: 3.271

INTRODUCTION

Genomic information has been successfully used in predicting dairy cattle breeding values during the last decade (VanRaden, 2020). In the original genomic evaluation approach, breeding values of the candidate animals were predicted using information derived from the genotyped animals of the reference population (Meuwissen et al., 2001). Multiple studies have shown that the reliability of genomic prediction depends on its size and structure (Goddard, 2009; Goddard & Hayes, 2009). Large commercial dairy breeding schemes initiated their reference populations by genotyping all progeny‐tested bulls and some elite cows. However, this approach is challenging in small populations because only a few progeny‐tested bulls may be available, and historical DNA samples of the bulls are not available. The obvious recipe to attain sufficient prediction accuracy in small populations with a limited number of progeny‐tested bulls is to increase the reference population with genotyped cows (Ding et al., 2013; Li et al., 2014). This approach increases genotyping costs because the low reliability of estimated breeding value (EBVs), typical to cows, requires a larger number of genotyped animals to gain the same accuracy as when using progeny‐tested bulls that typically have highly reliable EBVs (Daetwyler et al., 2008). In addition, the heritability of a trait affects the optimal size of the reference population; the lower the heritability, the more genotyped animals are needed in the reference population (Goddard & Hayes, 2009). Furthermore, genomic prediction accuracy depends on the model used in genomic prediction. The single‐step genomic BLUP (ssGBLUP) approach (Aguilar et al., 2010; Christensen & Lund, 2010) may yield more accurate genomic predictions than the two‐step approach when the population has a limited number of genotyped animals (Christensen et al., 2012; Song et al., 2019). Low genomic prediction reliability in a population with a limited number of genotyped and progeny‐tested animals can be enhanced by including data from external sources (Přibyl et al., 2013; VanRaden, 2012). Thus, a joint reference population can be created where countries benefit uni‐ or bilaterally from the data sharing. Several reported examples of EBVs and genomic data exchange between countries have shown significant increases in the reliability of genomic predictions (Jorjani et al., 2012; Lund et al., 2011; Ma et al., 2014). Even though several countries routinely make joint traditional and genomic evaluations (e.g. Denmark, Finland and Sweden) (Lidauer et al., 2015), most dairy evaluation systems are unwilling to share recorded data from cows and will only disseminate EBV from internal evaluations. In such circumstances, the inclusion of foreign bull EBVs with corresponding reliabilities into national or internal reference populations has become a common practice (Přibyl et al., 2013; Vandenplas et al., 2014). EBVs of foreign genotyped bulls can be attained from the multi‐trait across‐country evaluations (MACE, Interbull, Uppsala, Sweden). Several methods to include external EBV into internal evaluations have been developed (Bonaiti & Boichard, 1995; Luštrek et al., 2021; Přibyl et al., 2013; Täubert et al., 2000; Vandenplas et al., 2014; VanRaden, 2001, 2012). Vandenplas et al. (2014) described a unified approach for combining external EBV with internal data and pedigree information with further extension to genomic information (Vandenplas et al., 2017). The blended information from multiple sources was shown to be free from double counting the internal information. The method avoids the overestimation of reliabilities and can be used in genomic prediction models to include external data. Contemporary comparison is the current official dairy bull evaluation method in Russia, but state‐of‐the‐art animal model evaluations have already been proposed (Kudinov et al., 2017, 2018). In 2016, the Leningrad Region (LR) (Figure 1) Committee on Agriculture and Fishery (Saint Petersburg, Russia) initiated a research and development project to apply genomic evaluations of production traits using BLUP methodology. The region's largest dairy cattle population consists of animals with an admixture of Holstein (HOL) and Russian Black & White (RBW) breeds kept in 49 herds, with an average herd size of approximately 1,000 cows (Kudinov et al., 2018). Because the number of progeny‐tested bulls was low genotyping cows was the only way to increase the reference population. During the period 2015–2017, the starting pool of genotyped animals was created from 427 bulls and 1,080 cows. A small number of reference animals expectedly would limit genomic prediction reliability for candidate animals. Thus, it was proposed to improve reliability when including genomic and pseudo‐phenotypic information from the neighbouring Danish, Finnish and Swedish Holstein (DFS) populations.

FIGURE 1

Map. The northeastern part of the Baltic sea. The Leningrad region of the Russian Federation is highlighted with a dark grey colour (the plot was created in R with the ggplot2 software package, Wickham, 2016) [Colour figure can be viewed at wileyonlinelibrary.com] The aim of this study was to test the feasibility of ssGBLUP for HOL and RBW cattle in the LR with only a small number of genotyped cows and bulls. We also tested the effect of including genomic and pseudo‐phenotypic information from HOL bulls from DFS on the prediction ability of the genomic model. Single‐step genomic evaluations were computed using three scenarios: (a) phenotypes and genotypes of LR animals only, (b) including additional bull genotypes from the DFS and (c) further adding external MACE EBVs at the DFS scale.

MATERIALS AND METHODS

Leningrad region data

Phenotypic data, as described in Kudinov et al. (2018), included 363,833 records of 305 days of milk and fat yields from 159,069 highly related HOL and RBW cows in 49 breeding herds. Some animals had recorded up to the fifth lactation. The pedigree included 221,001 animals born between 1960 and 2015. For variance component estimation, the data were truncated to include records from the first three lactations only due to the small number of records in later lactations. Correspondingly the pedigree was pruned to only include informative animals for the truncated data. The final data included 319,509 observations and 206,356 pedigree animals.

Nordic data

The full ancestral pedigree was traced for bulls present in both the LR and the joint Nordic cattle genetic evaluations (NAV; Denmark, Finland, Sweden). The information of the 486 bulls was extracted from MACE EBVs published by Interbull in December 2018 at the NAV scale. The reliabilities of MACE EBV and values of the LR model heritabilities were used to derive effective daughter contributions (EDC) using reverse reliability estimation, as described in Taskinen et al. (2014). Using the calculated EDC and full pedigree information, the MACE EBVs were converted into deregressed daughter performances (DRP) using a matrix deregression procedure (Jairath et al.,1998; Strandén & Mäntysaari, 2010).

Genotypes

The LR data included single nucleotide polymorphism (SNP) marker genotypes from 1,080 cows and 427 bulls provided by repositories of the Russian Research Institute of Farm Animal Genetics and Breeding and LLC Laboratory Genome (Saint‐Petersburg, Russia). Both Illumina BovineSNP50v2 and IDBv3 arrays (Illumina, San‐Diego, USA) were used for genotyping. Genotyped cows were from 13 LR herds. The average (SD) number of samples per herd was 82 (21). The DFS data had 414 bull genotypes from Illumina BovineSNP50 chip provided by NAV. The DFS genotypes were imputed and had passed quality control in the official NAV HOL genomic evaluations (https://www.nordicebv.info/). The LR and DFS genotypes were synchronized to have identical reading patterns (i.e. coding). Imputation was performed to unify genotypes and fill‐in missing markers. Quality control of genotypes was performed using the following criteria: call rate >95% and minor allele frequency >5%. After processing 43,194 SNP markers remained for the genomic prediction.

Validation of model fit

Two reduced data sets were created for calculating the validation reliability and bias of genomic enhanced breeding values (GEBV). For bull validation, the milk and fat records from the last four production years (2012–2015) were removed from the data, as described in Mäntysaari et al. (2010). An exception was made for genotyped cows that were not closely related to the validation bulls (i.e. not daughters, granddaughters, or sibs) and represented contemporary groups (herd – year – season) with at least five animals. The data records from these cows were kept in order to avoid exhausting the training set. The bull validation test set included 48 bulls with EDCs greater than 20 in the full data, but EDC = 0 in the reduced data. For the cow validation test, records from the last production year (2015) were excluded. There were 221 test cows which had no records in the reduced data but at least one record in the full data. The full data were used to calculate daughter yield deviations (DYDs) for the bulls and yield deviations (YDs) for the cows (VanRaden & Wiggans, 1991) using a corresponding ssGBLUP model. Bias was estimated by (G)EBV overdispersion, that is the regression coefficient b 1 in the validation regression model (D)YD = b 0 + b 1 GEBV, and by the average difference between GEBV and (D)YD. The DYD observations for bull i were weighted using , calculated as , where λ 1 = (4 − h 2)/h 2. The YD observations for cow j were weighted using parameter , calculated as , where λ 2 = (1 − h 2)/h 2, and is the effective record contribution (Přibyl et al., 2013) of cow j. Validation reliability (R 2) was calculated as the squared correlation between (D)YD and the reduced data GEBV divided by mean . The within‐herd heritability was calculated using the formula , where , and are genetic, permanent environment and residual variances respectively.

Statistical model

The repeatability animal model presented in Kudinov et al. (2018) was modified by including a herd‐by‐sire interaction random effect (Wiggans et al., 1988) and a respective variance component (). Variance components and breeding values were estimated using the model: where y is a vector of milk or fat yield records, b is a vector of the fixed effects, and and are vectors of random animal breeding values, permanent environmental and herd‐by‐sire interaction effects, X is a design matrix relating fixed effects to the records, Z 1, Z 2 and Z 3 are design matrices relating the random effects to the records, and is a vector of the residual effects. Matrix A is the pedigree‐based relationship matrix and I are identity matrices. Days open by age of calving by lactation and herd‐year‐season were the fixed effects (Kudinov et al., 2018). No breed effect was used in the model, as active crossbreeding of RBW cows with HOL bulls began in the late 1970s, before the data sampling point. The 218 originally proposed unknown parent groups (UPGs; Kudinov et al., 2018) were revised and downscaled to 54, due to a low number of observations per group. Rearranged groups were based on origin (domestic or foreign), selection path and 5‐year time intervals.

Genomic evaluation

The mixed‐model equations (MME) of the original ssGBLUP (Aguilar et al., 2010; Christensen, 2010) included a joint relationship matrix H and its inverse: where G is the genomic relationship matrix, A is the pedigree‐based relationship matrix and A 22 is a subblock of the A matrix including genotyped animals only. The UPGs were accounted in the augmented inverse relationship matrix (), as presented in (Matilainen et al., 2018; Misztal et al., 2013): where , Q includes the proportions of contributions each animal receives from the UPG, Q 1 and Q 2 are the submatrices of Q corresponding to the non‐genotyped and genotyped animals, respectively, and A ij is the submatrix of A −1, with a superscript (i or j) value of 1 for the non‐genotyped and a value of 2 for the genotyped animals. Inbreeding coefficients were used in the calculations of the inverse pedigree‐based relationship matrices A −1 and . We assumed that genotypes could describe 90% of the genetic variance, and thus the genomic relationships were regressed towards the pedigree relationships as: where w represents the residual polygenic proportion 0.1, and , with M 101 as an n by m marker matrix with the genotypes coded by {−1, 0, 1}, m is the number of SNP markers, n is the number of genotyped animals, that is assuming allele frequencies =0.5. The scaling factor was used to assure that the average of the diagonals of the genomic relationship matrix is equal to the average of the diagonal of the matrix.

Integration of external information

For integration of MACE DFS information to the LR evaluation, we used the method presented in Vandenplas et al. (2014), Vandenplas et al. (2017). For notational simplicity, we present the MME with only the animal breeding value and the fixed effects: where subscript N pertains to the DFS MACE evaluation, the diagonal matrix has the effective record contribution (ERC) increase for the bulls due to the DFS information and zero for the cows, is the residual (co)variance matrix and is solution vector of breeding values, and has the DRP from the DFS MACE evaluation. Both the and were padded by zeros for animals in pedigree. The ERC were back solved from reliabilities in DFS evaluations with the reversed reliability estimation as in Pitkänen et al. (2018). Separate herd‐year‐season fixed effect class was used for the bull pseudo‐observations to reflect the different base of DFS compared to the LR evaluations. The LR is not part of MACE evaluation, thus external data were free from internal data and no actions to avoid double counting of information was needed.

Evaluation scenarios

The LR ssGBLUP evaluations were implemented and tested using three scenarios. In the first scenario, named ssLR, only LR phenotypic and genomic data were used. In the second scenario, named ssLRg, ssLR was upgraded with DFS bull genotypes. In the third scenario (ssLRdfs), the ssLR was upgraded with the DFS genotypes and the DRPs from the MACE evaluations. Thus, ssLRg included more genomic information than ssLR, while ssLRdfs included more phenotypic information than ssLRg.

Software

Pedigree pruning, calculation of inbreeding coefficients and relationship submatrix were performed using the RelaX2 program (Strandén & Vuori, 2006). Variance components were estimated with restricted maximum likelihood (REML) (Patterson & Thompson, 1971) in DMU software (Madsen et al., 2010) using AI‐REML algorithm. Unification of the DNA arrays and imputation of missing alleles were performed using FImpute v. 2.2 software (Sargolzaei et al., 2014). The G −1 and B matrices were computed using the HGinv v. 0.87 program (Strandén & Mäntysaari, 2018). The EDC, ERC and DRP, and finally the (G)EBV and (D)YD computations were performed in MiX99 software (Strandén & Lidauer, 1999).

RESULTS

The estimated genetic (), herd‐sire (), permanent environment () and residual () variance components with respective SEs for milk yield were 330,735 ± 6,571, 80,532 ± 3,023, 274,195 ± 8,741 and 955,257 ± 3,352 respectively. For the fat yield, the estimated variance components were 451 ± 9 (), 118 ± 4 (), 300 ± 11 () and 1,393 ± 5 (). Thus, the estimated heritability 0.21 ± 0.005 was the same for the milk and fat yields. The average GEBV of milk yield (kg) by the birth year of bulls with EDCs above 20 for the three ssGBLUP models is presented in Figure 2. Each GEBV trend was adjusted for the same base level by centring the mean GEBV of all cows born in 2010 to be zero. The genetic trends from ssLR and ssLRg had similar shapes, with an average annual genetic progress of 40 kg in 1995–2010. ssLRdfs showed a larger annual genetic trend (60 kg). Similar patterns were observed for the fat yield trends; the estimated annual genetic gains were 1.2 kg for ssLR and ssLRg and 1.9 kg for ssLRdfs (Figure 3).

FIGURE 2

FIGURE 3

Average genomic breeding value (GEBV) of bulls by birth year for fat yield (kg). Black line with triangles (ssLR) denotes ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP used LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com]

Average genomic breeding value (GEBV) of bulls by birth year for milk yield (kg). Black line with triangles (ssLR) denotes the ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP using LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com] Average genomic breeding value (GEBV) of bulls by birth year for fat yield (kg). Black line with triangles (ssLR) denotes ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP used LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com] Genetic trends in milk and fat yields for the cows are presented in Figures 4 and 5 respectively. The average annual predicted genetic gain in milk yield was identical using either ssLR or ssLRg (50 kg), but the annual genetic gain was 55 kg when using ssLRdfs. For fat yield, the annual genetic gain based only on LR data was 1.7 kg compared to a genetic gain of 1.9 kg obtained with an augmented data.

FIGURE 4

FIGURE 5

Average genomic breeding value (GEBV) of cows by birth year for fat yield (kg). Black line with triangles (ssLR) denotes ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP using LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com]

Average genomic breeding value (GEBV) of cows by birth year for milk yield (kg). Black line with triangles (ssLR) denotes ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP using LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com] Average genomic breeding value (GEBV) of cows by birth year for fat yield (kg). Black line with triangles (ssLR) denotes ssGBLUP model using Leningrad region (LR) phenotypes and genotypes; green line with snowflakes (ssLRg) denotes ssGBLUP using LR phenotypes, genotypes and Nordic (DFS) genotypes; blue line with circles (ssLRdfs) denotes ssGBLUP using LR phenotypes and genotypes, and DFS genotypes and deregressed EBVs (DRPs) [Colour figure can be viewed at wileyonlinelibrary.com] Table 1 shows the validation statistics of GEBV of milk yield based on DYD and YD for bulls and cows respectively. The highest validation reliability R 2 was observed for ssLRdfs: 0.30 for bulls and 0.42 for cows. Including DFS genotypes (ssLRg) in bull validation did not increase R 2 compared to ssLR. For cows, R 2 was higher in ssLR (0.38) than in ssLRg (0.36). Regression coefficients (b 1) of DYD on GEBV for ssLR and ssLRg were similar and below one (0.78 and 0.80). ssLRdfs gave the lowest b 1 (0.58). For cows, the highest b 1 was from ssLR (1.69) and the lowest (1.14) from ssLRdfs.

TABLE 1

Bull and cow validation results of milk yield by the three single‐step GBLUP models in the Leningrad Region Russian Black & White and Holstein population

Model ^a	Validation animals
	Bulls (42 animals)			Cows (221 animals)
	E (GEBV ‐DYD) ^b	2 * b ₁	R ²	E (GEBV‐YD)	b ₁	R ²
ssLR	529	0.78	0.21	65	1.69	0.38
ssLRg	557	0.80	0.21	91	1.55	0.36
ssLRdfs	748	0.58	0.30	113	1.14	0.42

Genomic enhanced breeding values (GEBV) and (daughter) yield deviations (D)YD were from the validation animals.

ssLR = model with Leningrad region genomic and phenotypic data, ssLRg = ssLR and Nordic (DFS) genomic data, ssLRdfs = ssLRg and DFS bulls EDCs.

E (GEBV‐DYD) = difference between GEBV and DYD, b 1 = regression coefficient, R 2 = validation reliability

Bull and cow validation results of milk yield by the three single‐step GBLUP models in the Leningrad Region Russian Black & White and Holstein population Genomic enhanced breeding values (GEBV) and (daughter) yield deviations (D)YD were from the validation animals. ssLR = model with Leningrad region genomic and phenotypic data, ssLRg = ssLR and Nordic (DFS) genomic data, ssLRdfs = ssLRg and DFS bulls EDCs. E (GEBV‐DYD) = difference between GEBV and DYD, b 1 = regression coefficient, R 2 = validation reliability Results of a linear regression of fat yield (D)YD on GEBV are given in Table 2. For bulls, the highest validation reliability (0.18) was obtained with both ssLRg and ssLRdfs. The difference in R 2 values between ssLR and the other models was 0.01. For cows, the increased amount of foreign information decreased validation reliability; the highest R 2 of 0.41 was achieved with LR data only (ssLR). The R 2 value reduced by 0.07 units in ssLRg and an additional 0.13 units in ssLRdfs. For fat yield, b 1 was larger than that obtained from the milk yield. For bulls, b 1 values were 0.64 and 0.68 for ssLR and ssLRg, respectively, but lower for ssLRdfs (0.41). For cows, b 1 values were above one for ssLR and ssLRg (1.86 and 1.67 respectively). In ssLRdfs, b1 was below one (0.89).

TABLE 2

Bull and cow validation results of fat yield from the three single‐step GBLUP models in the Leningrad Region Russian Black & White and Holstein population

Model ^a	Validation animals
	Bulls (42 animals)			Cows (217 animals)
	E (GEBV ‐DYD) ^b	2 * b ₁	R ²	E (GEBV‐YD)	b ₁	R ²
ssLR	18	0.64	0.17	6	1.86	0.41
ssLRg	19	0.68	0.18	7	1.67	0.34
ssLRdfs	27	0.41	0.18	7	0.89	0.21

Genomic enhanced breeding values (GEBV) and (daughter) yield deviations (D)YD were from the validation animals.

ssLR = model with Leningrad region genomic and phenotypic data, ssLRg = ssLR and Nordic (DFS) genomic data, ssLRdfs = ssLRg and DFS bulls EDCs.

E (GEBV‐DYD) = difference between GEBV and DYD, b 1 = regression coefficient, R 2 = validation reliability

Bull and cow validation results of fat yield from the three single‐step GBLUP models in the Leningrad Region Russian Black & White and Holstein population Genomic enhanced breeding values (GEBV) and (daughter) yield deviations (D)YD were from the validation animals. ssLR = model with Leningrad region genomic and phenotypic data, ssLRg = ssLR and Nordic (DFS) genomic data, ssLRdfs = ssLRg and DFS bulls EDCs. E (GEBV‐DYD) = difference between GEBV and DYD, b 1 = regression coefficient, R 2 = validation reliability

DISCUSSION

Genomic selection has been successfully applied in the animal breeding of various species (Stock & Reents, 2013). A particularly large impact has been on the dairy cattle industry, where genomic prediction has reduced the generation interval by approximately 2.6 years and has increased the number of candidate bulls at AI stations up to 70% (Mäntysaari et al., 2020; VanRaden, 2020). Despite the attractiveness and benefits of genomic selection, it cannot be implemented in all cattle populations due to small population sizes, low numbers of progeny‐tested bulls, or other limited resources. In our study, we implemented the ssGBLUP model for the admixed population of HOL and RBW cattle from LR with a limited number of progeny‐tested bulls. The local genotyped animals were mostly cows. To improve prediction accuracy, we added bull genotypes from the neighbouring HOL DFS population and finally increased the number of progeny‐tested animals in the reference population by integrating external deregressed NAV‐scale MACE EBVs. The set of reference animals in LR was expected to be too small to perform genomic prediction with prediction reliability as high as that reported for large USA HOL population (Wiggans et al., 2017). One approach to enlarging a reference population is to include genomic data from an external population. However, the success of this approach depends on the availability of phenotypic data for these animals in the internal population and the genetic distance between the pooled and the target populations (Lund et al., 2014). Based on our results, a gradual upgrade of the data by adding genotypes of bulls with genetic ties to both data sets did not sufficiently increase the size of the reference population. However, we did not include EBVs of bulls with ERCs less than 20 in the DFS data. Nonetheless, including 414 DFS genotypes improved the prediction accuracy of the milk yield for bulls. Comparable to our study, correlation between milk yield DGVs and DRPs was 0.26 in a small Chinese HOL reference population with 85 genotyped bulls and 2,862 genotyped cows (Ma et al., 2014). Correlation of unweighted DYDs and GEBVs was higher in current study, the lowest value (0.38) was attained using LR data and genotypes only. The highest R 2 in the milk yield was obtained with a model where DRP pseudo‐phenotypes of DFS bulls derived from Interbull MACE evaluations were blended with LR data. Similarly, in Přibyl et al. (2013), R 2 increased after Interbull DRP pseudo‐phenotypes were blended into Czech HOL genomic evaluations. However, integration of the traditional MACE evaluations should be performed with caution, as genomic pre‐selection bias has been observed in bulls born after 2009 (Patry & Ducrocq, 2011). The favourable effect of including DFS information on R 2 in the milk yield was not observed in the fat yield. The phenotypic data recording pitfalls in fat yield, such as variability in milk analysis systems (Kudinov et al., 2017), may be a potential explanation for this. Including the external pseudo‐phenotypes caused bias in the unstable prediction approach, leading to discrepancy between the expected and observed breeding values. From this viewpoint, it is important to understand whether various sources of phenotypic data are equally relevant and accurate before using them in the blending procedure. When recording errors are more prominent in the internal data than in the external data, a bias similar to what we observed in fat yield is also expected in the prediction. The other explanation for the reduced accuracy of the blended method is the validation test practice not fitting the ssLR model perfectly. As shown by Legarra and Reverter (2018), a reciprocal of the size of contemporary groups may generate upwards bias in the R 2 due to decreasing size of contemporary groups. Hence, the bias may have reduced after the integration of DFS data. We computed (results not presented) reliability and linear regression of GEBVs estimated using full and reduced data (Legarra & Reverter, 2018). Obtained R 2 increased along with increment of the data, both in milk and fat yields. However, any validation results from bulls must be considered with caution as only 42 candidates were used. Advantages of ssGBLUP can be seen when the number of genotyped animals is too modest to allow for accurate estimates from the multi‐step genomic prediction (Amaya‐Martínez et al., 2020; Christensen et al., 2012). Multi‐step genetic evaluations with integrated external information may create biased predictions due to an extra step used for blending the external information (Guarini et al., 2019). In such a case, using ssGBLUP provides less biased prediction due to the simultaneous use of genomic and pedigree‐based information (Přibyl et al., 2013). The limitations of ssGBLUP mostly concern the compatibility of the A and G matrices, originating from an inconsistency in the base population definition in the genotyped and ungenotyped animals (Legarra et al., 2014). Proper consideration of UPG in ssGBLUP is important for compatibility between the A and G matrices. Pedigree completeness is critical to the compatibility between the G and A 22 matrices (Misztal et al., 2010, 2013). A promising way to solve the compatibility of the G and A matrices is to fit the ancestral structure of the population by so‐called metafounders, as presented by Legarra et al. (2015). The method per se represents a fusion of Christensen's idea (Christensen et al., 2012) to construct G with 0.5 allele frequencies and an extension of the pedigree by related and inbred pseudofounders (Legarra et al., 2015). These relationships are accounted through a Gamma (Γ) matrix. The method has provided promising results when used in multiple breed pedigrees (Xiang et al., 2017) and simulated data (Garcia‐Baccino et al., 2017). However, implementation of the metafounder approach may be challenging when a population has breeds with high admixture (Kudinov et al., 2020). We tested the metafounder approach for the population in our current study but did not observe improvement in validation reliability. We used nine metafounders to describe the ancestral structure of the population. All diagonal elements of the Γ matrix, along with the off‐diagonal elements, were very close to each other, except for the highly different old RBW groups. The absence of validation reliability improvement with the metafounder approach may be due to the small number of genotyped animals, most of which were born during the last two decades. The self‐relationships of metafounders associated with old birth years were mostly computed using sporadic genotypes of historic RBW and Dutch HOL bulls. The metafounders approach may become attractive when the number of genotype animals increases noticeably. The total genetic effect in the ssGBLUP model depends on pedigree (A) and genomic relationships (G). The weight of genomic information in ssGBLUP can be changed by including a polygenic proportion variable (w), which weighs variation due to markers and pedigree information in the genomic relationship matrix. Commonly used values for w range from 10% to 30% (Ma et al., 2014; Matilainen et al., 2018; Přibyl et al., 2013). We tested both 10% and 30% for w, and observed only a small difference in the cross‐validation results. Thus, only results with the 10% proportion are presented. A random herd by sire interaction effect applied in the model is the same as that originally used by Kudinov et al. in LR data (Kudinov et al., 2018). A random herd by sire interaction effect is also used in routine evaluations in the USA (VanRaden & Wiggans, 1991). The herds in LR are large, but their management and the origin of breeding animals may vary significantly. We observed that imported semen from the top North American bulls were used in only a fraction of the LR farms. Interactions between sire and herd may occur in such situations or when the best bulls are only used in a few top herds (Dimov et al., 1995). Several Animal model and ssGBLUP test runs performed using the model without the herd‐by‐sire random effect showed lower reliability and higher bias in EBV prediction. The natural future direction to improve prediction accuracy in the LR data is to use test‐day records instead of 305‐day records and potentially even consider balancing investment into phenotyping and genotyping (Obšteter et al., 2021).

CONCLUSIONS

Single‐step genomic prediction was successfully implemented for the LR data. The reference population included more genotyped cows than bulls because the number of progeny‐tested bulls was low in the LR. Including EBVs and genotypes from the Nordic HOL population into the LR ssGBLUP evaluation created one of the largest dairy reference population among Russian regions. This joint reference population improved the prediction accuracy in the milk yield but not in the fat yield. The prediction accuracy of breeding values can be improved through better recording of phenotypes and pedigrees, and by drastically increasing the number of genotyped cows and progeny‐tested bulls.

CONFLICT OF INTEREST

The authors declare that they have no competing interests.

AUTHOR CONTRIBUTIONS

AAK was responsible for data editing, all data analyses and writing. The experiment was designed by EAM and IS. IS, EAM and PU were responsible for supervision, paper writing and interpretation of results. TJP developed an algorithm to prepare Nordic data for blending. EIS was responsible for Leningrad region data and genotypes, and for contact with Leningrad region farmers and breeding organizations. GPA was responsible for Nordic data and genotypes. PU, EIS and TJP provided input on data editing and the used methods. All authors read and approved the final manuscript.

38 in total

1. Single-step methods for genomic evaluation in pigs.

Authors: O F Christensen; P Madsen; B Nielsen; T Ostersen; G Su
Journal: Animal Date: 2012-04-05 Impact factor: 3.240

2. Evidence of biases in genetic evaluations due to genomic preselection in dairy cattle.

Authors: C Patry; V Ducrocq
Journal: J Dairy Sci Date: 2011-02 Impact factor: 4.034

3. Genomic evaluation of Brown Swiss dairy cattle with limited national genotype data and integrated external information.

Authors: B Luštrek; J Vandenplas; G Gorjanc; K Potočnik
Journal: J Dairy Sci Date: 2021-03-06 Impact factor: 4.034

4. Improving the accuracy of genomic prediction in Chinese Holstein cattle by using one-step blending.

Authors: Xiujin Li; Sheng Wang; Ju Huang; Leyi Li; Qin Zhang; Xiangdong Ding
Journal: Genet Sel Evol Date: 2014-10-14 Impact factor: 4.297

5. Genetic evaluation of dairy goats for milk and fat yield with an animal model.

Authors: G R Wiggans; J W Van Dijk; I Misztal
Journal: J Dairy Sci Date: 1988-05 Impact factor: 4.034

Review 6. Symposium review: How to implement genomic selection.

Authors: P M VanRaden
Journal: J Dairy Sci Date: 2020-04-22 Impact factor: 4.034

Review 7. Symposium review: Single-step genomic evaluations in dairy cattle.

Authors: E A Mäntysaari; M Koivula; I Strandén
Journal: J Dairy Sci Date: 2020-04-22 Impact factor: 4.034

8. A common reference population from four European Holstein populations increases reliability of genomic predictions.

Authors: Mogens S Lund; Adrianus P W de Roos; Alfred G de Vries; Tom Druet; Vincent Ducrocq; Sébastien Fritz; François Guillaume; Bernt Guldbrandtsen; Zenting Liu; Reinhard Reents; Chris Schrooten; Franz Seefried; Guosheng Su
Journal: Genet Sel Evol Date: 2011-12-12 Impact factor: 4.297