Florian Herry1, David Picard Druet2, Frédéric Hérault2, Amandine Varenne3, Thierry Burlot3, Pascale Le Roy2, Sophie Allais4. 1. NOVOGEN, 5 rue des compagnons, Secteur du Vau Ballier, 22960 Plédran, France; PEGASE, INRAE, Agrocampus Ouest, 35590 Saint-Gilles, France. 2. PEGASE, INRAE, Agrocampus Ouest, 35590 Saint-Gilles, France. 3. NOVOGEN, 5 rue des compagnons, Secteur du Vau Ballier, 22960 Plédran, France. 4. PEGASE, INRAE, Agrocampus Ouest, 35590 Saint-Gilles, France. Electronic address: sophie.allais@agrocampus-ouest.fr.
Abstract
With the availability of the 600K Affymetrix Axiom high-density (HD) single nucleotide polymorphism (SNP) chip, genomic selection has been implemented in broiler and layer chicken. However, the cost of this SNP chip is too high to genotype all selection candidates. A solution is to develop a low-density SNP chip, at a lower price, and to impute all missing markers. But to routinely implement this solution, the impact of imputation on genomic evaluation accuracy must be studied. It is also interesting to study the consequences of the use of low-density SNP chips in genomic evaluation accuracy. In this perspective, the interest of using imputation in genomic selection was studied in a pure layer line. Two low-density SNP chip designs were compared: an equidistant methodology and a methodology based on linkage disequilibrium. Egg weight, egg shell color, egg shell strength, and albumen height were evaluated with single-step genomic best linear unbiased prediction methodology. The impact of imputation errors or the absence of imputation on the ranking of the male selection candidates was assessed with a genomic evaluation based on ancestry. Thus, genomic estimated breeding values (GEBV) obtained with imputed HD genotypes or low-density genotypes were compared with GEBV obtained with the HD SNP chip. The relative accuracy of GEBV was also investigated by considering as reference GEBV estimated on the offspring. A limited reordering of the breeders, selected on a multitrait index, was observed. Spearman correlations between GEBV on HD genotypes and GEBV on low-density genotypes (with or without imputation) were always higher than 0.94 with more than 3K SNP. For the genetically closer, top 150 individuals for a specific trait, with imputation, the reordering was reduced with correlation higher than 0.94 with more than 3K SNP. Without imputation, the correlations remained lower than 0.85 with less than 3K and 16K SNP for equidistant and linkage disequilibrium methodology, respectively. The differences in GEBV correlations between both methodologies were never significant. The conclusions were the same for all studied traits.
With the availability of the 600K Affymetrix Axiom high-density (HD) single nucleotide polymorphism (SNP) chip, genomic selection has been implemented in broiler and layer chicken. However, the cost of this SNP chip is too high to genotype all selection candidates. A solution is to develop a low-density SNP chip, at a lower price, and to impute all missing markers. But to routinely implement this solution, the impact of imputation on genomic evaluation accuracy must be studied. It is also interesting to study the consequences of the use of low-density SNP chips in genomic evaluation accuracy. In this perspective, the interest of using imputation in genomic selection was studied in a pure layer line. Two low-density SNP chip designs were compared: an equidistant methodology and a methodology based on linkage disequilibrium. Egg weight, egg shell color, egg shell strength, and albumen height were evaluated with single-step genomic best linear unbiased prediction methodology. The impact of imputation errors or the absence of imputation on the ranking of the male selection candidates was assessed with a genomic evaluation based on ancestry. Thus, genomic estimated breeding values (GEBV) obtained with imputed HD genotypes or low-density genotypes were compared with GEBV obtained with the HD SNP chip. The relative accuracy of GEBV was also investigated by considering as reference GEBV estimated on the offspring. A limited reordering of the breeders, selected on a multitrait index, was observed. Spearman correlations between GEBV on HD genotypes and GEBV on low-density genotypes (with or without imputation) were always higher than 0.94 with more than 3K SNP. For the genetically closer, top 150 individuals for a specific trait, with imputation, the reordering was reduced with correlation higher than 0.94 with more than 3K SNP. Without imputation, the correlations remained lower than 0.85 with less than 3K and 16K SNP for equidistant and linkage disequilibrium methodology, respectively. The differences in GEBV correlations between both methodologies were never significant. The conclusions were the same for all studied traits.
The availability of single nucleotide polymorphisms (SNP) enabled the development of high-throughput genotyping technologies leading to the use of the 600K Affymetrix Axiom high-density (HD) genotyping array, a high-density genotyping chip developed by Kranis et al., 2013, in layer and broiler breeding. Genomic selection as described by Meuwissen et al. (2001) has then been implemented in many livestock species with different statistical methods like genomic best linear unbiased prediction (GBLUP) methods (Legarra et al., 2009, Goddard et al., 2011) or Bayesian methods (Meuwissen et al., 2001, Xu, 2003, Habier et al., 2009). From a reference population with genotypes and phenotypes, it is possible to estimate the genomic value of the genotyped selection candidates with or without phenotype. The main objective is to choose, among the selection candidates of generation N, the best breeders for one or more traits to produce the individuals of the generation N + 1. In addition, compared with genetic selection, genomic selection may increase the genetic gain through the decrease in generation interval, most particularly for species with high generation interval, through the increase in selection intensity by genotyping many selection candidates and through the increase in evaluation accuracy.However, the high cost of such HD SNP chips is still a problem for all livestock species. To reduce the cost of genomic selection, low-density SNP chips can be developed. The idea is to select a subset of markers from the HD SNP chip and to impute the genotypes at missing markers. Three main methods to select the marker panel have been developed: (1) selection of a subset of SNP chosen at regular intervals along each chromosome taking into account or not the minor allele frequency (MAF) of the selected SNP (Habier et al., 2009, Weigel et al., 2009, Zhang et al., 2011, Cleveland and Hickey, 2013, Wang et al., 2013, Herry et al., 2018), (2) selection of a subset of SNP having high effects on different traits of interest (Weigel et al., 2009, Zhang et al., 2011), or (3) selection of a subset of SNP based on linkage disequilibrium (LD) between markers (Herry et al., 2018). The latter method was studied because of the particularities of the Gallus gallus genome (International Chicken Genome Sequencing Consortium, 2004) and the particular structure of the avian LD (Megens et al., 2009, Qanbari et al., 2010, Hérault et al., 2018).Factors influencing imputation accuracy as well as the relation between imputation accuracy and genomic evaluation of the selection candidates are well documented. Theoretically, owing to imputation errors, genomic evaluation accuracy with imputed genotypes is expected to be lower than genomic evaluation performed with HD genotypes. The literature confirms it for the very-low-density SNP chip (from few SNP to 3K SNP) with a decrease in genomic evaluation accuracy with a decrease, sometimes limited, in imputation accuracy (Weigel et al., 2009, Weigel et al., 2010, Mulder et al., 2012, Cleveland and Hickey, 2013, Raoul et al., 2017). But concerning intermediate low-density SNP chips (between 6K and 20K SNP), other studies showed that the impact of imputation errors was very limited (Weigel et al., 2010, VanRaden et al., 2011, VanRaden et al., 2012, Moghaddar et al., 2015, Wang et al., 2016). However, few studies about the impact of imputation on genomic evaluation have been carried out on chickens (Wang et al., 2013).In addition, several studies showed that for traits affected by few large QTL, genomic evaluations are more sensitive to imputation errors. This was shown by Habier et al. (2009) and Zhang et al. (2011) in simulation studies and confirmed by Chen et al. (2014) on real data. They showed, in Holstein bulls, that the accuracy of direct genomic value for milk fat percentage, a trait affected by few large QTL, decreased by 34% via GBLUP using imputed genotypes. Conversely, they showed that the accuracy of direct genomic value for the somatic cell score (SCS), a trait affected by many small QTL, decreased only by 15%. In layer chickens, most of studied traits are affected by many small QTL. This could indicate that genomic evaluation would not be severely impacted by imputation errors.Finally, most studies investigated the impact of imputation on genomic evaluation accuracy, but only few studies focused on the impact of the use of medium-density SNP chips (Su et al., 2012, Moghaddar et al., 2015) or low-density SNP chips (Weigel et al., 2009, Harris and Johnson, 2010) without imputation on genomic evaluation.The main objective of a company is to select their breeders and to describe the consequences on the loss of selection response and on genetic progress by investigating if the ranking of their best candidates would be modified with the use of low-density SNP chips. Thus, focusing on 4 generations of a pure line of laying hens, the first objective of this study was to investigate the impact of imputation errors on genomic evaluation with an evaluation based on ancestry of the candidates of the second generation with true HD genotyping or imputed HD genotyping. The second objective was to study the impact of a direct use of low-density SNP chips, without imputation, on genomic evaluation. To do so, a comparison was performed between the same previous genomic evaluation of the candidates based on ancestry with true HD genotyping or with low-density genotyping without imputation. Then, to get closer to the true breeding values of the candidates, their genomic estimated breeding values (GEBV) was estimated with a genomic evaluation with optimal information (phenotypes on descendants). Thus, the third objective was to assess the relative accuracy of genomic evaluation by comparing the GEBV of the candidates of the second generation with optimal information (phenotypes on their descendants of the third and fourth generations) and their GEBV based on ancestry with imputed HD genotyping. Finally, imputed HD genotyping of the candidates were replaced by their low-density genotyping without imputation. Therefore, the fourth objective was to assess the relative accuracy of genomic evaluation of the candidates without imputation.
Material and methods
Ethics Statement
All blood samples were carried out as part of the commercial and selection activities of Novogen. These animals studied and the scientific investigations described herein are therefore not to be considered as experimental animals per se, as defined in European Union directive 2010/63 and subsequent national application texts. As a consequence, we did not seek ethical review and approval for this study, as this study includes the use of experimental animals. All animals were reared in compliance with national regulations pertaining to livestock production and as per the procedures approved by the French veterinary services.
Animals
All animals studied were detailed in Herry et al. (2018). They consisted in a commercial pure line of Rhode Island laying hens. This line was created and selected by Novogen (Plédran, France). The population studied was comprised 21,475 chickens split into 4 generations. Each generation was divided into 3 batches, and a new batch was bred every 6 mo from 2010 to 2015 (Figure 1).
Figure 1
Population structure of the RI line. RI, Rhode Island.
Population structure of the RI line. RI, Rhode Island.Concerning the laying hens, phenotypic data were recorded from 60 to 90 wk of age, when birds where bred in individual cages. Each data collected were associated with a laying hen. There were 75,121 measures recorded for 7,983 birds. Finally, the sires were bred in individual cages.Genomic selection was implemented in 2015 on males of this line. However, females were still selected based on pedigree and performances and not with genomic selection. Thus, this study concerned male selection candidates. In addition, among the different parameters studied and detailed in the next section, the relative accuracy of genomic selection was investigated. To calculate this relative accuracy, it is necessary to have a set of male selection candidates with information on their offspring. These male selection candidates were the 67 male breeders of the generation G1.
Genotyping
Genotyping is briefly described as detailed in the study by Herry et al. (2018). A total of 2,370 animals were genotyped for 580,961 SNP using the 600K Affymetrix Axiom HD genotyping array (Kranis et al., 2013).Based on the fifth annotation release of the Gallus gallus genome (Warren et al., 2017), these SNP were distributed on macrochromosomes (1–5), intermediate chromosomes (6–10), microchromosomes (11–28 and 33), one linkage group (LGE64), 2 sexual chromosomes Z and W, as well as a group of 3,724 SNP with unknown location.Genotypes were filtered through 6 successive steps (Table 1) including individual call rate (<95%), MAF (<0.05), SNP call rate (<95%), and Hardy-Weinberg equilibrium (P < 10−4). SNP with unknown location or located on sexual chromosome W, as well as the animals showing pedigree incompatibilities, were removed. Most of the SNP had to be removed because they showed zero MAF. Finally, 300,351 SNP and 2362 individuals remained available for the analyses.
Table 1
Summary of the different steps of quality control.
Genotypes filtration
RI line
Individual call rate (<95%)
8
MAF (=0)
204,122
MAF (<0.05)
54,650
SNP Call Rate (<95%)
7,541
Hardy–Weinberg equilibrium (P < 10−4)
12,538
SNP with unknown location or on chromosome W
1,759
Pedigree Incompatibility
0
SNP retained for analyses
300,351
Animals retained for analyses
2,362
Abbreviations: MAF, minor allele frequency; RI, Rhode Island; SNP, single nucleotide polymorphism.
Summary of the different steps of quality control.Abbreviations: MAF, minor allele frequency; RI, Rhode Island; SNP, single nucleotide polymorphism.
Low-Density SNP Chip Design
Several low density SNP chips were previously designed in silico by selecting a subset of SNP (Herry et al., 2018) from the HD SNP chip.An equidistant (EQ) methodology was studied by selecting SNP at regular physical intervals (in pb) along each chromosome. In addition, for each interval, the SNP with the highest MAF, or the one located furthest on the left, in case of equivalent MAF, was selected. Twelve low-density “equi” SNP chips were designed according to this method with different SNP densities: 1K, 2K, 3K, 4K, 5K, 7.5K, 10K, 15K, 20K, 30K, 40K, and 50K SNP.A LD methodology was studied considering the particular structure of the chicken LD. Low-density SNP chips were designed using the SS4I software (Hérault et al., 2016). This software enabled to obtain clusters of SNP according to a chosen LD threshold. For each cluster, the SNP with the highest MAF was selected and used as representative of this cluster. Nine low-density “LD” SNP chips were designed with different LD thresholds: 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8.
Imputation Accuracy
In our study, the selection candidates were the 580 sires of the second generation (G1) with simulated low-density genotyping. The selection candidates were imputed from the HD genotyping of the 447 sires of the first generation (G0). These 447 individuals were the fathers or the fathers' half-brothers of the selection candidates. Thus, the selection candidates were directly related to them.For each low-density SNP chip designed, imputation accuracy of the selection candidates was previously assessed as the mean correlation between true and imputed genotypes (Herry et al., 2018). Correlations were calculated one SNP at a time for all the candidates, as suggested in Pearson's method. The mean correlation was then estimated in 300,351 correlations. The mean correlations obtained were subsequently compared for the different low-density SNP chips and/or scenarios, using Student t tests with a type 1 error rate of 0.1%.
Measurement of Traits
Four distinct traits were studied in this article. They are named as per the Animal Trait Ontology for Livestock (Atol Ontology, 2012). From 60 to 75 wk, egg production was recorded each day for all individuals. There were individual data. A total of 75,121 eggs concerning 7,983 birds were measured from G0 to G3.One egg was collected per layer and per week, between 60 and 75 wk, for all layers. These eggs were then transferred to Zootests (Ploufragan, France) to study egg quality traits. The first step was to measure the egg weight (EW, in g). Then, 3 traits concerning egg shell color were estimated using a Minolta Chroma Meter (Konica Minolta, Nieuwegein, Netherlands): redness (a*), yellowness (b*), and lightness (L*) of the egg shell. Egg shell color (ESC) was then calculated as follows: . The next step consisted in measuring egg shell strength (ESS, in N) by using a compression machine to evaluate the shell static stiffness. ESS corresponded to the maximum force recorded before fracturing the shell. Finally, each egg was broken and albumen height (AH) was measured using a tripod.
Genomic Evaluation Strategies
Egg weight, ESC, ESS, and AH were evaluated with single-step GBLUP methodology (Legarra et al., 2009) using BLUPF90 programs (Misztal et al., 2002).The first part aimed to investigate the impact of imputation errors on genomic evaluations (Figure 2A). To do so, a genomic evaluation based on ancestry “Anc_HD” was performed using true HD genotyping of the 447 G0 sires and selection candidates (G1) and phenotypes of the G0. A second genomic evaluation based on ancestry “Anc_Imputed” was perfomed using the same data for the 447 G0 sires and imputed HD genotyping of the selection candidates (G1) from simulated low-density SNP chips previously designed. For each low-density SNP chip and for each trait, Spearman correlations, which enabled to estimate the reordering of the selection candidates, were calculated between true “Anc_HD” GEBV and “Anc_Imputed” GEBV. Spearman correlations were calculated for the top 150 individuals from G1 as per each trait. Spearman correlations were limited to the top 150 males to better describe the consequences of imputation errors on the reordering of these individuals, and thus to better describe the consequences on the loss of selection response and on genetic progress. The objective was to identify the good candidates and to successfully rank them among themselves. We did not focus on the ranking of the less good candidates. The ranking was also calculated for the 67 breeders from G1 having at least 10 offspring in G2.
Figure 2
Summary of all different genomic evaluation strategies studied: Impact of imputation errors (A), impact of the absence of imputation (B), impact of imputations on relative accuracy (C), impact of a direct use of low density SNP chips on relative accuracy (D). SNP, single nucleotide polymorphism.
Summary of all different genomic evaluation strategies studied: Impact of imputation errors (A), impact of the absence of imputation (B), impact of imputations on relative accuracy (C), impact of a direct use of low density SNP chips on relative accuracy (D). SNP, single nucleotide polymorphism.Then, concerning the second objective, imputed HD genotyping of the candidates was replaced by their low-density genotyping without imputation, allowing simulating the impact of the direct use of the different low-density SNP chips without imputation (Figure 2B). This part also implied the use of low-density genotyping without imputation for the reference population. For each low-density SNP chip and for each trait, Spearman correlations were calculated between the same previous true “Anc_HD” GEBV and “Anc_Not_Imputed” GEBV obtained with low-density genotyping (without imputation). These correlations were calculated for the same 67 breeders of G1 and the top 150 individuals from G1 as per each trait.The third objective was to study the attainable relative accuracy with imputation (Figure 2C). To calculate this relative accuracy, it is necessary to have a set of male selection candidates with information on their offspring. On one hand, males do not have own phenotypes and only a few of them have daughter records. Thus, information from them is limited. On the other hand, G2 had 662 genotyped females with own performances and some of them with progeny records. They would provide a more reliable validation set with GEBV using all available information fairly close to the true breeding values. However, females were still selected based on pedigree and performances and not with genomic selection. Thus, this study focused on male selection candidates. To get closer to the true breeding values for the males, a genomic evaluation “Full_HD” of the G1 candidates was performed with all available information (phenotypes and genotypes) from G0 to G3. These “Full_HD” GEBV led to closer to the true breeding values of the G1 candidates which cannot be calculated. These “Full_HD” GEBV represented the maximum of relative accuracy attainable regarding this genomic evaluation with all information and were calculated only for the 67 G1 breeders, which had at least 10 offspring in G2. Then, these “Full_HD” GEBV were compared by Pearson correlations with the previous GEBV based on ancestry “Anc_Imputed” with imputed HD genotyping of the breeders, for each simulated low-density SNP chip.Finally, imputed HD genotyping of the candidates were replaced again by their low-density genotyping without imputation. The “Full_HD” GEBV of the 67 G1 breeders were compared by Pearson correlations with their GEBV obtained with low-density genotyping without imputation (“Anc_Not_Imputed” GEBV). The fourth objective was thus to investigate the impact of a direct use of low-density SNP chips without imputation on relative accuracy of genomic evaluation (Figure 2D).The 4 traits were jointly estimated according to a classical multitrait animal model: where is a vector of the 4 traits of each individual, is the vector of means of each trait, is a vector of fixed effects including batches, battery and position in the battery, is a vector of genomic breeding values, and is a vector of random residual effects. and are design matrixes relating respectively phenotypes to fixed effects and phenotypes to genomic breeding values (). It is assumed that where is the genetic relationship matrix combining SNP information and pedigree data (Legarra et al., 2009) and W is the matrix of variance and covariance of the genomic breeding values of the 4 traits. Finally, where is the identity matrix and is the matrix of residual variance and covariance of the 4 traits.
Software
FImpute V2.2 (Sargolzaei et al., 2014) was used to impute the selection candidates with low-density genotyping to HD genotyping from the individuals of G0 with HD genotyping.The scenario with all available information (Full_HD) was used to estimate the genetic parameters of the model. Remlf90 (Misztal et al., 2002) was used to estimate the genetic and residual variance components. Once fixed, all different genomic evaluations based on ancestry were performed with Blupf90. The variance components were compared with components estimated with a pedigree based model using all phenotypes. They were highly correlated (Picard Druet et al., 2020).
Results and discussion
All the results concerning imputation accuracy were presented in the study by Herry et al. (2018), but the evolution of the mean correlations between true and imputed genotypes for the 2 different methodologies are recalled in Figure 3. For both methodologies, there was an increase in mean correlation with an increase in the number of SNP on the different low-density SNP chips. Better imputation accuracies were obtained with the LD methodology at an equivalent SNP density. The differences observed in mean correlation between the 2 methodologies were all significant. In addition, for the EQ methodology at a very low density of 1K SNP, the mean correlation was 0.7098 indicating a quite deteriorated imputation accuracy. This corresponded to a genotyping imputation error rate of 18.5%.
Figure 3
Mean correlations between true and imputed genotypes according to the number of SNP on low-density SNP chips for EQ and LD methodologies. EQ, equidistant; LD, linkage disequilibrium; SNP, single nucleotide polymorphism.
Mean correlations between true and imputed genotypes according to the number of SNP on low-density SNP chips for EQ and LD methodologies. EQ, equidistant; LD, linkage disequilibrium; SNP, single nucleotide polymorphism.These results were consistent with those found in the literature (Dassonneville et al., 2012, Carvalheiro et al., 2014) where an increase in the number of SNP in the low-density SNP chip led to better imputations.
Impact of Imputation Errors
The impact of imputation errors was investigated by comparing the results of a genomic evaluation based on ancestry with true HD genotyping or with imputed HD genotyping. Only the results for EW were shown to simplify the reading and because of the similarity of the results for the other traits.
Results for the Top 150 Individuals
For both methodologies (Figure 4A), there was an increase in Spearman correlations between “Anc_HD” GEBV and “Anc_imputed” GEBV with an increase in SNP density. Indeed, for the LD0.05 and LD0.8 SNP chips, the mean correlations were respectively 0.8661 and 0.9931. For the 3Kequi and 20Kequi SNP chips, there were 0.9045 and 0.9885, respectively. These results are in agreement with imputation accuracies obtained with the different low-density SNP chips. There was an increase in mean correlation concerning the evaluations with an increase in imputation accuracy, which is consistent with the literature. Moghaddar et al. (2015) showed, for Merino sheep, that the mean correlations between GEBV based on true genotypes (50K) and GEBV based on imputed genotypes (50K imputed from 12K) increased with imputation accuracies.
Figure 4
Spearman correlations between GEBV based on ancestry obtained with true HD genotyping and GEBV based on ancestry obtained with imputed HD genotyping. Results are shown for egg weight and for the top 150 individuals (A) or the 67 breeders (B) according to the number of SNP on the low density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.
Spearman correlations between GEBV based on ancestry obtained with true HD genotyping and GEBV based on ancestry obtained with imputed HD genotyping. Results are shown for egg weight and for the top 150 individuals (A) or the 67 breeders (B) according to the number of SNP on the low density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.It was noticed that for both methodologies, with more than 5K SNP, the mean correlations were above 0.90 indicating a reranking rather reduced of the best individuals for EW. However, for the 1Kequi SNP chip, the mean correlation was 0.7833 indicating a reordering quite important of the best individuals for EW.Finally, at equivalent SNP density of 3K SNP, the EQ methodology seemed to present higher results than the LD methodology with mean GEBV correlations of 0.9045 and 0.8661, respectively, for the 3Kequi and LD0.05. But the differences were not significant because the standard errors were ± 0.04 for both SNP chips. At a density of 20K SNP, both methodologies were equivalent with mean GEBV correlations of 0.9885 and 0.9931, respectively for the 20Kequi and LD0.8. However, as seen previously, the LD methodology appeared to be better to get good imputation accuracies. Thus, higher imputation accuracies with the LD methodology were not synonymous of better mean correlations between GEBV compared with the EQ methodology. This could be due to the methodology itself. Indeed, Harris and Johnson (2010) and Weigel et al. (2010) reported that an EQ methodology was better to get good genomic evaluation results for traits controlled by many small QTL, which is the case for the 4 traits studied. On the contrary, genomic evaluations concerning traits controlled by few large QTL were more sensitive to the EQ methodology, which was consequently not the most appropriated methodology. Moreover, ssGBLUP methodology considers a same variance for each SNP (Legarra et al., 2009) and consequently would favor the EQ methodology. Finally, another reason could be because of the errors performed with imputation. Some imputation errors from LD SNP chips could degrade more GEBV estimation than imputation errors from EQ SNP chips. The EQ methodology would be more robust than the LD methodology in case of imputation errors.
Results for the Breeders
Spearman correlations between “Anc_HD” GEBV and “Anc_Imputed” GEBV were also calculated for the 67 G1 breeders having at least 10 offspring in the next generation G2. For both methodologies (Figure 4B), there was an increase in Spearman correlations with an increase in SNP density. Indeed, for the LD0.05 and LD0.08 SNP chips, the mean GEBV correlations were 0.9777 and 0.9979, respectively. For the 3Kequi and the 20Kequi SNP chips, the results were 0.9771 and 0.9972, respectively. Thus, the results were higher than the results for the top 150 individuals. This is due to the distribution of the 67 breeders, which were not the best breeders of G1 for EW but the best for a set of selection criteria. This was confirmed by plotting the normal distribution of HD GEBV estimated on ancestry with true HD genotyping for all G1 candidates (Figure 5). The 67 breeders (in red on the plot) were well distributed among the 580 individuals of G1, which reduced the reordering of the individuals.
Figure 5
Normal distribution of all G1 selection candidates according to their HD GEBV of egg weight (EW) estimated on ancestry with true HD genotyping. Red dots represent the 67 G1 breeders, green dots represent the top 150 individuals for EW, and blue dots represent the other selection candidates. GEBV, genomic estimated breeding value; HD, high density.
Normal distribution of all G1 selection candidates according to their HD GEBV of egg weight (EW) estimated on ancestry with true HD genotyping. Red dots represent the 67 G1 breeders, green dots represent the top 150 individuals for EW, and blue dots represent the other selection candidates. GEBV, genomic estimated breeding value; HD, high density.The results also showed that even with an SNP density superior to 2K SNP, good mean correlations (superior to 0.95) could be obtained indicating a very reduced reranking of the individuals. With only 5K SNP imputed to the HD SNP chips, mean correlations higher than 0.98 could be reached.However, with the 1Kequi SNP chip, the mean GEBV correlation was lower than 0.95. This decrease in correlation was also illustrated by Cleveland and Hickey (2013) in pigs. They used only 450 SNP imputed to the Illumina PorcineSNP60 BeadChip, which resulted in a decrease in correlation to 0.866 (for an imputation accuracy of 0.914). Thus, by decreasing the SNP density too much, the reduced imputation accuracies can have negative consequences on genomic evaluations.Finally, our results did not show any difference between EQ and LD methodologies.
Impact of the Absence of Imputation
Given the good results of genomic evaluations with imputed genotyping, the impact of the absence of imputation was studied. Only the results for EW were shown to simplify the reading and because of the similarity of the results for the other traits.For the top 150 individuals for both methodologies (Figure 6A), there was an increase in Spearman correlation between “Anc_HD” GEBV and “Anc_Not_Imputed” GEBV with an increase in SNP density. Indeed, the mean correlations for the 3Kequi and the 20Kequi SNP chips were 0.8507 and 0.9379, respectively. For the LD0.05 and the LD0.8 SNP chips, they were 0.7816 and 0.8658, respectively. Zhang et al. (2011) had showed in simulation studies that compared with the results of a genomic evaluation performed with the HD SNP chip, the results of genomic evaluations performed with low-density SNP chips without imputation also decreased. With an effective population size of 100, heritability of 0.5, 241 QTL, and a SNP chip of 10K markers, the relative accuracy of the GBLUP evaluation decreased from 0.88 with 5K markers to 0.69 with only 200 markers.
Figure 6
Spearman correlations between GEBV based on ancestry obtained with true HD genotyping and GEBV based on ancestry obtained with low-density genotyping (without imputation). Results are shown for egg weight and for the top 150 individuals (A) or the 67 breeders (B) according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.
Spearman correlations between GEBV based on ancestry obtained with true HD genotyping and GEBV based on ancestry obtained with low-density genotyping (without imputation). Results are shown for egg weight and for the top 150 individuals (A) or the 67 breeders (B) according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.For both methodologies, there was a consequent decrease in mean correlations compared with the results of the genomic evaluations performed with imputed HD genotyping. For the 1Kequi and the 50Kequi SNP chips, both imputed, the results were 0.7833 and 0.9964, respectively. Without imputation, the results were 0.6261 and 0.9503, respectively. Similarly, for the LD0.05 and the LD0.8 SNP chips with imputation, the results were 0.8661 and 0.9931, respectively. Without imputation, the results decreased to 0.7816 and 0.8658, respectively. Furthermore, for 20K SNP, the results for the EQ methodology seemed to reach a mean correlation threshold of 0.95, whereas with imputation, the mean correlations were higher than 0.99. Thus, imputations enabled to increase significantly the mean correlations, mainly for very low-density SNP chips. In addition, these results indicate that the ranking the best 150 individuals of G1 for EW obtained without imputation was quite different from the ranking obtained with HD genotyping. The lower results obtained for very low SNP density indicated that using few SNP could not be sufficient to accurately rank individuals having very close genomes.Finally, at equivalent SNP density, a tendency to get higher results with the EQ methodology was observed. Indeed, at 3K SNP, the difference in mean correlation between 3Kequi and LD0.05 SNP chips was equal to 0.07. The same difference was obtained between 20Kequi and LD0.8 SNP chips. Such differences were higher than with imputation but were not significant. However, we can note that the correlations remained always lower than 0.90 for the top 150 individuals whatever the SNP density with the LD methodology without imputation. The differences between methodologies are consistent with the genetic determinism of the 4 traits as explained in the previous part (Harris and Johnson, 2010, Weigel et al., 2010). In addition, the EQ methodology enabled a covering of all chromosomes more optimally than the LD methodology (Herry et al., 2018). With the LD methodology, there were some gaps on chromosomes without SNP selected on low-density SNP chips. With the EQ methodology, the number of gaps was decreased or at least their size was lower.Spearman correlations between “Anc_HD” GEBV and “Anc_Not_Imputed” GEBV were also calculated for the 67 breeders (Figure 6B). For both methodologies, there was an increase in Spearman correlations with an increase in SNP density. At equivalent SNP density, the results for the 3Kequi and 20Kequi SNP chips were 0.9484 and 0.9802, respectively. For the LD0.05 and LD0.8, the results were 0.9349 and 0.9665, respectively. Compared with the results of the top 150 individuals, the results were better for the 67 breeders as shown previously in the scenario with imputation. Finally, for an SNP density higher than 3K, the mean correlations were higher than 0.94 for both methodologies, indicating a reordering rather reduction of the 67 breeders. In bovine, Weigel et al. (2009) showed that compared with the top 500 bulls selected from progeny testing, 306 were truly selected with 32K SNP chosen from the Illumina BovineSNP50 Bead Chip. With 2K equally spaced SNP, 292 bulls were chosen. With only 500 equally spaced SNP, 247 bulls were chosen. This illustrates that compared with the HD SNP chip, the reranking was limited and that even with few SNP, the reordering of the individuals was limited.Compared with the results obtained with imputation, there was a slight decrease in correlations with “Anc_HD” GEBV. Indeed, for the 1Kequi and 50Kequi SNP chips, the results were 0.9316 (±0.0451) and 0.9983 (±0.0072) with imputation, respectively, and 0.8718 (±0.0608) and 0.9815 (±0.0238) without imputation, respectively. Similarly, for the LD0.05 and the LD0.8, the results were 0.9777 (±0.0261) and 0.9979 (±0.0080) with imputation, respectively, and 0.9349 (±0.0440) and 0.9665 (±0.0318) without imputation, respectively. Thus, the differences observed for both methodologies were not significant, and the results were still high whichever the SNP chip used. These results were rather different from those obtained by Aliloo et al. (2018). They showed in bovine, for 1,034 individuals, that correlations between HD GEBV (on 777K genotypes) and GEBV based on imputed HD genotyping were significantly higher than without imputation. Indeed, according to their MAF within interval method, which was the closest to our EQ methodology, using 4,013 and 25,410 SNP imputed to 777K SNP resulted in correlations of 0.9398 and 0.9927, respectively. These results decreased dramatically without imputation with correlations of 0.6485 and 0.8598, respectively. Such a large decrease was not observed in our study.Finally, the differences observed between the 2 methodologies were also not significant. Consequently, the simpler EQ methodology seems to be sufficient to get good genomic evaluation results for traits controlled by many small QTL, which is the case for the 4 traits studied.
Impact of Imputation on Relative Accuracy of Genomic Evaluation
The impact of imputation on the attainable relative accuracy of genomic evaluations was studied by comparing a genomic evaluation “Full_HD” of the 67 G1 breeders using all available information (phenotypes and genotypes) from generation G0 to G3 and GEBV of the G1 breeders based on ancestry with imputed HD genotyping (“Anc_Imputed” GEBV), for each low-density SNP chip. Only the results for EW were shown to simplify the reading and because of the similarity of the results for the other traits.It was noticed (Figure 7) for the EQ methodology a slight increase in Pearson correlations from very low density SNP chips to 20K SNP. Indeed, for the 1Kequi and the 20Kequi SNP chips, the mean correlations were 0.4472 and 0.4854, respectively. But for the LD methodology, the results were rather stable with mean correlations of 0.4917 and 0.4875 for the LD0.05 and LD0.8 SNP chips, respectively. For both methodologies, the results varied slightly up to 20K SNP. They became steady for the EQ methodology from 20K to higher SNP densities. Finally, for both methodologies, the correlations of “Anc_Imputed” GEBV with “Full_HD” GEBV were not significantly different from those obtained by comparison between true HD GEBV on ancestry and “Full_HD” GEBV. The mean correlation was 0.4848 and corresponded to a theoretical maximum value attainable. The standard error for each low-density SNP chip was ±0.11 indicating that there was no difference with the theoretical maximum value. For information purposes, the mean correlations for ESC, ESS, and AH were 0.2618 ± 0.12, 0.4027 ± 0.11 and 0.4802 ± 0.11, respectively. This is consistent with the previous results showing a very slight impact of imputations errors on GEBV estimations of the 67 breeders on ascendance. For both methodologies, from a density of 5K SNP imputed to the HD SNP chip, the mean correlations were higher than 0.98 between “Anc_HD” GEBV and “Anc_Imputed” GEBV. These results are also in agreement with the literature. Indeed, Harris and Johnson (2010) showed that in bovine, from 5K to 1000K SNP, the increase in correlations between true phenotypes and predicted phenotypes was very limited (0.62–0.65). VanRaden et al. (2012) showed that for 28 traits tested in bovine, in average, the estimated genomic reliability was 61.1% with 300K SNP and decreased to only 60.7% when they used 45K SNP. In the study by Wellman et al. (2013), 768 SNP imputed to the Illumina PorcineSNP60 BeadChip (60K SNP) led to a negligible loss in genomic evaluation accuracy. Likewise, Chen et al. (2014) estimated in bovine that the accuracy of genomic prediction with observed 50K or imputed 50K (from 6K) genotypes was 0.61 for milk yield and 0.62 for SCS.
Figure 7
Pearson correlations between “Full_HD” GEBV based on offspring with true HD genotyping and GEBV based on ancestry with imputed HD genotyping. Results are shown for egg weight and for the 67 G1 breeders according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.
Pearson correlations between “Full_HD” GEBV based on offspring with true HD genotyping and GEBV based on ancestry with imputed HD genotyping. Results are shown for egg weight and for the 67 G1 breeders according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.However, a decrease in relative accuracy was observed with the 1Kequi SNP chip with a mean correlation of 0.4472. The highest decrease was observed for AH where the mean correlation for the 1Kequi SNP chip was 0.4045 (±0.11), and the theoretical maximum value was 0.4802. One cannot conclude about the significance of this difference, but this decrease was also expected because the results regarding the impact of imputation accuracies showed a mean correlation of 0.9316 for the 1Kequi SNP chip. Other studies showed that decreasing too much of the SNP density has consequences on genomic evaluation accuracies. Raoul et al. (2017) illustrated this point in Merino sheep where using only 500 or 250 SNP imputed to the Illumina OvineSNP50 BeadChip resulted respectively in a decrease in accuracies from 0.53 (with HD SNP chip) to 0.45 and 0.38. Wellman et al. (2013) showed that 384 SNP imputed to the Illumina PorcineSNP60 BeadChip led to a loss of 3% in genomic evaluation accuracy. Likewise, Chen et al. (2014) showed that the accuracy of genomic prediction decreased from 0.61 to 0.49 for milk yield and from 0.62 to 0.53 for SCS with imputed 50K genotypes from 384 SNP.Consequently, we can conclude that the effects of imputation errors on GEBV relative accuracies were very limited even if slightly more important for very low densities.
Impact of the Direct Use of Low Density SNP Chips Without Imputation on Relative Accuracy of Genomic Evaluation
The impact of the direct use of low-density SNP chips on relative accuracy of genomic evaluation was studied by comparing the “Full_HD” GEBV of the G1 and GEBV of the G1 breeders on ancestry with low-density genotyping without imputation (“Anc_Not_Imputed” GEBV), for each low-density SNP chip. For both methodologies, only the results for EW were shown to simplify the reading and because of the similarity of the results for the other traits.Both methodologies were rather stable with slight variations in Pearson correlations up to 20K SNP (Figure 8). The results for the 3Kequi and 20Kequi SNP chips were 0.4471 and 0.4675, respectively. For the LD0.05 and LD0.8, the correlations were 0.4583 and 0.4888, respectively. However, the standard errors associated to these results were ±0.11, and the correlation between the “Full HD” GEBV and the HD GEBV based on ancestry was 0.4848. This indicates that the differences observed between each low density SNP chip, and consequently between the 2 methodologies, were not significant. These results are in agreement with the previous results showing a very slight impact of the absence of imputation on GEBV estimation of the 67 breeders on ascendance. However, the result for the 1Kequi was 0.4018 (±0.11). This lower but nonsignificant result was also expected because the correlation between “Anc_HD” GEBV and “Anc_Not_Imputed” GEBV was lower (0.8718 ± 0.0608) than those obtained with higher SNP densities. This was the case for all traits studied.
Figure 8
Pearson correlations between “Full_HD” GEBV based on offspring with true HD genotyping and GEBV based on ancestry with low-density genotyping (without imputation). Results are shown for egg weight and for the 67 G1 breeders according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.
Pearson correlations between “Full_HD” GEBV based on offspring with true HD genotyping and GEBV based on ancestry with low-density genotyping (without imputation). Results are shown for egg weight and for the 67 G1 breeders according to the number of SNP on the low-density SNP chip for both methodologies. GEBV, genomic estimated breeding value; HD, high density; SNP, single nucleotide polymorphism.The results found in the literature are contrasted. Moghaddar et al. (2015) showed in Merino sheep, that the accuracy of genomic prediction based on observed 50K genotypes was 0.446 for postweaning weight (PWW) and 0.219 for postweaning eye muscle depth (PW_EMD). Based on genotypes imputed from 12K to 50K genotypes, with imputation accuracy comprised between 0.88 and 0.99, the accuracy of genomic prediction was 0.443 for PWW and 0.219 for PW_EMD. Based on observed 12K genotypes, the accuracy was 0.412 for PWW and 0.205 for PW_EMD. Thus, the results were slightly better with imputation compared with a direct use of the 12K without imputation, but in both cases, there was not a dramatic decrease in genomic prediction accuracy despite a significant gap of SNP density between HD and low-density chips. Weigel et al. (2009) had a gap of SNP density closer to our’s but the results were rather different. The correlation between the results from progeny testing and the genomic result with a HD SNP chip of 32K was 0.612. With 300K, 1K and 2K equally spaced SNP the results were 0.253, 0.422, and 0.539, respectively. Contrary to the results of Moghaddar et al. (2015), there was a significant decrease in their results with the use of low density SNP chips without imputation. In 2010, they had showed that their results were better with imputation.Finally, for an SNP density higher than 3K, using low-density SNP chips without imputation led to results as good as those obtained with the HD SNP chip itself.
Conclusions
This study showed a very limited reordering of the breeders, selected on a multitraits index, with low-density genotyping (with or without imputation) instead of HD genotyping. Indeed, Spearman correlations between GEBV on HD genotyping and GEBV on low-density genotyping were always higher than 0.94 with more than 3K SNP. For the top 150 individuals who are genetically closer than the breeders, the reordering was a bit more important. Thus, the correlations between GEBV with HD genotyping and GEBV with low-density genotyping remained lower than 0.85 with less than 3K SNP with the EQ methodology and less than 16K SNP (LD0.6) with the LD methodology. The differences in GEBV correlations between the 2 methodologies were never significant but seemed to indicate that the simpler EQ methodology was sufficient to obtain similar results.Thus, directly using low-density SNP chips designed with the EQ methodology with more than 5K SNPs could enable to get good results of genomic evaluation and could be a cost effective solution for genomic selection. However, only 4 traits were studied. These 4 traits were controlled by many small QTL, which explained why the EQ methodology was more appropriated to realize genomic evaluation with ssGBLUP than the LD methodology, whereas the results on imputation accuracies were inverted. Further investigations on other traits with different genetic architectures should be conducted.Finally, as shown by Habier et al. (2009), there could be a decrease in genomic evaluation accuracy over the generations with low-density genotyping. This would require to genotype at HD birds selected at each generation to avoid a decrease in genomic evaluation accuracy, which could be prejudicial for genomic selection. In addition, in our study, only the males were genotyped, but having both parents genotyped could lead to higher genomic evaluation accuracies.
Authors: C Wang; D Habier; B L Peiris; A Wolc; A Kranis; K A Watson; S Avendano; D J Garrick; R L Fernando; S J Lamont; J C M Dekkers Journal: Poult Sci Date: 2013-07 Impact factor: 3.352
Authors: K A Weigel; G de los Campos; O González-Recio; H Naya; X L Wu; N Long; G J M Rosa; D Gianola Journal: J Dairy Sci Date: 2009-10 Impact factor: 4.034
Authors: P M VanRaden; D J Null; M Sargolzaei; G R Wiggans; M E Tooker; J B Cole; T S Sonstegard; E E Connor; M Winters; J B C H M van Kaam; A Valentini; B J Van Doormaal; M A Faust; G A Doak Journal: J Dairy Sci Date: 2012-10-11 Impact factor: 4.034
Authors: Andreas Kranis; Almas A Gheyas; Clarissa Boschiero; Frances Turner; Le Yu; Sarah Smith; Richard Talbot; Ali Pirani; Fiona Brew; Pete Kaiser; Paul M Hocking; Mark Fife; Nigel Salmon; Janet Fulton; Tim M Strom; Georg Haberer; Steffen Weigend; Rudolf Preisinger; Mahmood Gholami; Saber Qanbari; Henner Simianer; Kellie A Watson; John A Woolliams; David W Burt Journal: BMC Genomics Date: 2013-01-28 Impact factor: 3.969