| Literature DB >> 28250015 |
Hsin-Yuan Tsai1, Oswald Matika1, Stefan McKinnon Edwards1, Roberto Antolín-Sánchez1, Alastair Hamilton2, Derrick R Guy2, Alan E Tinch2, Karim Gharbi3, Michael J Stear4, John B Taggart5, James E Bron5, John M Hickey1, Ross D Houston6.
Abstract
Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents) are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates) are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i) to measure the accuracy of genotype imputation using medium (25 K) and high (78 K) density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii) to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight). Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83) was possible even with very low density SNP data (∼250 SNPs). The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can provide a cost-effective method for generating robust genome-wide SNP data for genomic prediction in Atlantic salmon. Genotype imputation approaches are likely to form a critical component of cost-efficient genomic selection programs to improve economically important traits in aquaculture.Entities:
Keywords: GenPred; Genomic Selection; Shared Data Resources; aquaculture; disease resistance; imputation
Mesh:
Year: 2017 PMID: 28250015 PMCID: PMC5386885 DOI: 10.1534/g3.117.040717
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
The SNP genotype densities used for the imputation analyses
| Original SNP Panel Used to Genotype All Animals | Genotypes Masked to Mimic LD SNP Panels in Offspring (%) | Number of SNPs in LD SNP Panels in Offspring |
|---|---|---|
| High density (78 K) | 90 | 7836 |
| 99 | 784 | |
| Medium density (25 K) | 90 | 2563 |
| 99 | 256 |
The original SNP panels were either high density (HD) or medium density (MD), which were masked in a (proportion of) the offspring to mimic genotyping with various low density panels.
Figure 1The effect of minor allele frequency on imputation accuracy. The plot shows the imputation accuracy for the MD SNP panel with the two different LD SNP panel densities (90% SNPs masked = 2563 SNPs; 99% SNPs masked = 256 SNPs), plotted against the minor allele frequency of the SNPs using a local regression fit.
Summary of genotype imputation accuracy
| SNP Panel | Offspring Genotyping Strategy | Genotypes Masked to Mimic LD SNP Panels in Offspring | |
|---|---|---|---|
| 90% | 99% | ||
| High density (78 K) | 100% LD | 0.85 | 0.76 |
| 75% LD and 25% HD | 0.90 | 0.85 | |
| Medium density (25 K) | 100% LD | 0.76 | 0.62 |
| 75% LD and 25% MD | 0.85 | 0.75 | |
The correlation between true genotypes and imputed genotypes is presented based on genotype data from the HD SNP platform (78 K) and the MD SNP platform (25 K), with either 90 or 99% of genotypes were masked in the offspring to mimic LD SNP platforms (Table 1). The proportion of offspring genotyped for the LD SNP platforms was either 100 or 75%.
Figure 2Variation of imputation accuracies across individual animals in MD SNP panel. The histograms show bins of imputation accuracy (x-axis), and the number of animals in those bins (y-axis) for the two different LD SNP panel densities (90% SNPs masked = 2563 SNPs; 99% SNPs masked = 256 SNPs).
Figure 3Breeding value prediction accuracies for (A) sea lice resistance and (B) body weight calculated using (i) the pedigree (PBLUP), compared to genomic prediction using (ii) the 256 SNP LD panel only, (iii) the 256 SNP LD panel imputed to 25 K SNPs (with all parents and 25% offspring genotyped at MD SNP panel), and (iv) the true genotypes for the 25 K MD SNP panel. For comparison, the accuracy of breeding value prediction under scenario (iv) is shown by the blue dashed line, and the corresponding accuracy under scenario (i) with the red dashed line.