| Literature DB >> 29974966 |
J Friedrich1, R Antolín1, S M Edwards1, E Sánchez-Molano1, M J Haskell2, J M Hickey1, P Wiener1.
Abstract
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low-density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome-wide association analysis and genomic prediction in Labrador Retrievers.Entities:
Keywords: genome-wide association studies; genomic prediction; imputation accuracy; low-density array design; pedigree information; reference set
Mesh:
Year: 2018 PMID: 29974966 PMCID: PMC6055857 DOI: 10.1111/age.12677
Source DB: PubMed Journal: Anim Genet ISSN: 0268-9146 Impact factor: 3.169
Overview of scenarios
| Name |
| SNPsmasked (%) | Pedigree | Reference set |
|---|---|---|---|---|
| Ref90 | 1062/117 | 87.5 | Yes | Random |
| Ref90NoPed | 1062/117 | 87.5 | No | Random |
| Ref50 | 590/589 | 87.5 | Yes | Random |
| Ref50NoPed | 590/589 | 87.5 | No | Random |
| Ref10 | 117/1062 | 50–98.4 | Yes | Random |
| Ref10NoPed | 117/1062 | 87.5 | NO | Random |
| REL | 206/454 | 87.5 | Yes | Systematic |
| REL‐C | 206/454 | 87.5 | Yes | Random |
n Ref, number of dogs in the reference set; n Val, number of dogs in the validation set; SNPsmasked, proportion of SNPs in the high‐density array that were masked to generate the low‐density array; random, dogs randomly grouped into reference and validation sets; systematic, dogs in the validation set with at least one half‐sibling in the reference set.
For Ref10, masking of SNPs was step‐wise increased by 50% to generate multiple low‐density arrays with 50%, 75%, 87.5%, 93.8%, 96.9% and 98.4% masked SNPs.
Animal‐wise imputation accuracy by scenario
| Scenario | Proportion of correctly imputed genotypes (% correct) | Correlation between true and imputed genotypes (corr) | ||
|---|---|---|---|---|
| Average | SD | Average | SD | |
| Ref90 | 98.626 | 1.677 | 0.948 | 0.078 |
| REf90NoPed | 98.553 | 1.674 | 0.946 | 0.078 |
| Ref50 | 98.390 | 1.819 | 0.939 | 0.088 |
| Ref50NoPed | 98.315 | 1.817 | 0.938 | 0.088 |
| Ref10 | 97.432 | 2.359 | 0.916 | 0.095 |
| Ref10NoPed | 97.373 | 2.351 | 0.915 | 0.095 |
| REL | 98.792 | 1.213 | 0.972 | 0.035 |
| REL‐C | 97.668 | 2.265 | 0.926 | 0.086 |
Statistics were calculated across all 10 replicates for the particular scenarios except for REL, for which there were no replicates.
Dogs were randomly grouped into the reference and the validation sets, and 87.5% of genotypes were masked in the high‐density array to generate the low‐density array; Ref90, 90% of dogs in the reference set; Ref50, 50% of dogs in the reference set; Ref10, 10% of dogs in the reference set; NoPed, indicates that the imputation of the particular variant was run without pedigree information.
Dogs in the reference set (31%) had at least one half‐sibling in the validation set (REL; 69%). The REL‐C controls had the same number of dogs as REL, but dogs were selected at random for the reference and the validation sets. In REL and REL‐C, 87.5% of genotypes were also masked in the high‐density array to generate the low‐density array.
Figure 1Animal‐wise imputation accuracy vs. SNP density of the low‐density array. Boxplots (maximum, 75% quartile, median, 25% quartile, minimum) show animal‐wise accuracy measurements: (a) the correctly imputed genotypes (% correct) and (b) correlation between true genotypes and imputed genotypes (corr) vs. different levels of masking of the high‐density array to generate the low‐density array (for which 10% of dogs were randomly grouped into the reference set and the remaining 90% into the validation set, scenario Ref10).
Figure 2Marker‐wise correlation between true genotypes and imputed genotypes vs. the minor allele frequency of masked SNPs for different proportions of masked SNPs in the low‐density array (for which 10% of dogs were randomly grouped into the reference set and the remaining 90% into the validation set, scenario Ref10).
Figure 3Effect sizes of SNPs for the trait Norberg Angle right calculated by a GWAS using the true genotypes (GWAS real) and imputed genotypes (GWAS imputed).