| Literature DB >> 35183111 |
Christina M Dauben1, Christine Große-Brinkhaus2, Esther M Heuß1, Hubert Henne3, Ernst Tholen1.
Abstract
Next-generation sequencing is a promising approach for the detection of causal variants within previously identified quantitative trait loci. Because of the costs of re-sequencing experiments, this application is currently mainly restricted to subsets of animals from already genotyped populations. Imputation from a lower to a higher marker density could represent a useful complementary approach. An analysis of the literature shows that several strategies are available to select animals for re-sequencing. This study demonstrates an animal selection workflow under practical conditions. Our approach considers different data sources and limited resources such as budget and availability of sampling material. The workflow combines previously described approaches and makes use of genotype and pedigree information from a Landrace and Large White population. Genotypes were phased and haplotypes were accurately estimated with AlphaPhase. Then, AlphaSeqOpt was used to optimize selection of animals for re-sequencing, reflecting the existing diversity of haplotypes. AlphaSeqOpt and ENDOG were used to select individuals based on pedigree information and by taking into account key animals that represent the genetic diversity of the populations. After the best selection criteria were determined, a subset of 57 animals was selected for subsequent re-sequencing. In order to evaluate and assess the advantage of this procedure, imputation accuracy was assessed by setting a set of single nucleotide polymorphism (SNP) chip genotypes to missing. Accuracy values were compared to those of alternative selection scenarios and the results showed the clear benefits of a targeted selection within this practical-driven approach. Especially imputation of low-frequency markers benefits from the combined approach described here. Accuracy was increased by up to 12% compared to a randomized or exclusively haplotype-based selection of sequencing candidates.Entities:
Mesh:
Year: 2022 PMID: 35183111 PMCID: PMC8858453 DOI: 10.1186/s12711-022-00706-w
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Fig. 1Workflow for the evaluation of imputation accuracies. Number of animals and SNP genotypes after quality control included in imputation steps; HD: high marker density, LD: low marker density, LR: Landrace, LW: Large White, LD1: Imputation scenario 1 (Imputation of 10,000 SNPs), LD2: Imputation scenario 2 (Imputation of 50% of the SNP set)
Fig. 2Distribution of candidates among generations and breeds. Results from the H approach n = 100, P1 approach n = 100 and P2 approach n = 148(LR)/117(LW); Generation 1: most recent generation
Fig. 3Venn diagram of animals selected by the haplotype-based (H) and pedigree-based (P) approaches. Total number of animals per method: H approach n = 100, P1 approach (AlphaSeqOpt [25]) n = 100, P2 approach (ENDOG v.4.8 [26]) n = 148(LR)/117(LW)
Distribution among breeds and generations of animals selected for re-sequencing using the combined (C) approach
| Generation | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Landrace | 8 | 8 | 7 | 4 | 1 | 0 |
| Large White | 8 | 12 | 5 | 2 | 1 | 1 |
Generation 1: most recent generation of animals
Imputation accuracy (r) of masked chip genotype data from lower (LD1, LD2) to higher marker density in Landrace (LR) and Large White (LW) using different reference panels
| Breed | Reference panel | Imputation scenario | MAF 1–3% | MAF 3–5% | MAF >5% | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Max. | Min. | Max. | Min. | Max. | ||||||
| LR | com | LD1 | 86.53% | 83.97% | 88.95% | 86.30% | 84.51% | 87.73% | 86.30% | 84.51% | 87.73% |
| LD2 | 83.63% | 82.26% | 85.23% | 83.60% | 82.00% | 85.77% | 93.13% | 92.88% | 93.29% | ||
| ran | LD1 | 74.13% | 63.52% | 84.26% | 75.86% | 66.82% | 81.95% | 86.81% | 71.26% | 88.66% | |
| LD2 | 75.35% | 70.80% | 80.38% | 77.65% | 73.68% | 81.18% | 87.83% | 87.23% | 88.28% | ||
| hap | LD1 | 82.81% | 80.38% | 85.43% | 84.64% | 81.24% | 86.83% | 94.10% | 93.88% | 94.52% | |
| LD2 | 81.21% | 78.81% | 83.31% | 82.14% | 80.74% | 84.30% | 92.98% | 92.79% | 93.20% | ||
| LW | com | LD1 | 92.68% | 92.47% | 92.85% | 88.68% | 86.37% | 89.97% | 93.94% | 93.70% | 94.16% |
| LD2 | 86.52% | 85.22% | 88.23% | 86.20% | 84.93% | 87.47% | 92.71% | 92.59% | 92.90% | ||
| ran | LD1 | 83.30% | 76.37% | 88.20% | 82.24% | 76.15% | 86.73% | 89.90% | 77.65% | 90.86% | |
| LD2 | 79.22% | 73.35% | 83.88% | 78.68% | 74.33% | 83.88% | 88.35% | 87.41% | 88.94% | ||
| hap | LD1 | 84.59% | 81.45% | 87.05% | 87.36% | 85.79% | 89.46% | 93.97% | 93.77% | 94.17% | |
| LD2 | 83.37% | 81.34% | 85.35% | 85.46% | 84.05% | 86.54% | 92.86% | 92.66% | 93.03% | ||
Imputation accuracy (r): correlation between true and imputed genotypes, com: combined sample, ran: random sample, hap: haplotype sample, LD1: 10,000 SNPs (23.1%) were set to missing, LD2: 50% of the SNPs were set to missing, MAF: Minor allele frequency
Fig. 4Imputation accuracy using the combined sample and the random sample as reference panels in Landrace (a) and Large White (b)