| Literature DB >> 23043356 |
Mehar S Khatkar1, Gerhard Moser, Ben J Hayes, Herman W Raadsma.
Abstract
BACKGROUND: We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.Entities:
Mesh:
Year: 2012 PMID: 23043356 PMCID: PMC3531262 DOI: 10.1186/1471-2164-13-538
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Description of different SNP chips and SNP subset panels
| 15K | 15K (ParAllele/Affymatrix) | 15,036 | 205 SNPs from BTA20 | |
| 25K | 25K (Affymatrix) | 25,068 | 328 SNPs from BTA20 | |
| 50K | Illumina BovineSNP50 BeadChip | 54,001 | 42,136 | |
| 3K | Illumina BovineSNP50 BeadChip | 3,000 | 3,000 | Evenly spaced Subset of 50K |
| 5K | Illumina BovineSNP50 BeadChip | 5,000 | 5,000 | Evenly spaced Subset of 50K |
| 10K | Illumina BovineSNP50 BeadChip | 10,000 | 10,000 | Evenly spaced Subset of 50K |
| 20K | Illumina BovineSNP50 BeadChip | 20,000 | 20,000 | Evenly spaced Subset of 50K |
| 35K | Illumina BovineSNP50 BeadChip | 35,000 | 35,000 | Evenly spaced Subset of 50K |
| BovineLD 7K | Illumina BovineLD BeadChip | 6,909 | 6,662 | |
| Bovine3K | Illumina Bovine3K BeadChip | 2,900 | 2,500 | |
| 800K | Illumina 800K BovineHD beadChip | 786,799 | 610,879 | |
| 800K-imputed | Illumina 800K BovineHD beadChip | 786,799 | 610,879 | Imputed best guess genotypes |
| 800K-dosage | Illumina 800K BovineHD beadChip | 786,799 | 610,879 | Imputed dosage for B-allele |
| 49K | Illumina BovineSNP50 BeadChip | 54,001 | 49,394 | Common SNP between 800K and 50K chip |
Composition of reference and test sets for evaluating imputation accuracy up to 50K
| 1 | 50K | 1363 | 50 | bulls | 1364 | 50 | bulls+cows | 2727 |
| 2 | 50K | 681 | 25 | bulls | 2046 | 75 | bulls+cows | 2727 |
| 3 | 50K | 272 | 10 | bulls | 2455 | 90 | bulls+cows | 2727 |
| 4 | 50K | 136 | 5 | bulls | 2591 | 95 | bulls+cows | 2727 |
| 5 | 50K | 27 | 1 | key bulls | 2700 | 99 | bulls+cows | 2727 |
| 6 | 50K | 2205 | 81 | all bulls | 522 | 19 | all cows | 2727 |
| 7 | 50K | 522 | 19 | all cows | 2205 | 81 | all bulls | 2727 |
| 8 | 50K | 1753 | 80 | training set bulls | 452 | 20 | test set young bulls | 2205 |
The total number of animals (2,727) consisted of 2,205 bulls and 522 cows.
Figure 1Comparison of 2-tiered and 3-tiered imputation framework. The 2-tiered framework is composed of top tier (reference panel) and lower tier (test panel). Three separate test panels (bottom tier) using three SNP densities, viz. Bovine3K, BovineLD 7K and 50K, were analysed. In 3-tiered framework an additional panel of 2205 samples with 50K genotypes is included as middle tier.
Figure 2Mean allelic error rate (%) of three imputation methods using different proportion of animals in reference and test sets for varying SNP density (3K-35K evenly spaced) in the test set. The results shown are for chromosome 20.
Mean allelic error rate of imputing SNP genotypes between different SNP chips obtained with IMPUTE2
| 15K by 50K | 25 | 1419 | 1065 | 354 | 1529 | 205 | 13 | 0.80 |
| 50 | 1419 | 710 | 709 | 1529 | 205 | 13 | 0.95 | |
| 75 | 1419 | 355 | 1064 | 1529 | 205 | 13 | 1.40 | |
| 50K by 15K | 25 | 1419 | 1065 | 354 | 1529 | 1324 | 87 | 2.85 |
| 50 | 1419 | 710 | 709 | 1529 | 1324 | 87 | 3.15 | |
| 75 | 1419 | 355 | 1064 | 1529 | 1324 | 87 | 4.25 | |
| 25K by 50K | 25 | 431 | 324 | 107 | 1652 | 328 | 20 | 1.50 |
| 50 | 431 | 216 | 215 | 1652 | 328 | 20 | 1.85 | |
| 75 | 431 | 108 | 323 | 1652 | 328 | 20 | 2.75 | |
| 50K by 25K | 25 | 431 | 324 | 107 | 1652 | 1324 | 80 | 2.75 |
| 50 | 431 | 216 | 215 | 1652 | 1324 | 80 | 2.75 | |
| 75 | 431 | 108 | 323 | 1652 | 1324 | 80 | 4.55 |
The results are shown for three SNP chips viz. 15K, 25K and 50K and chromosome 20.
Figure 3Mean allelic error rate (%) of imputing high density SNPs (800K) using 49K SNPs in the test set comparing two methods of imputation across all autosomes.
Figure 4Mean allelic error rate (%) of imputing high density SNPs (800K) using different number of SNPs in the test set by 2-tiered and 3-tiered approach. Scenario (a) included 425 reference and 420 test animals, scenario (b) included 41 reference and 420 test animals. In the 3-tiered approach, an additional set of 2205 bulls with 50K data is included as middle tier in both scenarios (a) and (b). The results shown are for chromosome 20.
Accuracy of prediction of direct genomic value (DGV) for 5 dairy traits based on Bovine3K, BovineLD 7K, 50K, imputed up to 50K, imputed up to 800K and imputed 800K-dosage
| 50K | - | 0.540 | 0.527 | 0.499 | 0.224 | 0.251 |
| Subset Bovine3K | - | 0.444 | 0.464 | 0.429 | 0.187 | 0.200 |
| Subset Bovine LD 7K | - | 0.481 | 0.516 | 0.443 | 0.186 | 0.232 |
| 50K-imputed (Test imputedAA using Bovine3K) | 3.86 | 0.533 | 0.523 | 0.496 | 0.200 | 0.244 |
| 50K-imputed (Test imputedA with BovineLD) | 2.30 | 0.546 | 0.531 | 0.507 | 0.214 | 0.246 |
| 50K-imputed (Train & Test imputedB using Bovine3K) | 5.52 | 0.505 | 0.515 | 0.481 | 0.207 | 0.245 |
| 50K-imputed (Train & Test imputedB using BovineLD) | 3.06 | 0.530 | 0.524 | 0.492 | 0.209 | 0.248 |
| 800K-imputedC | - | 0.558 | 0.530 | 0.526 | 0.232 | 0.256 |
| 800K-dosageC | - | 0.554 | 0.525 | 0.520 | 0.229 | 0.253 |
AGenotypes of 452 young bulls with subset of original SNPs were imputed (using IMPUTE2) up to 50K using 1753 bulls as reference set. Hence for DGV prediction entire test set (452 young bulls) had imputed genotypes and all the training bulls (1753) had actual 50K genotypes.
BGenotypes of 2055 bulls with subset of original SNPs were imputed (using IMPUTE2) up to 50K using 136 bulls as reference set. Hence for DGV prediction the entire test set (452 young bulls) and 1617 bulls out of the training set of 1753 bulls had imputed genotypes.
CData on 2205 bulls genotyped for 50K were imputed using IMPUTE2 up to 800K using 845 cows genotyped on 800K as reference.