| Literature DB >> 22958449 |
Ziqing Weng1, Zhe Zhang, Xiangdong Ding, Weixuan Fu, Peipei Ma, Chonglong Wang, Qin Zhang.
Abstract
Missing genotypes are a common feature of high density SNP datasets obtained using SNP chip technology and this is likely to decrease the accuracy of genomic selection. This problem can be circumvented by imputing the missing genotypes with estimated genotypes. When implementing imputation, the criteria used for SNP data quality control and whether to perform imputation before or after data quality control need to consider. In this paper, we compared six strategies of imputation and quality control using different imputation methods, different quality control criteria and by changing the order of imputation and quality control, against a real dataset of milk production traits in Chinese Holstein cattle. The results demonstrated that, no matter what imputation method and quality control criteria were used, strategies with imputation before quality control performed better than strategies with imputation after quality control in terms of accuracy of genomic selection. The different imputation methods and quality control criteria did not significantly influence the accuracy of genomic selection. We concluded that performing imputation before quality control could increase the accuracy of genomic selection, especially when the rate of missing genotypes is high and the reference population is small.Entities:
Year: 2012 PMID: 22958449 PMCID: PMC3436610 DOI: 10.1186/2049-1891-3-6
Source DB: PubMed Journal: J Anim Sci Biotechnol ISSN: 1674-9782
Descriptive statistics and accuracies of the EBVs of three milk production traits
| Traits | Mean (range) | Standard deviation | Mean reliability (range) |
|---|---|---|---|
| Milk yield | 379.36 (-1, 667.00 to 2, 552.00) | 608.65 | 0.63 (0.50 to 0.71) |
| Fat percentage | -0.07 (-0.90 to 0.91) | 0.27 | 0.52 (0.41 to 0.70) |
| Protein percentage | -0.01 (-0.42 to 0.32) | 0.10 | 0.52 (0.41 to 0.70) |
Datasets in the reference population generated using five different imputation and quality control strategies
| Strategy1 | No. of animals | No. of SNPs |
|---|---|---|
| Imputation before quality control | ||
| S1: Impute A - QC3% | 2, 092 | 45, 072 |
| S2: Impute B - QC3% | 2, 092 | 45, 614 |
| S3: Impute A - QC5% | 2, 092 | 43, 710 |
| S4: Impute A - QC0 | 2, 092 | 53, 973 |
| Imputation after quality control | ||
| S5: Impute A - QC3% | 2, 021 | 43, 481 |
| S6: Impute A - QC5% | 2, 021 | 41, 866 |
1Impute A, imputation with findhap v1 [9]; Impute B, missing genotypes were directly replaced with heterozygote.
Abbreviations: QC3%, SNP MAF > 3%; QC5%, SNP MAF > 5%; QC0, no requirement for MAF.
Estimated heritabilities (h2) and accuracies of the GEBVs measured as correlations between the GEBVs and the conventional EBVs in the validation population using different imputation and quality control strategies 1
| Trait | h2 | Accuracy | |||||
|---|---|---|---|---|---|---|---|
| S1 | S2 | S3 | S4 | S5 | S6 | ||
| Milk yield | 0.36 | 0.65 | 0.65 | 0.65 | 0.64 | 0.64 | 0.63 |
| Fat percentage | 0.41 | 0.74 | 0.74 | 0.74 | 0.74 | 0.73 | 0.72 |
| Protein percentage | 0.23 | 0.58 | 0.58 | 0.58 | 0.56 | 0.57 | 0.56 |
1S1-S6 represent the different imputation and quality control strategies as described in the footnotes to Table 2.