| Literature DB >> 29281958 |
Steven G Larmer1,2, Mehdi Sargolzaei3,4, Luiz F Brito3, Ricardo V Ventura3,5, Flávio S Schenkel3.
Abstract
BACKGROUND: Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle.Entities:
Keywords: Cattle genomics; Genomic clustering; Genotype imputation; Sequencing data
Mesh:
Year: 2017 PMID: 29281958 PMCID: PMC5746022 DOI: 10.1186/s12863-017-0588-1
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary statistics of Clusters from 3 different algorithms
| Scenario | Number of clusters | Minimum cluster size | Maximum cluster size |
|---|---|---|---|
| ADMIXTURE | 4 | 190 | 346 |
| PLINK Genotype | 5 | 168 | 280 |
| PLINK Haplotype | 5 | 172 | 278 |
Accuracy of Imputation from 50 K to sequence, with either no clustering or different clustering algorithms used to determine reference population for imputation from 50 K to 777 K
| Scenario | Overall R2 | Time/Animal | Concordance | Concordance MAF < 0.05 |
|---|---|---|---|---|
| All | 0.819 | 0:03:47 | 0.934 | 0.974 |
| ADMIXTURE | 0.810 | 0:03:48 | 0.931 | 0.973 |
| PLINK Genotype | 0.807 | 0:03:37 | 0.930 | 0.973 |
| PLINK Haplotype | 0.810 | 0:03:44 | 0.931 | 0.973 |
Accuracy of imputation from 50 K to sequence, with either no clustering (ALL) or PLINK haplotype clustering (PLINKH) used to determine reference population for imputation from 777 K to sequence, and different reference populations (ALL or PLINKH) having been used for imputation from 50 K to sequence
| Scenario | Overall R2 | Time/Animal | Concordance | Concordance MAF < 0.05 | |
|---|---|---|---|---|---|
| 50 K | 777 K | ||||
| ALL | ALL | 0.819 | 0:03:47 | 0.935 | 0.974 |
| PLINKH | ALL | 0.810 | 0:03:44 | 0.931 | 0.973 |
| ALL | PLINKH | 0.817 | 0:02:32 | 0.935 | 0.973 |
| PLINKH | PLINKH | 0.810 | 0:02:28 | 0.932 | 0.973 |
Fig. 1Imputation accuracy per animal from 777 K to sequence using all animals in reference, or using only animals within PLINK haplotype cluster as reference
Difference between concordance using PLINK clustering for one or two steps of imputation (directly from low density or low to medium to high) compared to using all animals in the reference population per breed and proportion of animals with improved accuracy for different imputation reference populations
| Breed | PLINK (1 step) average change in accuracy after clustering | PLINK (1 step) Proportion of individuals in group with improved accuracy | PLINK (2 steps) average change in accuracy after clustering | PLINK (2 steps) Proportion of individuals in group with improved accuracy |
|---|---|---|---|---|
| Alberta Composite | 0.035 | 0.714 | 0.004 | 0.714 |
| Angus | 0.005 | 0.727 | 0.004 | 0.727 |
| Red Angus | 0.000 | 0.000 | 0.00 | 0.400 |
| Ayrshire | −0.001 | 0.800 | −0.002 | 1.000 |
| BeefBooster | 0.005 | 0.000 | 0.022 | 0.000 |
| Brown Swiss | 0.021 | 0.917 | 0.020 | 0.917 |
| Charolais | 0.060 | 0.500 | 0.062 | 0.000 |
| Gelbvieh | 0.001 | 1.000 | 0.044 | 1.000 |
| Guelph Composite | 0.000 | 0.200 | 0.003 | 0.200 |
| Hereford | 0.000 | 0.429 | 0.002 | 0.286 |
| Holstein | −0.007 | 0.500 | −0.005 | 0.362 |
| Red and White Holstein | 0.001 | 0.250 | 0.005 | 0.000 |
| Jersey | −0.014 | 0.833 | −0.016 | 1.000 |
| Limousin | 0.054 | 0.667 | 0.063 | 0.500 |
| Montbeliarde | −0.012 | 0.833 | −0.003 | 1.000 |
| Normande | −0.001 | 1.000 | −0.002 | 1.000 |
| Simmental | −0.028 | 0.500 | −0.026 | 0.568 |
Fig. 3Imputation accuracy measured by genotype concordance by position on chromosome 12 when imputation was carried out using all animals as reference, or by clustering based on PLINK haplotypes
Fig. 2Actual vs. predicted accuracy for all animals using (a) the full prediction model or (b) a simple reduced prediction model