| Literature DB >> 25927950 |
Roberto Carvalheiro1, Solomon A Boison2, Haroldo H R Neves3,4, Mehdi Sargolzaei5,6, Flavio S Schenkel7, Yuri T Utsunomiya8, Ana Maria Pérez O'Brien9, Johann Sölkner10, John C McEwan11, Curtis P Van Tassell12, Tad S Sonstegard13, José Fernando Garcia14,15.
Abstract
BACKGROUND: Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds.Entities:
Mesh:
Year: 2014 PMID: 25927950 PMCID: PMC4192291 DOI: 10.1186/s12711-014-0069-1
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Pedigree information of genotyped animals
|
|
|
|---|---|
| Individuals in pedigree | 9631 |
| Sires | 1536 |
| Dams | 6125 |
| Individuals with progeny | 7661 |
| Individuals with no progeny | 1970 |
| Individuals with only known sire | 17 |
| Individuals with only known dam | 1464 |
| Individuals with known sire and dam | 5067 |
| Founders | 3083 |
| Founders with no progeny | 350 |
Genomic relationship statistics between reference and validation sets
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Sire/sire (793; 202) | Maxr | 0.0661 | 0.6241 | 0.4353 | 0.4677 |
| Mean10 | 0.0513 | 0.3795 | 0.2017 | 0.1970 | |
| Sire/dam (793; 1247) | Maxr | 0.0392 | 0.6316 | 0.2813 | 0.2744 |
| Mean10 | 0.0333 | 0.3877 | 0.1351 | 0.1271 |
1Sire/sire: validation set composed of the 202 younger sires; sire/dam: validation set composed of 1247 dams; the same reference set of 793 sires was used in both cases; 2Maxr: maximum genomic relationship between each animal in the validation set and all the animals in the reference set; Mean10: average of the top 10 genomic relationships between each animal in the validation set and all the animals in the reference set.
Number (Nb) of SNPs shared with the HD chip, for different SNP chips
|
|
|
|
|
|---|---|---|---|
| Illumina® BovineHD | HD | 777962 | 439595 |
| Illumina® BovineLD | 7 K | 6637 | 4086 |
| Illumina® Bovine SNP50 v2 | 50 K | 49345 | 21014 |
| GeneSeek® Genomic Profiler 20 K - Indicine | GGP20Ki | 19493 | 13450 |
| GeneSeek® Genomic Profiler 75 K - Indicine | GGP75Ki | 73941 | 56169 |
| Customized 15K_e | 15K_e | 15144 | 15144 |
| Customized 15K_em | 15K_em | 15173 | 15173 |
| Customized 15K_el | 15K_el | 15173 | 15173 |
| Customized 15K_eml | 15K_eml | 15173 | 15173 |
| Customized 11K_eml add-on 7 K | 11a7 K | 17841 | 15290 |
| Customized 17 K_eml add-on 7 K | 17a7 K | 24121 | 21570 |
| Customized 27 K_eml add-on 7 K | 27a7 K | 33942 | 31391 |
| Customized 48K_eml add-on 7 K | 48a7 K | 55141 | 52590 |
1As described in the section “SNP chips” of “Methods”; 2QC: quality control of the genotypes.
Figure 1Study design of the imputation analyses using FImpute and BEAGLE. A reference set of 793 sires and a validation set with 202 young sires (sire:sire) or 1247 dams (sire:dam), with (Ped) or without (Ped0) pedigree information and different lower-density chips; numbers in brackets correspond to the number given to the analysis.
Average (standard deviation) imputation accuracy, for different imputation analyses using FImpute
|
|
|
|
|
|
|---|---|---|---|---|
| 1 | 7 K | 435509 (99.1) | 0.9257 (0.0346) | 90.56 (4.09) |
| 2 | 50 K | 418581 (95.2) | 0.9783 (0.0136) | 97.14 (1.76) |
| 3 | GGP20Ki | 426145 (96.9) | 0.9771 (0.0143) | 96.96 (1.87) |
| 4 | GGP75Ki | 383426 (87.2) | 0.9922 (0.0056) | 98.93 (0.76) |
| 5 | 15K_e | 424451 (96.6) | 0.9784 (0.0135) | 97.15 (1.75) |
| 6 | 15K_em | 424422 (96.5) | 0.9820 (0.0120) | 97.58 (1.61) |
| 7 | 15K_el | 424422 (96.5) | 0.9763 (0.0138) | 96.87 (1.77) |
| 8 | 15K_eml | 424422 (96.5) | 0.9840 (0.0107) | 97.85 (1.43) |
| 9 | 11a7 K | 424305 (96.5) | 0.9823 (0.0117) | 97.63 (1.54) |
| 10 | 17a7 K | 418025 (95.1) | 0.9864 (0.0093) | 98.17 (1.24) |
| 11 | 27a7 K | 408204 (92.9) | 0.9897 (0.0072) | 98.60 (0.97) |
| 12 | 48a7 K | 387005 (88.0) | 0.9931 (0.0049) | 99.05 (0.67) |
1Imputation analyses using FImpute (considering family information) and 202 young sires as the validation set; the numbers of each analysis refer to those in brackets from Figure 1; 2as described in the section “SNP chips” of “Methods”; 3CORR: Pearson’s correlation between imputed and observed genotypes; 4PERC: percentage of correctly imputed genotypes.
Average (standard deviation) imputation accuracy, using FImpute with or without pedigree (Ped) information
|
|
|
|
|
|---|---|---|---|
| 1 and 13 | 7 K | 0.9257 (0.0346) | 0.9164 (0.0351) |
| 2 and 14 | 50 K | 0.9783 (0.0136) | 0.9781 (0.0132) |
| 8 and 15 | 15K_eml | 0.9840 (0.0107) | 0.9832 (0.0113) |
| 9 and 16 | 11a7 K | 0.9823 (0.0117) | 0.9819 (0.0120) |
1Imputation analyses using FImpute software and 202 younger sires as the validation set; the numbers of each analysis refer to those in brackets from Figure 1; the first and the second numbers refer to analyses with and without pedigree information, respectively; 2as described in the section “SNP chips” of “Methods”.
Average (standard deviation) imputation accuracy, using dams or young sires as validation set
|
|
|
|
|
|---|---|---|---|
| 17 and 1 | 7 K | 0.8791 (0.0474) | 0.9257 (0.0346) |
| 18 and 2 | 50 K | 0.9603 (0.0190) | 0.9783 (0.0136) |
| 19 and 3 | GGP20Ki | 0.9566 (0.0211) | 0.9771 (0.0143) |
| 20 and 4 | GGP75Ki | 0.9846 (0.0082) | 0.9922 (0.0056) |
| 21 and 8 | 15K_eml | 0.9680 (0.0164) | 0.9840 (0.0107) |
| 22 and 9 | 11a7 K | 0.9658 (0.0173) | 0.9823 (0.0117) |
| 23 and 12 | 48a7 K | 0.9864 (0.0070) | 0.9931 (0.0049) |
1Imputation analyses using FImpute (considering family information) and different validation sets; the numbers of each analysis refer to those in brackets from Figure 1; the first and the second numbers refer to analyses using dams or young sires as validation set, respectively; 2as described in the section “SNP chips” of “Methods”.
Summary statistics of imputation accuracy, using BEAGLE and FImpute
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| 24 (1) | Young sire | 7 K | 0.7525 (0.8003) | 0.9717 (0.9845) | 0.8982 (0.9257) | 0.0392 (0.0346) |
| 25 (3) | Young sire | GGP20Ki | 0.8603 (0.8988) | 0.9951 (0.9963) | 0.9614 (0.9771) | 0.0225 (0.0143) |
| 26 (4) | Young sire | GGP75Ki | 0.9142 (0.9568) | 0.9986 (0.9990) | 0.9842 (0.9922) | 0.0120 (0.0056) |
| 27 (8) | Young sire | 15K_eml | 0.8788 (0.9211) | 0.9976 (0.9981) | 0.9714 (0.9840) | 0.0183 (0.0107) |
| 28 (9) | Young sire | 11a7 K | 0.8773 (0.9163) | 0.9979 (0.9975) | 0.9697 (0.9823) | 0.0190 (0.0117) |
| 29 (12) | Young sire | 48a7 K | 0.9214 (0.9628) | 0.9989 (0.9992) | 0.9860 (0.9931) | 0.0111 (0.0049) |
| 30 (17) | Dam | 7 K | 0.6969 (0.7096) | 0.9576 (0.9656) | 0.8501 (0.8791) | 0.0441 (0.0474) |
| 31 (19) | Dam | GGP20Ki | 0.8124 (0.8357) | 0.9874 (0.9923) | 0.9321 (0.9566) | 0.0288 (0.0211) |
| 32 (20) | Dam | GGP75Ki | 0.8645 (0.9291) | 0.9946 (0.9976) | 0.9692 (0.9846) | 0.0198 (0.0082) |
| 33 (21) | Dam | 15K_eml | 0.8296 (0.8711) | 0.9904 (0.9954) | 0.9456 (0.9680) | 0.0254 (0.0164) |
| 34 (22) | Dam | 11a7K | 0.8249 (0.8640) | 0.9893 (0.9951) | 0.9430 (0.9658) | 0.0260 (0.0173) |
| 35 (23) | Dam | 48a7K | 0.8677 (0.9363) | 0.9954 (0.9980) | 0.9715 (0.9864) | 0.0193 (0.0073) |
1Results of imputation analyses using BEAGLE or FImpute (between brackets) and different validation sets (young sires and dams); the numbers of each analysis refer to those from Figure 1; 2as described in the section “SNP chips” of “Methods”; SD = standard deviation.
Figure 2Accuracy of imputation (CORR) as a function of genomic relatedness (Mean10), using BEAGLE and FImpute. Figure 2 shows the results from the imputation analyses using dams as the validation set and the 7 K (top) or 48a7 K (bottom) chip. Solid lines refer to second order polynomial (top) and linear (bottom) regressions.
Figure 3Variation of SNP-wise imputation accuracy* and linkage disequilibrium along bovine chromosome 1. Top: SNP-wise correlation between imputed and observed genotypes (CORR) is plotted against the genomic coordinates (in Mb) for SNPs located on chromosome 1, which was divided in windows of about 50 subsequent markers; windows with the lowest (a) and highest (b) average imputation accuracies are highlighted. Middle: Heatmap representing the extent of linkage disequilibrium (r2) in window A (51 markers located between 44.71 and 44.91 Mb; averages for accuracy, MAF and r2 were 0.390, 0.195 and 0.103, respectively). Bottom: Heatmap representing the extent of r2 in window B (48 markers located between 69.40 and 69.49 Mb; averages for accuracy, MAF and r2 were 1.000, 0.270 and 0.321, respectively). *In order to exemplify the amount of variation verified for SNP-wise imputation accuracy on a single chromosome, the results obtained from Analysis 9 (Figure 1) are presented (i.e. using the 11a7k chip and FImpute considering pedigree information to impute genotypes of young sires).