| Literature DB >> 27663120 |
Ricardo V Ventura1,2, Stephen P Miller3,4, Ken G Dodds5, Benoit Auvray6, Michael Lee6, Matthew Bixley5, Shannon M Clarke5, John C McEwan5.
Abstract
BACKGROUND: Genotype imputation is a key element of the implementation of genomic selection within the New Zealand sheep industry, but many factors can influence imputation accuracy. Our objective was to provide practical directions on the implementation of imputation strategies in a multi-breed sheep population genotyped with three single nucleotide polymorphism (SNP) panels: 5K, 50K and HD (600K SNPs).Entities:
Year: 2016 PMID: 27663120 PMCID: PMC5035503 DOI: 10.1186/s12711-016-0244-7
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Fig. 1Distribution of animals genotyped with 50K and HD. According to a main breed composition and b birth year
Imputation scenarios with HD genotypes using different groups of purebred and crossbred animals
| Scenarioa | Number of reference animalsb | Number of imputed animals | Description of reference animals | Density of reference panelc | Imputed group breedd | Density of panel of imputed animals |
|---|---|---|---|---|---|---|
| 1_5K50K | 500 | 116 | Romney | 50K | Romney | 5K |
| 1B_5KHD_1STEP | 500 | 116 | Romney | HD | Romney | 5K |
| 1B_5KHD_2STEP | 17,000 + 500 | 116 | Romney | HD | Romney | 5K |
| 2_50KHD | 500 | 116 | Romney | HD | Romney | 50K |
| 3_5K50K | 469 | 116 | Romney-31 animals related with the imputed group | 50K | Romney | 5K |
| 3B_5KHD | 469 | 116 | Romney-31 animals related with the imputed group | HD | Romney | 5K |
| 4_50KHD | 469 | 116 | Romney-31 animals related with the imputed group | HD | Romney | 50K |
| 5_5K50K | 500 (R) + 100 (P) | 116 | Romney + Perendale | 50K | Romney | 5K |
| 5B_5KHD | 500 (R) + 100 (P) | 116 | Romney + Perendale | HD | Romney | 5K |
| 6_50KHD | 500 (R) + 100 (P) | 116 | Romney + Perendale | HD | Romney | 50K |
aImputation scenarios were from 5K to 50K (50K was a subset of the HD panel), 5K to HD and 50K to HD
b2-Step imputation: from 5K to 50K using all genotyped animals as reference population (N = 17,000) and from 50K imputed to HD using 500 animals as the reference population
cThe oldest animals in each scenario were used as reference population
dThe youngest animals in each scenario were imputed
Imputation scenarios with 50K genotypes using different groups of purebred and crossbred animals
| Scenarioa | Number of reference animalsb | Number of imputed animals | Description of reference animals | Imputed group breedc |
|---|---|---|---|---|
| 7_5K50K | 466 | 500 | Romney | Romney |
| 8_5K50K | 933 | 500 | Romney | Romney |
| 9_5K50K | 1860 | 500 | Romney | Romney |
| 10_5K50K | 2860 | 500 | Romney | Romney |
| 11_5K50K | 4862 | 500 | Romney | Romney |
| 12_5K50K | 933 | 200 | Romney | Composite |
| 13_5K50K | 1000 (R) + 893 (C) | 200 | Romney + Coopworth | Composite |
| 14_5K50K | 1000 (R) + 893 (C) + 500 (P) + 500 (T) | 200 | Romney + Coopworth + Perendale + Texel | Composite |
| 15_5K50K | 710 | 500 | Primera | Romney |
| 16_5K50K | 710 (P) + 933 (R) | 500 | Primera + Romney Scenario 8 | Romney |
| 17_5K50K | 710 (P) + 1860 (R) | 500 | Primera + Romney Scenario 9 | Romney |
| 18_5K50K | 350 | 200 | Primera | Primera |
| 19_5K50K | 506 | 200 | Primera | Primera |
| 20_5K50K | 350 (P) + 77 (S,PD) | 200 | Primera + Suffolk + Poll Dorset | Primera |
| 21_5K50K | 506 (P) + 77 (S,PD) | 200 | Primera + Suffolk + Poll Dorset | Primera |
| 22_5K50K | 470 | 300 | Coopworth | Coopworth |
| 23_5K50K | 951 | 300 | Coopworth | Coopworth |
| 24_5K50K | 951 (C) + 933 (R) | 300 | Coopworth + Romney | Coopworth |
aImputation scenarios were from 5K to 50K (original 50K panel)
bThe oldest animals in each scenario were used as the reference population
cThe youngest animals in each scenario were imputed
Imputation scenarios from 5K to 50K (50K original) using two types of reference population
| Scenario | Number of reference animals | Number of imputed animals | Description of reference animalsd | Imputed group breedb |
|---|---|---|---|---|
| 25_5K50K | 15,443a and 4564b | 218 | All breeds/Romney | Romney 100 % |
| 26_5K50K | 15,443a and 4326b | 142 | All breeds/Romney | Romney < 65 % |
| 27_5K50K | 4256 | 1000c | Romney | Romney |
| 28_5K50K | 15,443a and 2324b | 250 | All breeds/Coopworth | Coopworth 100 % |
| 29_5K50K | 15,443a and 2279b | 250 | All breeds/Coopworth | Coopworth < 70 % |
| 30_5K50K | 15,443a and 640b | 250 | All breeds/Perendale | Perendale > 95 % |
| 31_5K50K | 15,443a and 138b | 172 | All breeds/Composites | Composites > 50 % < 95 % |
aFixed reference population that included 15,443 animals from all breeds with genotyped animals
bWithin-breed/group reference population: some groups contained a small number of genotyped animals
c1000 animals defined as the imputed set to optimize the calculation of the r2 imputation accuracy per SNP
dTwo types of reference population were used: (1) a fixed reference population that included a large number of animals from all breeds and (2) a within-group reference population
Accuracy of genotype imputation and computing time for BEAGLE and FIMPUTE algorithms
| Scenario | CR_Fa | r2_Fb | Run Time_F m:s | CR_Bc | r2_Bd | Run Time_ B h:m:s | Mean Top10e | Min Top10e | Max Top10e |
|---|---|---|---|---|---|---|---|---|---|
| 1_5K50K | 86.98 | 78.75 | 00:57 | 83.80 | 73.80 | 02:16:25 | 0.115 | 0.034 | 0.234 |
| 1B_5KHD_1STEP | 87.61 | 80.73 | 06:51 | 84.10 | 74.00 | 23:12:35 | 0.115 | 0.034 | 0.234 |
| 1B_5KHD_2STEP | 93.28 | 89.6 | – | NA | NA | NA | 0.115 | 0.034 | 0.234 |
| 2_50KHD | 97.56 | 96.2 | 07:42 | 96.98 | 95.42 | 21:55:35 | 0.115 | 0.034 | 0.234 |
| 3_5K50K | 84.35 | 74.15 | 00:53 | 82.12 | 70.94 | 03:15:10 | 0.090 | 0.033 | 0.179 |
| 3B_5KHD | 85.3 | 76.85 | 06:43 | 82.23 | 71.12 | 27:17:35 | 0.090 | 0.033 | 0.179 |
| 4_50KHD | 97.25 | 95.71 | 07:11 | 96.63 | 94.91 | 12:33:02 | 0.090 | 0.033 | 0.179 |
| 5_5K50K | 87.19 | 78.98 | 01:08 | 83.58 | 73.37 | 03:18:52 | 0.097 | 0.037 | 0.252 |
| 5B_5KHD | 87.68 | 80.81 | 08:45 | 83.99 | 76.00 | 25:16:22 | 0.097 | 0.037 | 0.252 |
| 6_50KHD | 98.06 | 97.01 | 09:14 | Failed | Failed | Failed | 0.097 | 0.037 | 0.252 |
aCR_F = concordance rate using the FIMPUTE software
br2_F = Squared Pearson correlation using the FIMPUTE software
cCR_B = concordance rate using the BEAGLE software
dr2_B = Squared Pearson correlation using the BEAGLE software
eMean Top10, Min Top10 and Max Top10 = mean, min and max relationship among the 10 most related animals between the reference and imputed sets
Fig. 5Imputation accuracy using large fixed or within-group reference populations. Imputation from 5K to 50K (Scenarios 25–31) using the FIMPUTE software under different scenarios and two types of reference population: (i) fixed reference population containing a large number of animals from all breeds and (ii) within-group reference population. The x-axis represents the number of imputed individuals sorted from the highest to the lowest accuracy value. a Scenario 25, b Scenario 26, c Scenario 28, d Scenario 29, e Scenario 30, f Scenario 31
Fig. 6Imputation accuracy and its relation with the connectivity between each imputed animal and the reference set. Average numbers of Mendelian inconsistencies (AVTOP10_5K and AVTOP10_50K) between each animal in the imputed set and all animals from the reference set were calculated and are presented for each imputed animal as the average of 10 pairs of animals (one from the reference set and one from the imputed set) with the lowest Mendelian inconsistency. Imputation from 5K to 50K (Scenario 1_5K50K) and 5K to HD (Scenario 1B_5KHD_1STEP) using the FIMPUTE software for purebred Romney animals is also compared with the value defined above. a AVTOP10_5K calculated using the 5K panel before imputation. b AVTOP10_50K calculated using the 50K panel. The x-axis represents the average number of Mendelian inconsistencies and the y-axis the imputation accuracy per animal measured by concordance rate (CR)
Rare allele imputation accuracy (r2) for different ranges of MAF
| MAF | Number of SNPs | r2a |
|---|---|---|
| 0 < MAF = 0.0001 | 35 | 0 |
| 0.0001 < MAF = 0.001 | 96 | 6.6 |
| 0.001 < MAF = 0.01 | 625 | 38.9 |
| 0.01 < MAF = 0.05 | 2360 | 57.8 |
aAllelic imputation accuracy (r2) for Scenario 27_5K50K where 1000 Romney animals were imputed using a within-breed reference set that included 4256 animals
Fig. 2Imputation accuracy assessed by alternative approaches. a Imputation from 5K to 50K (Scenarios 1_5K50K and 3_5K50K) and from 50K to HD (Scenarios 2_50KHD and 4_50KHD) using the FIMPUTE software for purebred Romney animals (The suffix “KEY” refers to the 31 animals that are highly related with the group of imputed animals). b Imputation from 5K to 50K (Scenarios 1_5K50K and 5_5K50K) and from 50K to HD (Scenarios 2_50KHD and 6_50KHD) using the FIMPUTE software for purebred Romney animals after including Perendale animals in the reference set. c Imputation from 5K to 50K using the FIMPUTE software for purebred Romney animals after using the Primera group as reference set (Scenario 15_5K50K) and inclusion of Romney animals in the reference set (Scenarios 16_5K50K and 17_5K50K). Scenario 9 was included in this plot for comparison with within-breed imputation. d Imputation from 5 K to HD using the FIMPUTE software for purebred Romney animals by one- or two-step imputation (Scenarios 1B_5KHD_1STEP and 1B_5KHD_2STEP). The x-axis represents the number of imputed individuals sorted from the highest to the lowest accuracy value
Fig. 3Imputation from 5K to 50K using the FIMPUTE software for purebred Romney animals. Scenarios 7_5K50K to 11_5K50K. The x-axis represents the number of imputed individuals sorted from the highest to the lowest accuracy value
Fig. 4Imputation accuracy combining different breeds. a Imputation from 5K to 50K (Scenarios 18_5K50K to 21_5K50K) using the FIMPUTE software for purebred Primera animals after enlarging the reference set within group or adding animals from other breeds. b Imputation from 5K to 50K (Scenarios 22_5K50K to 24_5K50K) using the FIMPUTE software for Coopworth animals after enlarging the reference set within group or adding Romney animals. c Imputation of composite animals using alternate sets of reference population from 5K to 50K using the FIMPUTE software (Scenarios 12_5K50K to 14_5K50K). The x-axis represents the number of imputed individuals sorted from the highest to the lowest accuracy value
Fig. 7Imputation accuracy per chromosome and at both chromosome ends. a Squared Pearson correlation measure of imputation accuracy (r2) across different chromosomes after imputation from 5K to 50K for Romney sheep using the FIMPUTE software (Scenario 27_5K50K). b Squared Pearson correlation measure of imputation accuracy (r2) for both ends of each chromosome (each chromosome end is covered by 100 markers); imputation accuracy defined as the average r2 value for the 100 markers
Accuracy of genotype imputation from 5K to 50K and computing time when using the FIMPUTE software
| Scenario | CR_Fa | r2_Fb | Run time_F m:s | Mean Top10c | Min Top10c | Max Top10c |
|---|---|---|---|---|---|---|
| 7_5K50K | 74.82 | 57.79 | 01:15 | 0.058 | 0.011 | 0.178 |
| 8_5K50K | 77.10 | 61.64 | 02:14 | 0.076 | 0.036 | 0.210 |
| 9_5K50K | 84.42 | 74.05 | 03:33 | 0.135 | 0.054 | 0.310 |
| 10_5K50K | 87.55 | 79.29 | 05:47 | 0.152 | 0.052 | 0.394 |
| 11_5K50K | 91.06 | 85.38 | 09:08 | 0.177 | 0.054 | 0.398 |
| 12_5K50K | 60.93 | 35.25 | 02:04 | 0.085 | 0.055 | 0.168 |
| 13_5K50K | 66.69 | 44.25 | 03:32 | 0.095 | 0.056 | 0.338 |
| 14_5K50K | 72.12 | 52.44 | 05:47 | 0.123 | 0.056 | 0.349 |
| 15_5K50K | 51.82 | 17.89 | 01:28 | 0.004 | 0.003 | 0.006 |
| 16_5K50K | 75.18 | 58.25 | 03:07 | 0.117 | 0.052 | 0.259 |
| 17_5K50K | 84.07 | 73.41 | 05:04 | 0.153 | 0.058 | 0.335 |
| 18_5K50K | 92.21 | 86.78 | 00:44 | 0.140 | 0.091 | 0.183 |
| 19_5K50K | 95.10 | 91.90 | 01:01 | 0.042 | 0.001 | 0.270 |
| 20_5K50K | 92.8 | 87.77 | 00:55 | 0.045 | 0.001 | 0.187 |
| 21_5K50K | 95.32 | 92.26 | 01:11 | 0.066 | 0.002 | 0.270 |
| 22_5K50K | 77.53 | 62.36 | 01:07 | 0.070 | 0.022 | 0.211 |
| 23_5K50K | 88.46 | 80.92 | 02:05 | 0.167 | 0.023 | 0.370 |
| 24_5K50K | 87.99 | 80.14 | 03:34 | 0.204 | 0.055 | 0.417 |
aCR_F = concordance rate when using the FIMPUTE software
br2_F = Squared Pearson correlation when using the FIMPUTE software
cMean Top10, Min Top10 and Max Top10 = mean, min and max relationship among the 10 most related animals between the reference and imputed sets
Accuracy of genotype imputation with the FIMPUTE software using two types of reference population
| Scenarioa | CRAllc | r2Allc | CRWd | r2Wd | MeanAe | MinAe | MaxAe | MeanWe | MinWe | MaxWe |
|---|---|---|---|---|---|---|---|---|---|---|
| 25_5K50K | 93.39 | 89.38 | 89.16 | 82.17 | 0.023 | 0.079 | 0.467 | 0.145 | 0.049 | 0.376 |
| 26_5K50K | 95.45 | 92.10 | 82.05 | 70.47 | 0.267 | 0.096 | 0.432 | 0.185 | 0.077 | 0.355 |
| 27_5K50Kb | 89.07 | 82.06 | – | – | 0.180 | 0.055 | 0.401 | – | – | – |
| 28_5K50K | 89.94 | 84.01 | 89.80 | 83.27 | 0.250 | 0.100 | 0.427 | 0.200 | 0.050 | 0.384 |
| 29_5K50K | 96.24 | 93.12 | 87.55 | 79.76 | 0.283 | 0.085 | 0.426 | 0.201 | 0.075 | 0.387 |
| 30_5K50K | 87.89 | 81.23 | 88.32 | 80.55 | 0.215 | 0.100 | 0.535 | 0.162 | 0.061 | 0.310 |
| 31_5K50K | 90.16 | 82.17 | 65.05 | 41.57 | 0.243 | 0.109 | 0.413 | 0.03 | 0.001 | 0.260 |
aGenotype imputation was from 5K to 50K using two types of reference population: (i) fixed reference population containing a large number of animals from all breeds and (ii) within-group reference population
bScenario defined for the calculation of SNP r2 using 1000 animals as imputed
cCRAll and r2All = concordance rate and squared Pearson correlation, respectively, using the FIMPUTE software when a large set of animals from all breeds was defined as the reference population
dCRW and r2W = concordance rate and squared Pearson correlation, respectively, using the FIMPUTE software when the within-group population was defined as the reference population
eMeanA, MinA, MaxA, MeanW, MinW and MaxW = mean, min and max relationship among the 10 most related animals between the reference and imputed sets (all animals (A) or within-group (W))
Fig. 8MDS Cluster plot illustrating the genetic relationship (based on the genomic distances obtained by SNPs) between animals of each group or breed used to describe the genetic structure of different groups/breeds and to better define the imputation scenarios