| Literature DB >> 25543517 |
Mario L Piccoli1,2,3, José Braccini4,5, Fernando F Cardoso6,7, Medhi Sargolzaei8,9, Steven G Larmer10, Flávio S Schenkel11.
Abstract
BACKGROUND: Strategies for imputing genotypes from the Illumina-Bovine3K, Illumina-BovineLD (6K), BeefLD-GGP (8K), a non-commercial-15K and IndicusLD-GGP (20K) to either Illumina-BovineSNP50 (50K) or to Illumina-BovineHD (777K) SNP panel, as well as for imputing from 50K, GGP-IndicusHD (90iK) and GGP-BeefHD (90tK) to 777K were investigated. Imputation of low density (<50K) genotypes to 777K was carried out in either one or two steps. Imputation of ungenotyped parents (n = 37 sires) with four or more offspring to the 50K panel was also assessed. There were 2,946 Braford, 664 Hereford and 88 Nellore animals, from which 71, 59 and 88 were genotyped with the 777K panel, while all others had 50K genotypes. The reference population was comprised of 2,735 animals and 175 bulls for 50K and 777K, respectively. The low density panels were simulated by masking genotypes in the 50K or 777K panel for animals born in 2011. Analyses were performed using both Beagle and FImpute software. Genotype imputation accuracy was measured by concordance rate and allelic R(2) between true and imputed genotypes.Entities:
Mesh:
Year: 2014 PMID: 25543517 PMCID: PMC4300607 DOI: 10.1186/s12863-014-0157-9
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary statistics of genotyped animals and pedigree structure of the 50K and the 777K SNP panels
|
|
|
|
|
|---|---|---|---|
|
| |||
| Total of genotyped animals | 2,946 | 664 | 88 |
| Sires | 39 | 29 | 6 |
| Dams | 76 | 21 | 0 |
| Offspring | 2,831 | 614 | 82 |
| Offspring with sire and/or dam genotyped (%) | 22.81 | 32.68 | 12.50 |
| Average number of offspring per sire | 15.28 ± 17.38 | 6.76 ± 6.46 | 1.83 ± 0.90 |
| Smallest and largest number of offspring per sire | 1-76 | 1-26 | 1-3 |
| Average number of offspring per dam | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 |
| Offspring with sire and/or dam unknown (%) | 69.86 | 48.04 | 18.18 |
|
| |||
| Total of genotyped animals | 71 | 59 | 88 |
| Sires | 8 | 3 | 5 |
| Dams | 0 | 0 | 0 |
| Offspring | 63 | 56 | 83 |
| Offspring with sire and/or dam genotyped (%) | 25.35 | 8.47 | 10.23 |
| Average number of offspring per sire | 2.25 ± 1.09 | 1.67 ± 0.94 | 1.80 ± 0.98 |
| Smallest and largest number of offspring per sire | 1-4 | 1-3 | 1-3 |
| Average number of offspring per dam | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| Offspring with sire and/or dam unknown (%) | 53.52 | 38.98 | 18.18 |
Number of SNPs on each simulated panel before and after quality control for imputation to 50K or 777K SNP panels
|
|
|
|
|
|
|---|---|---|---|---|
| Illumina Bovine3K | 3K | 2,900 | 2,321 | 2,359 |
| Illumina BovineLD | 6K | 6,909 | 6,205 | 6,216 |
| Beef LD GeneSeek Genomic Profiler | 8K | 8,762 | 7,033 | 7,478 |
| 15K panel2 | 15K | 14,195 | 12,304 | 12,345 |
| Indicus LD GeneSeek Genomic Profiler | 20K | 19,721 | 7,320 | 16,047 |
| Illumina BovineSNP50 | 50K | 54,609 | 43,247 | 43,247 |
| GeneSeek Genomic Profiler Indicus HD | 90iK | 74,085 | - | 55,819 |
| GeneSeek Genomic Profiler Beef HD | 90tK | 76,992 | - | 61,445 |
| Illumina BovineHD | 777K | 787,799 | - | 587,620 |
1The SNP quality control included GenCall score (> = 0.15), Call Rate (> = 0.90), Hardy-Weinberg Equilibrium (P > =10−6), removal of non-autosomal chromosomes and SNPs not in common with reference panel;
2Non commercial panel. The 15K panel was created based on the Beef LD GeneSeek Genomic Profiler (8K) panel by expanding it with SNPs selected based on minor allele frequency greater than 0.23, linkage disequilibrium less than 0.088 and preferably located evenly spaced between two SNPs in the 8K SNP panel.
Imputation scenarios used in the study
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
|
| ||||
| 3K, 6K, 8K,15K, 20K | 50K | FImpute | Yes | Yes | One-step |
| No | |||||
| No | Yes | ||||
| No | |||||
| Beagle | No | Yes | |||
| No | |||||
| 3K, 6K, 8K,15K, 20K | 777K | FImpute | Yes | Yes | One-step |
| No | |||||
| Two-step | |||||
| Beagle | No | ||||
| 50K, 90iK, 90tK | 777K | FImpute | Yes | Yes | One-step |
| No | |||||
| Beagle | No | ||||
Overall computing run time in minutes for the different imputation scenarios
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| 3K | 2 | 6 | 41 | 39 | 2280 | 2131 |
| 6K | 3 | 7 | 46 | 45 | 828 | 772 |
| 8K | 3 | 7 | 45 | 45 | 808 | 656 |
| 15K | 3 | 9 | 48 | 48 | 328 | 317 |
| 20K | 3 | 7 | 37 | 42 | 708 | 622 |
|
| ||||||
| 3K | 16 (17,24) | - | 4 (5,8) | - | 64 (224,41) | - |
| 6K | 17 (23,24) | - | 4 (19,21) | - | 49 (238,33) | - |
| 8K | 17 (23,24) | - | 3 (20,23) | - | 45 (177,34) | - |
| 15K | 15 (24,23) | - | 8 (20,23) | - | 40 (127,42) | - |
| 20K | 17 (23,23) | - | 9 (20,23) | - | 44 (161,42) | - |
| 50K | 3 | - | 11 | - | 29 | - |
| 90iK | 17 | - | 11 | - | 25 | - |
| 90tK | 17 | - | 10 | - | 33 | - |
1Run time based on 10 parallel jobs with computer with 4*6-core processors (Intel Xeon X5690 @ 3.47GHz) and 128 Gigabytes of memory in OS x86-64 GNU/Linux;
2Scenarios for imputation. (NE-P) - using Nellore genotypes in the reference population and considering pedigree information; (NNE-P) - not using Nellore genotypes in the reference population and considering pedigree information; (NE-NP) - using Nellore genotypes in the reference population and not using pedigree information; (NNE-NP) - not using Nellore genotypes in the reference population and not using pedigree information;
32,735 or 2,647 (not using Nellore genotypes) animals in the reference population and 963 animals in the imputation population;
4Values outside the brackets refer to the one-step imputation. The reference and imputation population were formed by 175 and 43 animals, respectively;
5Values inside the brackets refer to the two-step imputation. The reference population were formed by 3,567 in the imputation from low density panel to the 50K SNP panel and 175 animals in the imputation from the 50K SNP panel to the 777K SNP panel. The imputation population was formed by 43 animals.
Mean and standard deviation (SD) of concordance rate and allelic R calculated for different algorithms, panel densities and scenarios for both imputation to 50K and 777K SNP panels
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| |||||
|
| |||||
| Beagle | 10 | 0.927 | 0.042 | 0.890 | 0.067 |
| Fimpute | 20 | 0.943 | 0.038 | 0.912 | 0.061 |
|
| |||||
| 3K | 6 | 0.864 | 0.011 | 0.787 | 0.016 |
| 6K | 6 | 0.946 | 0.008 | 0.919 | 0.011 |
| 8K | 6 | 0.952 | 0.008 | 0.927 | 0.011 |
| 15K | 6 | 0.973 | 0.006 | 0.962 | 0.008 |
| 20K | 6 | 0.953 | 0.008 | 0.929 | 0.011 |
|
| |||||
| NE-P | 5 | 0.943 | 0.041 | 0.913 | 0.065 |
| NE-NP | 10 | 0.935 | 0.041 | 0.901 | 0.066 |
| NNE-P | 5 | 0.943 | 0.042 | 0.912 | 0.067 |
| NNE-NP | 10 | 0.935 | 0.042 | 0.901 | 0.066 |
|
| |||||
|
| |||||
| Beagle | 8 | 0.895 | 0.040 | 0.826 | 0.066 |
| Fimpute | 16 | 0.921 | 0.035 | 0.866 | 0.059 |
|
| |||||
| 3K1 | 3 | 0.838 | 0.017 | 0.728 | 0.025 |
| 6K1 | 3 | 0.898 | 0.016 | 0.829 | 0.025 |
| 8K1 | 3 | 0.902 | 0.017 | 0.836 | 0.026 |
| 15K1 | 3 | 0.918 | 0.017 | 0.863 | 0.027 |
| 20K1 | 3 | 0.903 | 0.017 | 0.837 | 0.026 |
| 50K | 3 | 0.930 | 0.016 | 0.882 | 0.025 |
| 90iK | 3 | 0.952 | 0.010 | 0.919 | 0.016 |
| 90tK | 3 | 0.955 | 0.009 | 0.925 | 0.014 |
|
| |||||
| NE-P | 8 | 0.9199 | 0.037 | 0.865 | 0.062 |
| NE-NP | 16 | 0.9082 | 0.039 | 0.846 | 0.065 |
|
| |||||
| One-step | 15 | 0.8064 | 0.884 | 0.674 | 0.147 |
| Two-step | 15 | 0.8920 | 0.032 | 0.819 | 0.053 |
1Means and standard deviation for the two-step analysis.
Analysis of variance performed on the average concordance rate and allelic R of the animals in the imputation population from each scenario for imputation from low density panels to the 50K SNP panel
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||||
| FImpute | 1.340 | a | FImpute | 1.283 | a |
| Beagle | 1.306 | b | Beagle | 1.244 | b |
|
|
| ||||
| 15K | 1.402 | a | 15K | 1.368 | a |
| 20K | 1.347 | b | 20K | 1.295 | b |
| 8K | 1.345 | c | 8K | 1.292 | c |
| 6K | 1.332 | d | 6K | 1.276 | d |
| 3K | 1.189 | e | 3K | 1.085 | e |
|
|
| ||||
| NE-P | 1.323 | a | NE-P | 1.264 | a |
| NNE-P | 1.323 | a | NE-NP | 1.263 | a |
| NE-NP | 1.323 | a | NNE-P | 1.264 | a |
| NNE-NP | 1.322 | a | NNE-NP | 1.262 | a |
|
|
| ||||
| FImpute - 15K | 1.420 | a | FImpute -15K | 1.388 | a |
| Beagle - 15K | 1.384 | b | Beagle - 15K | 1.347 | b |
| FImpute - 20K | 1.365 | c | FImpute - 20K | 1.316 | c |
| FImpute - 8K | 1.362 | d | FImpute - 8K | 1.312 | d |
| FImpute - 6K | 1.349 | e | FImpute - 6K | 1.295 | e |
| Beagle - 20K | 1.330 | f | Beagle - 20K | 1.275 | f |
| Beagle - 8K | 1.328 | f | Beagle - 8K | 1.272 | f |
| Beagle - 6K | 1.316 | g | Beagle - 6K | 1.257 | g |
| FImpute - 3K | 1.204 | h | FImpute - 3K | 1.104 | h |
| Beagle - 3K | 1.174 | i | Beagle - 3K | 1.067 | i |
1Concordance rate and allelic R2 were arcsine square root transformed for the analyses;
2Interactions between Algorithm*Scenario and Panel*Scenario were not statistically significant (P?>?0.05);
3Different letters within a group means that there is a statistical difference between two means (P?0.05);
4Algorithm used was either FImpute v.2.2 [11] or Beagle v.3.3 [8];
53K, 6K, 8K, 15K and 20K are low-density panels;
6Scenarios for imputation to the 50K SNP panel. (NE-P) - using Nellore genotypes in the reference population and considering pedigree information; (NNE-P) - not using Nellore genotypes in the reference population and considering pedigree information; (NE-NP) - using Nellore genotypes in the reference population and not using pedigree information; (NNE-NP) - not using Nellore genotypes in the reference population and not using pedigree information.
Figure 1Concordance rate of imputation to the 50K panel in different concordance rate bins. Average over scenarios of imputation from alternative low density panels (3K, 6K, 8K, 15K and 20K) to the 50K SNP panel. a) using FImpute; b) using Beagle.
Figure 2Concordance rate of imputation to the 50K panel for all BTAs and scenarios. a) using FImpute; b) using Beagle.
Figure 3Concordance rate of imputation by MAF classes. Average over scenarios of imputation from alternative low density panels (3K, 6K, 8K, 15K and 20K) to the 50K SNP panel. Within a group of colums, two different letters means a statistical difference (P < 0.05).
Analysis of variance performed on the average concordance rate and allelic R of the animals in the imputation population from each scenario for imputation from low density panels to the 777K SNP panel
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||||
| FImpute | 1.291 | a | FImpute | 1.203 | a |
| Beagle | 1.244 | b | Beagle | 1.145 | b |
|
|
| ||||
| 90tK | 1.351 | a | 90tK | 1.286 | a |
| 90iK | 1.343 | b | 90iK | 1.275 | b |
| 50K | 1.295 | c | 50K | 1.210 | c |
| 15K | 1.273 | d | 15K | 1.181 | d |
| 20K | 1.247 | e | 20K | 1.146 | e |
| 8K | 1.245 | e | 8K | 1.144 | e |
| 6K | 1.239 | e | 6K | 1.135 | e |
| 3K | 1.150 | f | 3K | 1.013 | f |
|
|
| ||||
| NE-NP | 1.269 | a | NE-NP | 1.175 | a |
| NE-P | 1.267 | b | NE-P | 1.172 | b |
|
|
| ||||
| FImpute - 90tK | 1.370 | a | FImpute - 90tK | 1.309 | a |
| FImpute - 90iK | 1.364 | a | FImpute - 90iK | 1.301 | a |
| Beagle - 90tK | 1.331 | b | Beagle - 90tK | 1.262 | b |
| FImpute - 50K | 1.322 | b | Beagle - 90iK | 1.249 | b |
| Beagle - 90iK | 1.322 | b | FImpute - 50K | 1.244 | b |
| FImpute - 15K | 1.300 | c | FImpute - 15K | 1.215 | c |
| FImpute - 20K | 1.271 | d | Beagle - 50K | 1.176 | d |
| FImpute - 8K | 1.269 | d | FImpute - 20K | 1.176 | d |
| Beagle - 50K | 1.269 | d | FImpute - 8K | 1.174 | d |
| FImpute - 6K | 1.262 | d | FImpute - 6K | 1.165 | d |
| Beagle - 15K | 1.245 | e | Beagle - 15K | 1.146 | e |
| Beagle - 20K | 1.222 | f | Beagle - 20K | 1.115 | f |
| Beagle - 8K | 1.221 | f | Beagle - 8K | 1.114 | f |
| Beagle - 6K | 1.215 | f | Beagle - 6K | 1.106 | f |
| FImpute - 3K | 1.169 | g | FImpute - 3K | 1.039 | g |
| Beagle - 3K | 1.130 | h | Beagle - 3K | 0.988 | h |
1Concordance rate and allelic R2 were arcsine square root transformed for the analyses;
2Interaction effects between Algorithm*Scenario and Panel*Scenario were not statistically significant (P?>?0.05);
33K, 6K, 8K, 15K and 20K are low-density panels were imputed in two steps (firstly they were imputed to the 50K and then to the 777K SNP panel);
4Different letters within a group means that there is a statistical difference between two means (P?0.05);
5Algorithm used was either FImpute v.2.2 [11] or Beagle v.3.3 [8];
63K, 6K, 8K, 15K, 20K, 50K, 90iK and 90tK are low-density panels;
7Scenarios for imputation to the 777K SNP panel. (NE-P) - using Nellore genotypes in the reference population and considering pedigree information; (NE-NP) - using Nellore genotypes in the reference population and not using pedigree information.
Figure 4Concordance rate of imputation to the 777K panel in different concordance rate bins. Average over scenarios of imputation from alternative low density panels (3K, 6K, 8K, 15K, 20K, 50K, 90iK and 90tK) to the 777K SNP panel. a) using FImpute; Please note that figures cannot be composed of text only. Since it is in a table format, please modify Figure 1 as a normal table with at least two columns. Please ensure that if there are other tables in the manuscript, affected tables and citations should be renumbered in ascending numerical order. using Beagle.
Figure 5Concordance rate of imputation to the 777K panel for all BTAs and scenarios. a) using FImpute; b) using Beagle.
Analysis of variance performed on the average concordance rate and allelic R of the animals in the imputation population from each scenario for imputation to the 777K SNP panel by one or two steps
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||||
| Two-step | 1.231 | a | Two-step | 1.125 | a |
| One-step | 1.110 | b | One-step | 0.997 | b |
|
|
| ||||
| FImpute | 1.202 | a | FImpute | 1.080 | a |
| Beagle | 1.140 | b | Beagle | 0.997 | b |
|
|
| ||||
| 15K | 1.236 | a | 15K | 1.130 | a |
| 20K | 1.229 | b | 20K | 1.120 | a |
| 8K | 1.180 | c | 8K | 1.052 | b |
| 6K | 1.167 | d | 6K | 1.034 | c |
| 3K | 1.042 | e | 3K | 0.855 | d |
|
|
| ||||
| NE-NP | 1.171 | a | NE-NP | 1.038 | a |
| NE-P | 1.170 | a | NE-P | 1.038 | a |
|
|
| ||||
| Two-step - FImpute | 1.254 | a | Two-step - FImpute | 1.154 | a |
| Two-step - Beagle | 1.208 | b | Two-step - Beagle | 1.095 | b |
| One-step - FImpute | 1.149 | c | One-step - FImpute | 1.006 | c |
| One-step - Beagle | 1.072 | d | One-step - Beagle | 0.898 | d |
|
|
| ||||
| Two-step - 15K | 1.274 | a | Two-step - 15K | 1.183 | a |
| Two-step - 20K | 1.247 | b | Two-step - 20K | 1.147 | b |
| Two-step - 8K | 1.246 | b | Two-step - 8K | 1.145 | b |
| Two-step - 6K | 1.239 | b | Two-step - 6K | 1.136 | b |
| One-step - 20K | 1.210 | c | One-step - 20K | 1.094 | c |
| One-step - 15K | 1.198 | c | One-step - 15K | 1.078 | c |
| Two-step - 3K | 1.149 | d | Two-step - 3K | 1.013 | d |
| One-step - 8K | 1.114 | e | One-step - 8K | 0.960 | e |
| One-step - 6K | 1.094 | f | One-step - 6K | 0.932 | e |
| One-step - 3K | 0.936 | g | One-step - 3K | 0.696 | f |
1Concordance rate and allelic R2 were arcsine square root transformed for the analyses;
2Interaction effects between step*scenario, algorithm*panel, algorithm*scenario and panel*scenario were not statistically significant (P?>?0.05);
3Different letters within a group means that there is a statistical difference between two means (P?0.05);
4One-step is the imputation from the low-density panels to the 777K SNP panel and two-step is the imputation from low-density panels to 50K SNP panel and after the imputation from 50K SNP panel to 777K SNP panel;
5Algorithm used was either FImpute v.2.2 [11] or Beagle v.3.3 [8];
63K, 6K, 8K, 15K, and 20K are low-density panels;
7Scenarios for imputation to the 777K SNP panel. (NE-P) - using Nellore genotypes in the reference population and considering pedigree information; (NE-NP) - using Nellore genotypes in the reference population and not using pedigree information.
Figure 6Concordance rate of imputation by MAF classes. a) Average over scenarios of imputation from alternative low density panels (3K, 6K, 8K, 15K and 20K, 50K, 90iK and 90tK) to the 777K SNP panel; b) Average over scenarios of imputation from alternative low density panels (3K, 6K, 8K, 15K, 20K) to the 777K SNP panel in two-step imputation. Within a group of colums, two different letters means a statistical difference (P < 0.05).