| Literature DB >> 30223768 |
Andrew Whalen1, Gregor Gorjanc2, Roger Ros-Freixedes2, John M Hickey2.
Abstract
BACKGROUND: In this paper, we review the performance of various hidden Markov model-based imputation methods in animal breeding populations. Traditionally, pedigree and heuristic-based imputation methods have been used for imputation in large animal populations due to their computational efficiency, scalability, and accuracy. Recent advances in the area of human genetics have increased the ability of probabilistic hidden Markov model methods to perform accurate phasing and imputation in large populations. These advances may enable these methods to be useful for routine use in large animal populations, particularly in populations where pedigree information is not readily available.Entities:
Mesh:
Year: 2018 PMID: 30223768 PMCID: PMC6142395 DOI: 10.1186/s12711-018-0416-8
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Fig. 1Genotype imputation accuracy of four diploid HMM algorithms based on simulated data. Unless otherwise noted, there were 1000 high-density (HD) SNPs per chromosome, 200 low-density (LD) SNPs per chromosome, 100 dams genotyped at high-density, and complete overlap between the high-density arrays of generations 1 and 2 and those of generations 3 and 4. We varied a the number of dams genotyped at high-density, b the number of individuals in the population, c the number of SNPs in the low-density array, and d the amount of overlap between the high-density array of generations 1 and 2 and those of generations 3 and 4
Fig. 2Genotype imputation accuracy of a combination of pre-phasing and haploid HMM methods based on simulated data. Unless otherwise noted there were 1000 high-density (HD) SNPs per chromosome, 200 low-density (LD) SNPs per chromosome, 100 dams genotyped at high-density and complete overlap between the high-density arrays of generations 1 and 2 and those of generations 3 and 4. We varied a the number of dams genotyped at high-density, b the number of individuals in the population, c the number of SNPs in the low-density array, d the amount of overlap between the high-density array of generations 1 and 2 and those of generations 3 and 4, and e the number of high density SNPs per chromosome keeping the ratio between high- and low-density constant (15:1)
Run time and accuracy for diploid imputation, phasing, and haploid imputation methods in the default simulated data scenario
| Phasing method | Imputation method | Computing time (s) | Accuracy | |||
|---|---|---|---|---|---|---|
| HD phasing | LD phasing | Imputation | Total | |||
| IMPUTE2 | 42,796 | 42,796 | 0.861 | |||
| Beagle v4.0 | 23,042 | 23,042 | 0.901 | |||
| MaCH | 21,998 | 21,998 | 0.944 | |||
| fastPHASE | 28,892 | 28,892 | 0.941 | |||
| HAPI-UR | IMPUTE2 | 117 | 14 | 149 | 280 | 0.964 |
| HAPI-UR | Minimac3 | 117 | 14 | 62 | 193 | 0.967 |
| HAPI-UR | Beagle v4.1 | 117 | 14 | 78 | 209 | 0.793 |
| Eagle2 | IMPUTE2 | 1361 | 207 | 148 | 1717 | 0.988 |
| Eagle2 | Minimac3 | 1361 | 207 | 55 | 1623 | 0.988 |
| Eagle2 | Beagle v4.1 | 1361 | 207 | 79 | 1647 | 0.794 |
| SHAPEIT2 | IMPUTE2 | 8495 | 1175 | 150 | 9820 | 0.979 |
| SHAPEIT2 | Minimac3 | 8495 | 1175 | 58 | 9728 | 0.977 |
| SHAPEIT2 | Beagle v4.1 | 8495 | 1175 | 77 | 9747 | 0.792 |
The run time is given in seconds separately for phasing and imputation steps and as a total
HD high-density, LD low-density
Run time and accuracy for phasing, and haploid imputation methods on the real dataset scenario
| Phasing method | Imputation method | Computing time (h) | Accuracy | |||
|---|---|---|---|---|---|---|
| HD phasing | LD phasing | Imputation | Total | |||
| HAPI-UR | IMPUTE2 | 11.53 | 43.09 | 5.63 | 60.25 | 0.997 |
| HAPI-UR | Minimac3 | 11.53 | 43.09 | 2.27 | 56.89 | 0.995 |
| HAPI-UR | Beagle v4.1 | 11.53 | 43.09 | 2.69 | 57.31 | 0.939 |
| Eagle2 | IMPUTE2 | 4.48 (8 cores) | 2.37 (8 cores) | 5.63 | 12.48 | 0.827 |
| Eagle2 | Minimac3 | 4.48 (8 cores) | 2.37 (8 cores) | 2.21 | 9.06 | 0.992 |
| Eagle2 | Beagle v4.1 | 4.48 (8 cores) | 2.37 (8 cores) | 4.19 | 11.04 | 0.925 |
The run time is given in hours separately for phasing and imputation steps and as a total. For Eagle2, the program was run distributed across 8 compute cores. HAPI-UR was run on a single core
HD high-density, LD low-density