| Literature DB >> 25164068 |
Rasmus Froberg Brøndum1, Bernt Guldbrandtsen, Goutam Sahana, Mogens Sandø Lund, Guosheng Su.
Abstract
BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.Entities:
Mesh:
Year: 2014 PMID: 25164068 PMCID: PMC4152568 DOI: 10.1186/1471-2164-15-728
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Number of animals with whole-genome sequence and high density genotype information used in the study
| Holstein | Jersey | RDC | Brown-Swiss | Total | |
|---|---|---|---|---|---|
|
| 40 | 27 | 52 | 16 | 135 |
|
| 92 | 15 | 0 | 0 | 107 |
|
| 16 | 27 | 29 | 0 | 72 |
Mean and standard deviation (SD) of correlation between true and imputed genotype dosage for Holstein (HOL), Jersey (JER) and Nordic Red (RDC)
| METHOD | HOL | JER | RDC | |||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | |
| BEAGLE/BEAGLE pre-phasing (Single breed) | 0.87 | 0.32 | 0.82 | 0.38 | 0.76 | 0.39 |
| BEAGLE/BEAGLE pre-phasing | 0.88 | 0.32 | 0.87 | 0.32 | 0.86 | 0.30 |
| BEAGLE/SHAPEIT2 pre-phasing | 0.88 | 0.30 | 0.86 | 0.32 | 0.85 | 0.30 |
| BEAGLE/Genotype probabilities | 0.90 | 0.27 | 0.89 | 0.28 | 0.87 | 0.27 |
| IMPUTE2/BEAGLE pre-phasing | 0.90 | 0.20 | 0.85 | 0.23 | 0.86 | 0.20 |
| IMPUTE2/SHAPEIT2 pre-phasing | 0.90 | 0.20 | 0.84 | 0.23 | 0.86 | 0.21 |
| IMPUTE2/Genotype probabilities | 0.90 | 0.20 | 0.84 | 0.22 | 0.87 | 0.18 |
Figure 1Imputation accuracy along BTA29 for Holstein (HOL), Jersey (JER) and Nordic Red (RDC). Imputed data was obtained using BEAGLE or IMPUTE2 with genotype probability data in the reference.
Figure 2Distribution of minor allele frequency for sequence markers on BTA29. Minor allele frequencies are calculated based on the 242 sequenced animals.
Figure 3Imputation accuracy versus minor allele frequency (MAF). Imputation accuracies are averaged in bins of 1% of MAF.
Computation times for phasing and imputation procedures
| Procedure | Approximate CPU time (hour:min) |
|---|---|
| Phasing reference (N = 242) | |
| BEAGLE | 02:30 |
| SHAPEIT2 (4 cores) | 52:00 |
| Imputing one individual (ref: N = 241, validation: N = 1) | |
| BEAGLE with phased reference | 00:50 |
| BEAGLE with un-phased reference | 02:50 |
| IMPUTE2 with phased reference | 00:05 |
| IMPUTE2 with un-phased reference | 41:40 |
Computations were done on a Unix computer cluster with Intel XEON X5670/X5677 processors.