| Literature DB >> 25277486 |
Aniek C Bouwman1, Roel F Veerkamp2.
Abstract
BACKGROUND: The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants.Entities:
Mesh:
Year: 2014 PMID: 25277486 PMCID: PMC4189672 DOI: 10.1186/s12863-014-0105-8
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1Cross-validation (CV) scheme for each scenario where each block represents a group of 20 animals. The 100 Holstein individuals were divided in 5 groups of 20 animals each, and used as validation set once in each scenario. In the reference population the numbered blocks 1 to 5 represent the same 5 groups of 20 Holstein animals as in the validation sets; BSW were groups of 20 Brown Swiss animals; JER were groups of 20 Jersey animals; RDC were groups of 20 Nordic Red Dairy Cattle.
Average imputation accuracy from the bovine 777 K SNP chip to whole-genome sequence on chromosome 1 and 29
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| Sequence (n) | 1,912,451 | 1,912,451 | 1,91,2451 | 1,912,451 | 670,773 | 670,773 | 670,773 | 670,773 |
| 777 K chip (n) | 41,868 | 41,868 | 41,868 | 41,868 | 13,556 | 13,556 | 13,556 | 13,556 |
| No variation in reference set (n)1 | 1,178,683 | 710,480 | 894,633 | 616,808 | 385,137 | 221,137 | 283,516 | 187,904 |
| No variation observed in validation set (n)1 | −2 | 468,203 | 284,050 | 561,875 | −2 | 164,000 | 101,621 | 197,233 |
| No variation imputed in validation set (n)1 | 19,484 | 1,005 | 1,077 | 649 | 19,284 | 4,139 | 4,267 | 3,681 |
| Obtained overall imputation accuracy (n) | 672,416 | 690,895 | 690,823 | 691,251 | 252,796 | 267,941 | 267,813 | 268,399 |
| average overall imputation accuracy (r) | 0.70 | 0.83 | 0.88 | 0.89 | 0.59 | 0.74 | 0.80 | 0.82 |
| standard deviation of r | 0.32 | 0.27 | 0.25 | 0.24 | 0.37 | 0.32 | 0.29 | 0.28 |
1No variation was present in the genotype dosages of at least one of the 5 corresponding cross-validation sets, therefore the imputation accuracy (correlation) could not be computed.
2In scenario HOL20 the reference sets were the same as the validation sets, therefore all variants without variation in at least one cross-validation reference set are the same as the variants without variation in observed genotypes of the validation sets.
Figure 2Imputation accuracy of variants plotted against the minor allele frequency. Imputation accuracy of variants on chromosome 1 (A) and chromosome 29 (B) for HOL20 (dotdash line), MIX80 (dotted line), HOL80 (dashed line), and MIX140 (solid line) plotted against the minor allele frequency (MAF) in Holstein. The lines were fitted with a generalized additive model with integrated smoothness estimation using the imputation accuracy over all 5 cross-validations.
Average imputation accuracy (r) of SNP and short insertions and deletions (indels) on chromosome 1 and 29
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||
|
|
|
|
|
|
|
|
|
|
| HOL20 | 630,092 | 0.71 | 42,324 | 0.56 | 238,473 | 0.60 | 14,323 | 0.47 |
| MIX80 | 646,872 | 0.84 | 44,023 | 0.71 | 252,571 | 0.75 | 15,370 | 0.64 |
| HOL80 | 646,796 | 0.88 | 44,027 | 0.76 | 252,439 | 0.81 | 15,374 | 0.70 |
| MIX140 | 647,200 | 0.89 | 44,051 | 0.78 | 252,996 | 0.83 | 15,403 | 0.73 |
Figure 3Persistency of phase across breeds. Persistency of phase between Holstein and Brown Swiss (solid line), Holstein and Jersey (dashed line), Holstein and Nordic Red Dairy Cattle (dotted line) on chromosome 1.
Average imputation accuracy for scenarios HOL80 and MIX80 per category of minor allele frequency
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| > 0.1 | > 16 | 525,575 | 0.27237 | 0.92 | 0.26094 | 0.90 |
| 0.05625-0.1 | 9-16 | 141,708 | 0.07655 | 0.78 | 0.09741 | 0.71 |
| 0.03125-0.05 | 5-8 | 89,926 | 0.04076 | 0.67 | 0.06922 | 0.57 |
| 0.01875-0.025 | 3-4 | 51,491 | 0.02193 | 0.57 | 0.05642 | 0.51 |
| 0.0125 | 2 | 22,135 | 0.01250 | 0.44 | 0.05434 | 0.51 |
| 0.00625 | 1 | 18,384 | 0.00625 | 0.20 | 0.05014 | 0.49 |
1number of minor alleles present in the HOL80 reference population at corresponding MAF range in the HOL80 reference population.
Average imputation accuracy (r) and average minor allele frequency (MAF) of the reference population for scenarios HOL80 and MIX80 for variants on chromosome 1 per category of MAF range in the HOL80 reference population. Results are only shown for one cross-validation, but are similar for all cross-validations on chromosome 1.
Average imputation accuracy for scenarios HOL80 and MIX140 per category of minor allele frequency
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| > 0.1 | > 16 | 525,647 | 0.27237 | 0.92 | 0.27159 | 0.93 |
| 0.05625-0.1 | 9-16 | 143,313 | 0.07639 | 0.78 | 0.08784 | 0.79 |
| 0.03125-0.05 | 5-8 | 94,021 | 0.04073 | 0.66 | 0.05596 | 0.69 |
| 0.01875-0.025 | 3-4 | 56,235 | 0.02187 | 0.56 | 0.03884 | 0.62 |
| 0.0125 | 2 | 25,698 | 0.01250 | 0.42 | 0.03216 | 0.56 |
| 0.00625 | 1 | 25,101 | 0.00625 | 0.19 | 0.02385 | 0.45 |
1number of minor alleles present in the HOL80 reference population at corresponding MAF range in the HOL80 reference population.
Average imputation accuracy (r) and average minor allele frequency (MAF) of the reference population for scenarios HOL80 and MIX140 for variants on chromosome 1 per category of MAF range in the HOL80 reference population. Results are only shown for one cross-validation, but are similar for all cross-validations on chromosome 1.
Average imputation accuracy of Holstein specific variants for HOL80 and MIX80 per category of MAF
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| > 0.1 | > 16 | 15,736 | 0.16529 | 0.87 | 0.04369 | 0.78 |
| 0.05625-0.1 | 9-16 | 23,057 | 0.07367 | 0.79 | 0.02289 | 0.63 |
| 0.03125-0.05 | 5-8 | 14,825 | 0.04061 | 0.63 | 0.01046 | 0.30 |
| 0.01875-0.025 | 3-4 | 9,185 | 0.02152 | 0.54 | 0.00855 | 0.24 |
| 0.0125 | 2 | 2,616 | 0.01250 | 0.38 | 0.00679 | 0.13 |
| 0.00625 | 1 | 1,663 | 0.00625 | 0.18 | 0.00625 | 0.14 |
1number of minor alleles present in the HOL80 reference population at corresponding MAF range in the HOL80 reference population.
Average imputation accuracy (r) and average minor allele frequency (MAF) of the reference population for scenarios HOL80 and MIX80 for Holstein specific variants on chromosome 1 per category of MAF range in the HOL80 reference population. Results are only shown for one cross-validation, but are similar for all cross-validations on chromosome 1.
Average imputation accuracy of Holstein specific variants for HOL80 and MIX140 per category of MAF
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| > 0.1 | > 16 | 15,741 | 0.16528 | 0.87 | 0.09452 | 0.89 |
| 0.05625-0.1 | 9-16 | 24,641 | 0.07293 | 0.80 | 0.04167 | 0.81 |
| 0.03125-0.05 | 5-8 | 18,872 | 0.04046 | 0.63 | 0.02312 | 0.64 |
| 0.01875-0.025 | 3-4 | 13,865 | 0.02143 | 0.51 | 0.01224 | 0.54 |
| 0.0125 | 2 | 6,139 | 0.01250 | 0.33 | 0.00714 | 0.34 |
| 0.00625 | 1 | 8,266 | 0.00625 | 0.15 | 0.00357 | 0.15 |
1number of minor alleles present in the HOL80 reference population at corresponding MAF range in the HOL80 reference population.
Average imputation accuracy (r) and average minor allele frequency (MAF) of the reference population for scenarios HOL80 and MIX140 for Holstein specific variants on chromosome 1 per category of MAF range in the HOL80 reference population. Results are only shown for one cross-validation, but are similar for all cross-validations on chromosome 1.