| Literature DB >> 36092888 |
Yifan Jiang1, Hailiang Song2, Hongding Gao3, Qin Zhang4, Xiangdong Ding1.
Abstract
Genotype imputation from BeadChip to whole-genome sequencing (WGS) data is a cost-effective method of obtaining genotypes of WGS variants. Beagle, one of the most popular imputation software programs, has been widely used for genotype inference in humans and non-human species. A few studies have systematically and comprehensively compared the performance of beagle versions and parameter settings of farm animals. Here, we investigated the imputation performance of three representative versions of Beagle (Beagle 4.1, Beagle 5.0, and Beagle 5.4), and the effective population size (Ne) parameter setting for three species (cattle, pig, and chicken). Six scenarios were investigated to explore the impact of certain key factors on imputation performance. The results showed that the default Ne (1,000,000) is not suitable for livestock and poultry in small reference or low-density arrays of target panels, with 2.47%-10.45% drops in accuracy. Beagle 5 significantly reduced the computation time (4.66-fold-13.24-fold) without an accuracy loss. In addition, using a large combined-reference panel or high-density chip provides greater imputation accuracy, especially for low minor allele frequency (MAF) variants. Finally, a highly significant correlation in the measures of imputation accuracy can be obtained with an MAF equal to or greater than 0.05.Entities:
Keywords: accuracy; imputation; livestock; poultry; whole genome sequencing
Year: 2022 PMID: 36092888 PMCID: PMC9459117 DOI: 10.3389/fgene.2022.963654
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1The framework of the imputation.
Number of SNPs used across chromosomes under different panels in cattle.
| Chr (Cattle) | Chr length (bp) | Reference panel | Target panel | IMP sites (ref350) | IMP sites (ref1555) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ref350 | ref1555 | 50 K | 150 K | 777 K | 50 K | 150 K | 777 K | 50 K | 150 K | 777 K | ||
| chr1 | 158,337,067 | 1,265,065 | 3,068,377 | 3,067 | 6,781 | 39,186 | 1,262,134 | 1,258,329 | 1,229,993 | 3,065,312 | 3,061,599 | 3,029,193 |
| chr7 | 112,638,659 | 847,075 | 2,063,921 | 2,064 | 5,386 | 28,133 | 845,108 | 841,843 | 822,367 | 2,061,858 | 2,058,537 | 2,035,791 |
| chr21 | 71,599,096 | 562,318 | 1,387,487 | 1,296 | 3,071 | 17,712 | 561,083 | 559,274 | 546,263 | 1,386,192 | 1,384,416 | 1,369,777 |
| chr29 | 51,505,224 | 470,173 | 1,101,854 | 962 | 2,190 | 12,038 | 469,260 | 467,998 | 458,834 | 1,100,892 | 1,099,665 | 1,089,816 |
| Total | 394,080,046 | 3,144,631 | 7,621,639 | 7,389 | 17,428 | 97,069 | 3,137,585 | 3,127,444 | 3,057,457 | 7,614,254 | 7,604,217 | 7,524,577 |
Chr, chromosome; IMP sites, imputed sites and locus used to calculate imputation accuracy.
Number of SNPs used across chromosomes in pigs and chickens.
| Species | Chr | Chr length (bp) | Reference panel | Target panel | IMP sites |
|---|---|---|---|---|---|
| Pig | chr1 | 315,321,322 | 2,756,826 | 5,014 | 2,751,812 |
| chr6 | 157,765,593 | 1,782,136 | 3,693 | 1,778,443 | |
| chr12 | 63,588,571 | 893,925 | 2,138 | 891,787 | |
| chr18 | 61,220,071 | 879,515 | 1,439 | 878,076 | |
| Total | 597,895,557 | 6,312,402 | 12,284 | 6,300,118 | |
| Chicken | chr1 | 196,202,544 | 7,158,664 | 9,841 | 80,339 |
| chr3 | 111,302,122 | 4,079,325 | 5,506 | 44,859 | |
| chr6 | 35,467,016 | 1,479,613 | 2,117 | 17,537 | |
| chr28 | 4,974,273 | 190,787 | 534 | 4,187 | |
| Total | 347,945,955 | 12,908,389 | 17,998 | 146,922 |
Chr, chromosome; IMP sites, imputed sites and locus used to calculate imputation accuracy.
Scenarios used to evaluate imputation performance.
| Scenario | Description | Species | Target panel | Reference panel | Software | Ne |
|---|---|---|---|---|---|---|
| S1 | Effects of beagle version and Ne parameter size on imputation accuracy in three species | Cattle | 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle4.1, Beagle5.0, Beagle5.4 | 100, 1,000, 5,000, 10,000, 20,000, 50,000, 100,000, 1,000,000 |
| Pig | 25 Asian pigs + 25 European pigs (80 K) | 359 pigs | Beagle4.1, Beagle5.0, Beagle5.4 | 100, 1,000, 5,000, 10,000, 20,000, 50,000, 100,000, 1,000,000 | ||
| Chicken | 450 yellow-feather dwarf broiler chickens (60 K) | 355 chickens | Beagle4.1, Beagle5.0, Beagle5.4 | 100, 1,000, 5,000, 10,000, 20,000, 50,000, 100,000, 1,000,000 | ||
| S2 | Chip density and reference panel size on the imputation accuracy | Cattle | 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle4.1, Beagle5.0, Beagle5.4 | 100,000 |
| S3 | Imputation accuracy against minor allele frequency | Cattle | 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle5.4 | 100,000 |
| S4 | The relationship of the measure of imputation accuracy (Acc, Cor, AR2, DR2) | Cattle | 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle4.1 (for AR2), Beagle5.4 | 100,000 |
| S5 | The relationship between target panel and reference panel on the imputation accuracy | Cattle | 27 Chinese yellow cattle (50, 150, 777 K) and 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle5.4 | 100,000 |
| S6 | Time consuming | Cattle | 100 Holstein (50, 150, 777 K) | ref350, ref1555 | Beagle4.1, Beagle5.0, Beagle5.4 | 100,000 |
Ne, effective population size; AR2, allelic R-squared; DR2, dosage R-squared; Acc, genotype concordance; Cor: correlation.
FIGURE 2Principal component analysis (PCA) showing the population structure of the three farm animals (cattle, pigs, and chickens). (A) PCA showing the population structure of 1,682 sequenced cattle in the RUN5 of the 1000 bull genome project. (B) PCA showing the population structure of 409 sequenced pigs in genome variation map database (C) PCA showing the population structure of 335 sequenced chickens. GJF, green jungle fowl; RJF, red jungle fowl; YFDB, yellow feather dwarf broiler. Different colors and symbols represent different classes.
FIGURE 3Accuracy of imputation for three density BeadChip chips, two reference population sizes and three imputation software with a range of effective population size (Ne) sets in cattle. (A) Imputation accuracy measured by the genotype concordance (Acc). (B) Imputation accuracy measured by the correlation (Cor) (C,D) corresponds to (A) and (B) with minor allele frequency sites less than 0.05 removed.
FIGURE 4Imputation accuracy by minor allele frequency (MAF) class. The SNPs were divided into bins of 0.01 per increment according to their MAF. AR2, allelic R-squared; DR2, dosage R-squared; Acc, genotype concordance; Cor, correlation.
FIGURE 5The spearman correlation of the three measures of imputation accuracy and minor allele frequency (MAF) among each other. (A) All sites (B) the sites with minor allele frequency no less than 0.05.
FIGURE 6Genotype concordance calculated in the individual lever for 100 Holstein and 27 Chinese yellow cattle.
The imputation accuracy for 27 Chinese yellow cattle.
| Breed | Individual number | Imputation accuracy |
|---|---|---|
| Menggu | 2 | 0.953 |
| Yanbian | 2 | 0.941 |
| Hasake | 2 | 0.919 |
| Xizang | 1 | 0.888 |
| Qinchuan | 2 | 0.871 |
| Luxi | 2 | 0.869 |
| Guanling | 2 | 0.837 |
| Dengchuan | 2 | 0.832 |
| Wenling | 2 | 0.808 |
| Dehong | 2 | 0.805 |
| Dabieshan | 2 | 0.802 |
| Fujian | 2 | 0.802 |
| Liping | 2 | 0.789 |
| Nanyang | 2 | 0.768 |
Imputation accuracy was measured using genotype concordance (Acc). This imputation was performed from 150 K to WGS with ref1555 using Beagle 5.2 with Ne = 1,00,000.
FIGURE 7Time utilized for each imputation.