| Literature DB >> 35879321 |
Zhen Wang1, Zhenyang Zhang1, Zitao Chen1, Jiabao Sun1, Caiyun Cao1, Fen Wu1, Zhong Xu2, Wei Zhao3, Hao Sun4, Longyu Guo3, Zhe Zhang5, Qishan Wang6, Yuchun Pan7.
Abstract
Pigs not only function as a major meat source worldwide but also are commonly used as an animal model for studying human complex traits. A large haplotype reference panel has been used to facilitate efficient phasing and imputation of relatively sparse genome-wide microarray chips and low-coverage sequencing data. Using the imputed genotypes in the downstream analysis, such as GWASs, TWASs, eQTL mapping and genomic prediction (GS), is beneficial for obtaining novel findings. However, currently, there is still a lack of publicly available and high-quality pig reference panels with large sample sizes and high diversity, which greatly limits the application of genotype imputation in pigs. In response, we built the pig Haplotype Reference Panel (PHARP) database. PHARP provides a reference panel of 2012 pig haplotypes at 34 million SNPs constructed using whole-genome sequence data from more than 49 studies of 71 pig breeds. It also provides Web-based analytical tools that allow researchers to carry out phasing and imputation consistently and efficiently. PHARP is freely accessible at http://alphaindex.zju.edu.cn/PHARP/index.php . We demonstrate its applicability for pig commercial 50 K SNP arrays, by accurately imputing 2.6 billion genotypes at a concordance rate value of 0.971 in 81 Large White pigs (~ 17 × sequencing coverage). We also applied our reference panel to impute the low-density SNP chip into the high-density data for three GWASs and found novel significantly associated SNPs that might be casual variants.Entities:
Mesh:
Year: 2022 PMID: 35879321 PMCID: PMC9314402 DOI: 10.1038/s41598-022-15851-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Schematic diagram of the pig haplotype reference panel’s construction, imputation accuracy evaluation, implementation platform and applications. (A) Data resources and processing steps used to construct the PHARP. (B) Imputation accuracy estimation of PHARP on multiple test datasets. (C) Imputation platform development. (D) Applications of PHARP in GWASs, GS and other potential studies such as eQTL mapping and TWASs.
Figure 2Imputation accuracy under different scenarios. (A) Mimicing three popular pig commercial chips (50 K, 60 K, and 80 K) using three datasets by masking all variants (only autosomes were used) except those on the chips; the held-out genotypes were considered as ‘real’ to calculate the CR and r2 values. (B) Boxplot of imputation accuracy estimated by mimicking the imputed panel with different densities of SNPs on chromosome 1 using test datasets 1, 2 and 3 (see Supplementary Fig. S2 for plots of the remaining autosomes). (C) Boxplot of the imputation accuracy estimated by mimicking 50 K chip genotypes from dataset 1 using different sizes of reference panels constructed by randomly extracting samples from 1006 individuals or 115 Large White pigs (repeated 5 times, the different sizes of reference panel were marked with different colors). (D) Mimicking the 50 K chip genotypes from dataset 1 and 3 and using reference panels constructed by extracting samples according to pig breed (LW, Large White, n = 114; DU, Duroc, n = 85). (E) The imputation accuracies of the different MAF bins ((0, 0.02], (0.02, 0.05], (0.05, 0.1], (0.1, 0.2], (0.2, 0.3], (0.4 0.5]) estimated by mimicking the 50 K chip genotypes using dataset 1. (F) The imputation accuracy estimated from dataset 4 using our reference panel and that from Animal-ImputeDB. Dataset 1, Large White pig breed, LW, n = 81; dataset 2, Jiaxinghei pig breed, JXH, n = 54; dataset 3, Duroc pig breed, DU, n = 299; dataset 4, Duroc pig breed, n = 20, pigs were genotyped by both a 50 K chip and ELC.
Figure 3Association signals for growth phenotypes before and after imputation. Association test statistics on the − log10 (P-value) scale (y-axis) are plotted for each SNP position (x-axis) for the trait of backfat thickness at an age of 180 days (A), from Zhang et al., and at 100 kg (B), from Fu et al. To simplify the plot, only the variants with a P-value less than 1.08 × 10–4 are shown, and they are colored according to the annotated genes. The black-labeled genes are reported in the original paper, and the blue-labeled genes are novel genes detected after imputation. Examples of potential causal variants (marked by blue asterisks) in the SNRPC (C), GRM4 (D) and PACSIN1 (E) genes. Each dot represents a variant, whose LD (r2) with the Chip SNP (marked by blue diamonds) or the one with the lowest P-value (marked by a black circle) is indicated by the colour of the dot. The two horizontal lines divide SNPs with P-values < 2.05 × 10–6 and < 1.08 × 10–4 (A), and P-values < 6.46 × 10–7 and < 1.86 × 10–5 (B).