| Literature DB >> 35754827 |
Haoqiang Ye1, Zipeng Zhang2, Duanyang Ren1, Xiaodian Cai1, Qianghui Zhu1, Xiangdong Ding2, Hao Zhang1, Zhe Zhang1, Jiaqi Li1.
Abstract
The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4-5.9%) was higher than that within-population (1.2-4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r 2 = 0.2-0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.Entities:
Keywords: combined populations; genomic prediction; haplotype; linkage disequilibrium; whole-genome sequencing
Year: 2022 PMID: 35754827 PMCID: PMC9218795 DOI: 10.3389/fgene.2022.843300
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Population analysis between LM and XD. (A) Principal component analysis (PCA) for two population. (B) Genetic structure analysis for two population.
Summary of statistics between the two populations.
| Population | Trait | Number of individuals | Counts of observations | Mean | Sd | Birth year | Genotyped animals |
|---|---|---|---|---|---|---|---|
| LM | NBA | 5,907 | 19,660 | 9.83 | 3.03 | 2004 to 2016 | 1,641 |
| TNB | 5,907 | 19,660 | 10.85 | 3.06 | |||
| XD | NBA | 4,842 | 18,369 | 9.88 | 2.94 | 2004 to 2015 | 762 |
| TNB | 4,842 | 18,369 | 10.35 | 2.95 |
NBA: total number born alive; TNB: litter size.
FIGURE 2Summary of single and combined population haplotype statistics at the chip data level. The bar plot of the counts of blocked SNPs, haploblocks, and haplotype alleles in different LD threshold value, respectively.
FIGURE 3Genomic prediction accuracy of all scenarios with different r thresholds. The left side shows the trend of accuracy predicted using the 80 K chip data while the right side using the WGS data. The different colored lines represent different prediction methods in which blue, orange, and purple line represent GBLUP, GHBLUP, and GH + GBLUP, respectively. The solid line represents the prediction accuracy trend of combined population, while the dotted line represents the single population.
FIGURE 4Regression coefficient of all scenarios with different r thresholds. The left side shows the trend of accuracy predicted using the 80 K chip data while the right side using the WGS data. The different colored lines represent different prediction methods in which blue, orange, and purple line represent GBLUP, GHBLUP, and GH + GBLUP, respectively. The solid line represents the prediction accuracy trend of combined population, while the dotted line represents the single population.
The comparison of prediction performance of the two methods.
| Val | Ref | Trait | Method | Acc | Regression coefficient | Genetic variance | Residual variance |
|---|---|---|---|---|---|---|---|
| LM | LM | NBA | GBLUP | 0.453 | 0.808 | 1.643 | 0.569 |
| LM | LM | NBA | GHBLUP_SNP1 | 0.453 | 0.807 | 1.642 | 0.569 |
| LM | Combined | NBA | GBLUP | 0.459 | 0.857 | 1.193 | 0.491 |
| LM | Combined | NBA | GHBLUP_SNP1 | 0.458 | 0.857 | 1.193 | 0.491 |
| LM | LM | TNB | GBLUP | 0.450 | 0.801 | 2.285 | 0.751 |
| LM | LM | TNB | GHBLUP_SNP1 | 0.450 | 0.801 | 2.284 | 0.751 |
| LM | Combined | TNB | GBLUP | 0.460 | 0.861 | 1.634 | 0.624 |
| LM | Combined | TNB | GHBLUP_SNP1 | 0.460 | 0.861 | 1.634 | 0.624 |
| XD | XD | NBA | GBLUP | 0.392 | 0.888 | 0.601 | 0.260 |
| XD | XD | NBA | GHBLUP_SNP1 | 0.392 | 0.888 | 0.601 | 0.260 |
| XD | Combined | NBA | GBLUP | 0.387 | 0.740 | 1.193 | 0.504 |
| XD | Combined | NBA | GHBLUP_SNP1 | 0.387 | 0.740 | 1.193 | 0.504 |
| XD | XD | TNB | GBLUP | 0.431 | 0.880 | 0.806 | 0.259 |
| XD | XD | TNB | GHBLUP_SNP1 | 0.431 | 0.880 | 0.806 | 0.259 |
| XD | Combined | TNB | GBLUP | 0.439 | 0.785 | 1.636 | 0.653 |
| XD | Combined | TNB | GHBLUP_SNP1 | 0.440 | 0.785 | 1.635 | 0.653 |
Val: validation set of population; Ref: reference set of population; Acc: the prediction accuracy; NBA: total number born alive; TNB: litter size; GHBLUP_SNP1: the method of treating a single SNP, as a haplotype.