| Literature DB >> 32211021 |
Sohyoung Won1, Jong-Eun Park2, Ju-Hwan Son2, Seung-Hwan Lee3, Byeong Ho Park2, Mina Park2, Won-Chul Park2, Han-Ha Chai2, Heebal Kim1,4,5, Jungjae Lee6, Dajeong Lim2.
Abstract
Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction.Entities:
Keywords: Hanwoo; accuracy; best linear unbiased prediction; genomic prediction; haplotype; hierarchical clustering; linkage disequilibrium
Year: 2020 PMID: 32211021 PMCID: PMC7067973 DOI: 10.3389/fgene.2020.00134
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Summary statistics of the phenotypes used for the study.
| Minimum | 1st Qt. | Median | Mean | 3rd Qt. | Maximum | |
|---|---|---|---|---|---|---|
| 197 | 335 | 374 | 377.5789 | 415 | 623 | |
| 42 | 77 | 84 | 84.85138 | 92 | 126 | |
| 1 | 7 | 10 | 11.02117 | 14 | 39 |
CWT, carcass weight (kg); EMA, eye muscle area (cm2); BFT, backfat thickness (mm).
Haplotype and allele statistics of each haplotype defining method at different sizes.
| SNP count-based haplotypes | 5 SNPs | 10 SNPs | 20 SNPs | 50 SNPs |
|---|---|---|---|---|
| Number of haplotype alleles | 1,303,861 | 1,877,160 | 2,713,296 | 3,710,659 |
| Number of haplotypes | 111,123 | 55,554 | 27,768 | 11,099 |
| Average number of SNPs per haplotypes | 5 | 10 | 20 | 50 |
| Average number of alleles per haplotypes | 11.73349 | 33.78983 | 97.71305 | 334.3237 |
| Minimum SNPs in haplotypes | 5 | 10 | 20 | 50 |
| Maximum SNPs in haplotypes | 5 | 10 | 20 | 50 |
| Number of haplotype allele markers | 1,364,861 | 1,867,261 | 2,621,574 | 3,581,059 |
| Number of haplotypes | 97,061 | 54,163 | 27,797 | 11,196 |
| Average number of SNPs per haplotypes | 5.725038 | 10.25936 | 19.99057 | 49.63183 |
| Average number of alleles per haplotypes | 14.06188 | 34.47484 | 94.31140 | 319.8516 |
| Minimum SNPs in haplotypes | 2 | 2 | 2 | 2 |
| Maximum SNPs in haplotypes | 29 | 47 | 71 | 136 |
| Number of haplotype alleles | 1,277,525 | 1,764,074 | 2,472,637 | 3,358,562 |
| Number of haplotypes | 111,123 | 55,554 | 27,768 | 11,099 |
| Average number of SNPs per haplotypes | 5.000567 | 10.00248 | 20.01145 | 50.06559 |
| Average number of alleles per haplotypes | 11.49649 | 31.75422 | 89.04628 | 302.6004 |
| Minimum SNPs in haplotypes | 1 | 1 | 1 | 1 |
| Maximum SNPs in haplotypes | 114 | 131 | 141 | 213 |
K is the number of clusters and N is the number of total SNPs.
Figure 1Genomic prediction accuracies from five time five-fold cross validation. Prediction accuracies of using various sizes of haplotypes defined by different methods and using individual SNPs were compared for CWT, BFT and EMA respectively. The black lines on the bars show standard errors of the prediction accuracies. Accuracies were calculated as the correlation coefficients of GEBVs and pre-corrected phenotypes.
Figure 2Genomic prediction accuracies from forward validation. Prediction accuracies of using various sizes of haplotypes defined by different methods and using individual SNPs were compared for CWT, BFT and EMA respectively. Accuracies were calculated as the correlation coefficients of GEBVs and pre-corrected phenotypes.
P-values of paired t-tests comparing prediction accuracies using individual SNPs and haplotypes defined by different methods and sizes.
| Average number of SNPs | |||||
|---|---|---|---|---|---|
| | 0.002** | 0.01* | 0.21 | 0.98 | |
| 0.0008** | 0.03* | 0.23 | 0.92 | ||
| 0.0005** | 0.005** | 0.09 | 0.98 | ||
| 0.00004** | 0.004** | 0.12 | 0.81 | ||
| 0.00007** | 0.007** | 0.09 | 0.58 | ||
| 0.0002** | 0.002** | 0.07 | 0.86 | ||
| 0.64 | 0.67 | 0.77 | 0.99 | ||
| 0.07 | 0.20 | 0.74 | 0.99 | ||
| 0.77 | 0.52 | 0.43 | 1.00 | ||
* and ** indicates significant at α = 0.05, 0.01 respectively.
Estimated heritabilities using haplotypes defined by different methods and sizes and using individual SNPs.
| Average number of SNPs | ||||
|---|---|---|---|---|
| 0.39 | 0.39 | 0.41 | 0.43 | |
| 0.38 | 0.39 | 0.40 | 0.42 | |
| 0.39 | 0.39 | 0.41 | 0.43 | |
| 0.36 | ||||
| 0.33 | 0.34 | 0.35 | 0.38 | |
| 0.33 | 0.34 | 0.35 | 0.38 | |
| 0.33 | 0.34 | 0.36 | 0.38 | |
| 0.43 | ||||
| 0.45 | 0.46 | 0.48 | 0.52 | |
| 0.44 | 0.45 | 0.47 | 0.50 | |
| 0.44 | 0.45 | 0.46 | 0.50 | |
| 0.43 | ||||