| Literature DB >> 31296617 |
Clemens Falker-Gieske1, Iulia Blaj2, Siegfried Preuß3, Jörn Bennewitz3, Georg Thaller2, Jens Tetens1,4.
Abstract
In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density. This lead to the discovery of more than 32 million variants, 8 million of which have not been previously reported. The pooling of the four F2 designs enabled us to perform a joint genome-wide association study, which lead to the identification of numerous significantly associated variant clusters on chromosomes 1, 2, 4, 7, 17 and 18 for the growth and carcass traits average daily gain, back fat thickness, meat fat ratio, and carcass length. We could not only confirm previously reported, but also discovered new quantitative trait loci. As a result, several new candidate genes are discussed, among them BMP2 (bone morphogenetic protein 2), which we recently discovered in a related study. Variant effect prediction revealed that 15 high impact variants for the traits back fat thickness, meat fat ratio and carcass length were among the statistically significantly associated variants.Entities:
Keywords: Genome wide association study; Imputation; Meat; Variant calling; Whole genome sequencing; and production traits; carcass
Mesh:
Year: 2019 PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Per cross information of the sequenced individuals (F0 and F1) and SNP array genotyped individuals (F2). F0 and F1 animals served as the reference panel for the imputation of the F2 generation to sequence level for subsequent genome wide association analyses
| Cross/Generation | F0 | F1 | F2 |
|---|---|---|---|
| Piétrain x (Large White x Landrace)/Large White | 13 | 55 | 1750 |
| Meishan x Piétrain | 8 | 19 | 304 |
| Wild Boar x Piétrain | 6 | 17 | 291 |
| Wild Boar x Meishan | 1 | 0 | 312 |
Four founders are common among crosses.
Figure 1Genotyping workflow. 24 Founder animals were sequenced with high coverage, variants were called with GATK 4.0 and phased with Beagle 5.0. 91 F1 animals were sequenced with low coverage and variants were called with GATK 3.8 and BCFtools mpileup. The F1 dataset was imputed using Beagle 4.0 and pedigree information with phased Founders as a reference-panel for haplotype structure. The imputed F1 was then merged with the F0 variant call data set and phased with Beagle 5.0. Finally the 2657 chip genotyped F2 individuals were imputed to WGS levels with Beagle 4.0 and pedigree information with the merged and phased Founder/F1-imputed dataset as the reference-panel.
Average distance between variants discovered in the founder population. A number of 24 F0 animals were sequenced at high coverage and the average distances between variants (SNPs and INDELs) were calculated per chromosome
| Chromosome | Avg. distance (bp) | SD |
|---|---|---|
| 1 | 105,78729 | 196,8571 |
| 2 | 84,83889 | 201,4813 |
| 3 | 79,13779 | 183,627 |
| 4 | 80,16339 | 174,2767 |
| 5 | 73,37176 | 177,8913 |
| 6 | 85,34639 | 214,5176 |
| 7 | 79,12655 | 175,2067 |
| 8 | 78,90639 | 164,8448 |
| 9 | 79,61324 | 166,9157 |
| 10 | 56,5826 | 141,8648 |
| 11 | 67,6446 | 139,1412 |
| 12 | 65,41473 | 190,7484 |
| 13 | 102,70483 | 209,4908 |
| 14 | 83,58366 | 158,1296 |
| 15 | 93,02928 | 187,2321 |
| 16 | 71,18928 | 155,2467 |
| 17 | 65,91122 | 163,1281 |
| 18 | 76,95204 | 149,7542 |
Identification of local imputation inaccuracies. Chip data from each of the 24 founders was imputed using the remaining 23 founder animals as the reference panel. Coefficients of determination (R2) were calculated for each variant in order to calculate average R2 for SSC1, SSC2, SSC4, SSC7, SSC17, and SSC18
| Chromosome | Average R2 | SD |
|---|---|---|
| 1 | 0.28 | 0.32 |
| 2 | 0.22 | 0.29 |
| 4 | 0.25 | 0.30 |
| 7 | 0.25 | 0.31 |
| 17 | 0.18 | 0.25 |
| 18 | 0.29 | 0.32 |
Figure 2Manhattan plots of the −log10 p-values for association of variants with the traits (A) average daily gain (ADG), (B) back fat thickness (BFT), (C) meat to fat ratio (MFR), and (D) carcass length (CRCL). P-values > 0.001 were excluded from the plots.
Top associated genes for average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL) identified in the GWAS. Genes incorporating or nearby the top 5 variants in the clusters are listed with chromosome and cluster numbers
| Trait | SSC | Cluster no./SSC | Genes |
|---|---|---|---|
| ADG | 2 | 1 | |
| 4 | 1 | ||
| 7 | 2 | ||
| BFT | 1 | 1 | |
| 2 | 7 | ||
| 4 | 1 | ||
| 7 | 24 | ||
| MFR | 1 | 3 | |
| 2 | 8 | ||
| 4 | 3 | ||
| 5 | 1 | ||
| 7 | 1 | ||
| 18 | 6 | ||
| CRCL | 1 | 1 | |
| 7 | 52 | ||
| 17 | 8 |
Figure 3Cluster overlap for (A) SSC2, (B) SSC4 and (C) SSC7 for all traits (average daily gain (ADG) – red, back fat thickness (BFT) – green, meat to fat ratio (MFR) – purple, and carcass length (CRCL) - blue). The heights of the clusters are according to the top variant (– log10 p-value) within each given cluster.
Figure 4Variants concordance and discordance between the traits average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL). The Venn diagram contains statistically significant variants. Intersections between traits include the number of common variants. Numbers of variants that were exclusively found in the single traits are outside of intersections.
Results of variant effect prediction for the production traits average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL). Bonferroni-corrected variants were analyzed
| Predicted effect | ADG | ADG % | BFT | BFT % | MFR | MFR % | CRCL | CRCL % |
|---|---|---|---|---|---|---|---|---|
| Missense variant | 2 | 0.1580 | 962* | 0.6523* | 58 | 0.1893 | 787* | 0.4750* |
| Frameshift variant | 0 | 0 | 0 | 0 | 0* | 0* | 6* | 0.0036* |
| Start lost | 0 | 0 | 1 | 0.0007 | 0 | 0 | 0 | 0 |
| Stop gained | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.0006 |
| Inframe deletion | 0 | 0 | 1* | 0.0007* | 0 | 0 | 2 | 0.0012 |
| Intron variant | 936 | 73.9336 | 116815 | 79.2090 | 21556* | 70.3525* | 131590 | 79.4275 |
| 5 prime UTR variant | 0 | 0 | 229* | 0.1553* | 89 | 0.2905 | 277* | 0.1672* |
| 3 prime UTR variant | 8 | 0.6319 | 1160* | 0.7866* | 1242 | 4.0535 | 1543* | 0.9314* |
| Upstream gene variant | 50 | 3.9494 | 5680* | 3.8514* | 2195* | 7.1638* | 5300* | 3.1991* |
| Downstream gene variant | 44* | 3.4755* | 5791* | 3.9267* | 3442 | 11.2337 | 6893* | 4.1606* |
| Frameshift variant, splice region variant | 0 | 0 | 2 | 0.0014 | 0 | 0 | 0 | 0 |
| Missense variant, splice region variant | 0 | 0 | 41* | 0.0278* | 0 | 0 | 75* | 0.0453* |
| Splice region variant, non coding transcript exon variant | 0 | 0 | 2 | 0.0014 | 3* | 0.0098* | 5 | 0.0030 |
| Splice region variant, 3 prime UTR variant | 0 | 0 | 3* | 0.0020* | 3* | 0.0098* | 0 | 0 |
| Splice region variant, intron variant, non coding transcript variant | 0 | 0 | 2* | 0.0014* | 4 | 0.0131 | 20* | 0.0121* |
| Splice region variant, intron variant | 0 | 0 | 426* | 0.2889* | 41* | 0.1338* | 489* | 0.2952* |
| Splice region variant, synonymous variant | 0 | 0 | 21 | 0.0142 | 22* | 0.0718* | 28* | 0.0169* |
| Splice donor variant | 0 | 0 | 36 | 0.0244* | 1* | 0.0033 | 37 | 0.0223 |
| Intergenic variant | 109 | 8.6098 | 3318 | 2.2498 | 644 | 2.1018 | 9909 | 5.9811 |
| Synonymous variant | 0 | 0 | 2837 | 1.9237 | 214* | 0.6984* | 2751 | 1.6605 |
| Intron variant, non coding transcript variant | 117* | 9.2417* | 9636 | 6.5339 | 1060* | 3.4595* | 5759* | 3.4761* |
| Non coding transcript exon variant | 0 | 0 | 514* | 0.3485* | 66 | 0.2154 | 200* | 0.1207* |
| Start lost, start retained variant, 5 prime UTR variant | 0 | 0 | 0 | 0 | 0 | 0 | 1* | 0.0006* |
Statistically significant high impact variants that were discovered in the genome wide association studies for the production traits average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL)
| Trait | High impact consequence | Variant | Position bp | Gene | Gene name |
|---|---|---|---|---|---|
| Start lost | SSC7:rs319855624 | 32544657 | chromosome 7 C6orf89 homolog | ||
| Frameshift variant, splice region variant | SSC7:._504514 | 32606375 | peptidase inhibitor 16 | ||
| SSC7:._504513 | 32606373 | peptidase inhibitor 16 | |||
| Splice donor variant | SSC7:rs80834233 | 29157904 | dystonin | ||
| SSC7:rs327743463 | 28571665 | DNA primase subunit 2 | |||
| Splice donor variant | SSC2:rs1110687780 | 11630410 | transcobalamin 1 | ||
| Start lost, start retained variant, 5 prime UTR variant | SSC7:rs793752812 | 23958518 | neuraminidase 1 | ||
| Stop gained | SSC7:rs334442580 | 87783592 | |||
| Frameshift variant | SSC7:._1165873 | 97574140 | ATP binding cassette subfamily D member 4 | ||
| SSC7:rs693811701 | 48561663 | aurora kinase A-like | |||
| SSC7:._1068730 | 87783712 | ||||
| SSC7:._1068731 | 87783718 | ||||
| Splice donor variant | SSC7:rs80834233 | 29157904 | dystonin | ||
| SSC7:rs327743463 | 28571665 | DNA primase subunit 2 | |||
| SSC7:rs331245426 | 80150975 | lysophosphatidylcholine acyltransferase 4 |
Most significant Gene Ontology (GO) terms from DAVID for the top associated genes that were identified in genome wide association studies for the the traits back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL)
| Trait | Category | Term | Genes |
|---|---|---|---|
| BFT | MF | GO:0005509 | |
| calcium ion binding | |||
| MFR | BP | GO:0007186 | |
| G-protein coupled receptor signaling pathway | |||
| CC | GO:0016021 | ||
| integral component of membrane | |||
| CC | GO:0005886 | ||
| plasma membrane | |||
| MF | GO:0004930 | ||
| G-protein coupled receptor activity | |||
| MF | GO:0004984 | ||
| olfactory receptor activity | |||
| MF | GO:0005549 | ||
| odorant binding | |||
| CRCL | BP | GO:0001666 | |
| response to hypoxia | |||
| BP | GO:0008283 | ||
| cell proliferation | |||
| CC | GO:0045177 | ||
| apical part of cell | |||
| CC | GO:0031410 | ||
| cytoplasmic vesicle |
Figure 5Imputation accuracy on SSC2 between positions 1,250,000 and 2,000,000. IGF2 is located between bp 1,469,183 and 1,496,417.