| Literature DB >> 32764260 |
Faisal Ramzan1,2, Mehmet Gültas1,3, Hendrik Bertram1, David Cavero4, Armin Otto Schmitt1,3.
Abstract
Genome wide association studies (GWAS) are a well established methodology to identify genomic variants and genes that are responsible for traits of interest in all branches of the life sciences. Despite the long time this methodology has had to mature the reliable detection of genotype-phenotype associations is still a challenge for many quantitative traits mainly because of the large number of genomic loci with weak individual effects on the trait under investigation. Thus, it can be hypothesized that many genomic variants that have a small, however real, effect remain unnoticed in many GWAS approaches. Here, we propose a two-step procedure to address this problem. In a first step, cubic splines are fitted to the test statistic values and genomic regions with spline-peaks that are higher than expected by chance are considered as quantitative trait loci (QTL). Then the SNPs in these QTLs are prioritized with respect to the strength of their association with the phenotype using a Random Forests approach. As a case study, we apply our procedure to real data sets and find trustworthy numbers of, partially novel, genomic variants and genes involved in various egg quality traits.Entities:
Keywords: Random Forests; boruta; egg weight; eggshell strength; genome wide association studies; signal detection
Mesh:
Year: 2020 PMID: 32764260 PMCID: PMC7465705 DOI: 10.3390/genes11080892
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Step by step representation of the peak detection method. (A) Distribution of the test statistic values along the length of a chromosome segment. (B) The red line indicates the cubic spline fitted on the test statistic values represented by the black dots. (C) The same cubic spline curve as in B without points, y-axis rescaled (D) Dashed lines represent the inflection points of the curve. A pair of a left (blue) and a right (right) inflection point constitute a peak.
Figure 2Manhattan and Q-Q plots corresponding to eggshell strength at time point 1 (ESS1), time point 2 (ESS2) and egg weight at 36 weeks of age (EW36). In Manhattan plots (A–C), the horizontal red and green lines denote the genome-wide significance (p-value = for ESS1 and ESS2 and for EW36) and suggestive significance thresholds (p-value = for ESS1 and ESS2 for EW), respectively. The −log10 of the observed p-values for each single nucleotide polymorphism (SNP) is given on the y-axis while its position on a chromosome is given on the x-axis. In Q-Q plots (D–F) the observed −log10 transformed p-values are plotted against the expected −log10 transformed p-values. GIF stands for genomic inflation factor.
Significant peaks as defined in Phase 4 of our analysis framework and corresponding quantitative trait loci (QTLs) for ESS1 and ESS2.
| Chromosome | No. of SNPs | Start Position | End Position | No. of Genes | Trait |
|---|---|---|---|---|---|
| 2 | 204 | 147,575,318 | 148,273,465 | 3 | ESS1 |
| 9 | 66 | 21,762,694 | 21,953,310 | 0 | ESS1 |
| 9 | 82 | 21,777,888 | 22,001,729 | 0 | ESS2 |
| 10 | 75 | 6,517,673 | 6,728,897 | 4 | ESS1 |
| 10 | 86 | 9,922,422 | 10,054,824 | 2 | ESS1 |
| 10 | 60 | 10,715,120 | 10,818,097 | 3 | ESS2 |
| 10 | 61 | 11,245,585 | 11,351,799 | 1 | ESS2 |
| 12 | 112 | 10,948,518 | 11,227,521 | 2 | ESS1 |
| 15 | 42 | 4,908,007 | 5,006,688 | 7 | ESS1 |
| 15 | 43 | 6,193,090 | 6,273,778 | 3 | ESS2 |
| 18 | 38 | 1,722,586 | 1,836,741 | 2 | ESS1 |
| 20 | 51 | 7,589,607 | 7,717,177 | 1 | ESS1 |
| 20 | 46 | 7,599,368 | 7,711,505 | 1 | ESS2 |
Figure 3Plot representing a genomic region on chromosome 18 which is in association with eggshell strength at time point 1 (ESS1). (A) Plot representing the linkage disequilibrium (LD) structure inside and around a significant peak. The dotted red lines depict the boundaries of the peak. Each point represents a single nucleotide polymorphism (SNP) and the color shows the strength of LD between the top SNP inside the peak and the SNP surrounding it. The diamond shape points inside the peak depict the robust SNPs. The X-axis contains the SNP positions on the chromosome while the y-axis depicts the Wald statistic values obtained from the single-SNP based genome wide association study (GWAS) analysis. (B,C) The effects of different genotypes of the two leading SNPs identified in the combined framework for ESS and their significance (, .
Significant peaks as defined in Phase 4 of our analysis framework and corresponding QTLs for EW.
| Chromosome | No. of SNPs | Start Position | End Position | No. of Genes |
|---|---|---|---|---|
| 1 | 304 | 167,931,038 | 169,505,140 | 25 |
| 4 | 205 | 17,189,770 | 18,080,445 | 9 |
| 4 | 143 | 21,319,808 | 21,849,558 | 3 |
| 4 | 136 | 77,317,446 | 78,081,369 | 4 |
| 12 | 39 | 2,849,562 | 3,010,032 | 7 |
| 13 | 49 | 8,495,533 | 8,608,578 | 6 |
| 14 | 58 | 7,023,793 | 7,188,250 | 4 |
| 15 | 41 | 11,193,342 | 11,309,808 | 8 |
| 15 | 35 | 11,419,957 | 11,514,516 | 3 |
| 18 | 30 | 1,057,714 | 1,136,220 | 1 |
| 18 | 28 | 1,179,899 | 1,238,583 | 0 |
Figure 4Plot representing three genomic regions on chromosome 4 in association with egg weight (EW). (A) Plot representing the LD structure inside and around the significant peaks. The dotted red lines depict the boundaries of the peaks. Each point represents a SNP and the color shows the strength of linkage disequilibrium (LD) between the top single nucleotide polymorphisms (SNPs) inside each peak and the surrounding SNPs. The diamond shape points inside the peak depict the robust SNPs. The X-axis contains the SNP positions on the chromosome while the y-axis depicts the Wald statistic values obtained from single-SNP based GWAS analysis. ((B–D) The effects of different genotypes of the three leading SNPs identified for EW and their significance (, ).