| Literature DB >> 29142306 |
Bart J G Broeckx1, Thomas Derrien2, Stéphanie Mottier2, Valentin Wucher2, Edouard Cadieu2, Benoît Hédan2, Céline Le Béguec2, Nadine Botherel2, Kerstin Lindblad-Toh3,4, Jimmy H Saunders5, Dieter Deforce6, Catherine André2, Luc Peelman7, Christophe Hitte8.
Abstract
Genome-wide association studies (GWAS) are widely used to identify loci associated with phenotypic traits in the domestic dog that has emerged as a model for Mendelian and complex traits. However, a disadvantage of GWAS is that it always requires subsequent fine-mapping or sequencing to pinpoint causal mutations. Here, we performed whole exome sequencing (WES) and canine high-density (cHD) SNP genotyping of 28 dogs from 3 breeds to compare the SNP and linkage disequilibrium characteristics together with the power and mapping precision of exome-guided GWAS (EG-GWAS) versus cHD-based GWAS. Using simulated phenotypes, we showed that EG-GWAS has a higher power than cHD to detect associations within target regions and less power outside target regions, with power being influenced further by sample size and SNP density. We analyzed two real phenotypes (hair length and furnishing), that are fixed in certain breeds to characterize mapping precision of the known causal mutations. EG-GWAS identified the associated exonic and 3'UTR variants within the FGF5 and RSPO2 genes, respectively, with only a few samples per breed. In conclusion, we demonstrated that EG-GWAS can identify loci associated with Mendelian phenotypes both within and across breeds.Entities:
Mesh:
Year: 2017 PMID: 29142306 PMCID: PMC5688105 DOI: 10.1038/s41598-017-15947-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of distance between subsequent SNPs for the exome-1.0 and canine high-density array (chromosome 1). Only those SNPs that passed the filters for linkage disequilibrium calculations were used (sufficiently polymorphic, sufficient call rate (see methods section)). Distances are expressed in bp.
Figure 2Relation between linkage disequilibrium, gene annotation, SNP density and distance between tagSNPs on chromosome 1. (a) Whole exome sequencing (WES)- and canine high-density array (cHD)-specific informative SNP count per bin (binsize: 1 Mb) relative to position. (b) Overview of RefSeq Genes track (blue) and Ensembl Gene Predictions track (brown) density relative to position. (c) WES- and cHD-specific linkage disequilibrium (measured in r²) relative to position. (d) Relation between r² and distance between subsequent SNPs. In each graph, lines are obtained with LOWESS (locally weighed scatterplot smoothing).
Figure 3The effect of subsampling SNPs on r² relative to position. From the original 9541 SNPs on chromosome 1, random subsampling was performed, reducing the number of SNPs from 9000 to 1500 in 6 steps of 1500. In each step, 10 subsets were randomly sampled (without replacement). The number of SNPs that were polymorphic is depicted in the graph. At 4500 SNPs, WES and cHD had an equal number of informative tagSNPs (WES: 3310 SNPs, cHD: 3365 SNPs). Lines are obtained with LOWESS (locally weighed scatterplot smoothing).
Figure 4Power and distance between causal and tagSNPs for the exome-1.0 and canine high-density 170k array. (a) Boxplots showing the power to detect the association when a signal is located inside the target regions and outside the target regions. (b) Boxplots showing distance between the most significant SNP and the causal SNP when the signal is located inside or outside the target regions, respectively. (c,d) Boxplots showing power and distance to detect a non-exonic signal inside WES bins with a high informative SNP density (corresponding to the 85th percentile or higher, threshold: ≥48 SNPs/Mb) and a low informative SNP density (corresponding to at most the 15th percentile, threshold: ≤4 SNPs/Mb). (e) Boxplots showing power to detect a signal in long intergenic non-coding RNAs (lincRNAs). (f) Effect of sample size reduction on power to detect a monogenic recessive trait. Subsampling was performed stepwise, from 14 down to 6 samples and for each step, at least 20% of all possible permutations of samples were performed. The bottom and top of the boxplot represent the first (Q1) and third quartile (Q3), while the horizontal line in the boxplot represents the median. Whiskers represent 1.5 times the interquartile range (Q3-Q1).
Characteristics of the closest significant SNP for each method and each phenotype. For each method, the SNP location, its p-value and rank are provided for the tagSNP that is closest to the causal mutation. The columns labelled “Genotype distribution of cases” and “Genotype distribution of controls” detail the number of times a specific genotype (AA, AB or BB for di-allelic markers with alleles A and B) occurred for cases and controls, respectively. Whereas EG-GWAS was each time closer to the causal mutation, its result was less significant for furnishing because that tagSNP was only called in 8 out of 16 cases. This higher variability in call rate was expected based on earlier reports[5,10] (see methods). Nevertheless, for EG-GWAS, the genotypes of cases and controls are perfectly separated whereas for cHD, there is an overlap of 1 sample for furnishing and 5 samples for hair length. Manhattan plots are presented in Suppl. Fig. S4.
| Phenotype | Technique | SNP | P-value | Genotype distribution of cases (AA/AB/BB) | Genotype distribution of controls (AA/AB/BB) | Distance | Rank |
|---|---|---|---|---|---|---|---|
| Hair length | EG-GWAS | chr32:4509367 | 1.87e-06 | 6/0/0 | 0/1/17 | 0 | 1 |
| cHD | chr32:4299533 | 0.002 | 4/2/0 | 0/3/19 | 200 | 32 | |
| Furnishing | EG-GWAS | chr13:8611728 | 1.858e-05 | 8/0/0 | 0/0/12 | 1 | 3 |
| cHD | chr13:8635445 | 4.601e-07 | 1/0/15 | 12/0/0 | 25 | 1 |
EG-GWAS = Exome-guided genome-wide association studies.
cHD = canine High-Density SNP array.
P-value = Bonferroni corrected p-value of closest significant SNP.
Distance = Distance between closest significant SNP and causal variant (in kb).
Rank = position of the SNP when all SNPs within a method are ranked from most significant to least significant.