| Literature DB >> 35634920 |
Jackson Peter1, Anne Friedrich1, Gianni Liti2, Joseph Schacherer1,3.
Abstract
With the advent of high throughput sequencing technologies, genome-wide association studies (GWAS) have become a powerful paradigm for dissecting the genetic origins of the observed phenotypic variation. We recently completely sequenced the genome of 1011 Saccharomyces cerevisiae isolates, laying a strong foundation for GWAS. To assess the feasibility and the limits of this approach, we performed extensive simulations using five selected subpopulations as well as the total set of 1011 genomes. We measured the ability to detect the causal genetic variants involved in Mendelian and more complex traits using a linear mixed model approach. The results showed that population structure is well accounted for and is not the main problem when the sample size is high enough. While the genetic determinant of a Mendelian trait is easily mapped in all studied subpopulations, discrepancies are seen between datasets when performing GWAS on a complex trait in terms of detection, false positive and false negative rate. Finally, we performed GWAS on the different defined subpopulations using a real quantitative trait (resistance to copper sulfate) and showed the feasibility of this approach. The performance of each dataset depends simultaneously on several factors such as sample size, relatedness and population evolutionary history. This article is part of the theme issue 'Genetic basis of adaptation and speciation: from loci to causative mutations'.Entities:
Keywords: complex trait; genome-wide association study; population genomics; variant mapping; yeast
Mesh:
Year: 2022 PMID: 35634920 PMCID: PMC9149792 DOI: 10.1098/rstb.2020.0514
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.671
Datasets description.
| datasets | individuals | polymorphic positions | singletons | biallelic polymorphic positions, no missing | biallelic polymorphic positions, no missing, MAF > 5% | |
|---|---|---|---|---|---|---|
| 1011 strains | 1011 | 0.0044 | 1 625 809 | 509 011 | 1 346 007 | 82 869 |
| mixed origins | 71 | 0.0032 | 142 093 | 3959 | 97 690 | 81 030 |
| mosaic region 3 | 113 | 0.0042 | 496 841 | 174 079 | 365 433 | 72 807 |
| sake | 47 | 0.0008 | 100 257 | 14 548 | 84 197 | 21 489 |
| sampled diversity | 133 | 0.0049 | 935 060 | 506 761 | 720 709 | 66 299 |
| European wine | 323 | 0.001 | 284 342 | 105 123 | 218 789 | 14 164 |
Figure 1Overview of the six datasets used in this study. (a) Phylogenetic relationships between the 1011 S. cerevisiae isolates, illustrated by a neighbour-joining tree constructed with all biallelic SNPs in the population [4]. Branches of the four phylogenetic clusters are highlighted with different colours while the isolates from the sampled diversity dataset are designed with a blue circle. (b) Distribution of the minor allele frequency of the polymorphic positions within the six considered datasets.
Figure 2Mapping of Mendelian traits. (a) Distribution of the false positive rate observed across the subpopulations for the GWAS performed on 1000 simulated Mendelian traits. A Kruskal-Wallis test indicates that the samples do not share the same distribution. We also tested whether the sake distribution was significantly different compared to all the other datasets using a Mann-Whitney-Wilcoxon test (**p-value < 2 × 10−8; ***p-value < 2 × 10−16). (b) Minor allele frequency distribution of the false negative, false positive and true positive variants detected by GWAS with the sake subpopulation, for which a bias toward variants with MAF = 0.49 is observed.
Figure 3Mapping of complex traits. (a) Distribution of the minor allele frequency (MAF) of the false negative (FN) and true positive (TP) variants detected by GWAS among the 1000 simulated complex traits across the subpopulations. (b) Distribution of the absolute effect size of the FN and TP variants detected by GWAS among the 1000 simulated complex traits across the subpopulations.