Elizabeth K Engle1, Justin C Fay2. 1. Molecular Genetics and Genomics Program, Washington University, St. Louis, Missouri 63108. 2. Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University, St. Louis, Missouri 63108 jfay@genetics.wustl.edu.
Abstract
Estimates of the fraction of nucleotide substitutions driven by positive selection vary widely across different species. Accounting for different estimates of positive selection has been difficult, in part because selection on polymorphism within a species is known to obscure a signal of positive selection among species. While methods have been developed to control for the confounding effects of negative selection against deleterious polymorphism, the impact of balancing selection on estimates of positive selection has not been assessed. In Saccharomyces cerevisiae, there is no signal of positive selection within protein coding sequences as the ratio of nonsynonymous to synonymous polymorphism is higher than that of divergence. To investigate the impact of balancing selection on estimates of positive selection, we examined five genes with high rates of nonsynonymous polymorphism in S. cerevisiae relative to divergence from S. paradoxus One of the genes, the high-affinity zinc transporter ZRT1 showed an elevated rate of synonymous polymorphism indicative of balancing selection. The high rate of synonymous polymorphism coincided with nonsynonymous divergence among three haplotype groups, among which we found no detectable differences in ZRT1 function. Our results implicate balancing selection in one of five genes exhibiting a large excess of nonsynonymous polymorphism in yeast. We conclude that balancing selection is a potentially important factor in estimating the frequency of positive selection across the yeast genome.
Estimates of the fraction of nucleotide substitutions driven by positive selection vary widely across different species. Accounting for different estimates of positive selection has been difficult, in part because selection on polymorphism within a species is known to obscure a signal of positive selection among species. While methods have been developed to control for the confounding effects of negative selection against deleterious polymorphism, the impact of balancing selection on estimates of positive selection has not been assessed. In Saccharomyces cerevisiae, there is no signal of positive selection within protein coding sequences as the ratio of nonsynonymous to synonymous polymorphism is higher than that of divergence. To investigate the impact of balancing selection on estimates of positive selection, we examined five genes with high rates of nonsynonymous polymorphism in S. cerevisiae relative to divergence from S. paradoxus One of the genes, the high-affinity zinc transporter ZRT1 showed an elevated rate of synonymous polymorphism indicative of balancing selection. The high rate of synonymous polymorphism coincided with nonsynonymous divergence among three haplotype groups, among which we found no detectable differences in ZRT1 function. Our results implicate balancing selection in one of five genes exhibiting a large excess of nonsynonymous polymorphism in yeast. We conclude that balancing selection is a potentially important factor in estimating the frequency of positive selection across the yeast genome.
The frequency of adaptive substitutions driven by positive selection is central to our understanding of molecular evolution and divergence among species. The neutral theory assumes that most substitutions are effectively neutral and generates predictions that can be tested based on patterns of molecular evolution (Fay and Wu 2003). While many individual genes have been found to deviate from neutral patterns of evolution, the overall impact of positive selection across the genome remains a contentious issue (Hahn 2008; Sella ; Nei ; Fay 2011).Genome-wide comparisons of polymorphism vs. divergence have been the primary means of estimating the frequency of positive selection among species. The McDonald-Kreitman (MK) test (Mcdonald and Kreitman 1991) has been used to estimate the frequency of positive selection within protein coding sequences based on an elevated ratio of nonsynonymous to synonymous divergence relative to that of polymorphism. However, applications of the MK test to plant, animal, and microbial genomes have revealed substantial differences in estimates of positive selection among species, ranging from zero to over half of all amino acid substitutions (Fay 2011). While the frequency of positive selection may differ due to a species’ effective population size and species-specific selective pressures (Bachtrog 2008; Gossmann ; Siol ; Slotte ; Gossmann ), estimating the frequency of positive selection during divergence among species depends on controlling for the effects of selection on polymorphism within species (Fay and Wu 2001; Bierne and Eyre-Walker 2004; Hughes ).Estimates of the frequency of positive selection can be influenced by a number of factors that can make it difficult to detect adaptation when it is present. Slightly deleterious polymorphisms segregate at low frequencies due to weak negative selection and can increase the nonsynonymous-to-synonymous polymorphism ratio to a greater extent than that of divergence. As a consequence, deleterious polymorphism can obscure evidence of positive selection (Fay ; Bierne and Eyre-Walker 2004; Charlesworth and Eyre-Walker 2008; Eyre-Walker and Keightley 2009). Methods have been developed to account for the effects of low-frequency deleterious polymorphism, but, even so, there are still some species with little or no evidence of positive selection (Fay 2011).A number of other factors can influence the detection of positive selection through their effects on slightly deleterious polymorphism. Such factors include mating system as well as population size and structure. For example, a decrease in population size can increase the abundance of slightly deleterious polymorphism within a species and obscure evidence of positive selection among species (Eyre-Walker 2002). In humans, there is little evidence for an excess of nonsynonymous divergence, yet it has been estimated that up to 40% of amino acid substitutions could have been driven by positive selection without being detected (Eyre-Walker and Keightley 2009). Controlling for these additional factors is often difficult as it requires specific knowledge of the species being examined and its population history.Another factor that has received less attention but can also influence estimates of positive selection is balancing selection (Wright and Andolfatto 2008). The maintenance of multiple nonsynonymous polymorphisms within a species by balancing selection could increase the genome-wide ratio of nonsynonymous to synonymous polymorphism within a species above the ratio of nonsynonymous-to-synonymous divergence among species. Elevated rates of nonsynonymous polymorphism may also occur because of local adaptation (Charlesworth ). If an appreciable number of genes are involved in adaptive divergence between different populations of the same species, genome-wide estimates of the frequency of positive selection among species could be substantially underestimated.The yeastSaccharomyces cerevisiae is one species with little or no evidence of positive selection based on the MK test (Doniger ; Liti ; Elyashiv ). In contrast to other species that lack evidence of positive selection (Foxe ; Gossmann ; Gossmann ), its large effective population size ensures the efficient removal of weakly deleterious mutations and the ability to fix weakly advantageous mutations. However, S. cerevisiae also exhibits strong population structure, potentially facilitated by its low rate of outcrossing, i.e., mating between unrelated parents (Ruderfer ), low rate of migration, or local adaptation to the diverse array of environments from which it has been isolated (Fay and Benavides 2005). Genome-wide patterns of population structure have revealed a number of genetically differentiated groups, including strains originating from sake in Japan, vineyards in Europe, and oak trees in North America (Liti ; Schacherer ). While these groups may have arisen as a result of geographic barriers, they might also have arisen as a consequence of domestication or adaptation to human-modified environments (Fay and Benavides 2005). However, even when these groups are taken into consideration and examined separately, the ratios of nonsynonymous to synonymous polymorphism within or between groups are higher than the ratio of nonsynonymous to synonymous divergence among species (Elyashiv ).In this study, we tested the hypothesis that genes with a large excess of nonsynonymous polymorphism are underbalancing selection. We reasoned that such genes have a disproportionate effect on estimates of positive selection and should be considered separately if underbalancing selection. We examined five genes that were previously shown to contain a large excess of nonsynonymous to synonymous polymorphism (Doniger ; Liti ). To distinguish between purifying selection and balancing selection on nonsynonymous polymorphism, we examined rates of synonymous polymorphism as negative selection is expected to decrease linked neutral variation, whereas balancing selection is expected to increase linked neutral variation (Charlesworth ). We found one of the genes, , showed a significantly elevated rate of synonymous polymorphism based on the Hudson-Kreitman-Aguade (HKA) test (Hudson ), consistent with balancing selection. Our results show that a large number of amino acid polymorphisms can occur at certain loci, underbalancing selection.
Materials and Methods
Polymorphism and divergence data
Data were collected for five genes that were previously found to exhibit an excess of nonsynonymous polymorphism in two studies (Doniger ; Liti ) and 30 randomly selected control genes, using 36 S. cerevisiae strains with genome sequence data (Supporting Information, Table S1). Twenty-seven of the genome sequences were accessed through a BLAST server (www.moseslab.csb.utoronto.ca/sgrp/), and the other 9 were accessed through the Saccharomyces Genome Database (www.yeastgenome.org). For each gene, sequences homologous to the coding region of the reference genome (S288C) were aligned using Clustal X version 2.0 software (Larkin ). Strains with sequences that were <99% and <90% of the S288C sequence length for the neutral and selected genes, respectively, were removed. The cutoff of <90% was used to accommodate , for which few strains had BLAST hits covering the entire 9.2 kb of coding sequence present in the reference genome. Strains were removed from a gene analysis if a polymorphism led to an internal stop codon (4 cases), while unique single-base insertions were considered sequencing errors and the base was removed from the sequence (10 cases). A small number of heterozygous sites were present within the strains Vin13, VL3, and LalvinAQ23. At these sites, we randomly selected one of the two observed nucleotides to represent the position. Divergence was measured by comparison to the CBS432 strain of Saccharomyces paradoxus.The final dataset included an average of 29.9 strain alleles per gene, ranging from 23 to 36, and the five-gene set included an average of 21.4 strain alleles per gene, ranging from 18 to 24. Eight of the control genes were removed from analysis because they had few strains with sufficient sequence coverage or multiple strains with frameshifts. One gene, RPS28B, was removed because it showed evidence of introgression between species (Doniger ). The final dataset is available in File S1.
Population genetic analysis
MK tests were conducted using the number of nonsynonymous and synonymous polymorphic sites and fixed differences calculated using DnaSP version 5.10.01 software (Librado and Rozas 2009). The weighted neutrality index (Stoletzki and Eyre-Walker 2011) was estimated by the equation:where P and D are the number of polymorphic sites and fixed differences, respectively; subscripts s and n indicate synonymous and nonsynonymous changes, respectively; and i indicates the ith gene.HKA tests were conducted using maximum likelihood HKA test (MLHKA version 2.0) software (Wright and Charlesworth 2004) with rates of synonymous polymorphism and divergence obtained from DnaSP (Table S2). Each of the five genes found to be significant by the MK test were compared to 21 control genes, covering 22,974 sites, using the MLHKA test. The program was run with a chain length of 100,000 interactions for all analyses.For analysis of the region surrounding , from hrough (∼11 kb), we downloaded S. cerevisiae and S. paradoxus strain sequences from the Saccharomyces Genome Resequencing Project (Liti ). A sliding window analysis of polymorphism and divergence was calculated with DnaSP using 36 S. cerevisiae strains and 1 S. paradoxus strain, CBS432, with gaps in the alignment excluded. Because of difficulty in aligning the 4.67-kb noncoding region between and , we used only ∼200 bases downstream of and 800 bases upstream of , where we were confident of alignment.Bootstrapped neighbor-joining trees for and the concatenated control gene set were constructed using MEGA5 and pairwise gap removal (Tamura ).
Strain construction and phenotype analysis
was deleted in YJF186 (YPS163 background, Mat a, HO::dsdAMX4, ) by using the kanMX deletion cassette (Wach ). Three alleles were integrated into this strain at the locus by amplifying the entire gene region, including 878 bases of the 5′ noncoding region and the entire 195 bases of the 3′ noncoding region, as well as 186 bases of the 3′ gene , using primers with homology to pRS306, and transforming the product along with the yeast integrative plasmid pRS306 (Sikorski and Hieter 1989). Integration of these constructs at the locus was achieved by selection on plates lacking uracil, and each transformant was confirmed by PCR. The alleles were found to have between 1 and 3 mutations. However, most alleles had only single synonymous changes or changes within the 5′ or 3′ regions, and no mutations were shared among the alleles, including the replicated transformants. These mutations were considered not functional because of the lack of any phenotypic effects. Wild-type (YJF186) and deletion strains were integrated with the empty plasmid pRS306 as a control.Experiments comparing growth under low zinc conditions were conducted using low zinc media (LZM) composed of 0.17% yeastnitrogen base without amino acids, (NH)2SO, or zinc (MP Biomedicals); 0.5% (NH4)2SO4; 20 mM trisodium citrate, pH 4.2; 2% glucose; 1 mM Na2EDTA; 25 µM MnCl2; and 10 µM FeCl3, as previously described (Gitan ; Gitan ). Strains were grown overnight in LZM, washed, diluted to a starting optical density (OD) of 0.05 (absorbance at 600 nm) in fresh LZM with 0.1 mM of ZnCl2, and then grown for 20 hr in a plate reader at 30°C with shaking at 1200 rpm (iEMS model 1400; Thermo Lab Systems, Helsinki, Finland). For each strain, the maximum OD was determined after normalization to the initial cell concentration. For each construct and controls, 3–9 independent transformants were phenotyped.Rates of fermentation were measured using grape juice. Strains were grown overnight in Reserve Chardonnay grape juice (Winexpert, Port Coquitiam, BC, Canada), washed, and diluted to a starting OD of 0.1 in fresh grape juice or grape juice with metal chelators (20 mM trisodium citrate, pH 4.2, and 1 mM Na2EDTA). Fermentation was conducted in 250-ml flasks sealed with airlocks and incubated at room temperature, out of direct sunlight without shaking. Flasks were weighed daily to determine CO2 loss and shaken once daily, immediately following measurement. Four independent transformants were examined for each construct.
Results
Identification of genes exhibiting an excess of amino acid polymorphism
From two previous independent genome-wide screens based on the McDonald-Kreitman (MK) test, we identified five genes that were significant (P < 0.001) in both studies (Doniger ; Liti ). The five genes were , , , , and . All of the genes showed a ratio of nonsynonymous to synonymous polymorphism (P/P) that was more than twofold greater than that of divergence (D/D). We repeated the MK test using a strain set composed of 36 strains for which genome sequences are publicly available (see Materials and Methods and Table S1), of which 29 were not included in previous screens. All five genes retained significance according to the MK test results (P < 0.05, Bonferroni corrected) (Table 1). As a control, we randomly selected 21 genes from those not significant in either previous study. Only two of the genes were significant according to the MK test (P < 0.05, Bonferroni corrected) (Table S2), both were characterized as having a P/P ratio greater than that of D/D.
Table 1
McDonald-Kreitman (MK) test results
Gene(s)
Number of Sites
Pna
Psa
Dnb
Dsb
MK P Value
Neutrality Index
IRA2
9117
69
108
131
735
0.000000
3.6
OPT2
2436
35
20
18
187
0.000000
18.2
PEP1
4293
48
45
168
320
0.002240
2.0
SAS10
1680
12
7
41
120
0.002186
5.0
ZRT1
1113
55
45
12
62
0.000000
6.3
5 Genesc
18,639
219
225
370
1424
3.8
21 Genesc
22,815
158
260
711
1623
1.3
P and P are the number of nonsynonymous (P) and synonymous (P) polymorphic sites. and D are the number of nonsynonymous (D) and synonymous (D) fixed differences.The 5 genes and 21 control genes are described in the text.
P and P are the number of nonsynonymous (P) and synonymous (P) polymorphic sites. and D are the number of nonsynonymous (D) and synonymous (D) fixed differences.The 5 genes and 21 control genes are described in the text.The two gene sets exhibited marked differences in their neutrality indexes (Table 1). The weighted neutrality index (Stoletzki and Eyre-Walker 2011) is 3.80 for the five selected genes and 1.33 for the 21 control genes. The high neutrality index of the five genes, indicating an excess of nonsynonymous polymorphism, is unlikely a consequence of selective pressure on synonymous sites as average codon bias is similar between the two groups, and the five genes have nearly equal numbers of changes to preferred and unpreferred codons for both polymorphism, 63 and 58 respectively, and divergence, 414 and 397, respectively. Thus, the five-gene set is highly enriched for genes with an excess of nonsynonymous polymorphism relative to that of divergence.
Balancing selection in ZRT1
A high ratio of nonsynonymous to synonymous polymorphism can result from slightly deleterious nonsynonymous mutations that contribute to polymorphism but not divergence or from a recent loss of functional constraint. In either scenario, the rate of synonymous polymorphism should not be affected. Alternatively, a high rate of nonsynonymous polymorphism can result from balancing selection on multiple nonsynonymous alleles. If balancing selection is responsible for the elevated rate of nonsynonymous polymorphism, rates of linked synonymous polymorphism should also be elevated (Charlesworth ).To test whether the rate of synonymous polymorphism was elevated in any of the five genes, we used the HKA test (Hudson ). Using a maximum likelihood MLHKA test, we found only showed a significantly elevated rate of synonymous polymorphism in comparison to that of the control gene set (Table S3). Figure 1 shows that of all the genes we tested, was characterized by an exceptionally high rate of synonymous polymorphism. One gene in the control gene set, , appeared to be an outlier characterized by both a high rate of synonymous polymorphism and a low rate of synonymous divergence. is nominally significant for an excess synonymous polymorphism relative to the remaining neutral gene set, by MLHKA test results (P = 0.0132). However, removal of from the neutral gene set only increased the significance of . Thus, of the five genes, only exhibits evidence of balancing selection.
Figure 1
Synonymous polymorphism vs. divergence. Synonymous nucleotide diversity (πsyn) vs. synonymous nucleotide divergence (Ksyn) is shown for the five selected genes (red), the 21 control genes (black), and the three genes neighboring ZRT1 (blue). Nucleotide diversity was measured by the average number of pairwise differences among strains of S. cerevisiae, and nucleotide divergence was measured by differences between S. cerevisiae and S. paradoxus.
Synonymous polymorphism vs. divergence. Synonymous nucleotide diversity (πsyn) vs. synonymous nucleotide divergence (Ksyn) is shown for the five selected genes (red), the 21 control genes (black), and the three genes neighboring ZRT1 (blue). Nucleotide diversity was measured by the average number of pairwise differences among strains of S. cerevisiae, and nucleotide divergence was measured by differences between S. cerevisiae and S. paradoxus.Balancing selection of is also supported by patterns of nonsynonymous and synonymous divergence among strains. A neighbor-joining tree of shows that all of the alleles, except for the EC1118 allele, cluster into three groups distinguished by multiple nonsynonymous and synonymous differences (Figure 2). With the exception of the wine strain group, the groups showed no close correspondence to the source from which each strain was obtained or to a neighbor-joining tree generated from the concatenated control gene set (Figure 2 and Figure S1). Of the 111 polymorphic sites used to generate the tree, 86 can be placed on a single branch without homoplasious traits, of which 24 nonsynonymous and 17 synonymous changes occurred on one of the four main internal branches (Figure 2). In comparison, 62 synonymous but only 12 nonsynonymous differences separate S. cerevisiae from S. paradoxus. Thus, the high ratio of nonsynonymous to synonymous polymorphism is not limited to external branches, as would be expected to occur if most nonsynonymous polymorphisms were deleterious.
Figure 2
Neighbor-joining tree of ZRT1 is shown along with bootstrap values greater than 90% (gray). S. cerevisiae st rains are color coded by class (see color key). The position of the branch leading to S. paradoxus (dashed line) is not drawn to scale. The number of nonsynonymous/synonymous/complex (two changes within a codon) changes unique to each of the four main lineages are listed along the respective branches. Inset (at right) shows the unrooted neighbor-joining tree of the concatenated 21-control gene set drawn to the same scale and for the same strains as the ZRT1 tree.
Neighbor-joining tree of ZRT1 is shown along with bootstrap values greater than 90% (gray). S. cerevisiae st rains are color coded by class (see color key). The position of the branch leading to S. paradoxus (dashed line) is not drawn to scale. The number of nonsynonymous/synonymous/complex (two changes within a codon) changes unique to each of the four main lineages are listed along the respective branches. Inset (at right) shows the unrooted neighbor-joining tree of the concatenated 21-control gene set drawn to the same scale and for the same strains as the ZRT1 tree.The presence of intermediate frequency alleles, many of which contribute to the unique grouping of alleles, also supports balancing selection. For synonymous sites, results of Tajima’s D test were positive across all strains (D = 0.718, P > 0.10) but negative within each of the three strain groups (M22 group D = −1.265; YPS163 group D = −0.59894; and S288C group D = −0.61182; all P > 0.10). In comparison, the average D result of the control gene set was −0.294, and only four of the genes had positive D values greater than 0.3. These results further highlight the unique pattern of variation present in .
Regional variation around ZRT1
The elevated rate of synonymous polymorphism in could be a consequence of balancing selection on amino acid polymorphism but also could be caused by selection on its promoter or on adjacent genes. To determine whether the signal of balancing selection extends into adjacent genes and gene regions, we applied the MLHKA test to and , the two genes adjacent to . Only was significant in comparison to the control gene set (MLHKA test, P = 0.0007) (Table S3) and was characterized by both high rates of polymorphism and also low rates of divergence at synonymous sites (Figure 1). Hence, we also tested , the next gene adjacent to , and found no significant departure from neutrality as measured by the MLHKA test. Additionally, none of the three adjacent genes that were examined showed a significant excess of amino acid polymorphism as measured by the MK test (Table S2).To more precisely track the signal of balancing selection within and around , we used a sliding window analysis of polymorphism to divergence, including both coding and noncoding regions. Figure 3 shows that the highest rate of polymorphism occurred within the coding region of and extended into its 5′ noncoding region. The overall rate of polymorphism is much lower in the two adjacent genes and . In , the rate of divergence is also quite low and likely contributes to the significance of the MLHKA test. Interestingly, a portion of had a very low rate of polymorphism, whereas its more distal portion had another peak of polymorphism. Based on the sliding window analysis and the MLHKA test results, the signature of balancing selection appears to be concentrated at the locus.
Figure 3
Sliding window analysis of polymorphism and divergence within and around ZRT1. The sliding window plot includes ZRT1 and three neighboring genes, with their positions and orientations indicated below the graph. Polymorphism (solid line) and divergence from S.paradoxus (dashed line) are shown for a window size of 200 bp and step size of 50 bp. A break is shown between ADH4 and ZRT1 where ~3600 intergenic bases were excluded because of uncertainty in the alignment with S. paradoxus. The average nucleotide diversity of synonymoussites for the control gene set is indicated by the gray horizontal line.
Sliding window analysis of polymorphism and divergence within and around ZRT1. The sliding window plot includes ZRT1 and three neighboring genes, with their positions and orientations indicated below the graph. Polymorphism (solid line) and divergence from S.paradoxus (dashed line) are shown for a window size of 200 bp and step size of 50 bp. A break is shown between ADH4 and ZRT1 where ~3600 intergenic bases were excluded because of uncertainty in the alignment with S. paradoxus. The average nucleotide diversity of synonymoussites for the control gene set is indicated by the gray horizontal line.We next examined the degree to which polymorphism within is independent of polymorphism within adjacent genes. There is ample evidence for recombination within and around . Across the entire region, from through , there have been a minimum of 26 recombination events based on the four-gamete test (Hudson and Kaplan 1985). As expected in the presence of recombination, the genealogies of and differed from that of (Figure S2), although all three genes showed a similar grouping of wine strains. was the most similar to but had less divergence. As measured by the HKA test, showed significantly elevated rates of polymorphism compared to (P = 0.0087) but not or (P > 0.05).
ZRT1 alleles confer no detectable phenotype differences
If selection has acted on , then different alleles of should confer different phenotypes. is a high-affinity zinc transporter that is activated only when zinc levels are very low and facilitates growth under limiting zinc conditions (Zhao and Eide 1996). We compared the effects of three alleles integrated into a strain in which the endogenous gene was deleted. The three alleles were from the S288C (laboratory), M22 (wine), and YPS163 (nature) strains and were selected as representatives from the three major groups of strains (Figure 2). While deletion of resulted in a significant growth defect in zinc-limiting conditions and each of the three alleles rescued the growth deficiency, we found no significant difference among the three alleles for maximum growth (Figure 4) or growth rate (not shown).
Figure 4
Effects of strain-specific ZRT1 alleles on growth under low-zinc conditions. The maximum cell density under low-zinc conditions is shown for YPS163 with an unmodified ZRT1 allele (WildType), the same strain with a deletion of ZRT1 (ZRT1 deletion), and three ZRT1 alleles integrated into the ZRT1 deletion strain (YPS163, S288C, and M22). The three integrated alleles represent alleles from the three major strain groupings based on ZRT1. Error bars show the 95% confidence interval of the mean.
Effects of strain-specific ZRT1 alleles on growth under low-zinc conditions. The maximum cell density under low-zinc conditions is shown for YPS163 with an unmodified ZRT1 allele (WildType), the same strain with a deletion of ZRT1 (ZRT1 deletion), and three ZRT1 alleles integrated into the ZRT1 deletion strain (YPS163, S288C, and M22). The three integrated alleles represent alleles from the three major strain groupings based on ZRT1. Error bars show the 95% confidence interval of the mean.In addition to its requirements for growth, zinc is an essential cofactor for many enzymes, including alcohol dehydrogenase, and has been shown to influence rates of fermentation (De Nicola and Walker 2011). To test whether the alleles affect rates of fermentation, we measured CO2 release during fermentation of grape juice into wine. In the presence of metal chelators, deletion of had a dramatic effect on the rate of fermentation, but no differences were found among the three alleles tested (analysis of variance [ANOVA], P > 0.05) (Figure 5). No differences in rates of fermentation were found among any of the four strains in grape juice without chelators at any of the time points (ANOVA, P > 0.05).
Figure 5
Effects of strain-specific ZRT1 alleles on fermentation rate. Fermentation rate, measured by CO2 release (grams per hour), for strains grown in grape juice containing metal chelators. All strains have ZRT1 deleted, and three have either an S288C (orange), YPS163 (yellow), or M22 (green) allele of ZRT1 inserted at the URA3 locus. Lines show the average of four replicates, each from an independent transformation. Standard deviations are not shown for clarity and average between 0.0028 and 0.0056 for the four strains.
Effects of strain-specific ZRT1 alleles on fermentation rate. Fermentation rate, measured by CO2 release (grams per hour), for strains grown in grape juice containing metal chelators. All strains have ZRT1 deleted, and three have either an S288C (orange), YPS163 (yellow), or M22 (green) allele of ZRT1 inserted at the URA3 locus. Lines show the average of four replicates, each from an independent transformation. Standard deviations are not shown for clarity and average between 0.0028 and 0.0056 for the four strains.The ability of each allele to rescue the deletion phenotypes indicates that none of the 38 amino acid polymorphisms that distinguish these three alleles caused a substantial loss of function as measured by growth or fermentation rate under the conditions assayed.
Discussion
Application of the MK test to a variety of species has revealed substantial differences in the estimated frequency of positive selection on protein coding sequences (Fay 2011). While differences in effective population size are capable of explaining some of the differences among species (Eyre-Walker and Keightley 2009; Gossmann ; Halligan ; Siol ; Slotte ; Gossmann ), a small effective population cannot explain the absence of evidence for positive selection in yeast. In this study, we examined whether balancing selection can explain the high rate of nonsynonymous polymorphism observed in a small set of genes exhibiting a disproportionately large excess of nonsynonymous polymorphism, as this could obscure evidence of positive selection in S. cerevisiae. We showed that one of the five genes tested exhibited a significantly elevated rate of synonymous polymorphism, indicative of balancing selection. While patterns of polymorphism and divergence around suggest that nonsynonymous polymorphism within itself is the most likely target of balancing selection, we found no functional differences among three alleles, using two different phenotype assays. Our results illustrate how balancing selection might obscure a signal of positive selection.Evidence for balancing selection on is based on an elevated rate of synonymous polymorphism as measured by the HKA test (Figure 1 and Table S3), a high ratio of polymorphism to divergence that is centered on (Figure 3), an increased frequency of intermediate frequency alleles, and the coincidence of multiple synonymous and nonsynonymous changes that distinguish three groups of strains (Figure 2). However, it is worth noting that balancing selection in the general sense (i.e., selective maintenance of distinct alleles) can result from temporal or spatial variation in selection coefficients as well as heterozygote advantage. Local adaptation, which can result from spatial variation in selection coefficients, also provides an explanation for the presence of multiple nonsynonymous differences among alleles. Yet, our results are not able to distinguish between these different forms of selection, but rather distinguish them from patterns that can be explained by population structure, loss of selective constraint, and selection on adjacent genes.In S. cerevisiae, there is extensive population structure related to both geographic origin and the ecological source from which each strain was isolated (Fay and Benavides 2005; Liti ; Schacherer ), which are frequently correlated with one another. Such groups include sake strains from Japan, oak tree strains from North America, and strains isolated from Europe or vineyards. The neighbor-joining tree of the 21 control genes generally recapitulates these previously defined groups. While the tree bears some resemblance to that of the control gene set, particularly the vineyard group, the three main groups of strains that differ at are not obviously related by either their geographic origin or the ecological source from which they were isolated. More importantly, population structure by itself does not explain the elevated rates of polymorphism at relative to the 21-control-gene set. In support of selection acting at , we observed negative Tajima’s D values for most of the control gene set but a positive Tajima’s D value at , consistent with balancing selection.Loss of functional constraint or weak negative selection is another explanation for an excess of nonsynonymous polymorphism as measured by the MK test. Genome-wide estimates in yeast suggest that much of the nonsynonymous polymorphism may be weakly deleterious (Elyashiv ). In the case of , we cannot exclude the possibility that some of the nonsynonymous polymorphisms are neutral or slightly deleterious. However, two lines of evidence indicate that at least some of the nonsynonymous changes within have been underbalancing selection. First, many neutral and most deleterious polymorphisms are expected to be rare and present only in a small number of strains. While 78% of nonsynonymous alleles are at less than 10% frequency in the 21-control-genee set, only 38% of nonsynonymous polymorphism are at less than 10% frequency in . In addition, of the nonsynonymous changes that are specific to one or more lineages, 55% are positioned along the four internal branches that distinguish the three major groups of alleles. Second, neither loss of constraint or negative selection on nonsynonymous polymorphism should increase variation at linked synonymous sites.While patterns of variation within and around indicated that it is the most likely target of balancing selection, selection on linked sites could have influenced observed patterns of variation at . Patterns of polymorphism within , the adjacent gene, indicated no excess of synonymous or nonsynonymous polymorphism. However, may have experienced a recent selective sweep as there is evidence of positive selection during S. cerevisiae and S. paradoxus divergence, both within its coding region and within the intergenic region between and (Sawyer ; Engle and Fay 2012). Patterns of polymorphism within the adjacent genes and are more complex. Both genes show rates of synonymous polymorphism that are higher than those of the 21-control-genee set, except for the outlier gene . shows regions with high and low polymorphism levels, but the region closest to has the lower rate of polymorphism (Figure 3). shows a significantly elevated rate of synonymous polymorphism relative to divergence by the HKA test. Yet, in comparison to other regions (Figure 3) the significance of appears to be a partial consequence of the low rate of synonymous divergence. These observations combined with the ample evidence for recombination within the region indicate that while sites within and may have also been under selection, selection on linked sites in adjacent regions are unlikely to be solely responsible for the high rate of synonymous polymorphism at .Relevant to the possibility of selection on adjacent genes, there are functional links between and . and are both activated by in zinc limiting conditions (Lyons ), and is an alcohol dehydrogenase that may help conserve zinc or work more efficiently under zinc limiting conditions (Bird ; De Nicola ), or during fermentation of sugars to ethanol (Zhao and Bai 2012). Interestingly, the closest homologs of outside of those present within the sensu stricto Saccharomyces species are from two distantly related species commonly found in wine fermentations, Lachancea thermotolerans and Zygosaccharomyces rouxii (Combina ; Romancino ), rather than other more closely related species, suggesting that may have been introgressed into the ancestral lineage of the Saccharomyces species. The subtelomeric physical location of in S. cerevisiae is consistent with other genes acquired by horizontal gene transfer (Hall ; Muller and Mccusker 2011) and genes likely to be involved with adaptations to specific environments (Brown ).
Phenotypic effects of ZRT1 alleles
We found that the three distinct alleles conferred no detectable phenotypic differences from one another. Although the alleles were not integrated at the endogenous locus, each was able to fully rescue the deletion phenotype (Figure 4). This result indicates that under the conditions tested, none of the nonsynonymous differences among the three alleles caused a substantial loss of function. The lack of phenotypic differences among the different alleles implies that either the alleles are functionally equivalent to one another and so are not involved in balancing selection or that the lack of a discernible phenotype is a consequence of the conditions tested or an effect too small to be detected. For example, , as a metal transporter, could also influence fitness due to transport of other metals, such as cadmium (Gitan ; Gomes ; Gitan ).
Prevalence of balancing selection
Of the five genes that exhibited an excess of nonsynonymous polymorphism according to MK test results, only showed evidence of balancing selection. The excess of nonsynonymous polymorphism in the other four genes is most likely a consequence of loss of functional constraint or slightly deleterious polymorphism. Interestingly, alleles of , a GTPase that negatively regulates RAS signaling, are responsible for numerous environment-specific differences in gene expression across the genome (Smith and Kruglyak 2008), and alleles of also have been shown to affect high temperature growth (Parts ). However, does not show an excess of synonymous polymorphism as measured by the HKA test.The prevalence of balancing selection across the entire yeast genome is more difficult to assess. The observation that rates of nonsynonymous and synonymous polymorphism are correlated with one another provides some evidence for the possibility of weak balancing selection throughout the yeast genome (Cutter and Moses 2011). However, genes with high rates of synonymous polymorphism do not show a tendency toward an excess of nonsynonymous polymorphism (Kendall’s tau = −0.15) (Figure 6) as predicted by the MK test using the data of Liti . The challenge to interpreting genome-wide evidence for balancing selection is that many cases of balancing selection may be difficult to detect. First, the effect of balancing selection on linked variation decreases as a function of the rate of recombination; nucleotide diversity is 1 + 1/4 Nr(1-F) relative to a neutral locus, where N is the effective population size and r is the rate of recombination and F is the inbreeding coefficient (Charlesworth ). Using a rate of recombination of 3.5 × 10−6/bp, a rate of outcrossing of 2 × 10−5/generation, and an effective population size of 1.6 × 107 cells (Ruderfer ), we expect diversity to be increased by a factor of 10 and 2, 13 bp and 113 bp from a site underbalancing selection, respectively. Gene conversion is expected to narrow this window even further (Andolfatto and Nordborg 1998). Second, balancing selection must act over many generations, on the order of the effective population size (Navarro ), to noticeably influence linked neutral variation. However, the ability to detect balancing selection may be increased if there are multiple selected sites at a single locus, which might be the case for genes identified by a high rate of nonsynonymous polymorphism. Thus, it is hard to rule out the possibility that balancing selection has inflated the rate of nonsynonymous polymorphism across many genes without generating a strong effect on linked synonymous sites.
Figure 6
No abundance of genes exhibiting high rates of synonymous polymorphism and an excess of nonsynonymous polymorphism. The rate of synonymous polymorphism, measured by the number of synonymous single nucleotide polymorphisms (SNPs) per codon compared to the observed (obs.) minus the expected (exp.) number of nonsynonymous SNPs (data are from Liti ). The expected number of nonsynonymous SNPs was derived from P − P × (D/D), where P and P are the number of nonsynonymous and synonymous SNPs, respectively, and D and D are the number of nonsynonymous and synonymous fixed differences, respectively. Genes with nonsynonymous polymorphism below −10 or above 10 are shown as points at the values of −10 or 10, respectively. The red point is ZRT1.
No abundance of genes exhibiting high rates of synonymous polymorphism and an excess of nonsynonymous polymorphism. The rate of synonymous polymorphism, measured by the number of synonymous single nucleotide polymorphisms (SNPs) per codon compared to the observed (obs.) minus the expected (exp.) number of nonsynonymous SNPs (data are from Liti ). The expected number of nonsynonymous SNPs was derived from P − P × (D/D), where P and P are the number of nonsynonymous and synonymous SNPs, respectively, and D and D are the number of nonsynonymous and synonymous fixed differences, respectively. Genes with nonsynonymous polymorphism below −10 or above 10 are shown as points at the values of −10 or 10, respectively. The red point is ZRT1.
Why is there little evidence of adaptive evolution within the yeast genome?
An important and persistent question in genome-wide estimates of adaptive evolution based on the MK test is why some species show high rates of adaptive evolution whereas others, such as yeast, do not. A small effective population size is one explanation as adaptive substitutions are expected to be more infrequent and deleterious polymorphism more common. This provides a reasonable explanation for the absence of signal in humans and many plant species (Eyre-Walker and Keightley 2009; Gossmann ; Halligan ; Siol ; Slotte ; Gossmann ). However, it does not explain the lack of signal in yeast, which has a large effective population size, on the order of 107 for S. paradoxus (Tsai ) and S. cerevisiae (Ruderfer ). The rate of outcrossing may also be relevant to detecting selection in yeast. Selfing helps purge recessive deleterious alleles but also limits recombination between different haplotypes. Despite the presence of selfing in yeast, S. cerevisiae exhibits an excess of rare nonsynonymous polymorphism indicative of deleterious alleles and a rapid decay in levels of linkage disequilibrium, an observation that can be attributed to its exceptionally high rate of recombination even if outcrossing is rare. Thus, there is no obvious aspect of S. cerevisiae diversity that distinguishes it from outcrossing species. Furthermore, both a selfing and an outcrossing species of Arabidopsis show no signal of adaptive evolution (Foxe ). As it stands, neither population size nor selfing provide a particularly compelling explanation for why yeast show little adaptive evolution based on the MK test.In the present study, we considered the possibility that balancing selection obscured patterns of positive selection in yeast. While we focused only on a small number of genes exhibiting a large excess of nonsynonymous polymorphism, we found one that exhibited evidence of balancing selection. Thus, while our results highlight the need to consider balancing selection in estimating the frequency of positive selection in yeast, it remains difficult to assess the impact of this consideration. We conclude that balancing selection is a potentially important factor in estimating the frequency of positive selection in yeast. While not emphasized here, it is also important to consider whether adaptive evolution is rare and estimates of positive selection are inflated in other species (Fay 2011).
Authors: Raffaele De Nicola; Lucie A Hazelwood; Erik A F De Hulster; Michael C Walsh; Theo A Knijnenburg; Marcel J T Reinders; Graeme M Walker; Jack T Pronk; Jean-Marc Daran; Pascale Daran-Lapujade Journal: Appl Environ Microbiol Date: 2007-10-12 Impact factor: 4.792
Authors: Pedro Almeida; Carla Gonçalves; Sara Teixeira; Diego Libkind; Martin Bontrager; Isabelle Masneuf-Pomarède; Warren Albertin; Pascal Durrens; David James Sherman; Philippe Marullo; Chris Todd Hittinger; Paula Gonçalves; José Paulo Sampaio Journal: Nat Commun Date: 2014-06-02 Impact factor: 14.919
Authors: Pooja K Strope; Daniel A Skelly; Stanislav G Kozmin; Gayathri Mahadevan; Eric A Stone; Paul M Magwene; Fred S Dietrich; John H McCusker Journal: Genome Res Date: 2015-04-03 Impact factor: 9.043