Literature DB >> 32543001

Disagreement in F_ST estimators: A case study from sex chromosomes.

William J Gammerdinger¹, Melissa A Toups¹, Beatriz Vicoso¹.

Abstract

Sewall Wright developed FST for describing population differentiation and it has since been extended to many novel applications, including the detection of homomorphic sex chromosomes. However, there has been confusion regarding the expected estimate of FST for a fixed difference between the X- and Y-chromosome when comparing males and females. Here, we attempt to resolve this confusion by contrasting two common FST estimators and explain why they yield different estimates when applied to the case of sex chromosomes. We show that this difference is true for many allele frequencies, but the situation characterized by fixed differences between the X- and Y-chromosome is among the most extreme. To avoid additional confusion, we recommend that all authors using FST clearly state which estimator of FST their work uses.

Entities: Disease Gene Species

Keywords: zzm321990Fzzm321990STzzm321990; sex chromosomes

Mesh：

Year: 2020 PMID： 32543001 PMCID： PMC7689734 DOI： 10.1111/1755-0998.13210

Source DB: PubMed Journal: Mol Ecol Resour ISSN： 1755-098X Impact factor: 7.090

BACKGROUND

Genetic sex determination is common in both plants and animals (Bachtrog et al., 2014) and the pair of chromosomes where the sex‐determination gene resides are referred to as the sex chromosomes. Sex chromosomes are hypothesized to often emerge from autosomes once they have acquired a novel mutation for sex determination (Abbott, Nordén, & Hansson, 2017; Bachtrog et al., 2014). Linked, sexually antagonistic alleles can help to drive a novel sex‐determination allele to a higher frequency in the population (van Doorn & Kirkpatrick, 2010) and mechanisms that reduce recombination between sexually antagonistic loci and the novel sex‐determination locus are selectively favoured (Charlesworth, Charlesworth, & Marais, 2005; Rice, 1987). Due to the reduction in recombination, deleterious mutations accumulate and gradually decay the gene content within this region (Bachtrog, 2013; Blaser, Grossen, Neuenschwander, & Perrin, 2012). In some systems, large‐scale deletions or expansions of repetitive elements occur and lead to heteromorphic sex chromosomes (Bachtrog, 2013; Charlesworth et al., 2005). As a result of this process, sex chromosomes exist on a spectrum between harbouring a single nucleotide polymorphism (SNP) responsible for sex determination with no reduction in recombination from the surrounding region, as seen in fugu (Kamiya et al., 2012), to the highly decayed and heteromorphic sex chromosomes observed in many eutherian mammals (Bellott et al., 2014; Cortez et al., 2014). Two common methods have been developed to identify sex chromosomes at different points on this spectrum using next‐generation sequencing data sets. In the more advanced stages of sex chromosome evolution the X‐ and Y‐chromosome share little genomic content. As a result, short reads from the Y‐chromosome align poorly to an X‐chromosome reference, resulting in a higher coverage in females than in males. These differences in coverage between males and females can be used to detect putatively nonrecombining regions of sex chromosomes (Fraïsse, Picard, & Vicoso, 2017; Huylmans, Toups, Macon, Gammerdinger, & Vicoso, 2019; Pal & Vicoso, 2015; Roesti, Moser, & Berner, 2013; Vicoso & Bachtrog, 2011, 2013, 2015; Vicoso, Kaiser, & Bachtrog, 2013). In the less advanced stages of sex chromosome evolution, the X‐ and Y‐chromosome differ only by a few base substitutions. Therefore, short reads from the X‐ and Y‐chromosome align in nearly equal proportions to an X‐chromosome reference and the identification of sex chromosomes instead relies on differences in allele frequencies in males and females (Böhne et al., 2019; Conte, Gammerdinger, Bartie, Penman, & Kocher, 2017; Dixon, Kitano, & Kirkpatrick, 2018; Fontaine et al., 2017; Gammerdinger, Conte, Acquah, Roberts, & Kocher, 2014; Gammerdinger, Conte, Baroiller, D’Cotta, & Kocher, 2016; Gammerdinger, Conte, Sandkam, Penman, & Kocher, 2019; Gammerdinger, Conte, Sandkam, & Ziegelbecker, 2018; Toups, Rodrigues, Perrin, & Kirkpatrick, 2018; Veltsos et al., 2019). Typically, SNPs are first identified among subpopulations of males and females, and regions with high levels of genetic differentiation between males and females are presumed to be sex‐linked. This genetic differentiation between males and females is often described in terms of F ST. F ST is a relative measure of population differentiation (Cruickshank & Hahn, 2014) and was outlined along with other F‐statistics by Sewall Wright (Wright, 1949). Estimates of F ST have been used for many novel applications, such as examining parallel adaptation in sticklebacks (Hohenlohe, Bassham, Etter, & Cresko, 2010), introgression in canaries (Lopes et al., 2016) and local adaptation in high‐altitude populations of Tibetans (Peng et al., 2011; Xu et al., 2011). Recent work has used estimates of F ST to identify and describe the divergence between relatively homomorphic sex chromosomes (Bergero, Gardner, Bader, Yong, & Charlesworth, 2019; Böhne et al., 2019; Conte et al., 2017; Dixon et al., 2018; Fontaine et al., 2017; Gammerdinger et al., 2014, 2016, 2018, 2019; Kirkpatrick & Guerrero, 2014; Natri, Shikano, & Merilä, 2013; Rodrigues & Dufresnes, 2017; Toups et al., 2018; Veltsos et al., 2019). However, there is a discrepancy within the literature regarding the expected estimate of F ST for a fixed difference between the X‐ and Y‐chromosome. When comparing males and females for an allele that is either fixed on the X‐ or Y‐chromosome of an XY pair, some studies expect an F ST estimate of (Gammerdinger et al., 2014, 2016, 2018, 2019; Kirkpatrick & Guerrero, 2014; Toups et al., 2018), while other studies expect an F ST estimate of 0.5 (Böhne et al., 2019; Fontaine et al., 2017; Rodrigues & Dufresnes, 2017). The difference in these expectations is not typically justified, nor is the specific estimator of F ST employed stated, leading to some confusion in the field. A recent study highlighted inconsistencies between different estimators of F ST (Berner, 2019) and in particular pointed out that, for an SNP fixed on the Y‐chromosome, one common estimator of F ST yields a value of (Nei, 1973) while another yields 0.5 (Weir & Cockerham, 1984). How other popular estimators behave for an SNP that is alternatively fixed between the X‐ and Y‐chromosome, and, importantly, why these discrepancies arise, have yet to be systematically reviewed. Here, we aim to clarify why such discrepancies in the expected estimates of F ST can arise when comparing males and females for alternatively fixed alleles between the X‐ and Y‐chromosome. Note that this difference in expectations of F ST is symmetric for ZW systems, so this analysis will only describe an XY system. Importantly, while this analysis focuses on the specific case of sex chromosomes, we also illustrate that the difference between F ST estimators can be substantial for a wide range of allele frequencies, making direct comparisons of F ST estimates between studies problematic in many contexts. Last, we apply a variety of population genetics software packages, which often generically refer to F ST, to estimate F ST for alternatively fixed alleles between the X‐ and Y‐chromosome under various sampling schemes. Because these programs use different estimators and corrections for sample size and composition, a diverse range of expected F ST values can be recovered (0.16–0.67) and, as a result, further complicates the interpretation of experimental studies that use F ST to assess sex chromosome differentiation.

METHODS

We evaluated estimators of F ST across different commonly used software packages, including vcftools version 0.1.15 (Danecek et al., 2011), arlequin 3.5 (Excoffier & Lischer, 2010), genepop 1.0.5 (Rousset, 2008), popgenome 2.61 (Pfeifer, Wittelsbürger, Ramos‐Onsins, & Lercher, 2014), hierfstat 0.04‐29 (Goudet, 2005), diversity 1.9.90 (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013) and dnasp 6.12.03 (Rozas et al., 2017). Programs that used R were run on R 3.5.1 (R Core Team, 2016). Scripts for software packages that did not have a GUI are provided in File S1. During our use of these software packages, we analysed the effect of sample sizes on F ST estimators. To perform this analysis, we created mock VCF files containing 20 fixed differences between the X‐ and Y‐chromosome for males and females (File S1) following the defined VCF format (Danecek et al., 2011). When necessary, we used pgdspider (Lischer & Excoffier, 2012) to convert our mock VCF files into fasta, arlequin, fstat and genepop formats.

WRIGHT’S F ST

Since Wright introduced F ST (Wright, 1949), it has been unclear if this definition represents a parameter or an estimate of the parameter (Hahn, 2018; Holsinger & Weir, 2009). Nonetheless, F ST for a biallelic system is most traditionally described as:where is the variance in the allele frequency for p and and are the average allele frequencies across the subpopulations for p and q, respectively (Hedrick, 2005; Weir & Cockerham, 1984). When comparing the sex chromosomes of males and females, we will define subpopulation 1 to be males and subpopulation 2 to be females, while p is the frequency of the allele on the X‐chromosome and q is the frequency of the allele on the Y‐chromosome (Table 1). When using Equation 1 and the values in Table 1, the resulting F ST is .

TABLE 1

Description of values for p and q in males and females

	Subpopulation 1 (males)	Subpopulation 2 (females)	Average of the subpopulations	F _ST
p (frequency of the allele fixed on the X‐chromosome)	p ₁ = 0.5	p₂ = 1	p¯ = 0.75	0.333¯
q (frequency of the allele fixed on the Y‐chromosome)	q ₁ = 0.5	q₂ = 0	q¯ = 0.25	0.333¯

Description of values for p and q in males and females

ESTIMATORS OF F ST

In practice, the parameter values for allele frequencies are unknown and thus many methods have been proposed to estimate F ST. Here, we contrast two common estimators of F ST, which we will denote as and . The difference between these estimators has been previously discussed by others (Bhatia, Patterson, Sankararaman, & Price, 2013; Charlesworth, 1998), but not specifically in the context of sex chromosomes. While some estimators handle multiple alleles and multiple subpopulations, we will be considering only the biallelic state for two subpopulations. We make these simplifications because they provide a direct comparison between estimators and they reflect a situation with alternatively fixed alleles on the X‐ and Y‐chromosome when comparing males and females. Also, we will focus on the example of a fixed difference between the X‐ and Y‐chromosome because it is the fundamental component of these elevated estimates of F ST. However, F ST estimates for various degrees of difference in the allele frequencies between males and females, going from equal frequencies in both sexes to alternatively fixed differences between the X‐ and Y‐chromosome, can be found in Figure S1. In an empirical study, the identification of an XY system would typically show a region overrepresented with these fixed differences between the X‐ and Y‐chromosome.

Nei's estimator of F ST

One estimator of F ST comes from Nei (1973) and is often referred to as G ST. G ST uses heterozygosity data to estimate F ST. G ST quantifies the difference between the total heterozygosity of the population and the average heterozygosity of the subpopulations and normalizes this difference by the total heterozygosity of the population. It was defined by Nei (1973) as: When considering two alleles in two subpopulations, this estimator can be simplified to: Thus, by comparing males and females for an allele that is alternatively fixed on the X‐ and Y‐chromosome using Nei’s (1973) estimator, the expected estimate of F ST is . A similar estimator of F ST, called γ ST, is generally used in the context of haplotypes and estimates F ST using nucleotide diversity. Nucleotide diversity, π, is the mean number of nucleotide differences between two randomly selected sequences from a population. γ ST is described as the difference between the total nucleotide diversity of the population and the average nucleotide diversity of the subpopulations normalized to the total nucleotide diversity of the population (Nei, 1982). Nei (1982) defined γ ST as:where π T the total nucleotide diversity of the population and π S is the mean of the subpopulations’ nucleotide diversities. Notably, when considering a single, biallelic SNP, G ST and γ ST are equivalent (Nei, 1982). As a result, we will describe nucleotide diversities in terms of p and q, since nucleotide diversities and heterozygosities are equivalent for a SNP. We introduce γ ST because it will lead to the most direct comparison between Nei’s (1973) estimator and Hudson, Slatkin, et al. (1992) estimator in the next section. π T estimates nucleotide diversity from and , the mean of p and q across the subpopulations, respectively. Importantly, π T makes comparisons between all alleles in the whole population. When considering a biallelic SNP in two subpopulations, π T in Nei’s (1982) estimator can be simplified to: Using the values from Table 1, we can estimate π T as 0.375. The nucleotide diversity, π, of each subpopulation can be computed using Nei and Li’s (1979) definition for this statistic and averaged together to become π S. For a biallelic SNP in two subpopulations, π S can be simplified to: When utilizing the values in Table 1, π S is 0.25 and therefore Equation 4 recovers an expected estimate of F ST to be .

Hudson, Slatkin and Maddison's estimator of F ST

A second, alternative estimator of F ST comes from Hudson, Slatkin, et al. (1992). This estimator considers the difference in the average nucleotide diversity between subpopulations and the average subpopulation nucleotide diversity and then normalizes this difference to the average nucleotide diversity between subpopulations. This estimator of F ST is defined as:where π W is similar to Nei’s (1982) π S, except π W excludes pairwise comparisons of haplotypes against themselves and is thus dependent on the subpopulation sample sizes. π W can be expressed as:where n 1 and n 2 are the number of diploid individuals sampled from subpopulation 1 and 2, respectively. (A full derivation that can be found in the supplementary information of Bhatia et al. (2013). Note that in the Bhatia et al. (2013) derivation, n 1 and n 2 represent allele counts not diploid individual counts as are used here). As the subpopulation sample sizes go to infinity, π W will approach π S and thus the difference between π S and π W is often negligible with large subpopulation sample sizes. π B is an alternative estimator of nucleotide diversity, defined by Nei and Li (1979) as π, and it can be quite numerically different from π T. π is the mean number of nucleotide differences between two randomly selected DNA sequences, each of which is drawn from separate subpopulations. In the case of a biallelic SNP in two subpopulations, Hudson, Slatkin, et al. (1992) estimator for π B can be rewritten as: The values in Table 1 yield an estimate of π B to be 0.5. As subpopulation sample sizes approach infinity, estimating F ST for a biallelic locus in two subpopulations with this estimator can be written as: As subpopulation sizes go to infinity and using either Equation 7 or 10 with the values in Table 1, we arrive at an estimate of F ST approaching 0.5. F ST estimates for finite sample sizes using this estimator are demonstrated in Figure 1. Importantly, regardless of the subpopulation sample sizes employed, Nei’s (1973) estimator and Hudson, Slatkin, et al. (1992) estimator are always quite different for the case of sex chromosomes (Figure 1).

FIGURE 1

Various estimates of F ST for a fixed difference between the X‐ and Y‐chromosome when (a) using equal subpopulation sample sizes for two subpopulations, males and females, and (b) using unequal subpopulation sample sizes for the two subpopulations, males and females, while keeping the total sample size constant

Why is there a difference in the expected estimate of FST?

By comparing Equations 4 and 7 with large subpopulation sample sizes, it is clear that the important difference in Nei’s (1973, 1982) estimator and Hudson, Slatkin, et al. (1992) estimator arises in how they handle π T and π B. Nei’s (1982) π T uses two randomly drawn sequences from the population as a whole, while Hudson, Slatkin, et al. (1992) π B requires that the two randomly drawn sequences be from the separate subpopulations. Figure 2 illustrates the difference between these two estimators for (a) a biallelic SNP present on an autosome in two subpopulations and for (b) sex chromosomes when comparing males and females. Figure 3 shows the F ST estimates produced from Nei’s (1973) estimator and Hudson, Slatkin, et al. (1992) estimator when considering infinitely large subpopulation sample sizes, as well as the difference between these two estimators. Interestingly, the regions corresponding to an SNP that is alternatively fixed between the X‐ and Y‐chromosome are among the regions where the difference in these estimators is highest. However, these estimators can differ substantially across the range of plausible allele frequencies observed in two subpopulations. For example, SNPs that are not yet fixed differences between the X‐ and Y‐chromosome also show this discordance between F ST estimators (Figure 3; Figure S1). Additionally, the difference in these two F ST estimates when p 1 is 0.20, p 2 is 0.80 and both subpopulation samples sizes are 20 is slightly larger than the difference produced by sex chromosomes. This importantly illustrates that the disagreement in F ST estimators is not the byproduct of the unique scenario of sex chromosomes, but is a disagreement that many researchers using F ST estimators should consider.

FIGURE 2

FIGURE 3

Visualizations of Nei (1973), Hudson, Slatkin, et al. (1992) and the difference between the two estimators. (a) Estimates of F ST using the Nei (1973) estimator with white being no differentiation and dark blue being complete differentiation. (b) Estimates of F ST using the Hudson, Slatkin, et al. (1992) estimator given infinitely large subpopulation sizes with white being no differentiation and dark blue being complete differentiation. (c) A heatmap of the difference between Hudson, Slatkin, et al. (1992) estimator and Nei’s (1973) estimator for F ST (Hudson, Slatkin, et al. (1992) minus Nei (1973)) given infinitely large subpopulation sample sizes and the allele frequencies of p in subpopulations 1 and 2. Warmer colours show more difference between the estimators, while cooler colours show less difference between the estimators. Because the assignment of p and q along with subpopulation 1 and 2 is arbitrary, we have placed black boxes at all of the locations that could fit the description of a fixed difference between the X‐ and Y‐chromosome and provided an arrow to the scenario we outlined in Table 1. Dotted lines show an F ST estimate equal to 0.1, dashed lines show an F ST estimate equal to 0.5 and solid lines show an F ST estimate equal to 0.9. Black dotted, dashed and solid lines are used to signify Nei’s (1973) estimator in panels (a) and (c), while purple dotted, dashed and solid lines are used to signify Hudson, Slatkin, et al. (1992) estimator in panels (b) and (c)

Comparison of the nonzero components of π B in Hudson, Slatkin, et al. (1992) estimator and π T in Nei’s (1973) estimator for biallelic SNPs in (a) two subpopulations and (b) an XY system. Each bar under the alleles represents a nonzero comparison that occurs in the formulation of π B or π T. The curly bracket beneath females in the sex chromosome comparison illustrates that females are homomorphic for this allele despite being diploid and thus only one nonzero comparison is made Visualizations of Nei (1973), Hudson, Slatkin, et al. (1992) and the difference between the two estimators. (a) Estimates of F ST using the Nei (1973) estimator with white being no differentiation and dark blue being complete differentiation. (b) Estimates of F ST using the Hudson, Slatkin, et al. (1992) estimator given infinitely large subpopulation sizes with white being no differentiation and dark blue being complete differentiation. (c) A heatmap of the difference between Hudson, Slatkin, et al. (1992) estimator and Nei’s (1973) estimator for F ST (Hudson, Slatkin, et al. (1992) minus Nei (1973)) given infinitely large subpopulation sample sizes and the allele frequencies of p in subpopulations 1 and 2. Warmer colours show more difference between the estimators, while cooler colours show less difference between the estimators. Because the assignment of p and q along with subpopulation 1 and 2 is arbitrary, we have placed black boxes at all of the locations that could fit the description of a fixed difference between the X‐ and Y‐chromosome and provided an arrow to the scenario we outlined in Table 1. Dotted lines show an F ST estimate equal to 0.1, dashed lines show an F ST estimate equal to 0.5 and solid lines show an F ST estimate equal to 0.9. Black dotted, dashed and solid lines are used to signify Nei’s (1973) estimator in panels (a) and (c), while purple dotted, dashed and solid lines are used to signify Hudson, Slatkin, et al. (1992) estimator in panels (b) and (c)

Additional corrections to F ST

There are several estimators of F ST that attempt to provide corrections for sampling biases and can cause further deviations from the expected estimate of F ST. Some estimators, such as that proposed by Hudson, Slatkin, et al. (1992), change with the total number of individuals sampled (Hudson, Boos, & Kaplan, 1992; Hudson, Slatkin, et al., 1992; Nei & Chesser, 1983) (Figure 1a; Table 2). Additionally, some estimators change as the proportion of individuals sampled from each subpopulation changes even when the total number of individuals sampled is held constant (Hudson, Boos, et al., 1992; Hudson, Slatkin, et al., 1992; Nei & Chesser, 1983; Weir & Cockerham, 1984) (Figure 1b; Table 2).

TABLE 2

Package (version)	Option	5 Males and 5 females	10 Males and 10 females	20 Males and 20 females	5 Males and 15 females	15 Males and 5 females	Referenced estimator
vcftools (0.1.15)	weir‐fst‐pop	0.5	0.5	0.5	0.667	0.4	Weir and Cockerham, (1984)
popgenome (2.61)	F_ST.stats: nucleotide.F_ST	0.444	0.474	0.487	0.444	0.483	Hudson, Slatkin, et al. (1992)
	F_ST.stats: nuc.F_ST.pairwise	0.444	0.474	0.487	0.444	0.483	Hudson, Slatkin, et al. (1992)
	F_ST.stats: Nei.G_ST	0.333	0.333	0.333	0.333	0.333	Nei (1973)
	F_ST.stats: Nei.G_ST.pairwise	0.333	0.333	0.333	0.333	0.333	Nei (1973)
	F_ST.stats: Hudson.H_ST	0.296	0.316	0.325	0.45	0.163	Hudson, Boos, et al. (1992) ^a
	F_ST.stats: Hudson.G_ST	0.286	0.310	0.322	0.378	0.195	Hudson, Boos, et al. (1992) ^b ^, ^c
diversity (1.9.90)	diffCalc(fst = TRUE)	0.5	0.5	0.5	0.667	0.4	Weir and Cockerham (1984)
diversity (1.9.90)	diffCalc()	0.286	0.310	0.322	0.3023	0.3023	Nei and Chesser (1983)
hierfstat (0.04–29)	pairwise.fst	0.333	0.333	0.333	0.429	0.2	Nei (1973) ^c
	genet.dist(method = Nei87)	0.5	0.5	0.5	0.5	0.5	Nei (1987) ^d
	pairwise.neifst	0.5	0.5	0.5	0.5	0.5	Nei (1987) ^d
	basic.stats(fst)	0.333	0.333	0.333	0.333	0.333	Nei (1987)
	genet.dist(method = WC84)	0.5	0.5	0.5	0.667	0.4	Weir and Cockerham, (1984)
	pairwise.WCfst	0.5	0.5	0.5	0.667	0.4	Weir and Cockerham, (1984)
genepop (1.0.5)	Fst	0.5	0.5	0.5	0.667	0.4	Weir and Cockerham, (1984)
arlequin (3.5)	Compute pairwise FST	0.444	0.474	0.487	0.647	0.362	Excoffier, Smouse, and Quattro (1992)
dnasp (6.12.03)	Gene Flow and Genetic Differentiation: GST	0.286	0.310	0.322	0.378	0.194	Nei (1973) ^b ^, ^c
	Gene Flow and Genetic Differentiation: GammaSt	0.333	0.333	0.333	0.429	0.2	Nei (1982) ^c
	Gene Flow and Genetic Differentiation: Fst	0.444	0.474	0.487	0.444	0.483	Hudson, Slatkin, et al. (1992)

This implementation appears to use a weighting factor.

These estimates are most consistent with Nei and Chesser (1983), which is also discussed in Hudson, Boos, et al. (1992).

These metrics appear to use a weighting factor, while Nei (1982) and Nei and Chesser (1983) state that in most practices the subpopulations can be assumed to be weighted equally.

The referenced estimator is consistent with F′ST in Nei (1987).

Software packages for estimating F ST and their estimates using mock input. These input files contained fixed differences between the X‐ and Y‐chromosome for various sample sizes of males and females This implementation appears to use a weighting factor. These estimates are most consistent with Nei and Chesser (1983), which is also discussed in Hudson, Boos, et al. (1992). These metrics appear to use a weighting factor, while Nei (1982) and Nei and Chesser (1983) state that in most practices the subpopulations can be assumed to be weighted equally. The referenced estimator is consistent with F′ST in Nei (1987). As similarly pointed out by Berner (2019), Weir and Cockerham’s (1984) estimator appears to respond dramatically to unequal numbers of males and females. As the proportion of the males in a constant sample size increases, the total variance of the sample increases and thus decreases the estimate of F ST (Figure 1b). Regardless, in the case of a fixed difference between the X‐ and Y‐chromosome, is defined as 0.75. Thus, there is little need to correct for subpopulation sample sizes because this differentiation is similar whether a single male and female are analysed or a very large number of each sex are considered. However, this type of sample size correction may be applicable when considering SNPs that are more frequent on the Y‐chromosome but not yet fixed or if it is unknown whether an SNP is alternatively fixed between the X‐ and Y‐chromosome. An additional correction considers the number of subpopulations sampled (Hedrick, 2005; Weir & Cockerham, 1984). This correction is related to an infinite island model that assumes that the researcher is sampling a few subpopulations from a larger metapopulation. Because there are only two subpopulations, males and females, these corrections are probably unsuitable in this context. Additionally, it has also been pointed out that F ST underestimates differentiation at highly polymorphic loci, such as microsatellites (Charlesworth, 1998; Hedrick, 2005; Meirmans & Hedrick, 2011). Some estimators are particularly concerned with correcting for this bias (Hedrick, 2005; Meirmans & Hedrick, 2011); however, this correction for highly polymorphic loci is unlikely to be necessary for the biallelic locus in question. While some of these corrections are probably inappropriate, authors may be using them as some software packages refer to their implementation generically as F ST (Table 2). Table 2 and Figure 1 highlight the wide range of results that researchers could get for estimating F ST depending on the subpopulation sample sizes and estimator employed. One may argue that any large deviation away from zero in F ST estimates is sufficient enough evidence for sex chromosomes. However, estimates of F ST that include the various aforementioned corrections may never reach or 0.5 (Table 2) and thus the expected maximum estimate of F ST for a particular data set should ideally be considered and stated. Otherwise, deviations from the theoretical maximum due to these corrections could lead to an erroneous interpretation that there are no fixed differences between the X‐ and Y‐chromosome. While the particular case of variants on sex chromosomes leads to some of the largest differences between these estimators, substantial differences can occur under alternative scenarios as well, and it would often be helpful to know how much of the variance between studies is driven by how F ST is estimated. For instance, whether sexually antagonistic selection can explain the range of F ST values that are found between males and females of different species (Cheng & Kirkpatrick, 2016; Flanagan & Jones, 2017; Lucotte, Laurent, Heyer, Ségurel, & Toupance, 2016; Wright et al., 2018; Wright, Rogers, Fumagalli, Cooney, & Mank, 2019) has recently been the subject of debate (Kasimatis, Nelson, & Phillips, 2017; Kasimatis, Ralph, & Phillips, 2019). While Kasimatis et al. (2019) compare Wright's F ST to Weir and Cockerham's estimator of F ST, the variability introduced by the various estimators used in the previously cited experimental work (Hudson, Slatkin, et al., 1992; Nei, 1986; Weir, 1996; Wright, 1949) was not considered. In the future, we strongly urge researchers to justify their estimator, so that appropriate F ST estimators are employed and estimates from various studies can be comparable.

CONCLUSIONS

When considering fixed differences between the X‐ and Y‐chromosome, we conclude that it is appropriate to use Nei’s (1973) estimator since it is most consistent with the work of Wright and others. However, both Nei’s (1973) and Hudson, Slatkin, et al. (1992) estimators are useful estimators of differentiation and there could be questions, such as those regarding polymorphisms that are not fully linked to the X‐ or Y‐chromosome, which are better answered with different estimators that implement some of the previously mentioned corrections. Moving forward, we encourage researchers to state which estimator they choose, their rationale for that choice and what the expected estimate of F ST is for the data set they are investigating.

AUTHOR CONTRIBUTIONS

W.J.G. conceived the commentary, drafted the manuscript and created the figures. M.A.T. and B.V. aided in drafting the manuscript and contributed intellectually to the commentary. M.A.T. also ran the software packages for the estimates of F ST in Table 2.

CONFLICTS OF INTEREST

The authors declare no conflicts of interest. Fig S1 Click here for additional data file.

65 in total

1. PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs.

Authors: H E L Lischer; L Excoffier
Journal: Bioinformatics Date: 2011-11-21 Impact factor: 6.937

2. A statistical test for detecting geographic subdivision.

Authors: R R Hudson; D D Boos; N L Kaplan
Journal: Mol Biol Evol Date: 1992-01 Impact factor: 16.240

3. genepop'007: a complete re-implementation of the genepop software for Windows and Linux.

Authors: François Rousset
Journal: Mol Ecol Resour Date: 2008-01 Impact factor: 7.090

4. DEFINITION AND ESTIMATION OF FIXATION INDICES.

Authors: Masatoshi Nei
Journal: Evolution Date: 1986-05 Impact factor: 3.694

5. Sex-chromosome turnovers induced by deleterious mutation load.

Authors: Olivier Blaser; Christine Grossen; Samuel Neuenschwander; Nicolas Perrin
Journal: Evolution Date: 2012-10-11 Impact factor: 3.694

6. DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets.

Authors: Julio Rozas; Albert Ferrer-Mata; Juan Carlos Sánchez-DelBarrio; Sara Guirao-Rico; Pablo Librado; Sebastián E Ramos-Onsins; Alejandro Sánchez-Gracia
Journal: Mol Biol Evol Date: 2017-12-01 Impact factor: 16.240

7. Numerous transitions of sex chromosomes in Diptera.

Authors: Beatriz Vicoso; Doris Bachtrog
Journal: PLoS Biol Date: 2015-04-16 Impact factor: 8.029

8. Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators.

Authors: Daniel W Bellott; Jennifer F Hughes; Helen Skaletsky; Laura G Brown; Tatyana Pyntikova; Ting-Jan Cho; Natalia Koutseva; Sara Zaghlul; Tina Graves; Susie Rock; Colin Kremitzki; Robert S Fulton; Shannon Dugan; Yan Ding; Donna Morton; Ziad Khan; Lora Lewis; Christian Buhay; Qiaoyan Wang; Jennifer Watt; Michael Holder; Sandy Lee; Lynne Nazareth; Jessica Alföldi; Steve Rozen; Donna M Muzny; Wesley C Warren; Richard A Gibbs; Richard K Wilson; David C Page
Journal: Nature Date: 2014-04-24 Impact factor: 49.962

9. Using conventional F-statistics to study unconventional sex-chromosome differentiation.

Authors: Nicolas Rodrigues; Christophe Dufresnes
Journal: PeerJ Date: 2017-04-27 Impact factor: 2.984

10. The Origin of a New Sex Chromosome by Introgression between Two Stickleback Fishes.

Authors: Groves Dixon; Jun Kitano; Mark Kirkpatrick
Journal: Mol Biol Evol Date: 2019-01-01 Impact factor: 16.240

5 in total

1. The functional genetic architecture of egg-laying and live-bearing reproduction in common lizards.

Authors: Hans Recknagel; Madeleine Carruthers; Andrey A Yurchenko; Mohsen Nokhbatolfoghahai; Nicholas A Kamenos; Maureen M Bain; Kathryn R Elmer
Journal: Nat Ecol Evol Date: 2021-10-07 Impact factor: 15.460

2. Dioecy and chromosomal sex determination are maintained through allopolyploid speciation in the plant genus Mercurialis.

Authors: Melissa A Toups; Beatriz Vicoso; John R Pannell
Journal: PLoS Genet Date: 2022-07-06 Impact factor: 6.020

3. ZW sex-chromosome evolution and contagious parthenogenesis in Artemia brine shrimp.

Authors: Marwan Elkrewi; Uladzislava Khauratovich; Melissa A Toups; Vincent Kiplangat Bett; Andrea Mrnjavac; Ariana Macon; Christelle Fraisse; Luca Sax; Ann Kathrin Huylmans; Francisco Hontoria; Beatriz Vicoso
Journal: Genetics Date: 2022-09-30 Impact factor: 4.402

4. Disagreement in F_ST estimators: A case study from sex chromosomes.

Authors: William J Gammerdinger; Melissa A Toups; Beatriz Vicoso
Journal: Mol Ecol Resour Date: 2020-07-06 Impact factor: 7.090

5. Sex-Specific Selection Drives the Evolution of Alternative Splicing in Birds.

Authors: Thea F Rogers; Daniela H Palmer; Alison E Wright
Journal: Mol Biol Evol Date: 2021-01-23 Impact factor: 16.240

5 in total