Literature DB >> 25567922

Linkage disequilibrium estimates of contemporary N e using highly variable genetic markers: a largely untapped resource for applied conservation and evolution.

Abstract

Genetic methods are routinely used to estimate contemporary effective population size (N e) in natural populations, but the vast majority of applications have used only the temporal (two-sample) method. We use simulated data to evaluate how highly polymorphic molecular markers affect precision and bias in the single-sample method based on linkage disequilibrium (LD). Results of this study are as follows: (1) Low-frequency alleles upwardly bias [Formula: see text], but a simple rule can reduce bias to <about 10% without sacrificing much precision. (2) With datasets routinely available today (10-20 loci with 10 alleles; 50 individuals), precise estimates can be obtained for relatively small populations (N e < 200), and small populations are not likely to be mistaken for large ones. However, it is very difficult to obtain reliable estimates for large populations. (3) With 'microsatellite' data, the LD method has greater precision than the temporal method, unless the latter is based on samples taken many generations apart. Our results indicate the LD method has widespread applicability to conservation (which typically focuses on small populations) and the study of evolutionary processes in local populations. Considerable opportunity exists to extract more information about N e in nature by wider use of single-sample estimators and by combining estimates from different methods.

Entities: Chemical Disease Gene Species

Keywords: bias; computer simulations; confidence intervals; effective population size; microsatellites; precision; temporal method

Year: 2009 PMID： 25567922 PMCID： PMC3352464 DOI： 10.1111/j.1752-4571.2009.00104.x

Source DB: PubMed Journal: Evol Appl ISSN： 1752-4571 Impact factor: 5.183

Introduction

Effective population size (Ne) is widely regarded as one of the most important parameters in both evolutionary biology (Charlesworth 2009) and conservation biology (Nunney and Elam 1994; Frankham 2005), but it is notoriously difficult to estimate in nature. Logistical challenges that constrain the ability to collect enough demographic data to calculate Ne directly have spurred interest in genetic methods that can provide estimates of this key parameter, based on measurements of genetic indices that are affected by Ne (reviewed by Wang 2005). Although some early proponents suggested that indirect genetic estimates of Ne would only be useful in cases where the natural population was so large it could not be counted effectively, it was subsequently pointed out that these methods have much greater power if population size is small. Indeed, the rapid increase in applications in recent years has been fueled largely by those interested in conservation issues or the study of evolutionary processes in local populations that often are small (Schwartz et al. 1999, 2007; Leberg 2005; Palstra and Ruzzante 2008). Estimates of contemporary effective size (roughly, Ne that applies to the time period encompassed by the sampling effort) can be based on either a single sample (Hill 1981; Pudovkin et al. 1996) or two samples (Krimbas and Tsakas 1971;Nei and Tajima 1981). The two-sample (temporal) method, which depends on random changes in allele frequency over time, has been by far the most widely applied, and it was the only method considered in a recent meta-analysis of genetic estimates of Ne in natural populations (Palstra and Ruzzante 2008). This is a curious result, given that every temporal estimate requires at least two samples that could each be used to provide a separate, single-sample estimate of Ne. Furthermore, whereas the amount of data used by the temporal method increases linearly with increases in numbers of loci (L) or alleles (K), the amount of data used by the most powerful single-sample estimators increases with the square of L and K. This suggests that, given the large numbers of highly polymorphic molecular markers currently available, there is a large, untapped (or at least under-utilized) resource that could be more effectively exploited to extract information about effective size in nature. Toward that end, in this study we evaluate precision and bias of the original single-sample method for estimating Ne– that based on random linkage disequilibrium (LD) that arises by chance each generation in finite populations (Laurie-Ahlberg and Weir 1979; Hill 1981). In the moment-based LD method, accuracy depends on derivation of an accurate expression for the expectation of a measure of LD () as a function of Ne. As r2 is a ratio, deriving its expected value is challenging, and the original derivation that ignored second-order terms was subsequently shown to lead to substantial biases in some circumstances (England et al. 2006). An empirically derived adjustment to E() (Waples 2006) has addressed the bias problem, but the bias correction was based on simulated data for diallelic gene loci and did not consider precision in any detail. Although is a standardized measure of LD, the standardization does not completely remove the effects of allele frequency (Maruyama 1982; Hudson 1985; Hedrick 1987). Therefore, it is necessary to evaluate more rigorously the LD method using simulated data for highly polymorphic markers (now in widespread use) that include many alleles that can drift to low frequencies. Specifically, we ask the following questions: How is precision affected by factors under control of the investigator (L, K, number of individuals sampled) and those that are not [true (unknown) Ne]? What effect do rare alleles have on precision and bias? What practical guidelines can help balance tradeoffs between precision and bias? Under what conditions can the LD method provide useful information for practical applications? If Ne is small, how often does the method mistakenly estimate a large Ne? If Ne is large, how often does the method mistakenly estimate a small Ne? What kind of performance can we expect when data consist of a very large number of diallelic, single-nucleotide-polymorphism (SNP) markers? How does performance of the LD method compare to other methods for estimating contemporary Ne?

Methods

Genotypic data were generated for ‘ideal’ populations (constant size, equal sex ratio, no migration or selection, discrete generations, and random mating and random variation in reproductive success) using the software EasyPop (Balloux 2001). One thousand replicate populations were generated for each size considered (N = 50, 100, 500, 1000, 5000 ideal individuals). In the standard parameter set, each simulated individual had data for L = 20 independent gene loci, which had a mutational model approximating that of microsatellites (mutation rate μ = 5 × 10−4; k-allele model with A = 10 possible allelic states; see Table 1 for a definition of notation). In some runs, we used 5, 10, or 40 loci and/or 5 or 20 alleles per locus. Each simulation was initiated with maximal diversity (initial genotypes randomly drawn from all possible allelic states) and run for successive generations until the mean within-population expected heterozygosity (HE) reached 0.8 (comparable to levels found in many studies of natural populations using microsatellites). Simulations with N = 5000 used a lower mutation rate (μ = 5 × 10−5) because μ = 5 × 10−4 leads to mutation–drift equilibrium values of HE that are larger than 0.8. After the HE = 0.8 criterion was met, samples of S = 25, 50, 100, or 200 (for N ≥ 200) individuals were taken in the final generation. As the populations were ‘ideal,’ apart from random sampling errors the effective size and census size were the same (more precisely, for otherwise ideal populations in species with separate sexes, Ne ≍ N + 0.5; Balloux 2004).

Table 1

Notation used in this study

N	Population size, equal to the number of ideal individuals
N_e	Effective population size per generation
N_b	Effective number of breeders in a specific time period
	An estimate of effective size based on genetic data
LD	Denotes the linkage disequilibrium method for estimating N_e
T	Denotes the temporal method for estimating N_e
CV	Coefficient of variation
S	Number of individuals sampled for genetic analysis
L	Number of (presumably independent) gene loci
A	Maximum number of allelic states for a gene locus
K	Actual number of alleles at a locus
P_crit	Criterion for excluding rare alleles; alleles with frequency <P_crit are excluded
n	Total number of independent allelic combinations (degrees of freedom) for the LD method (given by eqn 1)
n′	Total number of independent alleles (degrees of freedom) for the temporal method (given by eqn 5)
t	Elapsed number of generations between samples in the temporal method
V_k	Variance among adults in lifetime contribution of gametes to the next generation

Notation used in this study The composite Burrows method (Weir 1996) was used to calculate , an estimator of the squared correlation of allele frequencies at pairs of loci. Because it is straightforward to calculate and does not require one to assume random mating (as does Hill's 1974 maximum likelihood method), Weir (1979) recommended use of the Burrows method for most applications. For each sample, an overall mean was computed as the weighted average over the L(L−1)/2 pairwise comparisons among loci. With 20 loci initially segregating and a high mutation rate, virtually every replicate had 20 polymorphic loci at the time of sampling, yielding 20 × 19/2 = 190 pairwise comparisons of loci. The weights for each locus pair were a function of the relative number of independent alleles used in the comparison, as discussed in Waples and Do (2008). A locus with K alleles has the equivalent of K−1 independent alleles. For two loci with K1 and K2 alleles, respectively, there are the equivalent of (K1−1)(K2−1) independent allelic comparisons (Zaykin et al. 2008). The total degrees of freedom associated with the overall weighted mean was computed as The LD method is based on the following theoretical relationship between and Ne (Hill 1981): Thus, has two components: one due to drift (1/3Ne) and one to sampling a finite number of individuals (1/S). Subtracting the expected contribution of sampling error produces an unbiased estimate of the drift contribution to LD, which can be used to estimate Ne: Equation (2) is only approximate as it ignores second-order terms in S and Ne, which can lead to substantial bias in . Therefore, the adjusted expectations for the drift and sampling error components of developed by Waples (2006), as implemented in the software Ldne (Waples and Do 2008), were used to calculate and estimate effective size. To assess possible biases from numerous low-frequency alleles, was computed separately after excluding alleles with frequencies below the following cutoffs: Pcrit = 0.1, 0.05, 0.02, 0.01. With S = 25, the lowest possible allele frequency is 1/(2S) = 0.02, which means that for this sample size Pcrit = 0.02 and 0.01 both fail to screen out any alleles that actually occur in the population. Therefore, for S = 25 we used Pcrit = 0.03 rather than 0.02; this provided a contrast between the criterion 0.01 (which allows all alleles) and 0.03 (which excludes only alleles that occur in a single copy). Accuracy was evaluated by comparing harmonic mean across replicates to the nominal effective size, N. A theoretical measure of precision can be obtained from the following expression for the coefficient of variation (CV) of , modified from Hill (1981; Equation 8) to reflect current notation: This expression assumes that the loci are not physically linked and that S and K are constant across loci. Our simulations used unlinked loci and constant sample sizes, and variation in the actual number of alleles per locus was relatively small. Equation (3) can be misleading if (as will often be the case) the distribution of is sharply skewed toward high values. Therefore, we also considered an empirical measure of precision, CV(). Another useful metric that measures both accuracy and precision is the mean-squared error (MSE = Variance + Bias2). We calculated MSE for each parameter set as the mean of , where is the overall mean for the ith replicate and E() is the expected value of , obtained from Table 2 of Waples (2006) for the specific values of S and Ne.

Table 2

Percentage of estimates for the LD method that fell outside the indicated lower and upper bounds relative to nominal Ne = N

			P_crit					P_crit

N	S	Lower bound	0.1	0.05	0.02*	0.01	Upper bound	0.1	0.05	0.02*	0.01
L = 20
50	25	<0.1N	0.0	0.0	0.0	0.0	>10N	0.0	0.0	0.0	0.0
		<0.5N	3.7	1.3	0.6	0.0	>2N	3.0	2.3	2.7	8.3
	50	<0.1N	0.0	0.0	0.0	0.0	>10N	0.0	0.0	0.0	0.0
		<0.5N	0.0	0.0	0.0	0.0	>2N	0.0	0.0	0.0	0.7
	100†	<0.1N	0.0	0.0	0.0	0.0	>10N	0.0	0.0	0.0	0.0
		<0.5N	0.0	0.0	0.0	0.0	>2N	0.0	0.0	0.0	0.0
100	25	<0.1N	0.0	0.0	0.0	0.0	>10N	1.9	0.5	0.4	0.6
		<0.5N	10.7	5.1	1.6	0.1	>2N	10.6	8.4	10.9	18.7
	50	<0.1N	0.0	0.0	0.0	0.0	>10N	0.0	0.0	0.0	0.0
		<0.5N	0.1	0.0	0.0	0.0	>2N	2.5	0.6	1.3	4.6
	100	<0.1N	0.0	0.0	0.0	0.0	>10N	0.0	0.0	0.0	0.0
		<0.5N	0.0	0.0	0.0	0.0	>2N	0.0	0.0	0.0	0.0
500	25	<0.1N	0.0	0.0	0.0	0.0	>10N	29.6	26.5	25.6	31.7
		<0.5N	34.6	29.0	25.3	16.1	>2N	37.1	34.5	37.1	43.5
	50	<0.1N	0.0	0.0	0.0	0.0	>10N	17.2	11.1	10.8	9.4
		<0.5N	15.5	7.0	2.4	2.1	>2N	31.5	26.0	26.7	32.0
	100	<0.1N	0.0	0.0	0.0	0.0	>10N	3.6	0.9	0.5	0.4
		<0.5N	3.5	1.0	0.0	0.0	>2N	15.8	9.0	6.6	8.4
	200	<0.1N	0.0	0.0	0.0	0.0	>10N	0.1	0.0	0.0	0.0
		<0.5N	0.1	0.0	0.0	0.0	>2N	2.8	0.7	0.1	0.0
1000	25	<0.1N	2.1	0.2	0.1	0.0	>10N	43.1	39.1	36.9	43.1
		<0.5N	39.3	38.1	33.8	26.6	>2N	46.8	43.6	43.2	51.1
	50	<0.1N	0.0	0.0	0.0	0.0	>10N	33.3	28.6	25.2	26.7
		<0.5N	26.2	21.6	14.8	11.8	>2N	42.2	39.4	39.1	42.5
	100	<0.1N	0.0	0.0	0.0	0.0	>10N	17.0	8.8	5.9	6.2
		<0.5N	15.1	5.8	2.2	2.2	>2N	29.8	22.0	21.1	23.1
	200	<0.1N	0.0	0.0	0.0	0.0	>10N	2.9	0.7	0.1	0.0
		<0.5N	2.6	0.2	0.0	0.0	>2N	12.6	6.7	4.8	4.8
5000	25	<0.1N	31.5	26.7	20.7	17.5	>10N	47.9	47.1	48.1	51.2
		<0.5N	48.8	47.8	45.5	40.8	>2N	48.8	48.0	49.3	53.1
	50	<0.1N	15.8	7.6	3.5	3.0	>10N	50.4	49.5	49.2	50.6
		<0.5N	42.4	38.8	36.7	34.5	>2N	52.3	52.8	51.7	54.1
	100	<0.1N	2.5	0.2	0.2	0.1	>10N	44.8	42.4	41.3	39.9
		<0.5N	38.4	35.9	30.3	27.8	>2N	48.2	46.2	47.3	47.6
	200	<0.1N	0.0	0.0	0.0	0.0	>10N	37.1	28.2	23.3	21.2
		<0.5N	31.5	24.8	20.9	17.5	>2N	42.7	38.1	34.6	35.0
S = 50
500	40	<0.1N	0.0	0.0	0.0	0.0	>10N	1.9	0.5	0.4	0.5
500	40	<0.5N	4.4	3.1	0.1	0.0	>2N	11.1	10.4	13.8	15.7
500	20	<0.1N	0.0	0.0	0.0	0.0	>10N	17.2	11.1	10.8	9.4
500	20	<0.5N	15.5	7.0	2.4	2.1	>2N	31.5	26.0	26.7	32.0
500	10	<0.1N	0.0	0.0	0.0	0.0	>10N	33.6	26.0	23.9	26.9
500	10	<0.5N	29.9	22.3	15.3	12.0	>2N	40.7	36.6	36.4	40.4
500	5	<0.1N	2.1	0.0	0.0	0.1	>10N	44.6	39.9	38.4	38.9
500	5	<0.5N	38.0	33.2	30.0	24.5	>2N	47.7	45.6	45.7	45.5

Results are based on 1000 replicates using simulated data; S = sample size; L = number of gene loci, each with a maximum of 10 alleles per locus, and Pcrit is the criterion for excluding rare alleles.

For S = 25, results shown are for Pcrit = 0.03 rather than 0.02.

S = 100 and Ne ≍ 50 was approximated by using N = 100 with a skewed sex ratio (85:15).

Percentage of estimates for the LD method that fell outside the indicated lower and upper bounds relative to nominal Ne = N Results are based on 1000 replicates using simulated data; S = sample size; L = number of gene loci, each with a maximum of 10 alleles per locus, and Pcrit is the criterion for excluding rare alleles. For S = 25, results shown are for Pcrit = 0.03 rather than 0.02. S = 100 and Ne ≍ 50 was approximated by using N = 100 with a skewed sex ratio (85:15). For comparative purposes, an analog to eqn (3) for the moment-based temporal method is (modified from Pollak 1983, Equation 29, to reflect current notation): where the subscript T denotes the temporal method. In eqn (4), lower case t is the number of generations between samples and n′ is the number of independent alleles for the temporal method, which is given by

Results

Precision

In the LD method, CV() is an increasing function of N– that is, variance is higher and precision lower for populations with large effective size (eqn 3). Palstra and Ruzzante (2008) found a similar result in a review of published temporal estimates. Conversely, CV() declines (and precision increases) with larger samples of individuals and more allelic combinations. These patterns are illustrated in Fig. 1. When effective size is moderately small (Ne = N = 100), good precision can be obtained even with moderate amounts of data [CV() < 0.2 for S = 50, L = 10]. However, if Ne is large (∼1000), precision will be poor unless large amounts of data are accumulated. This figure also illustrates an important practical point: with the other parameters fixed, separately doubling the sample size of individuals, number of loci, or number of alleles per locus all lead to roughly the same gains in precision. This theoretical result, which is similar to a conclusion reached by Waples (1989) for the temporal method, holds for a wide range of parameter values (data not shown). The results for the parameter set with L = 180, and K = 2 provide an indication of the number of diallelic SNP loci required to achieve precision comparable to that for a typical microsatellite dataset: 180 independent SNP loci would provide roughly the same level of precision as 20 typical microsatellite loci with 10 alleles each.

Figure 1

Effects of independently doubling sample size of individuals (S), number of loci (L), and number of alleles per locus (K) on the coefficient of variation (CV) of . Results are shown for three different population sizes of N ideal individuals. CV() was computed using eqn (3) in the text. Equation (3) and Fig. 1 assume a fixed number of alleles per locus. In the simulated datasets, we specified the maximum number of allelic states per locus (A) but the actual number of segregating alleles (K) was a random variable. Figure 2 shows how the total number of (presumably independent) allelic combinations (n) in the simulated data varied with other input parameters. n increased sharply as lower frequency alleles were admitted into the computations and in general was about twice as high for Pcrit = 0.05 as for 0.1 and about three times as high for Pcrit = 0.02. Interestingly, for fixed values of L and A, the number of useful allelic combinations was not very sensitive to sample size or effective size (Fig. 2, top panel). For specified values of N, S, and L, n was much higher for A = 10 than A = 5 but did not increase much more with a larger number of potential allelic states (Fig. 2, middle panel). This result occurred because under the simulated conditions, most populations were not able to maintain much beyond 10 alleles per locus. With larger populations (N > 500–1000), increasing A beyond 10 allelic states did allow more alleles into the analysis, but the effect was not large (data not shown). Increasing the number of loci led to large increases in the number of allelic combinations (Fig. 2, bottom panel), a result directly attributable to the fact that the number of pairwise comparisons increases with the square of the number of loci.

Figure 2

Changes in the number of independent allelic comparisons (n) available to compute mean as a function of the criterion for excluding rare alleles (Pcrit). Results shown are means across replicates for simulated data using different combinations of population size (N), sample size (S), number of gene loci (L), and maximum number of alleles per locus (A). (A) Effects of variation in N and S with A and L fixed. (B) Effects of variation in A while N, S, and L are fixed. (C) Effects of variation in L while N, S, and A are fixed. The practical consequences of varying input parameters on the distribution of estimates are seen in Fig. 3. With N = 100 and only moderate amounts of data (S = 50; L = 10), most estimates clustered around 100 and only 0.3% were higher than 500. A much tighter distribution of , with virtually all estimates falling between 50 and 150, was obtained with larger samples of individuals and loci (S = 100; L = 40). The bottom panel shows a much wider range of estimates for N = 1000 than N = 100. Under somewhat typical conditions (S = 50 and L = 20), when true Ne was 1000 over a third of the estimates were >2000. However, for N = 1000 only 1% of the were <300, and for N = 100 only 0.1% of the were >300. Thus, when using the LD method with an amount of data that it is currently possible to achieve for many natural populations, one is not likely to mistake a population with moderately small Ne for one with large Ne.

Figure 3

Empirical distribution of values for 1000 replicate simulated populations, as a function of population size (N), sample size (S), and number of gene loci (L). Each gene locus had a maximum of 10 alleles each, and the criterion for excluding rare alleles was Pcrit = 0.02 (for S > 25) and Pcrit = 0.03 (for S = 25). (A) Population size fixed at N = 100, while S and L vary. (B) Results for two different population sizes with S and L fixed at 50 and 20, respectively. Bins marked with an asterisk represent all estimates >500 (Panel A) or >2000 (Panel B). A broader picture of practical applicability of the LD method can be obtained by examining data in Table 2, which shows the fraction of estimates that differ from N by a factor of 2x or 10x. One major result clearly illustrated here is that lower portions of the distribution of are much more constrained than upper portions. For example, assuming a standard sample of 20 ‘microsat’ loci and N = 500, even a sample of only 25 individuals is sufficient to ensure that will almost never be <10% of N. Conversely, with N = 500 and S = 25, about 25–30% of the estimates exceeded 10N = 5000, depending on the Pcrit value used. Precision is also strongly affected by interaction between sample size and effective size. When N is only 100, a sample of 25 individuals genotyped for 20 ‘microsat’ loci is sufficient to ensure that only a small fraction (1.6% for Pcrit = 0.03; Table 2) of estimates will be less than half the true value. But with N = 500 and the same sample size and Pcrit, about 25% of values will be <0.5N. This table thus illustrates that, for numbers of highly polymorphic loci typically available at present, small samples on the order of 25 individuals can provide meaningful information about effective size only for populations that are not too large (Ne < about 500). The practical value of small samples in the range S = 25 also depends heavily on the number of loci and alleles. For example, with only five ‘microsat’ loci typed, samples of S = 25 do not produce reliable estimates of Ne even when N is as small as 100 (most are either <<100 or >500; Fig. 3A). Finally, results in Table 2 emphasize that even with large samples of individuals, the upper bound of is not well defined if Ne is large. With N = 100 a sample of 50 individuals is sufficient to ensure that no values are >10N, but with N = 1000 about 6% of estimates are >10N even when based on sample sizes of 100, and for N = 5000 even samples of 200 individuals produce a quarter or more of estimates with > 10N. Increasing the number of loci also helps precision (Table 2), but the problem of placing an upper bound on for large populations remains challenging. Direct effects on precision when low-frequency alleles are used are seen in the second panels in Figs 4 and 5. For all values of N, CV() is highest for Pcrit = 0.1, drops by about 40–50% for Pcrit = 0.05, and declines further (but more modestly) for Pcrit = 0.02 and 0.01. This effect is essentially independent of sample size.

Figure 4

Indices of precision and bias for estimates of Ne for simulated data, plotted as a function of sample size (S) and the criterion for excluding rare alleles (Pcrit). Results shown used 20 loci with a maximum of 10 alleles per locus, and population size was N = 100. (A) Bias in harmonic mean ; dotted line shows unbiased expectation = 1.0. (B) Coefficient of variation (CV) of , measured across 1000 replicate values computed as means across all 20 gene loci. (C) Mean-squared error (MSE) of , scaled within each sample size so that the lowest MSE = 1.0.

Figure 5

As in Fig. 4, but with N = 1000.

Bias

We found an interaction between bias (indexed by the ratio harmonic mean /N), S, and Pcrit (Figs 4 and 5). In general, the LD method has little or no bias for Pcrit ≥ 0.05, which is not surprising as the empirical bias correction (Waples 2006) was developed for data that excluded alleles at frequency <5%. As alleles with lower frequency are allowed into the analysis, estimates become biased slightly upwards, and this effect is more pronounced for smaller sample sizes (compare results for S = 50, 100, and 200 with N = 1000 in Fig. 5). The program Ldne implements a separate bias correction for S < 30; this reverses the trend of increasing upward bias with smaller samples and actually leads to a slight downward bias for Pcrit ≥ 0.05 (Figs 4 and 5). However, this small-sample correction is not effective at the lowest Pcrit considered (0.01), which (in the case of S = 25) fails to exclude any alleles, even those occurring in only a single copy. For this sample size, use of Pcrit = 0.03, which screens out singletons but allows all other alleles into the analysis, led to essentially unbiased estimates of Ne for N ≤ 1000 (Figs 4 and 5). The effect of allowing singletons can also be seen for S = 50 in Figs 4 and 5, where upward bias rises sharply for Pcrit = 0.01, a criterion that allows alleles that occur only once in a sample of 2S = 100 genes. Results presented in Figs 2, 4, and 5 thus illustrate an inherent tradeoff between bias and precision: in general, a lower Pcrit leads to estimates that are not only more precise but also more biased. MSE analyses (bottom panels in Figs 4 and 5) are useful to review in this context. Pcrit = 0.1 is clearly too conservative, sacrificing too much precision for only modest benefits with respect to bias. Otherwise, which Pcrit value leads to the lowest MSE depends to some extent on S and N: with N = 100, Pcrit in the range 0.03–0.05 led to the lowest MSE, depending on sample size, whereas with N = 1000 the most extreme Pcrit (0.01) produced the smallest MSE. We found no appreciable effect of the number of loci on bias (data not shown). The maximum number of alleles per locus had little effect over the range A = 10–40, but upward bias in was largely eliminated with A = 5 (data not shown). Presumably, this occurred because A = 10 was sufficient to saturate our populations with rare alleles, whereas with only five alleles per locus most alleles remained at intermediate frequencies. Collectively, results discussed above and for other parameter sets we considered suggest the following practical ‘rule of thumb’ for balancing the precision–bias tradeoff for the LD method: choose Pcrit to be the larger of 0.02 or a value that screens out alleles that occur in only one copy. Operationally, this rule can be expressed as follows: For S > 25: choose Pcrit = 0.02. For S ≤ 25: choose so that 1/(2S) < Pcrit ≤ 1/S. Adoption of this simple rule can be expected to lead to largely unbiased estimates of Ne that have relatively high precision under a wide range of conditions (Fig. 6). If this rule is followed, most realistic situations should produce estimates of Ne with bias <10% or so – a relatively small effect considering the various other sources of uncertainty associated with biological systems. Because rare alleles cause less upward bias in for large S, if sample size is about 100 or larger users might consider using Pcrit = 0.01 to maximize precision with relatively little cost in terms of bias. This might be particularly effective in situations where population size is thought to be large, in which case adequate precision is difficult to achieve without a great deal of data. Note, for example, that with N as large as 5000, estimates based on samples <100 individuals become highly unreliable (Fig. 6).

Figure 6

Bias in harmonic mean across replicate populations as a function of population size (N) and sample size (S). Dotted line shows unbiased expectation = 1.0. Results shown used 20 gene loci with a maximum of 10 alleles each, and the criterion for excluding rare alleles was Pcrit = 0.02 (for S > 25) and Pcrit = 0.03 (for S = 25).

Confidence intervals

Although confidence intervals to are easy to calculate for the LD method, they are complicated to evaluate. To illustrate, consider an idealized scenario in which the point estimate is unbiased (harmonic mean = N) and 95% of the 95% CIs contain the true value of Ne, which is fixed and equal to N every replicate. Practical realities lead to several types of departures from this ideal scenario.

Problem 1

Parametric CIs for the LD method are based on the observation that a function of is distributed approximately as chi-square with n degrees of freedom: CV2() ≍ 2/n (Hill 1981), with n defined as in eqn (1). However, this formulation assumes that the L(L−1)/2 pairwise comparisons among loci are all independent, which is not strictly true; correlations among overlapping pairs of loci (e.g., locus 1 with locus 2 and locus 1 with locus 3) violate this assumption (Hill 1981). As a consequence, variance of mean does not decrease as fast as the theoretical expectation when additional loci are used, and parametric confidence intervals based on the chi-square approximation (Equation 12 in Waples 2006) do not contain the true value the expected fraction of the time when many loci are used.

Problem 2

If is biased, CIs computed for replicate point estimates will tend to perform poorly because they are generated around the biased point estimates but are being compared to the unbiased (true) value of Ne. For example, if the estimator is biased toward high values (as occurs for the LD method with some combinations of N, S, and Pcrit), the entire CI will be above the true value a disproportionate fraction of the time.

Problem 3

A somewhat related phenomenon, recently described by Waples and Faulkner (2009), is that when one explicitly models a Wright–Fisher ‘ideal’ population (e.g., in a computer model that tracks multilocus genotypes), the realized effective size in each replicate (Ne*) only rarely, and only by chance, equals the nominal ‘true’ value of N. This is because in the Wright–Fisher model, the realized variance among individuals in genes contributed to the next generation (V*) is a random variable; effective size equals N only when V* is exactly equal to the binomial expectation E(V) = 2(N−1)/N, so in most replicates Ne* is higher or lower than N because V* ≠ 2(N−1)/N. This effect is small if N is large but can be important even for N = 100, in which case realized Ne* typically varies between about 80 and 120 (± about 20%) across replicate generations in modeled ideal populations (Waples and Faulkner 2009). As a consequence of this effect, performance evaluations of CIs in modeled populations will tend to be overly pessimistic because they do not account for random variation in realized Ne*. To recap, Problem 1 means that parametric CIs for the LD method will tend to be slightly too narrow, with the effect being more pronounced for large numbers of loci. Problems 2 and 3 remain even if the CIs have the appropriate width; these problems arise because the CIs are offset from the ‘true’ value of Ne. In Problem 2, the offset is a real bias and occurs consistently in one direction. In contrast, in Problem 3 the offset is not a bias but instead is due to random differences between realized Ne* and what is assumed to be the true, constant value Ne = N. Interestingly, and importantly, Problems 2 and 3 are both more acute with large amounts of data (high S, L, K). With only modest amounts of data, CIs will be wide and will (by chance) include N a large fraction of the time, even with bias in (Problem 2) or random variation in Ne* (Problem 3). However, as more and more data are brought into the analysis, the CIs will become narrowly focused on the biased point estimate (Problem 2) or the realized value of Ne* that applies to that particular replicate generation (Problem 3). In both cases, the resulting CIs will include N a smaller and smaller fraction of the time as information content increases. In contrast to Problem 1, which is specific to the LD method because it arises from a lack of independence of overlapping pairs of loci, Problems 2 and 3 are more generic and apply as well to confidence intervals for other Ne estimators. What are practical implications of these factors? Waples and Do (2008) proposed a jackknife method to empirically estimate the variance of and modify parametric LD confidence intervals accordingly, which should address Problem 1 given an adequate number of loci to compute a jackknife estimate. Problem 3 complicates evaluation of performance of CIs with simulated data, which is one reason we do not provided detailed evaluations of CIs in this study. However, this problem arises from a type of pseudoreplication inherent to simulated data (Waples and Faulkner 2009) and therefore ceases to be a problem when considering data from natural populations, where each sample has associated with it only one realized Ne*, which is the parameter of interest. Problem 2 is therefore of most immediate concern for those interested in placing confidence limits on estimates of Ne in natural populations. The best approaches are to 1) pick a method that is unbiased, or 2) accept a small degree of bias in exchange for greater precision, recognizing that the resulting CIs might exclude the true Ne a higher-than-expected fraction of the time (even if the width of the CIs is appropriate).

The LD method versus the temporal method

Equations (3) and (4) allow a theoretical comparison of precision of the LD method and the moment-based temporal method. For both estimators, CV() is inversely related to the number of degrees of freedom; this is generally larger for the LD method because as the numbers of loci and alleles/locus increase, n increases as the square of L and K while the temporal n′ increases only linearly with L and K (compare eqns 1 and 5). Thus, we expect that precision should increase more rapidly for the LD method as more loci and alleles are used. Conversely, in the temporal method CV() declines with increasing time between samples (eqn 4), while this parameter does not affect LD estimates. Finally, although precision in both methods is lower for larger N, the coefficient for the Ne term is smaller for the temporal method (eqn 4) than for the LD method (eqn 3), indicating that precision for the temporal method should not decline as rapidly with increases in population size. These diverse and contrasting effects can be quantified by considering the ratio of the coefficients of variation for (CVLD/CVT). As the temporal method requires two samples of size S, to standardize the comparison we assumed a single sample of size 2S for the LD estimates (see Wang 2009, for a comparable adjustment for comparisons of two-sample and one-sample methods). With this adjustment and assuming K is constant, after combining eqns (3) and (4), expanding the expressions for n and n′ using eqns (1) and (5), and simplifying, yields: With this formulation, values of the ratio >1 indicate that the temporal method has greater precision, while the LD method is more precise when the ratio <1. Obviously, this analysis is only meaningful for L ≥ 2 (a minimum of two loci are required for the LD method) and K ≥ 2 (monomorphic loci provide no information). It is easy to see that the first term in eqn (6) will be 1 if L,K = (3,2) or (2,3), and the ratio will be < 1 if either L or K >3. The numerator and denominator of the second term will be equal when t = 1.33 generations, and the numerator will be larger if the time between samples exceeds this value. These effects are illustrated for some representative parameter combinations in Fig. 7. Relative performance of the temporal method increases (higher ratio) when (i) more generations elapse between samples (compare results for t = 2 and 10 for L = 5 loci, K = 5 alleles and L = 20, K = 10); (ii) smaller samples are used (compare results for S = 25 and 50 for L = 20, K = 10); and larger populations are involved (consistently higher ratios for N = 500 and 1000 than for N = 100). Conversely, increasing L and K has a dramatic effect on reducing CVLD, leading to low values of the ratio. The only parameter combination for which overall precision was better for the temporal method involved modest amounts of data (five loci with five alleles each) and a relatively long time (10 generations) between samples. For the other (arguably more realistic) parameter combinations, precision of the LD method was higher, and often a great deal higher. For example, with 20 ‘microsat’ loci with 10 alleles each or 180 diallelic ‘SNP’ loci, CVLD was a third lower than CVT with N = 1000 and over two-thirds lower with N = 100 (Fig. 7).

Figure 7

Theoretical precision of the LD and temporal methods for various combinations of parameters. Values on the Y-axis are ratios of coefficients of variation of for the LD method [CVLD() from eqn 3] and the temporal method [CVT() from eqn 4]. The dotted horizontal line identifies the point at which precision is the same for the two methods. Points above the line indicate greater precision for the temporal method, points below the line greater precision for the LD method. Variables considered are population size (N), sample size of individuals (S), number of loci (L), number of alleles per locus (K), and number of generations between samples (t, temporal method only). Results for the temporal method assume two samples each of size S and results for the LD method assume a single sample of size 2S. These results should be regarded as only a general indication of relative precision of the two methods. Various estimators used in the temporal method have different variance properties, providing some opportunities to trade off precision and accuracy (Jorde and Ryman 2007). Furthermore, likelihood-based (Wang 2001) or approximate Bayesian computation (ABC; Tallmon et al. 2004) temporal methods should have lower variance than moment-based estimators, at least if their underlying assumptions are satisfied. Nevertheless, the general patterns observed here should be fairly robust. Notably, Wang (2009, Fig. 1) found qualitatively similar results in comparing his pseudo-likelihood temporal method to a new single-sample estimator (discussed below): the temporal method performed poorly for low t but eventually outperformed the single-sample estimator if the temporal samples were spaced a large enough number (t = 16–32) of generations apart, and doubling the number of loci led to larger increases in precision of the single-sample method.

Discussion

It seems clear that previous efforts to estimate effective size in natural populations have not extracted as much information as possible from genetic data. Any application of the temporal method that collects multilocus genotypic data provides an opportunity to obtain at least two estimates of Ne from individual generations using the LD method or one of the other single-sample estimators, but relatively few have taken advantage of this opportunity. The simulation program used here (EasyPop) differs in some important ways from the one used to generate data to develop the empirical bias correction for the LD method (Waples 2006). In particular, the original program had no mutation and considered only diallelic loci at moderate allele frequency, whereas EasyPop has an explicit mutation model and generates data with a wide range of allele frequencies and numbers of alleles per locus. The new simulated data thus represent an independent assessment of the bias-corrected LD estimator – and a more realistic assessment of performance with highly polymorphic markers currently in widespread use. In summarizing important results of our evaluations, we return to the specific questions posed in the Introduction before closing by discussing a few related issues.

Factors affecting precision and bias

The LD method benefits from the fact that the amount of information increases with the square of the numbers of loci and alleles, so efforts to capitalize on ready availability of highly variable markers can pay large dividends. Within the range of values of practical interest to most investigators, the same proportional increases in numbers of loci, alleles per locus, or individuals sampled should have roughly comparable effects on precision, and this result (along with the quantitative expression for CVLD in eqn 3) can be used to guide experimental design decisions. Although each SNP locus provides much less precision than a typical microsatellite, this can be overcome by brute force if enough new independent loci can be developed. Figure 1 indicates that about 180 SNP loci can be expected to provide precision comparable to that attained by about 10–20 typical microsatellite loci; this might seem like a lot, but techniques to develop thousands of SNP loci are rapidly advancing and declining in cost (Morin et al. 2004; Xu et al. 2009). As discussed below (Key assumptions), however, an application using a very large numbers of SNP loci should be accompanied by a careful analysis of assumptions of independence and neutrality. Rare alleles tend to upwardly bias LD estimates of Ne, just as they do for the temporal method (Turner et al. 2001), but in many cases the effect is not too severe. This means that large numbers of alleles typically can be allowed into the analysis to boost precision without substantially increasing bias. For most applications, a good rule of thumb is to screen out any alleles at frequency <0.02, as well as any alleles that occur in only a single copy in the sample (see Nielsen and Signorovitch 2003, for discussion of effects on of using singletons from SNP data). Using this criterion, something close to maximum precision can be achieved while (in most cases) keeping bias to less than about 10% (Fig. 6). With large samples (S∼ 100 or larger), alleles with frequency as low as 0.01 can probably be used.

Practical applications

All genetic methods for estimating contemporary depend on a signal that is a function of 1/Ne, so these methods are most powerful with small populations (for which the signal is strong) and have difficulty distinguishing large populations from infinite ones (because the signal is so small). This effect is amply demonstrated for the LD method in Figs 1 and 3 and Table 2. With amounts of data commonly available today (samples of about 50 individuals; 10–20 microsatellite-like loci), quite good precision can be obtained for populations with relatively small effective sizes (about 100–200 or less). For very small populations (Ne less than about 50), small samples of only 25–30 individuals can still provide some useful information. These results are encouraging, as conservation concerns typically focus on populations that are (or might be) small, and modern molecular methods have facilitated an increasing interest in studying evolutionary processes in local populations in nature. In contrast, estimating effective size with any precision in populations that are large (Ne∼ 1000 or larger) is very challenging. In general, a small sample of individuals (or a moderate or large sample based on only a few gene loci) will not provide much useful information about Ne in large populations, and even with relatively large samples of individuals and loci it might not be possible to say much about the upper bound to . In theory, with arbitrarily large numbers of loci and alleles (as might routinely be achievable in the future), it should be possible to produce estimates that place tight bounds even on the upper limit to in large populations (cf. Fig. 1). However, because the drift signal is so small for large populations, researchers who want to estimate Ne in populations that are or might be large should pay careful attention to various sources of noise in the analysis (slight departures from random sampling; data errors; violation of underlying model assumptions) that can have a disproportionate effect on results. In this respect, estimating contemporary Ne in large populations using genetic markers is as challenging as, and suffers many of the same intrinsic limitations as, genetic estimates of dispersal in high gene flow species (Waples 1998; Fraser et al. 2007). Fortunately, because the LD signals for large and small populations are quite different (Fig. 3), estimates based on even moderate amounts of data should be able to provide a useful lower bound for Ne, and this can be important, particularly in conservation applications where a major concern is avoidance and/or early detection of population bottlenecks. Based on extensive computer simulations, Russell and Fewster (2009) reached a rather pessimistic conclusion about practical usefulness of the LD method. However, two factors make their results difficult to interpret in the present context. First, they presented quantitative results only for the original LD method (Hill 1981) which, when the ratio S/Ne is small, has been shown to produce an estimate that is more closely related to the sample size than to the true effective size (England et al. 2006; Waples 2006). Second, Russell and Fewster (2009) assessed bias by comparing arithmetic mean to the true Ne. Because of the inverse relationship between and (eqn 2a), this has the unfortunate consequence that if is a completely unbiased estimator of r2, arithmetic mean will be upwardly biased. Results in Table 2 and Figure 3 show how upwardly skewed the distribution of can be, in which case the arithmetic mean is not a useful indicator of central tendency. Here, we have followed the approach used by Nei and Tajima (1981), Pollak (1983), Waples (1989), Jorde and Ryman (2007), Nomura (2008), and Wang (2009), all of whom evaluated bias in terms of harmonic mean (or, equivalently, used the overall mean or temporal across replicates to compute an overall ). Importantly, this approach can readily accommodate negative or infinite values in individual replicates (see next section).

Negative estimates and nonsignificant LD

As shown in eqn (2a), before estimating Ne in the LD method, the expected contribution of sampling error is subtracted from the empirical . If Ne is large, or if only limited data are available, by chance mean can be smaller than the sample size correction, in which case the estimate of Ne will be negative. A related phenomenon can occur with the standard temporal method (Nei and Tajima 1981; Waples 1989) and with unbiased estimators of genetic differentiation (Nei 1978; Weir and Cockerham 1984). Negative estimates occur when the genetic results can be explained entirely by sampling error without invoking any genetic drift, so the biological interpretation is = ∞ (Laurie-Ahlberg and Weir 1979; Nei and Tajima 1981). In this situation, the user can conclude that the data provide no evidence that the population is not ‘very large’. However, even if the point estimate is negative, if adequate data are available the lower bound of the CI generally will be finite and can provide useful information about plausible limits Ne. Many software packages provide tests of statistical significance of LD for each pair of loci or across all loci. Although these tests vary in the way they assess significance and combine information across multiple alleles and loci, in general they are testing the hypothesis that the observed LD can be explained entirely by sampling error. A nonsignificant test for LD, therefore, indicates that the null hypothesis (H0: ≤ 1/S) cannot be rejected, which implies that the upper bound of would include infinity. That is, a nonsignificant test provides no evidence for drift, which is not the same as saying no drift occurs (in fact, all finite populations have some contribution to from drift, and, assuming the test is valid, that drift component should become statistically significant if enough data are collected). So, for reasons discussed in the previous paragraph, even a dataset with a nonsignificant LD result can potentially provide useful information about effective population size.

Key assumptions

Like other Ne estimators, the LD method assumes that of the four evolutionary forces (mutation, migration, selection and genetic drift), only drift is responsible for the signal in the data. Although mutation rate strongly affects estimates of long-term Ne, it probably is of little consequence for the LD method, apart from its role in producing genetic variation. Selection can cause nonrandom associations of genes at different gene loci, just as it can influence rates of allele frequency change, but it might be reasonable to assume that it has relatively little influence on LD measured in microsatellite loci. The neutrality assumption should be evaluated more rigorously, however, if large numbers of SNP loci are used. Vitalis and Couvet (2001) proposed a method to jointly estimate Ne and migration rate. Immigration of genetically differentiated individuals from other populations leads to mixture disequilibrium (Nei and Li 1973) that could downwardly bias LD estimates of local Ne; conversely, high migration rates among weakly differentiated populations could cause local samples to provide an estimate closer to the metapopulation Ne than the local Ne (because the sample is drawn from a larger pool of potential parents). Unpublished data (P. England, personal communication) indicate that under equilibrium migration models, the former effect is small and the latter effect is substantial only for migration rates that are high in genetic terms (∼10% or higher) – suggesting that under many natural conditions the LD method can provide a robust estimate of local (subpopulation) Ne. However, upward biases in might be more important in small subpopulations that are part of a metapopulation, as in that case even a few migrants per generation could represent a relatively high migration rate. The LD method as implemented here assumes that loci are independent (probability of recombination = 0.5). This is probably a reasonable assumption in most current situations, given the numbers of markers typically used in studies of natural populations. However, some taxa (e.g., Drosophila) have only a few chromosomes and/or regions of the genome in which recombination is suppressed, and in the future LD estimates might be generated using thousands of SNP or other markers. In such cases, therefore, issues related to recombination rate would have to be re-evaluated. Linked markers actually provide more power, providing the recombination rate is known (Hill 1981). The LD method provides information primarily about Ne in the parental generation, but residual disequilibrium from a recent bottleneck can affect the estimate for a few generations (Waples 2005, 2006). If loci are closely linked, estimates from the LD method will be more strongly influenced by Ne in the distant past (see Tenesa et al. 2007, for an application to human SNP data). The theoretical relationship between and Ne assumes either random mating without selfing or random mate choice with lifetime monogamy (Weir and Hill 1980; Waples 2006). The populations do not have to be ideal; the method still performs well with highly skewed sex ratios and overdispersed variance in reproductive success (Waples 2006). However, strongly assortative mating or widespread selfing would be expected to lead to biases that have not been quantitatively evaluated. Genotyping errors can also affect estimates of LD (Akey et al. 2001). Russell and Fewster (2009) found an upward bias in for the standard LD method (Hill 1981) when 1% allelic dropout was modeled, and this topic bears further study. Finally, the underlying model for the LD method assumes discrete generations, and this is the only situation where the resulting estimate can be interpreted as effective size for a generation (Ne). Most natural populations do not have discrete generations; when samples are taken from age-structured species, the resulting estimate from the LD method can be interpreted as an estimate of the effective number of breeders (Nb) that produced the cohort(s) from which the sample was taken. The relationship between and Ne in age-structured species has been evaluated for the temporal method (Waples and Yokota 2007), but comparable evaluations have not been made for any single-sample estimator. A reasonable conjecture is that if the number of cohorts represented in a sample is roughly equal to the generation length, the estimate from the LD method should roughly correspond to Ne for a generation, but this remains to be tested.

Comparison with other methods

As illustrated in Fig. 7, with samples of individuals, loci, and alleles routinely available today, the LD method should generally provide better precision than the temporal method, unless samples for the latter are spaced a large number of generations apart. Several other one-sample estimators of Ne have been proposed, although direct comparisons of performance have generally not been made with the LD method. The heterozygote excess method is generally much less precise than other single-sample estimators (Nomura 2008; Wang 2009) and is best suited for analyzing small populations of species with Type III survivorship for which large samples of offspring are possible (Hedgecock et al. 2007; Pudovkin et al. 2009). A single-sample ABC estimator (OneSamp; Tallmon et al. 2008) appears to have considerable potential but has not been rigorously evaluated under a wide range of conditions and assumes a specific type of mutation model that makes it useful only for microsatellite data. Two new methods, based on the analysis of molecular coancestry (Nomura 2008) and identification of full and half sibs (Wang 2009), each included some comparisons with some other Ne estimators. However, Nomura only considered populations with tiny Ne (<15) and only compared his new method to the heterozygote excess method, which was also the only single-sample estimator that Wang (2009) compared his new method to with simulated data. Nevertheless, Wang did provide results for some analyses that are comparable enough to those conducted here that a quantitative comparison of the LD method and the sibship method is possible for a few parameter combinations. In Table 5 of his paper, Wang (2009) reported the root mean-squared error () for the quantity for simulations using random mating populations of constant size with equal sex ratio and 10–40 gene loci with eight alleles of initial equal frequency. That analysis involved a comparison with temporal samples taken in generations 3 and 5, so to get a single sibship-based estimate for each replicate Wang computed an estimate for both generations and took the average. For the parameters N = 200, S = 50, L = 20, for the sibship method was 0.0005. To allow a comparison, we simulated populations as described in Methods with N = 200, L = 20, A = 8, and drew samples after 10 generations – long enough for levels of LD to stabilize. The version of EasyPop we used does not allow sampling in two different generations, so we used the approach described above of taking a single sample of twice the size (i.e., we sampled 100 individuals once rather than 50 individuals twice). All else being equal, the two sampling schemes should provide roughly comparable precision. For our simulated datasets, we found that of was 0.0004, slightly lower than the value reported by Wang for his one-sample method and considerably less than the value he found (0.0015) for temporal samples separated by two generations. For the same set of simulated populations and sample size S = 100 (two samples of 100 for the sibship method, one sample of 200 for the LD method), we found of 0.00025 compared to 0.0003 reported by Wang. It would be a mistake to place too much emphasis on these results, given that tabular values in Wang (2009) are rounded off and that the direct comparisons that are possible cover only a small fraction of potential parameter space. Nevertheless, these data suggest that LD and sibship one-sample methods might have roughly comparable levels of performance as measured by some common indicators. A comprehensive comparison of performance of the LD, coancestry, and sibship methods would be useful.

Combining estimates across methods

Researchers who have reported estimates of Ne from more than one method too often have not taken advantage of another opportunity to increase precision – combining the estimates into a single estimator. Because all the estimators respond to a signal that is inversely related to Ne, an appropriate way to combine estimates across methods would be to take a weighted harmonic mean (Waples 1991). Ideally, the weights would be reciprocals of variances, which can be obtained for the moment-based LD and temporal methods from eqns (3) and (4), respectively. Combining data for these two methods could be particularly useful for large populations, as the temporal method is somewhat less sensitive to large N. Appendix A provides a worked example of how effective size estimates can be combined, both within and across methods. Additional work would be needed to determine the most appropriate way to weight estimates from different single-sample estimators. However, Nomura (2008) showed that considerable improvements in performance can be obtained even by taking an unweighted harmonic mean of from the heterozygote excess and molecular coancestry methods. Some cautions are important to keep in mind here. First, which time period(s) each estimate applies to needs careful consideration. Each of the single-sample estimators is most closely related to inbreeding Ne and provides an estimate of the effective number of breeders (Nb) that produced the sample (Waples 2005). Combining estimates from single-sample methods should therefore be straightforward, provided an appropriate weighting scheme can be developed. However, in general the single-sample and temporal methods do not provide estimates of Ne in exactly the same generations (Waples 2005). Each single-sample estimate relates to Ne in a single generation (or Nb for a particular time period), while a temporal estimate depends on the harmonic mean Ne in the entire interval spanned by the samples. If Ne does not vary too much over time and the primary interest is an overall estimate of effective size for the population, then it might be reasonable to simply combine the temporal and single-sample estimates with appropriate weights as discussed above. However, if the primary interest is Ne in specific generations, which might vary considerably, then careful consideration is needed to determine whether combining estimates is desirable. Second, the benefits of combining estimates depend on the degree to which they provide independent information about effective size. Based on unpublished data cited by Waples (1991), the LD and temporal methods are essentially independent, but correlations among the other estimators have not been determined. Conducting these evaluations should be an important research priority. Third, the different Ne estimators generally depend on similar, but not identical, suites of assumptions (as discussed above). It will generally be the case that not all of these assumptions are completely satisfied in any particular dataset, and the different estimators might behave in different ways in response to violation of these assumptions. Researchers should think carefully before combining estimates in cases for which good reasons exist to believe some key assumptions are strongly violated.

Table A1

Combining estimates of N within methods (single-sample or temporal)

					Temporal

		Single-sample			Moment

Time	S	LD	ABC		W	JR	ML
2005	24	23.8	34.7	28.2	20.9	28	322
2006	39	35.9	29.3	32.3
HMean	29.7			30.7*	23.9†		44.6‡

Harmonic mean single-sample estimate for 2005 and 2006 combined () weighted by sample size and number of allelic comparisons (n).

Harmonic mean of the W and JR estimates ().

Harmonic mean of the ML estimate and [the harmonic mean of the W and JR estimates] ().

HMean, harmonic mean; S, sample size; LD, linkage disequilibrium (Waples and Do 2008); ABC, Approximate Bayesian Computation (Tallmon et al. 2008); W, Waples (1989); JR, Jorde and Ryman (2007); ML, pseudo maximum likelihood (Wang 2001). This example uses data for the Miami blue butterfly from Saarinen et al. (2009).

Table A2

Combining estimates of N across single-sample (SS) and temporal (T) methods

	Effective population size

	25	50	100
Sample size (S)	29.7	29.7	29.7
Number of loci (L)	11	11	11
Alleles/locus (K)	5	5	5
n	880	880	880
n′	44	44	44
Generations (t)	8	8	8
Var()	18	208	2801
Var()	42	229	1542
W_SS	0.70	0.52	0.36
W_T	0.30	0.48	0.64
with ML	33.8	36.0	38.4
without ML	28.3	27.0	25.9

Results are shown for three different effective population sizes. is the weighted harmonic mean of the overall estimates for the two methods from Table A1 [= 30.7 for single-sample method and 44.6 or 23.9 for the temporal method with or without the Wang (2001) ML estimate, respectively].

n = number of degrees of freedom for the LD method (Equation 1); n’ = number of degrees of freedom for the temporal method (Equation 5); W = relative weights for single-sample and temporal estimates.

35 in total

1. EASYPOP (version 1.7): a computer program for population genetics simulations.

Authors: F Balloux
Journal: J Hered Date: 2001 May-Jun Impact factor: 2.645

2. Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium.

Authors: Rasmus Nielsen; James Signorovitch
Journal: Theor Popul Biol Date: 2003-05 Impact factor: 1.570

3. Heterozygote excess in small populations and the heterozygote-excess effective population size.

Authors: François Balloux
Journal: Evolution Date: 2004-09 Impact factor: 3.694

4. A new method for estimating effective population sizes from a single sample of multilocus genotypes.

Authors: Jinliang Wang
Journal: Mol Ecol Date: 2009-04-08 Impact factor: 6.185

5. Modelling evolutionary processes in small populations: not as ideal as you think.

Authors: Robin S Waples; James R Faulkner
Journal: Mol Ecol Date: 2009-03-31 Impact factor: 6.185

6. Correlation-based inference for linkage disequilibrium with multiple alleles.

Authors: Dmitri V Zaykin; Alexander Pudovkin; Bruce S Weir
Journal: Genetics Date: 2008-08-30 Impact factor: 4.562

7. ldne: a program for estimating effective population size from data on linkage disequilibrium.

Authors: Robin S Waples; Chi DO
Journal: Mol Ecol Resour Date: 2008-07 Impact factor: 7.090

8. The sampling distribution of linkage disequilibrium under an infinite allele model without selection.

Authors: R R Hudson
Journal: Genetics Date: 1985-03 Impact factor: 4.562

9. Inferences about linkage disequilibrium.

Authors: B S Weir
Journal: Biometrics Date: 1979-03 Impact factor: 2.571

10. A new method for SNP discovery.

Authors: Jian-Yong Xu; Gen-Bo Xu; Song-Lin Chen
Journal: Biotechniques Date: 2009-03 Impact factor: 1.993

184 in total

1. Computationally efficient sibship and parentage assignment from multilocus marker data.

Authors: Jinliang Wang
Journal: Genetics Date: 2012-02-23 Impact factor: 4.562

2. Eutrophication causes speciation reversal in whitefish adaptive radiations.

Authors: P Vonlanthen; D Bittner; A G Hudson; K A Young; R Müller; B Lundsgaard-Hansen; D Roy; S Di Piazza; C R Largiader; O Seehausen
Journal: Nature Date: 2012-02-15 Impact factor: 49.962

3. Exploring the causes of small effective population sizes in cyst nematodes using artificial Globodera pallida populations.

Authors: Josselin Montarry; Sylvie Bardou-Valette; Romain Mabon; Pierre-Loup Jan; Sylvain Fournet; Eric Grenier; Eric J Petit
Journal: Proc Biol Sci Date: 2019-01-16 Impact factor: 5.349

4. Stock enhancement or sea ranching? Insights from monitoring the genetic diversity, relatedness and effective population size in a seeded great scallop population (Pecten maximus).

Authors: R Morvezen; P Boudry; J Laroche; G Charrier
Journal: Heredity (Edinb) Date: 2016-06-29 Impact factor: 3.821

5. A longitudinal genetic survey identifies temporal shifts in the population structure of Dutch house sparrows.

Authors: L Cousseau; M Husemann; R Foppen; C Vangestel; L Lens
Journal: Heredity (Edinb) Date: 2016-06-08 Impact factor: 3.821

6. Estimating contemporary effective population size on the basis of linkage disequilibrium in the face of migration.

Authors: Robin S Waples; Phillip R England
Journal: Genetics Date: 2011-08-11 Impact factor: 4.562

7. Dispersal and selection mediate hybridization between a native and invasive species.

Authors: Ryan P Kovach; Clint C Muhlfeld; Matthew C Boyer; Winsor H Lowe; Fred W Allendorf; Gordon Luikart
Journal: Proc Biol Sci Date: 2015-01-22 Impact factor: 5.349

8. A century-long genetic record reveals that protist effective population sizes are comparable to those of macroscopic species.

Authors: Phillip C Watts; Nina Lundholm; Sofia Ribeiro; Marianne Ellegaard
Journal: Biol Lett Date: 2013-11-27 Impact factor: 3.703

9. Incorporating non-equilibrium dynamics into demographic history inferences of a migratory marine species.

Authors: E L Carroll; R Alderman; J L Bannister; M Bérubé; P B Best; L Boren; C S Baker; R Constantine; K Findlay; R Harcourt; L Lemaire; P J Palsbøll; N J Patenaude; V J Rowntree; J Seger; D Steel; L O Valenzuela; M Watson; O E Gaggiotti
Journal: Heredity (Edinb) Date: 2018-05-03 Impact factor: 3.821

10. Population structure and effective/census population size ratio in threatened three-spined stickleback populations from an isolated river basin in northwest Spain.

Authors: A Pérez-Figueroa; C Fernández; R Amaro; M Hermida; E San Miguel
Journal: Genetica Date: 2015-04-28 Impact factor: 1.082