| Literature DB >> 25747459 |
Abstract
The effective population size [Formula: see text] is a key parameter in population genetics and evolutionary biology, as it quantifies the expected distribution of changes in allele frequency due to genetic drift. Several methods of estimating [Formula: see text] have been described, the most direct of which uses allele frequencies measured at two or more time points. A new likelihood-based estimator [Formula: see text] for contemporary effective population size using temporal data is developed in this article. The existing likelihood methods are computationally intensive and unable to handle the case when the underlying [Formula: see text] is large. This article tries to work around this problem by using a hidden Markov algorithm and applying continuous approximations to allele frequencies and transition probabilities. Extensive simulations are run to evaluate the performance of the proposed estimator [Formula: see text], and the results show that it is more accurate and has lower variance than previous methods. The new estimator also reduces the computational time by at least 1000-fold and relaxes the upper bound of [Formula: see text] to several million, hence allowing the estimation of larger [Formula: see text]. Finally, we demonstrate how this algorithm can cope with nonconstant [Formula: see text] scenarios and be used as a likelihood-ratio test to test for the equality of [Formula: see text] throughout the sampling horizon. An R package "NB" is now available for download to implement the method described in this article.Entities:
Keywords: effective population size; genetic drift; maximum-likelihood estimation
Mesh:
Year: 2015 PMID: 25747459 PMCID: PMC4423369 DOI: 10.1534/genetics.115.174904
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Hidden Markov model representing the structure of the process. is the sequence of true allele frequencies propagating according to the Wright–Fisher model but they are unobserved. are the realizations or the sampled allele frequencies.
Simulation results
| True | Method | Mean (SD) | 2.5% | 97.5% | Mean C.I. width | Coverage | |
|---|---|---|---|---|---|---|---|
| Two samples (sample at | |||||||
| 1000 | 100 | 1,059.7 (253.5) | 699.8 | 1,657.8 | — | — | |
| MLNE | 1,080.7 (260.7) | 711.3 | 1,695.4 | 1,283.3 | 960 | ||
| 1,033.2 (247.3) | 684.1 | 1,604.8 | 1,195.5 | 956 | |||
| 5000 | 500 | 5,272.4 (1,164.5) | 3,534.1 | 8,056.8 | — | — | |
| MLNE | 5,276.7 (1,166.7) | 3,539.9 | 8,083.9 | 6,046.3 | 970 | ||
| 5,217.1 (1,149.6) | 3,501.6 | 7,958.1 | 5,957.4 | 967 | |||
| Three samples (sample at | |||||||
| 1000 | 100 | 1,107.8 (638.8) | 661.8 | 2,050.7 | — | — | |
| MLNE | 1,076.6 (243.9) | 734.9 | 1,704.6 | 1,134.2 | 957 | ||
| 1,030.9 (226.8) | 709.4 | 1,605.4 | 1,054.0 | 960 | |||
| 5000 | 500 | 5,567.7 (2,038.2) | 3,165.9 | 10708.0 | — | — | |
| MLNE | 5,254.0 (1,153.4) | 3,530.2 | 8,198.1 | 5,427.4 | 950 | ||
| 5,202.0 (1,138.5) | 3,495.9 | 8,008.4 | 5,352.2 | 953 | |||
For each parameter setting, 1000 replicate populations were simulated and all three methods are used to estimate . The true , sample size per generation, and number of temporal samples are shown. A total of 500 unlinked loci are used in each run and the initial allele frequencies are sampled from the uniform distribution. The mean, standard deviation, 2.5 and 97.5 percentiles of the 1000 runs are reported. For MLNE and , the mean width of the 95% confidence interval (C.I.) is also computed. The last column shows the number of C.I.’s (of 1000 simulations) that cover the true value .
Figure 2Plot of bias of the estimator against true . The bias (solid line) is quantified as the percentage difference relative to the true . Sample size was 10% of the true with 1000 loci. Two samples were taken 10 generations apart. The bias approaches 0 (red dotted line) if the estimator is unbiased.
Figure 3Histogram of the likelihood-ratio test statistic under H0 for 5000 simulations. Three temporal samples were drawn in each replicate. The red line represents the theoretical density of a chi-square distribution with 1 d.f.
Figure 4Statistical power against sample size. A specific H1 was chosen as described in the text, with 1000 independent loci.
Figure 5Comparison of computational effort (in seconds) between MLNE and . A shows the computational time against true . of 50,000 was not run for MLNE because this exceeds the limits of the software. B shows the computational time against the number of loci used in each iteration. C plots the computing time against sampling interval.