| Literature DB >> 26067448 |
Anne Chao1, Lou Jost2, T C Hsieh1, K H Ma1, William B Sherwin3, Lee Ann Rollins4.
Abstract
Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information ("Shannon differentiation") between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26067448 PMCID: PMC4465833 DOI: 10.1371/journal.pone.0125471
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The expected Shannon entropy 1 H, heterozygosity 2 H, for the equilibrium allele distribution at a neutral locus under IAM and SMM for an isolated population, and for a total population (subscript T) composed of n subpopulations (subscript S).
| Model/measure | Isolated population | Total population | Subpopulation |
|---|---|---|---|
| IAM: | |||
| Shannon entropy |
1
|
1
|
|
| (See | |||
| Heterozygosity |
2
|
|
|
| SMM: | |||
| Shannon entropy |
1
|
1
|
|
| (See | |||
| Heterozygosity |
|
|
|
N = population size, m = dispersal rate, μ = mutation rate, m * = nm/(n–1), N = effective population size in the total population, and ψ(x) = digamma function. See S1 and S2 Appendices for all derivations. For an isolated population, when α tends to 0, all formulas for SMM reduce to those for IAM. For the total population, when α tend to 0, all formulas for SMM reduce to those for IAM. For subpopulation, when both α and α tend to 0, all formulas for SMM reduce to those for IAM.
(Notation for IAM) θ = 4Nμ, .
(Notation for SMM) θ = 4Nμ, α = [(1 + 2θ)1/2−1]/2, θ = [1/(1−2 H )2−1]/2, α = [1/(1−2 H )−1]/2 = [(1+2θ )1/2−1]/2. , where 2 H and 2 H are shown in Eqs 8A and 8B. B(x,y) = Γ(x)Γ(y)/Γ(x+y): beta function, Γ(x): gamma function.
Fig 1(IAM-FIM n = 2, N = 5000).
Plots of the Shannon differentiation (i.e., normalized mutual information, solid lines), Jost’s differentiation measure D (dashed lines), and G (dash-dotted line) as a function of Nm (upper panels), Nμ (middle panels), and m */(nμ) (lower panels).
Fig 2(SMM-FIM n = 2, N = 5000).
Plots of the Shannon differentiation (i.e., normalized mutual information, solid lines), Jost’s differentiation measure D (dashed lines), and G (dash-dotted line) as a function of Nm (upper panels), Nμ (middle panels), and m */(nμ) (lower panels).
Fig 3Simulation plots.
Simulation results showing stochastic behavior of the average (over 5 loci) of total-population and subpopulation Shannon entropies for N = 10000, n = 4, μ = 0.005%, m = 0.1% in the simulation. The horizontal line in each panel represents the theoretical equilibrium value. The initial condition was set to be just one allele (all shared) in each subpopulation. (a) The stochastic pattern for the total-population entropy 1 H is shown in black curve, and the red curve is 1 H = ψ[1/(1−2 H )]+0.5772, which is the 1 H value calculated from a function of heterozygosity under IAM-FIM. (b) The pattern for subpopulation entropy 1 H is shown in black curve, and the red curve is obtained via a link from heterozygosity (see Eq. D7 in S4 Appendix) under IAM-FIM. In both (a) and (b), the processes converge roughly after 40000 generations, but the two lines become close before equilibrium (around 20000 generations). (c) The stochastic pattern for total- population entropy 1 H under SMM-FIM is shown in black curve, and the red curve is log{[1+2 H −(2 H )2]/(1−2 H )}, which is the 1 H value calculated from a function of heterozygosity. (d) The pattern for subpopulation entropy 1 H is shown in black curve, and the red curve is obtained via a link from heterozygosity (see S4 Appendix for the link). The relationship between heterozygosity and Shannon entropy exists in all stages of the stochastic process under SMM-FIM.
Consistency of empirical data with IAM based on the Dopamine receptor D4 (DRD4) alleles data.
| Method/Model | Measure | Subpopu- | Subpopu- | Subpopu- | Subpopu- |
|---|---|---|---|---|---|
| lation 1 | lation 2 | lation 3 | lation 4 | ||
| Empirical | Estimated Shannon |
|
|
|
|
| (s.e.) | (0.0952) | (0.1139) | (0.0460) | (0.0815) | |
| Estimated heterozygosity | 0.8018 | 0.8688 | 0.9004 | 0.8949 | |
| (s.e.) | (0.0232) | (0.0193) | (0.0059) | (0.0147) | |
| IAM expected | Expected Shannon |
|
|
|
|
| (s.e.) | (0.1250) | (0.1392) | (0.0608) | (0.1426) | |
| Proportional difference | 0.0188 | 0.1179 | 0.0526 | 0.0045 |
Empirical and expected values by treating each of the four subpopulations as an isolated population following IAM for mutation. Data are shown in Table A (S5 Appendix). See Table 1 for the expected formulas and S4 Appendix for statistical methods to obtain empirical values. The proportional difference PD ≡ (expected value−estimated value)/expected value. All s.e. estimates were obtained by a bootstrap method based on 1000 resamples generated from the observed allele frequency distribution.
#The expected parameters under IAM for the four subpopulations: Nμ = (1.0113, 1.6552, 2.2610, 2.1280); see Eq 1.
Consistency of empirical data with IAM-FIM based on the Dopamine receptor D4 (DRD4) alleles data.
| Method or | Measure | Total | Subpopu- | Shannon | Jost |
|
|---|---|---|---|---|---|---|
| assumptions | population | lation | differentiation | differentiation | ||
| Empirical | Estimated Shannon |
|
|
| ||
| (s.e.) | (0.0400) | (0.0447) | (0.0226) | |||
| Estimated heterozygosity | 0.9106 | 0.8665 | 0.4407 | 0.0485 | ||
| (s.e.) | (0.0046) | (0.0083) | (0.0386) | (0.0070) | ||
| IAM-FIM | ||||||
| expected | Expected Shannon |
|
|
| ||
| (s.e.) | (0.0524) | (0.0626) | (0.0315) | |||
| Proportional difference | 0.0686 | -0.0184 | 0.4440 |
Empirical and IAM-FIM expected values for total-population, subpopulation and differentiation measures under IAM-FIM. Data are shown in Table A (S5 Appendix). See Table 1 for the expected formulas and S4 Appendix for statistical methods to obtain empirical values. The proportional difference PD ≡ (expected value−estimated value)/expected value. All s.e. estimates were obtained by a bootstrap method based on 1000 resamples generated from the observed allele frequency distribution.
# The expected parameters under IAM-FIM: Nμ = 0.6058, Nm = 3.0748; see Eqs. D5 and D6 of S4 Appendix.
* Total population entropy value calculated from total population-heterozygosity under IAM via Eq 3A: 1 H = ψ[1/(1−2 H )]+0.5772.
§ Subpopulation entropy is calculated from heterozygosity via a link described in Eq. D7 in S4 Appendix.
Consistency of empirical data with SMM based on the microsatellites for each subpopulation (all results are averaged over 3 loci).
| Method/Model | Measure | Subpopu- | Subpopu- | Subpopu- | Subpopu- |
|---|---|---|---|---|---|
| lation 1 | lation 2 | lation 3 | lation 4 | ||
| Empirical | Estimated Shannon |
|
|
|
|
| (s.e.) | (0.0227) | (0.0484) | (0.0142) | (0.0215) | |
| Estimated heterozygosity | 0.7585 | 0.7905 | 0.8491 | 0.8569 | |
| (s.e.) | (0.0073) | (0.0160) | (0.0034) | (0.0045) | |
| SMM expected | Expected Shannon |
|
|
|
|
| (s.e.) | (0.0436) | (0.1061) | (0.0313) | (0.0484) | |
| Proportional difference | 0.0065 | -0.0171 | -0.0036 | -0.0107 |
Empirical and expected values by treating each of the four subpopulations as an isolated population following SMM for mutation. Data are shown in Table B (S5 Appendix). See Table 1 for the expected formulas and S4 Appendix for statistical methods to obtain empirical values. The proportional difference PD ≡ (expected value−estimated value)/expected value. All s.e. estimates were obtained by a bootstrap method based on 1000 resamples generated from the observed allele frequency distribution.
#The expected parameters (average over 3 loci) for the four subpopulations: Nμ = (2.7901, 3.4202, 6.0214, 8.0434); see Eq 4B.
Consistency of empirical data with SMM-FIM based on the microsatellites for each subpopulation (all results are averaged over 3 loci).
| Methods or | Measure | Total | Subpopu- | Shannon | Jost |
|
|---|---|---|---|---|---|---|
| assumptions | population | lation | differentiation | differentiation | ||
| Empirical | Estimated Shannon |
|
|
| ||
| (s.e.) | (0.0122) | (0.0153) | (0.0086) | |||
| Estimated heterozygosity | 0.8554 | 0.8138 | 0.2983 | 0.0512 | ||
| (s.e.) | (0.0028) | (0.0047) | (0.0185) | (0.0045) | ||
| SMM-FIM | ||||||
| expected | Expected Shannon |
|
|
| ||
| (s.e.) | (0.0166) | (0.0187) | (0.0104) | |||
| Proportional Difference | -0.0117 | -0.0050 | -0.0762 |
Empirical and SMM-FIM expected values for total-population, subpopulation and differentiation measures under SMM-FIM. Data are shown in Table B (S5 Appendix). See Table 1 for the expected formulas and S4 Appendix for statistical methods to obtain empirical values. The proportional difference PD ≡ (expected value−estimated value)/expected value. All s.e. estimates were obtained by a bootstrap method based on 1000 resamples generated from the observed allele frequency distribution.
# The expected parameters (average over 3 loci) under SMM-FIM: Nμ = 6.31, Nm = 9.11; see Eqs. D8 and D9 of S4 Appendix.
* Total population entropy value calculated from total population heterozygosity under SMM via Eq 5B of the main text: 1 H ≈log{[1+2 H −(2 H )2]/(1−2 H )}.
§ Subpopulation entropy is calculated from heterozygosity via a link described in S4 Appendix.