| Literature DB >> 35626564 |
Abstract
We present a new class of estimators of Shannon entropy for severely undersampled discrete distributions. It is based on a generalization of an estimator proposed by T. Schürmann, which itself is a generalization of an estimator proposed by myself.For a special set of parameters, they are completely free of bias and have a finite variance, something which is widely believed to be impossible. We present also detailed numerical tests, where we compare them with other recent estimators and with exact results, and point out a clash with Bayesian estimators for mutual information.Entities:
Keywords: Bayesian; bias; entropy estimates; mutual information estimates; undersampling; variance
Year: 2022 PMID: 35626564 PMCID: PMC9141067 DOI: 10.3390/e24050680
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1Estimated entropies (in bits) of N-tuples of independent and identically distributed random binary variables with and , using the optimized estimator defined in Equation (27). The parameter was kept fixed at its optimal value , while is varied in view of possible problems with the variances, and is plotted on the horizontal axis. For each N and each value of , tuples were drawn. The exact entropy for and is bits, and is indicated by the horizontal straight line.
Figure 2Estimated entropies (in bits) of N-tuples of independent and identically distributed random ternary variables with , and , using the optimized estimator defined in Equation (27). The parameter was kept fixed at its optimal value , while and varied in view of possible problems with the variances. More precisely, we used , so that the plot ends at the bias-free value and at a value of slightly smaller than . For each N and each value of , tuples were drawn. The exact entropy is bits, and is indicated by the horizontal straight line.
Figure 3Estimated mutual information (in bits) of N-tuples of independent and identically distributed random subsamples from two distributions given in [22]. The data for “PYM”, originally due to [24], consist of 250,000 pairs with binary y with , and x being uniformly distributed over 4096 values. Thus each value is realized ≈60 times, and we classify them into 5 classes depending on the associated values: (i) very heavily biased toward , (ii) moderately biased toward , (iii) neutral, (iv) moderately biased toward , and (v) heavily biased toward . When we estimated conditional entropies for randomly drawn subsamples, we kept this classification and choose accordingly: For class (iii) we used , for class (ii) we used , for class (i) we used , for class (iv) we used , and finally for class (v) we used . The data for “spherical”, originally due to [21], consist of 50,000 pairs. Here, Y is again binary with , but X is highly non-uniformly distributed over ≈4000 values. Again we classified these values as neutral or heavily/moderately biased toward or against and used this classification to choose values of accordingly.