| Literature DB >> 33286941 |
Marcin Cholewa1, Bartłomiej Płaczek1.
Abstract
This paper introduces a new method of estimating Shannon entropy. The proposed method can be successfully used for large data samples and enables fast computations to rank the data samples according to their Shannon entropy. Original definitions of positional entropy and integer entropy are discussed in details to explain the theoretical concepts that underpin the proposed approach. Relations between positional entropy, integer entropy and Shannon entropy were demonstrated through computational experiments. The usefulness of the introduced method was experimentally verified for various data samples of different type and size. The experimental results clearly show that the proposed approach can be successfully used for fast entropy estimation. The analysis was also focused on quality of the entropy estimation. Several possible implementations of the proposed method were discussed. The presented algorithms were compared with the existing solutions. It was demonstrated that the algorithms presented in this paper estimate the Shannon entropy faster and more accurately than the state-of-the-art algorithms.Entities:
Keywords: Shannon entropy; entropy estimation; positional entropy
Year: 2020 PMID: 33286941 PMCID: PMC7597344 DOI: 10.3390/e22101173
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 2Visualization of positional entropy for binary sequence.
Comparison of Shannon entropy and positional entropy for 1-adjacent, 2-adjacent and 1,2,3-adjacent pairs.
| No. | Sample | Shannon Entropy | Positional Entropy | Positional Entropy | Positional Entropy | |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| ||||||||
| 1 | <0000> | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | <0001> | 0.811 | 0.562 | 0.244 | (1) | (0.334) | (2) | (0.4) | 3 | 0.5 |
| 3 | <0010> | 0.811 | 0.562 | 0.244 | (2) | (0.667) | (3) | (0.6) | 3 | 0.5 |
| 4 | <0011> | 1 | 0.693 | 0.301 | 1 | 0.334 | 3 | 0.6 | 4 | 0.667 |
| 5 | <0100> | 0.811 | 0.562 | 0.244 | 2 | 0.667 | 3 | 0.6 | 3 | 0.5 |
| 6 | <0101> | 1 | 0.693 | 0.301 | (3) | (1) | 4 | 0.8 | 4 | 0.667 |
| 7 | <0110> | 1 | 0.693 | 0.301 | (2) | (0.667) | 4 | 0.8 | 4 | 0.667 |
| 8 | <0111> | 0.811 | 0.562 | 0.244 | 1 | 0.334 | 2 | 0.4 | 3 | 0.5 |
| 9 | <1000> | 0.811 | 0.562 | 0.244 | 1 | 0.334 | 2 | 0.4 | 3 | 0.5 |
| 10 | <1001> | 1 | 0.693 | 0.301 | (2) | (0.667) | 4 | 0.8 | 4 | 0.667 |
| 11 | <1010> | 1 | 0.693 | 0.301 | (3) | (1) | 4 | 0.8 | 4 | 0.667 |
| 12 | <1011> | 0.811 | 0.562 | 0.244 | 2 | 0.667 | 3 | 0.6 | 3 | 0.5 |
| 13 | <1100> | 1 | 0.693 | 0.301 | 1 | 0.334 | 3 | 0.6 | 4 | 0.667 |
| 14 | <1101> | 0.811 | 0.562 | 0.244 | (2) | (0.667) | (3) | (0.6) | 3 | 0.5 |
| 15 | <1110> | 0.811 | 0.562 | 0.244 | (1) | (0.334) | (2) | (0.4) | 3 | 0.5 |
| 16 | <1111> | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Figure 3Comparison of Shannon entropy and positional entropy for 65 entropy classes.
Figure 4Differences between positional entropy and Shannon entropy for 65 entropy classes.
Figure 5Comparison of the cumulative entropy for 32-, 64- and 96-adjacent pairs with the Shannon entropy.
Figure 6Pairs generated by Algorithm 3 for sequence of 5 elements with k = 1.
The 1-adjacent positional entropy for different sample types.
| Type of Sample | Alphabet Length | Average Enp1( |
|---|---|---|
| Binary sequence | 2 | 0.4962 |
| English text | 32 | 0.9593 |
| Digital image-bitmap | 256 | 0.9836 |
| Digital image-compressed | 256 | 0.9964 |
Comparison of average execution times and congruity for the proposed algorithms and existing solutions.
| Algorithms | Parameters | Congruity and Execution Time for Sequences of Length | |||||
|---|---|---|---|---|---|---|---|
| Average Execution Time [s] | Congruity | Average Execution Time [s] | Congruity [%] | Average Execution Time [s] | Congruity [%] | ||
| Algorithm 2 | 0.1814 | 98.76 | 0.3519 | 98.43 | 0.7098 | 98.73 | |
| Algorithm 3 | 0.1798 | 98.89 | 0.3587 | 98.76 | 0.7183 | 98.47 | |
| Algorithm 4 | α = 4 | 0.1734 | 99.11 | 0.3307 | 99.25 | 0.6635 | 99.59 |
| CRAN entropy | Chao | 0.2093 | 98.52 | 0.4173 | 98.32 | 0.8346 | 98.23 |
| CRAN entropy | NSB | 0.2172 | 97.64 | 0.4390 | 98.58 | 0.8791 | 98.09 |
| CRAN entropy | Shrink | 0.2104 | 98.37 | 0.4216 | 98.69 | 0.8207 | 98.34 |
| CRAN entropy | ML | 0.2198 | 98.01 | 0.4365 | 98.56 | 0.8769 | 98.29 |
Figure 7Dependency between execution time and sequence length for the compared algorithms.
Figure 8Execution times measured for the compared algorithms during 10 tests (n = 256).
Figure 9Estimation errors evaluated for the compared algorithms during 10 tests (n = 256).