| Literature DB >> 33286315 |
Abstract
Novel measures of symbol dominance (dC1 and dC2), symbol diversity (DC1 = N (1 - dC1) and DC2 = N (1 - dC2)), and information entropy (HC1 = log2 DC1 and HC2 = log2 DC2) are derived from Lorenz-consistent statistics that I had previously proposed to quantify dominance and diversity in ecology. Here, dC1 refers to the average absolute difference between the relative abundances of dominant and subordinate symbols, with its value being equivalent to the maximum vertical distance from the Lorenz curve to the 45-degree line of equiprobability; dC2 refers to the average absolute difference between all pairs of relative symbol abundances, with its value being equivalent to twice the area between the Lorenz curve and the 45-degree line of equiprobability; N is the number of different symbols or maximum expected diversity. These Lorenz-consistent statistics are compared with statistics based on Shannon's entropy and Rényi's second-order entropy to show that the former have better mathematical behavior than the latter. The use of dC1, DC1, and HC1 is particularly recommended, as only changes in the allocation of relative abundance between dominant (pd > 1/N) and subordinate (ps < 1/N) symbols are of real relevance for probability distributions to achieve the reference distribution (pi = 1/N) or to deviate from it.Entities:
Keywords: Camargo statistics; Lorenz curve; Rényi’s entropy; Shannon’s entropy; information entropy; symbol diversity; symbol dominance
Year: 2020 PMID: 33286315 PMCID: PMC7517034 DOI: 10.3390/e22050542
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Ten probability distributions (I–X) are described as hypothetical messages: N = the number of different symbols; p1–p33 = the relative abundances of symbols (symbol probabilities); P = the whole relative abundance of dominant symbols (p > 1/N) that must be transferred to subordinate symbols (p < 1/N) to achieve equiprobability (p = 1/N, including N = 1).
| I | II | III | IV | V | VI | VII | VIII | IX | X | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 32 | 16 | 8 | 4 | 2 | 3 | 5 | 9 | 17 | 33 |
|
| 0.0375 | 0.075 | 0.15 | 0.3 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 |
|
| 0.0375 | 0.075 | 0.15 | 0.3 | 0.4 | 0.2 | 0.1 | 0.05 | 0.025 | 0.0125 |
|
| 0.0375 | 0.075 | 0.15 | 0.2 | 0.2 | 0.1 | 0.05 | 0.025 | 0.0125 | |
|
| 0.0375 | 0.075 | 0.15 | 0.2 | 0.1 | 0.05 | 0.025 | 0.0125 | ||
|
| 0.0375 | 0.075 | 0.1 | 0.1 | 0.05 | 0.025 | 0.0125 | |||
|
| 0.0375 | 0.075 | 0.1 | 0.05 | 0.025 | 0.0125 | ||||
|
| 0.0375 | 0.075 | 0.1 | 0.05 | 0.025 | 0.0125 | ||||
|
| 0.0375 | 0.075 | 0.1 | 0.05 | 0.025 | 0.0125 | ||||
|
| 0.0375 | 0.05 | 0.05 | 0.025 | 0.0125 | |||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.0375 | 0.05 | 0.025 | 0.0125 | ||||||
|
| 0.025 | 0.025 | 0.0125 | |||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.025 | 0.0125 | ||||||||
|
| 0.0125 | |||||||||
|
| 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.267 | 0.4 | 0.489 | 0.541 | 0.57 |
Ten probability distributions (XI–XX) are described as hypothetical messages: N = the number of different symbols; p1–p10 = the relative abundances of symbols (symbol probabilities); P = the whole relative abundance of dominant symbols (p > 1/N) that must be transferred to subordinate symbols (p < 1/N) to achieve equiprobability (p = 1/N, including N = 1).
| XI | XII | XIII | XIV | XV | XVI | XVII | XVIII | XIX | XX | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
|
| 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 |
|
| 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.14 | 0.17 | 0.17 | 0.17 | 0.17 |
|
| 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.14 | 0.13 | 0.15 | 0.15 | 0.15 |
|
| 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.14 | 0.13 | 0.12 | 0.13 | 0.13 |
|
| 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 0.14 | 0.13 | 0.12 | 0.11 | 0.11 |
|
| 0.05 | 0.09 | 0.09 | 0.09 | 0.09 | 0.05 | 0.05 | 0.05 | 0.05 | 0.09 |
|
| 0.05 | 0.04 | 0.07 | 0.07 | 0.07 | 0.05 | 0.05 | 0.05 | 0.05 | 0.07 |
|
| 0.05 | 0.04 | 0.03 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 |
|
| 0.05 | 0.04 | 0.03 | 0.02 | 0.03 | 0.05 | 0.05 | 0.05 | 0.05 | 0.03 |
|
| 0.05 | 0.04 | 0.03 | 0.02 | 0.01 | 0.05 | 0.05 | 0.05 | 0.05 | 0.01 |
|
| 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 |
Figure 1The cumulative proportion of abundance is related to the cumulative proportion of symbols, ranked from the symbol with the lowest relative abundance to the symbol with the highest relative abundance, for the ten probability distributions (I–X) described as hypothetical messages in Table 1. The reference distribution is depicted by the 45-degree line of equiprobability, where every symbol has the same relative abundance = 1/N, symbol dominance = 0, and symbol diversity = the number of different symbols (N). Symbol dominance may be estimated as the maximum vertical distance from the Lorenz curve to the 45-degree line, or as twice the area between the Lorenz curve and the 45-degree line, with both measures giving the same value whenever relative abundance inequality occurs only between dominant and subordinate symbols (as shown in this figure). In addition, symbol diversity = N (1 – symbol dominance), symbol redundancy = 1/symbol diversity, and information entropy = log2 symbol diversity.
Measures of symbol dominance (d1, d2, d, and d), symbol diversity (D1, D2, D, and D), and information entropy (H1, H2, H, and H) are computed for the ten probability distributions (I–X) described as hypothetical messages in Table 1. H = log2 N = maximum expected entropy; H1/H, H2/H, H/H, and H/H = normalized entropies. All statistics are explained in the text.
| I | II | III | IV | V | VI | VII | VIII | IX | X | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 0.100 | 0.100 | 0.100 | 0.100 | 0.100 | 0.267 | 0.400 | 0.489 | 0.541 | 0.570 |
|
| 28.800 | 14.400 | 7.200 | 3.600 | 1.800 | 2.199 | 3.000 | 4.599 | 7.803 | 14.190 |
|
| 4.848 | 3.848 | 2.848 | 1.848 | 0.848 | 1.137 | 1.585 | 2.202 | 2.964 | 3.827 |
| 0.970 | 0.962 | 0.949 | 0.924 | 0.848 | 0.717 | 0.683 | 0.695 | 0.725 | 0.759 | |
|
| 0.100 | 0.100 | 0.100 | 0.100 | 0.100 | 0.267 | 0.400 | 0.489 | 0.541 | 0.570 |
|
| 28.800 | 14.400 | 7.200 | 3.600 | 1.800 | 2.199 | 3.000 | 4.599 | 7.803 | 14.190 |
|
| 4.848 | 3.848 | 2.848 | 1.848 | 0.848 | 1.137 | 1.585 | 2.202 | 2.964 | 3.827 |
| 0.970 | 0.962 | 0.949 | 0.924 | 0.848 | 0.717 | 0.683 | 0.695 | 0.725 | 0.759 | |
|
| 0.038 | 0.038 | 0.038 | 0.038 | 0.038 | 0.242 | 0.500 | 0.708 | 0.841 | 0.917 |
|
| 30.768 | 15.384 | 7.692 | 3.846 | 1.923 | 2.273 | 2.500 | 2.632 | 2.703 | 2.740 |
|
| 4.943 | 3.943 | 2.943 | 1.943 | 0.943 | 1.184 | 1.322 | 1.396 | 1.434 | 1.454 |
| 0.989 | 0.986 | 0.981 | 0.972 | 0.943 | 0.747 | 0.569 | 0.440 | 0.351 | 0.288 | |
|
| 0.020 | 0.020 | 0.020 | 0.020 | 0.020 | 0.138 | 0.317 | 0.500 | 0.650 | 0.762 |
|
| 31.360 | 15.680 | 7.840 | 3.920 | 1.960 | 2.586 | 3.413 | 4.503 | 5.942 | 7.841 |
|
| 4.971 | 3.971 | 2.971 | 1.971 | 0.971 | 1.371 | 1.771 | 2.171 | 2.571 | 2.971 |
| 0.994 | 0.993 | 0.990 | 0.985 | 0.971 | 0.865 | 0.763 | 0.685 | 0.629 | 0.589 | |
|
| 5.000 | 4.000 | 3.000 | 2.000 | 1.000 | 1.585 | 2.322 | 3.170 | 4.087 | 5.044 |
Measures of symbol dominance (d1, d2, d, and d), symbol diversity (D1, D2, D, and D), and information entropy (H1, H2, H, and H) are computed for the ten probability distributions (XI–XX) described as hypothetical messages in Table 2. H = log2 N = maximum expected entropy; H1/H, H2/H, H/H, and H/H = normalized entropies. All statistics are explained in the text.
| XI | XII | XIII | XIV | XV | XVI | XVII | XVIII | XIX | XX | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 0.250 | 0.250 | 0.250 | 0.250 | 0.250 | 0.250 | 0.250 | 0.250 | 0.250 | 0.250 |
|
| 7.500 | 7.500 | 7.500 | 7.500 | 7.500 | 7.500 | 7.500 | 7.500 | 7.500 | 7.500 |
|
| 2.907 | 2.907 | 2.907 | 2.907 | 2.907 | 2.907 | 2.907 | 2.907 | 2.907 | 2.907 |
| 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | |
|
| 0.250 | 0.270 | 0.282 | 0.288 | 0.290 | 0.270 | 0.282 | 0.288 | 0.290 | 0.330 |
|
| 7.500 | 7.300 | 7.180 | 7.120 | 7.100 | 7.300 | 7.180 | 7.120 | 7.100 | 6.700 |
|
| 2.907 | 2.868 | 2.844 | 2.832 | 2.828 | 2.868 | 2.844 | 2.832 | 2.828 | 2.744 |
| 0.875 | 0.863 | 0.856 | 0.852 | 0.851 | 0.863 | 0.856 | 0.852 | 0.851 | 0.826 | |
|
| 0.200 | 0.213 | 0.220 | 0.224 | 0.225 | 0.213 | 0.220 | 0.224 | 0.225 | 0.248 |
|
| 8.000 | 7.870 | 7.800 | 7.760 | 7.750 | 7.870 | 7.800 | 7.760 | 7.750 | 7.520 |
|
| 3.000 | 2.976 | 2.963 | 2.956 | 2.954 | 2.976 | 2.963 | 2.956 | 2.954 | 2.911 |
| 0.903 | 0.896 | 0.892 | 0.890 | 0.889 | 0.896 | 0.892 | 0.890 | 0.889 | 0.876 | |
|
| 0.122 | 0.137 | 0.149 | 0.157 | 0.161 | 0.128 | 0.131 | 0.133 | 0.134 | 0.173 |
|
| 8.779 | 8.628 | 8.512 | 8.431 | 8.387 | 8.724 | 8.691 | 8.675 | 8.662 | 8.269 |
|
| 3.134 | 3.109 | 3.089 | 3.076 | 3.068 | 3.125 | 3.120 | 3.117 | 3.115 | 3.048 |
| 0.943 | 0.936 | 0.930 | 0.926 | 0.924 | 0.941 | 0.939 | 0.938 | 0.937 | 0.918 | |
|
| 3.322 | 3.322 | 3.322 | 3.322 | 3.322 | 3.322 | 3.322 | 3.322 | 3.322 | 3.322 |