| Literature DB >> 33286363 |
Shanyun Liu1,2, Rui She1,2, Zheqi Zhu1,2, Pingyi Fan1,2.
Abstract
This paper mainly focuses on the problem of lossy compression storage based on the data value that represents the subjective assessment of users when the storage size is still not enough after the conventional lossless data compression. To this end, we transform this problem to an optimization, which pursues the least importance-weighted reconstruction error in data reconstruction within limited total storage size, where the importance is adopted to characterize the data value from the viewpoint of users. Based on it, this paper puts forward an optimal allocation strategy in the storage of digital data by the exponential distortion measurement, which can make rational use of all the storage space. In fact, the theoretical results show that it is a kind of restrictive water-filling. It also characterizes the trade-off between the relative weighted reconstruction error and the available storage size. Consequently, if a relatively small part of total data value is allowed to lose, this strategy will improve the performance of data compression. Furthermore, this paper also presents that both the users' preferences and the special characteristics of data distribution can trigger the small-probability event scenarios where only a fraction of data can cover the vast majority of users' interests. Whether it is for one of the reasons above, the data with highly clustered message importance is beneficial to compression storage. In contrast, from the perspective of optimal storage space allocation based on data value, the data with a uniform information distribution is incompressible, which is consistent with that in the information theory.Entities:
Keywords: importance coefficient; lossy compression storage; message importance measure; optimal allocation strategy; weighted reconstruction error
Year: 2020 PMID: 33286363 PMCID: PMC7517127 DOI: 10.3390/e22050591
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Notations.
| Notation | Description |
|---|---|
|
| The sequence of raw data |
|
| The sequence of compressed data |
|
| The storage size of |
|
| The distortion measure function between |
|
| The number of event classes |
|
| The alphabet of raw data |
|
| The alphabet of compressed data |
|
| The error cost for the reconstructed data |
|
| The probability distribution of data class |
|
| The weighted reconstruction error |
|
| The relative weighted reconstruction error |
|
| The storage size of raw data |
|
| The storage size of compressed data |
|
| The round optimal storage size of the data belonging to the |
|
| The maximum available average storage size |
|
| The importance coefficient |
|
|
|
|
| The message importance measure, which is given by |
|
| The average compressed storage size of each data, which is given by |
|
| The maximum available |
|
| The non-parametric message importance measure, which is given by |
Figure 1Pictorial representation of the system model.
Figure 2Restrictive water-filling for optimal storage sizes.
Figure 3The success rate of compressed storage versus the maximum available average storage size.
Figure 4Broken line graph of optimal storage size with the probability distribution , for a given maximum available average storage size and original storage size .
Figure 5Relative weighted reconstruction error (RWRE) versus maximum available average storage size T with the probability distribution in the case of the value of importance coefficient . is acquired by substituting Equation (19) in Equation (38), while is obtained by substituting Equation (22) in Equation (38).
Figure 6RWRE versus maximum available average storage size T with the probability distribution in the case of the value of importance coefficient .
Figure 7RWRE vs. average compressed storage size of each data with importance coefficient .
The auxiliary variables in ideal storage system.
| Variable | Probability Distribution |
|
|
|
|---|---|---|---|---|
|
|
| 5.7924 | −0.6276 | 6.7234 |
|
|
| 4.2679 | −1.1350 | 6.1305 |
|
|
| 7.1487 | −0.0287 | 5.4344 |
|
|
| 2.2367 | −0.5838 | 5.2530 |
|
|
| 0 | 0 | 5 |
Figure 8RWRE versus maximum available average storage size T.
The auxiliary variables in the quantification storage system.
| Variable | Probability Distribution |
|
|
|---|---|---|---|
|
|
| 0.007 | 136.8953 |
|
|
| 0.007 | 136.8953 |
|
|
| 0.01 | 94.3948 |
|
|
| 0.014 | 66.1599 |
|
|
| 0.2 | 4.0000 |