| Literature DB >> 36080852 |
Beniamin Stecuła1, Kinga Stecuła2, Adrian Kapczyński1.
Abstract
The goal of the research was to study the possibility of using the planned language Esperanto for text compression, and to compare the results of the text compression in Esperanto with the compression in natural languages, represented by Polish and English. The authors performed text compression in the created program in Python using four compression algorithms: zlib, lzma, bz2, and zl4 in four versions of the text: in Polish, English, Esperanto, and Esperanto in x notation (without characters outside ASCII encoding). After creating the compression program, and compressing the proper texts, authors conducted an analysis on the comparison of compression time and the volume of the text before and after compression. The results of the study confirmed the hypothesis, based on which the planned language, Esperanto, gives better text compression results than the natural languages represented by Polish and English. The confirmation by scientific methods that Esperanto is more optimal for text compression is the scientific added value of the paper.Entities:
Keywords: coding; compression; compression algorithms; languages; processing
Mesh:
Year: 2022 PMID: 36080852 PMCID: PMC9460191 DOI: 10.3390/s22176393
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Pseudo code for the developed program.
Figure 2The comparison of the lengths of the characters in each language.
Data on uncompressed text.
| Parameter | pl | en | eo | eox |
|---|---|---|---|---|
| The volume of the uncompressed text (bytes). | 1,187,923 | 1,198,403 | 1,174,480 | 1,174,480 |
The sample text from Quo vadis in different languages and the adequate number of characters.
| Quotation | Number of Characters |
|---|---|
| “I tak minął Nero, jak mija wicher, burza, pożar, wojna lub mór, a bazylika Piotra panuje dotąd z wyżyn watykańskich miastu i światu”. | 131 |
| “Therefore, Nero passed, as a whirlwind, as a storm, as a fire, as war or death passes; but the basilica of Peter rules till now, from the Vatican heights, the city, and the world”. | 174 |
| “Tiel pasis Nero, kiel pasas uragano, fulmotondro, brulo, milito aŭ pesto, dum la baziliko de Petro regas ĝis nun de la Vatikana altaĵo la urbon kaj la mondon”. | 157 |
| “Tiel pasis Nero, kiel pasas uragano, fulmotondro, brulo, milito aux pesto, dum la baziliko de Petro regas gxis nun de la Vatikana altajxo la urbon kaj la mondon”. | 160 |
Data contained in the console.
| Algorithm | pl | en | eo | eox |
|---|---|---|---|---|
| Compression time [s] | ||||
| zlib | 0.0683 | 0.0683 | 0.0722 | 0.0723 |
| lzma | 0.4449 | 0.4552 | 0.4473 | 0.4464 |
| bz2 | 0.0821 | 0.0813 | 0.0813 | 0.0821 |
| lz4 | 0.0818 | 0.0934 | 0.0927 | 0.0927 |
| Space used [%] | ||||
| zlib | 38.38 | 37.20 | 35.80 | 35.71 |
| lzma | 30.66 | 29.45 | 28.58 | 28.56 |
| bz2 | 27.93 | 26.86 | 26.03 | 25.98 |
| lz4 | 43.47 | 41.87 | 40.52 | 40.48 |
| Space used [bytes] | ||||
| zlib | 455,947 | 445,863 | 420,472 | 419,452 |
| lzma | 364,232 | 352,948 | 335,724 | 335,400 |
| bz2 | 331,739 | 321,942 | 305,694 | 305,111 |
| lz4 | 516,426 | 501,722 | 475,912 | 475,389 |
Figure 3Volume of uncompressed text in the given languages.
Figure 4Volume of compressed text in the given languages.
Data on compression efficiency [%].
| Algorithm | pl | en | eo | eox |
|---|---|---|---|---|
| zlib | 38.38 | 37.2 | 35.8 | 35.71 |
| lzma | 30.66 | 29.45 | 28.58 | 28.56 |
| bz2 | 27.93 | 26.86 | 26.03 | 25.98 |
| lz4 | 43.47 | 41.87 | 40.52 | 40.48 |
Figure 5The efficiency of text compression.
Data on the efficiency of compression in relation to the volume of text in Polish [%].
| Algorithm | pl | en | eo | eox |
|---|---|---|---|---|
| zlib | 38.38 | 37.53 | 35.40 | 35.31 |
| lzma | 30.66 | 29.71 | 28.26 | 28.23 |
| bz2 | 27.93 | 27.10 | 25.73 | 25.68 |
| lz4 | 43.47 | 42.24 | 40.06 | 40.02 |
Figure 6The efficiency of text compression in relation to text volume in Polish.
Data on the compression time [ms].
| Algorithm | pl | en | eo | eox |
|---|---|---|---|---|
| Zlib | 68 | 68 | 72 | 72 |
| Lzma | 444 | 455 | 447 | 446 |
| bz2 | 82 | 81 | 81 | 82 |
| lz4 | 81 | 93 | 93 | 93 |
Figure 7The compression time.
Figure 8The result of additional experiment—comparison of compression of text translated in Google Translate.
Figure 9The summarization of the additional experiment (bytes).