| Literature DB >> 36203895 |
Subrata Kumar Das1, Mohammad Zahidur Rahman1.
Abstract
Healthcare systems capture patients' data using different medical equipment and store it in the databases with a continual increase in data volume. The continuous processing and sharing of this massive data are rising concerns in live data transferring over the networks. Sending patient data to the distant remote user without proper compressing format requires high latency in the communication channels. Any alternation of data transmitted via the communication medium may also cause issues in assuring data authentication and integrity. For solving the problems, watermarking method is being applied to ensure such security, which has a cheaper computational cost. Various watermarking mechanisms are available for ensuring health data security, especially for medical images. Watermarking on the text was not used yet due to the lack of efficient technique. This paper proposes a secured compression technique for patient live-text data while sharing them remotely over a bandwidth-deficient channel. To test the proposed system, we use patient data. The result indicates that the proposed strategy outperforms the existing compression methods and is robust enough to provide data integrity and authentication.Entities:
Keywords: Encoding; Fragile watermark; Low latency; Patient data; Robust watermark; Security; Text watermarking
Year: 2022 PMID: 36203895 PMCID: PMC9529588 DOI: 10.1016/j.heliyon.2022.e10788
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Summary of literature review on data compression & security.
| Authors [Ref.] | Focus area | Advantages | Drawbacks |
|---|---|---|---|
| Hanoune and Lysmos | Monitoring E-health applications to improve data processing and to compress data volume to store. | To enhance the collection of data. | Bzip2 strategy suites better than LZ4 and LZO, but the compression time is high. |
| Ni et al. | Introducing structural health monitoring (SHM) system to reduce and reconstruct data. | The proposed method is more helpful in saving data on storage devices under low compression ratios. | The compression technique is not as sensitive to all data and may cause an error in reconstructing them. |
| Sridhar and Lakshmi | Using LZW to compress data and comparing other existing techniques to find a better one to reduce medical data. | It supports choosing a better compression algorithm among existing methods and saves storage space. | The LZW performs comparatively better than other methods, but the compression gain shows up to 40% only. |
| Hameed et al. | To transfer health data to remote distance using Huffman coding. | This is useful for ECG signal filtering and compression. | The proposed technique is applied to compress ECG data only. |
| Punitha and Kalavathi | Doing analysis on lossless compression for patient data. | It helps to understand different medical image formats (Nifti, Minc, DICOM, etc.) and compress them. | Huffman coding shows better performance, but the research is only done to compress patient image data. |
| Narayanan et al. | A solution approach for compressing data to prevent data-loss error in Healthcare Applications. | The method helps to optimize the storage space. | The LZMA provides a better compression ratio than Huffman compression, but its compression time is high. |
| Almehmadi and Gutub | Introducing Arabic e-text watermarking and using “Kashida” for hiding the watermarking data. | This approach serves to prevent the likelihood and reoccurrence of a possible attack on Arabic text. | Watermark extraction is complex and time-consuming. |
| Alkhafaji et al. | Enhancing the Quran text watermarking approach using a reversing way. | It protects any data alternation of digital Holy Quran to keep its invaluable meaning intact. | The defined scheme is not robust against formatting attacks. |
| Liang and Iranmanesh | Watermarking approach using white-spaces between the words of the document. | This strategy allows a user to hide the secret information inside the document to protect from the attackers. | That is not robust enough against formatting attacks. |
| Xiao et al. | Presenting a strategy based on Font-Code and use fonts glyphs instead of changing letters of document text for embedding watermark. | It helps to detect machine-recognizable glyph perturbations. | This technique is only applicable for one kind (regular Times New Roman) of font-family. |
| Alotaibi and Elrefaei | To cite a watermarking article for Arabic text based on pseudo-space. | The approach prevents Arabic text from different attacks: copying and pasting, text tampering, and text formatting. | This cannot work against retyping attacks. |
| Hilal et al. | To demonstrate a watermarking approach on text (RETWNLPA) for English based on Natural Language Processing to uplift the accuracy of tamper data detection. | The proposed technique is helpful in identifying tampering with sensitive English text. | It is an English text-based watermarking that should comply with the grammar rules to preserve readability. |
Analysis of different lossless compression mechanisms used in healthcare system.
| Algorithm | Strategy | Issue |
|---|---|---|
| LZW | LZW is an improvement of LZ78 and a heuristic algorithm that tries to search increasingly longer repetitive phrases and encode them. | The algorithm allows finding a match always and is not effective in compressing data without multiple occurrences of substrings. |
| RLE | To reduce data based on repetitive sequences containing many consecutive data known as runs. The algorithm store the sequence of data as a single value and count. | It may yield ambiguity for strings with numerical characters. |
| LZO | It is a block-level compression and implemented based on a 64 KB compression dictionary. LZO uses LZ77 with a small hash table. | It provides lower compression gain and wastes more space. |
| LZSS | LZSS is a derivative of LZ77 and works by checking whether a substitution decreases data size or not. | The algorithm has poor anti-error performance and does not provide good performance to compress data. |
| LZMA | LZMA is a dictionary compression scheme yielded modifying LZ77 at a bit rather than byte level. | The slow compression speed is the main problem of LZMA. |
| LZ4 | It compresses data based on LZ77. LZ4 represents a series of data sequences that contain a one-byte token in the starting and split into two 4-bit fields. | The algorithm is not able to perform well if identical characters are less in the data. |
| Deflate | Deflate compression approach consists of an order of blocks analogous to successive input data blocks. That method is developed with a combination of Huffman coding and the LZ77 algorithm for compressing each block. | The central problem of Deflate is the searching way of the longer and duplicate substring. This algorithm can get a larger volume of compressed data or even shows lower performance. |
| BZ2 | This takes multiple layers encoding atop one another. The method leads to the run-length encoding of the elementary data, uses the Burrows-Wheeler scheme, then implicates the Huffman coding prior to associating various tables. | It is slower and not suitable with small data and for less repetitive sequence data. |
Analysis of different existing text watermarking techniques.
| Algorithm | Strategy | Issue |
|---|---|---|
| Structured based text watermarking | This approach transforms the format or structure of the media text or selects a keyword from the text to add the watermark with it. | The shifting or formatting is not possible for patient live-text data because this is raw text. |
| Linguistic based text watermarking | This technique uses the semantic and syntactic nature of the media text to watermark them. | The process of embedding a watermark should comply with the grammar rules to preserve the readability of the data that do not follow patient text data. |
| Image based text watermarking | A text is interpreted as an image into a series of text-image. | This watermarking approach is not applicable for patient text data because of being the plain text instead of the image. |
Algorithm 1To create formatted data or tables supposing the prior practitioner knowledge (I1), medical terms (I2), diagnostic indicator charts (I3), medication products (I4), databases' data (I5), …, and more information (I).
Algorithm 2To encode data for compressing before watermarking it.
Algorithm 3To decode data before passing the end-users.
Figure 1The Process of Embedding and Extracting Watermark.
Algorithm 4To insert watermark to encoded data yielded using Algorithm 2.
Algorithm 5Extracting the watermark and encoded data from the watermarked data.
Algorithm 6Creating a robust watermark to embed into actual data.
Algorithm 7Making a fragile watermark taking encoded patient data (P) and robust watermark W.
Figure 2The operation procedure flow for compressing patient data securely.
Figure 3Communication diagram for data compression and watermarking.
Compressed data using different techniques.
| Data | Size(B) | Compressed Data Size(B) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| RLE | LZW | Deflate | LZO | LZ4 | BZ2 | LZSS | LZMA | Proposed Tech. | ||
| D1 | 979 | 1905 | 3752 | 322 | 446 | 459 | 389 | 416 | 393 | 278 |
| D2 | 1477 | 2897 | 4840 | 367 | 528 | 544 | 458 | 505 | 433 | 328 |
| D3 | 2738 | 5403 | 7976 | 523 | 780 | 805 | 613 | 764 | 589 | 387 |
| D4 | 3439 | 6779 | 9016 | 613 | 915 | 930 | 713 | 904 | 665 | 472 |
| D5 | 5418 | 11697 | 13672 | 789 | 1238 | 1264 | 873 | 1261 | 841 | 613 |
| D6 | 7221 | 14943 | 14672 | 1002 | 1583 | 1619 | 1041 | 1606 | 1037 | 750 |
| D7 | 8657 | 15981 | 16552 | 1155 | 1854 | 1874 | 1162 | 1882 | 1177 | 825 |
| D8 | 10882 | 20881 | 18664 | 1376 | 2234 | 2246 | 1324 | 2299 | 1361 | 959 |
| D9 | 12397 | 24565 | 21040 | 1516 | 2493 | 2543 | 1454 | 2567 | 1501 | 997 |
| D10 | 14815 | 28997 | 23720 | 1740 | 2853 | 2924 | 1653 | 3071 | 1701 | 1145 |
Figure 4Comparison of compression gained by different techniques for data (D1-D5).
Figure 5Comparison of compression gained by different techniques for data (D6-D10).
Figure 6Time for Compressing Healthcare Data (D1-D5).
Figure 7Time for Compressing Healthcare Data (D6-D10).
Figure 8Comparison of data size before and after watermarking.
Figure 9Comparison of compression time before and after watermarking.
Figure 10Accuracy achieved after altering the watermarked data.