| Literature DB >> 25689741 |
Saman Iftikhar1, Sharifullah Khan1, Zahid Anwar1, Muhammad Kamran2.
Abstract
Genetic data, in digital format, is used in different biological phenomena such as DNA translation, mRNA transcription and protein synthesis. The accuracy of these biological phenomena depend on genetic codes and all subsequent processes. To computerize the biological procedures, different domain experts are provided with the authorized access of the genetic codes; as a consequence, the ownership protection of such data is inevitable. For this purpose, watermarks serve as the proof of ownership of data. While protecting data, embedded hidden messages (watermarks) influence the genetic data; therefore, the accurate execution of the relevant processes and the overall result becomes questionable. Most of the DNA based watermarking techniques modify the genetic data and are therefore vulnerable to information loss. Distortion-free techniques make sure that no modifications occur during watermarking; however, they are fragile to malicious attacks and therefore cannot be used for ownership protection (particularly, in presence of a threat model). Therefore, there is a need for a technique that must be robust and should also prevent unwanted modifications. In this spirit, a watermarking technique with aforementioned characteristics has been proposed in this paper. The proposed technique makes sure that: (i) the ownership rights are protected by means of a robust watermark; and (ii) the integrity of genetic data is preserved. The proposed technique-GenInfoGuard-ensures its robustness through the "watermark encoding" in permuted values, and exhibits high decoding accuracy against various malicious attacks.Entities:
Mesh:
Year: 2015 PMID: 25689741 PMCID: PMC4331525 DOI: 10.1371/journal.pone.0117717
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Malicious Attacks.
|
|
|
|
|---|---|---|
|
| Insertion attacks | Insert |
|
| Deletion attacks | Delete |
|
| Alteration attacks | Alter |
|
| Sorting attacks | Sort the tuples in ascending or descending order |
|
| Additive attacks | Embed own computed watermark in the database, and claims ownership |
|
| Counterfeiting attacks | Make a forged copy of the Alice’s data, for un-authorized use. |
Fig 1Top Level Architecture of GenInfoGuard.
Notations Used in the Paper.
|
|
|
|---|---|
|
| Original database |
|
| The watermarked database |
|
| All tuples in a table (or a dataset) |
|
| A tuple in the database table |
|
| Owner defined threshold for |
|
| All features in dataset |
|
| A Seed vector for Pseudo-Random Sequence Generator |
|
| A feature selected for watermarking |
|
| Determines character at a specified index |
|
| Index of a specified character in permuted matrix |
|
| A key based on number of characters used in digram matrix |
|
| Secret number of rounds for permutations defined by the owner |
| Σ | All accepted characters |
| ΔΣ | Matrix of all possible digrams |
|
| Length of input data string |
|
| Length of the matrix of all possible digrams |
|
| Index of the character in digram matrix |
|
| Permutation vector for introducing randomness N times |
|
| Watermark length |
|
| Watermark bits |
|
| Owner defined an integer change |
|
| Owner defined percentage change |
| Δ | Change in the original value |
| Δ | Change in the watermarked value |
| Δ | Difference in change in the original value and the watermarked value |
|
| Substituted digram for encoding |
|
| Decoded data from Substituted digram |
| ℘ | Probability of success of prediction |
| ℘ | Probability of success of prediction before applying watermarking |
| ℘ | Probability of success of prediction after applying watermarking |
| Ç | Correlation coefficient of |
| Ç | Correlation coefficient of |
| Ç | Correlation coefficient of |
| Ç | Correlation coefficient of |
Protein Dataset.
|
|
|
|
|---|---|---|
| GLY | E | 0.20 |
| TYR | H | 0.49 |
| VAL | H | 0.17 |
| THR | E | 0.12 |
| PRO | H | 0.52 |
| MET | H | 0.32 |
| ASP | E | 0.26 |
| … | … | … |
DNA Dataset: Splice-Junction Gene Sequences.
|
|
|
|
|---|---|---|
| EI | ATRINS-DONOR-521 | CCAGCTGCATCACAGGAGGCCAGCGAGCAGG |
| TCTGTTCCAAGGGCCTTCGAGCCAGTCTG | ||
| EI | ATRINS-DONOR-905 | AGACCCGCCGGGAGGCGGAGGACCTGCAGGG |
| TGAGCCCCACCGCCCCTCCGTGCCCCCGC | ||
| … | … | … |
| IE | ATRINS-ACCEPTOR-701 | TTCAGCGGCCTCAGCCTGCCTGTCTCCCAGG |
| TCTCTGTCCTTCCACCATGGCCCTGTGGA | ||
| IE | ATRINS-ACCEPTOR-1678 | GGACCTGCTCTGCGTGGCTCGCCCTGGCAGTGGGG |
| CAGGTGGAGCTGGGTGGGGGCTCTA | ||
| … | … | … |
Watermark Encoding.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| GLY |
| F | HO | F | F | F | F | HO | HO |
| Δ | 17.5 | 17.1 | 17.5 | 17.5 | 17.5 | 17.5 | 17.1 | 17.1 | |
| TYR |
| QI | UJ | QI | QI | QI | QI | UJ | UJ |
| Δ | 53.9 | 53.5 | 53.9 | 53.9 | 53.9 | 53.9 | 53.5 | 53.5 | |
| VAL |
| DD | YG | DD | DD | DD | DD | YG | YG |
| Δ | 56.9 | 56.5 | 56.9 | 56.9 | 56.9 | 56.9 | 56.5 | 56.5 |
Watermark Decoding.
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|
| GLY | Δ | 17.1 | 17.1 | 17.5 | 17.5 | 17.5 | 17.5 | 17.1 | 17.5 |
| Δ | 17.5 | 17.5 | 17.5 | 17.5 | 17.5 | 17.5 | 17.5 | 17.5 | |
| Δ | 0.4 | 0.4 | 0 | 0 | 0 | 0 | 0.4 | 0 | |
|
|
|
|
|
|
|
|
|
| |
|
| HO | HO | F | F | F | F | HO | F | |
| TYR | Δ | 53.5 | 53.5 | 53.9 | 53.9 | 53.9 | 53.9 | 53.5 | 53.9 |
| Δ | 53.9 | 53.9 | 53.9 | 53.9 | 53.9 | 53.9 | 53.9 | 53.9 | |
| Δ | 0.4 | 0.4 | 0 | 0 | 0 | 0 | 0.4 | 0 | |
|
|
|
|
|
|
|
|
|
| |
|
| UJ | UJ | QI | QI | QI | QI | UJ | QI | |
| VAL | Δ | 56.5 | 56.5 | 56.9 | 56.9 | 56.9 | 56.9 | 56.5 | 56.9 |
| Δ | 56.9 | 56.9 | 56.9 | 56.9 | 56.9 | 56.9 | 56.9 | 56.9 | |
| Δ | 0.4 | 0.4 | 0 | 0 | 0 | 0 | 0.4 | 0 | |
|
|
|
|
|
|
|
|
|
| |
|
| YG | YG | DD | DD | DD | DD | YG | DD |
Performance Measures: Success Probability ℘, and Dependency of Correlation Coefficient Ç on Prediction of the Secondary Structure before and after Applying GenInfoGuard.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 | 0.54 | 0.54 | 0.11 | 0.11 | 0.14 | 0.14 |
| 3 | 0.58 | 0.58 | 0.22 | 0.22 | 0.20 | 0.20 |
| 5 | 0.61 | 0.61 | 0.28 | 0.28 | 0.26 | 0.26 |
| 7 | 0.62 | 0.62 | 0.32 | 0.32 | 0.28 | 0.28 |
| 9 | 0.62 | 0.62 | 0.33 | 0.33 | 0.28 | 0.28 |
| 11 | 0.62 | 0.62 | 0.36 | 0.36 | 0.29 | 0.29 |
| 13 | 0.63 | 0.63 | 0.35 | 0.35 | 0.29 | 0.29 |
| 15 | 0.62 | 0.62 | 0.35 | 0.35 | 0.31 | 0.31 |
| 17 | 0.62 | 0.62 | 0.33 | 0.33 | 0.27 | 0.27 |
Fig 2Double Stranded DNA Sequence Alignment before and after Watermarking.
Nucleotide Sequence Statistics.
|
|
|
|
|
|
|---|---|---|---|---|
| Adenine (A) | 308 | 0.262 | 0 | 0 |
| Cytosine (C) | 310 | 0.264 | 0 | 0 |
| Guanine (G) | 293 | 0.249 | 0 | 0 |
| Thymine (T) | 264 | 0.225 | 0 | 0 |
| C + G | 603 | 0.513 | 0 | 0 |
| A + T | 572 | 0.487 | 0 | 0 |
Fig 3Data Recovery after Insertion Attack.
Fig 4Comparison of Watermark Decoding Accuracy of GenInfoGuard with TFBW, RBW, and BIW after Insertion Attack.
Fig 5Data Recovery after Deletion Attack.
Fig 6Comparison of Watermark Decoding Accuracy of GenInfoGuard with TFBW, RBW, and BIW for Deletion Attack.
Fig 7Data Recovery after Alteration Attacks.
Fig 8Comparison of Watermark Decoding Accuracy of GenInfoGuard with TFBW, RBW, and BIW for Alteration Attack.