Literature DB >> 23818831

A fragile zero watermarking scheme to detect and characterize malicious modifications in database relations.

Aihab Khan1, Syed Afaq Husain.   

Abstract

We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.

Entities:  

Mesh:

Year:  2013        PMID: 23818831      PMCID: PMC3684121          DOI: 10.1155/2013/796726

Source DB:  PubMed          Journal:  ScientificWorldJournal        ISSN: 1537-744X


1. Introduction

Digital watermarking is a class of information hiding technique that provides measures for copyright protection, broadcast monitoring, covert communication, copy control, tamper, and integrity proof of digital assets. The watermarking techniques were primarily proposed for multimedia content [1-4]; however, in the last decade, the research community has extended these techniques to relational databases for its copyright protection, temper detection, and integrity proof. Most of the existing watermarking schemes for relational databases [5-20] introduce intentional errors or distortions as marks in the underlying data with some error tolerance so that it does not have a significant impact on the usefulness of data. However, this results in degrading data quality as the integrity of a relational database is violated. A large collection of real-world datasets has a strong usability constraint that disallows any permanent distortions or intentional errors. For example, the safety critical datasets are designed to minimize errors rather than to introduce intentional errors. Similarly, a business application may require that local properties like item-cost, ordered-quantity, and so forth, are preserved as well as global properties like natural join between item and sales, employees and department, and so forth. Moreover, in business datasets, the semantic constraints are not violated, like dissimilarity in attribute value for two similar transactions [21]. Query processing is sensitive due to selection criteria and has well-defined semantics; therefore, the watermarking schemes that introduce distortion into the database original content are not appropriate for certain applications. Based on the intent of marking, the watermarking schemes presented in the literature can be categorized into robust and fragile schemes. The robust schemes [5-16] are aimed at copyright protection, whereas the fragile schemes [17-25] are used for tamper detection and integrity proof of database relations. Most of the robust schemes for copyright protection [5-16] introduce distortions into the database original content which affects data integrity and usability. These robust schemes may work for numeric [5-10] and categorical attributes [11, 12] of relational databases to embed watermarks. Some techniques embed meaningless bit pattern [5, 6]; whereas in other techniques meaningful bit patterns like image [13-15] and owner's speech [16] are used as watermarks for embedding in relational databases. In data sales environment, some of these robust schemes are extended to fingerprinting domain for unique identification of each buyer and also for traitor detection [21, 26–28]. Compared with the robust schemes, the fragile watermarking schemes are not adequately addressed and relatively little work is available for integrity proof of relational databases [20]. In this paper, we focus on fragile watermarking schemes for temper detection and integrity proof of database relations. The initial work on fragile watermarking schemes can be found on images [29-31], which is extended to audio [32, 33] and video [3, 34] schemes. Recently, the importance of other data domains is recognized and fragile schemes for text [35, 36] and relational databases [17–20, 22–25] are proposed. Like robust schemes, most of the fragile schemes for relational databases [17-20] introduce distortion into the database original contents that degrades data quality and also affects data usability. These schemes are based on the content characteristics of database relation itself to create a secure hash (used as a watermark) which is stored in Least Significant Bits (LSBs) of database original contents, thus introducing distortion. A fragile watermarking scheme presented by Guo et al. [17] detects malicious modifications made to a database relation. In their scheme, the watermark generation is based on the content characteristics of the database relation itself. The generated watermarks are embedded in at most two LSBs of all attributes in the database relation that introduces considerable distortion in the database original contents. The fragile scheme presented by Khataeimaragheh and Rashidi [18] is also a distortion-based scheme for integrity proof of database relations. Like [17], the watermarks are embedded in at most two LSBs of all attributes in the relation that forms a two-bit watermark grid. The fragile scheme presented by Iqbal et al. [19] logically partitions the database relation into three groups and generates self-constructing fragile watermark information from each group. The generated watermarks are embedded at LSBs of numerical attributes in each group of a database relation which introduces distortion in database original contents. Prasannakumari [20] presented a fragile scheme for temper detection in database relations. This technique also introduces distortion as it inserts a fake attribute in database relation to act as a watermark. The data values for the newly inserted attribute are determined by applying aggregate function on original database content. Beside distortion-based techniques, some researches also presented distortion free fragile watermarking schemes [22-25] for integrity proof of database relations. The main feature of these schemes is that the watermark embedding in actual fact is the tuples or attributes reordering based on the content characteristics of database relation. A fragile scheme proposed by Li et al. [22] detects and localizes malicious modifications made to the database relations. Their scheme partitions the database relation into disjoint groups and the watermark is embedded and verified in each group independently. In their scheme, the watermark is embedded as tuple reordering and the order of each tuple pair in group is changed or unchanged depending on the tuple hash values and the corresponding group hash value. Though their technique does not introduce any distortion in the database relation, but it works only for categorical data type. Kamel [23] presented a fragile scheme to protect the integrity of database relations. Their scheme divides the database relations in groups and each group is marked independently. As in [22], the watermark embedding is reordering of tuples in each group that corresponds to the value of some secret watermark. The fragile scheme proposed by Bhattacharya and Cortesi [24] detects malicious modifications in database relations having categorical attributes. Their scheme divides the database relation into groups on the basis of categorical attribute values. Like [22, 23], tuple hash value is used to obtain a watermark as permutation of tuples. A fragile zero watermarking scheme is presented by Hamadou et al. [25] for authentication of database relations. Their technique is distortion-free and is based on attribute reordering method. Initially, the attributes of database relation are virtually sorted on hash values of attribute names to define a secret initial order of attributes. For each attribute in database relation, the Most Significant Bits (MSBs) are extracted and used for watermark generation. The generated watermark is then registered with the Certification Authority (CA) for certification purpose. As their technique is based on virtual sorting of attributes by their names, so any change in attribute name by attacker would fail the temper detection process. In the previous discussion, we have identified two important issues in existing fragile watermarking schemes. First, the fragile schemes are distortion based [17-20] that inevitably degrade data integrity and thus affect data usability; therefore, these schemes are not applicable to non-error-tolerant data like safety critical datasets, and so forth. Second, though there exist some fragile schemes like [22-25] that are distortion-free, but the watermarking approach is based on reordering of tuples or attributes; so, they are vulnerable to sorting attacks. Also, if the modification is small, such that, it does not affect the order of tuples, the temper detection would fail. To address these issues, we propose a fragile scheme based on zero watermarking approach that does not modify any part or properties of the database relations itself; therefore, the proposed scheme assures imperceptibility and overcomes weaknesses like data integrity and data usability in existing fragile watermarking schemes. Also, the proposed scheme is independent of tuple ordering as well as attributes ordering and naming, so it is not vulnerable to sorting attacks. The watermark generation in the proposed scheme is based on algorithmically evaluating the local characteristics of database relation like frequency distribution of digit count, length and range of data values. This enables us to characterize the malicious data modifications on parameters like the fraction of digit, length and range of data values attacked, the type of attack (insertion, deletion, or update), and the effect of attack (low to high, high to low, or no change) on data values. Also, to the best of our knowledge, there is no such distortion-free fragile watermarking scheme that can characterize the tempering attacks, that is, the nature of tempering. Experimental results show that the proposed scheme can detect and characterize malicious data modifications successfully.

2. Materials and Methods

In this section, we present our proposed fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. The proposed scheme exhibits the following important properties of a fragile watermarking system as discussed in [17]. Fragility. The proposed scheme is designed to be fragile; that is, if there are any malicious data modifications, the embedded watermark is not detectable (destroyed). Imperceptibility. As the proposed scheme is based on zero watermarking approach, it does not introduce any distortion in the underlying data; therefore, the embedded watermark is invisible or imperceptible. Key-Based System. The watermark generation and verification in the proposed scheme is a key-based system. Also, to detect and characterize malicious data modifications, a secret key is required. Blindness. In the proposed scheme, the original database relation is not required to detect and characterize malicious data modifications. Tuple and Attribute Ordering. The existing fragile schemes are based on tuple ordering [22-24] and attribute ordering and naming [25]. The proposed scheme is independent of tuple and attributes ordering so it is not vulnerable to sorting attacks. Characterization. The proposed scheme not only detects but also characterizes the malicious data modifications in database relation to quantify the nature of tempering attacks.

2.1. Watermark Generation

Let R be a database relation with primary key PK and ν attributes denoted by R(PK, A1, A2,…, A ). The watermark generation in the proposed scheme is based on the content characteristics of numeric data values, so we assume that some attributes of the database relation are numeric. Figure 1 shows the watermark generation process that comprises of subwatermark generation for digit count, length, and range of data values. The generated watermark is registered with the Certification Authority (CA) for certification purpose. Table 1 presents the list of notations used in our algorithms and discussion.
Figure 1

Proposed model for watermark generation and registration.

Table 1

Notations.

SymbolDescription
R Database relation
PKPrimary key attribute
r i The ith tuple
A j The jth attribute
η Number of tuples in a database relation
ν Number of attributes in a database relation
ω d Digit sub-watermark
ω l Length sub-watermark
ω r Range sub-watermark
ω R Watermark for database relation R
ω C Watermark certificate
SKSecret key
d i The ith digit
l j The jth length
r k The kth range
fd i Frequency for digit i of data values
fl j Frequency for length j of data values
fr k Frequency for range k of data values
r fd i Relative frequency for digit i of data values
r fl j Relative frequency for length j of data values
r fr k Relative frequency for range k of data values
Δfd i Change in frequency of digit i
Δfl j Change in frequency for length j
Δfr k Change in frequency for range k
Δℱd i Fractional change in digit frequency for digit i
Δℱl j Fractional change in length frequency for length j
Δℱr k Fractional change in range frequency for range k
CACertification authority
WARWatermark accuracy rate
WDRWatermark distortion rate
The algorithm for watermark generation is presented in Algorithm 1. At lines 1–3, the digit, length, and range of data values in a database relation are algorithmically evaluated to generate the subwatermarks as presented in Algorithms 2–4. These subwatermarks are then used to generate a database relation watermark ω as shown at line 4. At line 5, the relation watermark ω is encrypted with a secret key SK known only to the database owner. We assume that the secret key is selected from large key space such that it is computationally infeasible for attacker to guess a key. At lines 6-7, the encrypted relation watermark Eω is concatenated with owner Id along with date and time stamp to generate a watermark certificate ω , which is then registered with the CA before publishing the database for certification purpose.
Algorithm 1

Watermark generation.

Algorithm 2

Digit sub-watermark generation.

Algorithm 4

Range sub-watermark generation.

Algorithm 2 generates a digit subwatermark which is based on digit frequency for all data values present in adatabase relation. At lines 1–3, the length of each data value is determined which is then used to extract the individual digits as shown at lines 4-5. Lines 6-7 compute the frequency of each digit and the total number of digits present in the database relation. At line 11, the relative frequency of each digit rfd is determined which is then used to generate a digit subwatermark ω as shown at line 13. At lines 15-16, the digit subwatermark ω is concatenated with total digit count and is returned to the watermark generation algorithm. It is to be noted that the digit subwatermark is composed of each digit relative frequency rfd and the total count of all digits. In fact, this information is used for characterization of attacks as discussed in Section 3. The subwatermark generation for length of data values in a database relation is presented in Algorithm 3. At lines 1–3, the length of each data value is determined. Lines 4-5 determine the frequency for each length of data values and the total count of data values length present in the database relation. At line 9, the relative frequency for each length of data value rfl is computed which is then used to generate length subwatermark ω as shown at line 10. At lines 12-13, the length subwatermark ω is concatenated with total length count and is returned.
Algorithm 3

Length sub-watermark generation.

Algorithm 4 presents the algorithm for subwatermark generation for range of data values in a database relation. At line 1, different data ranges are defined in which the data value of a database relation may fall. It is to be noted that the defined data ranges may be adjusted as per the nature of data values in the database relation and also for more precise characterization of malicious data modifications, as discussed in Section 3. Lines 1–3 determine the attribute value, within each tuple. Lines 5–13 determine the frequency for different data ranges in which the data value may fall and the total number of data ranges present in the database relation. At lines 16-17, the relative frequency for each range of data value rfr is computed, which is then used to generate range subwatermark ω . Lines 19-20 show that the range subwatermark ω is concatenated with total range count and is returned.

2.2. Watermark Verification

Figure 2 shows the model for detection of malicious modifications in suspicious database relation R′. For detection of malicious data modifications, the relation watermark ω ′ is regenerated for suspicious database relation R′ and compared with the relation watermark ω registered at CA; if both watermarks are different then the suspicious database relation R′ is considered as a tempered relation.
Figure 2

Proposed model for detection of malicious tempering.

The algorithm for watermark detection is presented in Algorithm 5. At line 1, the watermark ω ′ is generated by using Algorithm 1 for suspicious database relation R′. The watermark certificate ω which is already registered at CA is used to extract database relation watermark ω as shown at lines 2–4. At lines 5–10, each digit of ω is compared with the corresponding digit of ω ′ and match_count is incremented on each successful match. At line 9, the total_count is computed to know the number of digits tested. At lines 11-12, the WAR (Watermark Accuracy Rate) and WDR (Watermark Distortion Rate) are computed. If the distortion exists in the suspicious database relation R′, then R′ is rejected as a tempered relation with distortion rate WDR as shown at lines 13–15.
Algorithm 5

Watermark verification.

The algorithm for characterization of malicious data modifications is presented in Algorithm 6. At line 2, the relative frequency of each digit rfd is extracted from digit subwatermark ω as ω ⊆ω and ω is already registered at CA. The frequency distribution of each digit fd in relation R is determined at line 3. At line 4, the frequency distribution of each digit fd ′ for suspicious database relation R′ is determined. The change in frequency distribution of each digit Δfd is computed at line 5 and the fractional change in each digit Δℱd is determined at line 6. The computed value of Δℱd is then used to characterize the malicious modifications made to the database relation R. For example, if Δℱd is zero, then the suspicious relation R′ is not tempered. A positive Δℱd indicates that ℱ fraction of digit d is maliciously inserted by attacker as an attempt to transform low data values to high in database relation R. Similarly, a negative Δℱd indicates that ℱ fraction of digit d is maliciously deleted by attacker as an attempt to transform high data values to low in database relation R. At lines 8–14 and 15–21, a similar method as discussed earlier is used to determine Δℱl and Δℱr to characterize the attacks on length and range of data values in database relation R. The characterization of malicious data modifications is further elaborated in Section 3.2 with experimental results.
Algorithm 6

Characterization of malicious data modifications.

3. Results and Discussion

Suppose that Alice is the database owner and she has used the proposed algorithms along with the secret key to generate a watermark for the database relation R. The attacker Mallory for his own nefarious objectives may attempt to make malicious modifications in Alice watermarked database relation. We conducted our experiments in Microsoft Visual Basic and Microsoft Access, on 3.2 GHz Intel core i3 CPU with 2 GB of RAM. The proposed watermarking scheme is evaluated on a real-life dataset namely Forest Cover Type data set, available at UCI Machine Learning Repository [37]. This dataset has 581,102 tuples, each with 10 integer attributes, 44 Boolean attributes, and 1 categorical attribute. In our experiments, we have used all 10 integer attributes. It is to be noted that in robust watermarking schemes, the aim of Mallory is to destroy the Alice watermark without affecting the database relation, whereas in fragile schemes, Mallory attempts to make malicious modifications in Alice watermarked database relation without affecting the watermark. The experimental results presented in this section show that the watermark is adversely affected by even minor malicious data modifications; therefore, the generated watermark is fragile.

3.1. Detection of Malicious Modifications

In this set of experiments, we randomly introduce malicious modifications in Forest Cover Type data set [37]. As discussed in Algorithm 5, these malicious modifications are detected by generating the watermark for the suspicious database relation R′ to obtain ω ′, which is then compared with the registered watermark ω to determine the WAR (Watermark Accuracy Rate) and WDR (Watermark Distortion Rate). Table 2 shows the WAR and WDR for the malicious insertions made to the database relation with different attack rates. For example, when 10% of the fake but similar tuples are randomly inserted into the database relation R, the WDR is found to be high and malicious insertions are detected with low WAR.
Table 2

Detection of malicious insertion of tuples with different attack rates (η = 106).

Insertion attack rateWARWDRTemper detection
10%18.1481.86Yes (High)
30%18.5681.44Yes (High)
50%20.4179.59Yes (High)
70%16.6783.33Yes (High)
90%16.3283.68Yes (High)
Tables 3-4 show similar results as of insertion attack for malicious deletions and updates made to the database relation R.
Table 3

Detection of malicious deletion of tuples with different attack rates (η = 106).

Deletion attack rateWARWDRTemper detection
10%24.3275.68Yes (High)
30%17.8882.12Yes (High)
50%20.9579.05Yes (High)
70%13.0886.92Yes (High)
90%14.1485.86Yes (High)
Table 4

Detection of malicious update of tuples with different attack rates (η = 106).

Update attack rateWARWDRTemper detection
10%20.4279.58Yes (High)
30%19.8980.11Yes (High)
50%19.8980.11Yes (High)
70%18.9481.06Yes (High)
90%14.1385.87Yes (High)
Figure 3 summarizes the insertion, deletion, and update attacks and shows that the WDR is always high for different volume of malicious data modifications.
Figure 3

Watermark distortion rate for malicious insertion, deletion, and update of tuples with different attack rates (n = 106).

In another set of attacks, we simultaneously perform malicious insertion, deletion, and update of tuples with different attack rates in database relation R. Table 5 shows the WDR for this set of attack.
Table 5

Detection of malicious data modifications with different attack rates (η = 106).

Insertion attack rateDeletion attack rateUpdate attack rateWARWDRTemper detection
10%10%10%15.8584.15Yes (High)
30%30%30%10.9889.02Yes (High)
50%50%50%10.2889.72Yes (High)
70%70%70%13.3386.67Yes (High)
90%90%90%10.9889.02Yes (High)
The experimental results presented in Tables 2–5 show that the malicious modifications are always detected and fragility of the registered watermark ω is observed for even low volumes of attack. The WAR is low and WDR is high for different volume of malicious insertions, deletions, and updates made to the database relation. The low WAR indicates the extent to which the database relation has been attacked, whereas the high WDR indicates that the database relation has been tampered and is not authentic. The accuracy of watermark is adversely affected even with minor malicious data modifications and the watermark fragility proves that the database relation has been attacked.

3.2. Characterization of Malicious Modifications

One of the important features of the proposed watermarking scheme is to characterize the malicious modifications made to the database relations. As discussed in Algorithm 1, the watermark generation is based on the content characteristics of database relation itself which enable us to characterize the malicious data modifications. Algorithm 6 elaborates the algorithm for characterization of malicious data modifications by evaluating the fractional change in each digit Δℱd , length Δℱl and range Δℱr of data values in the tempered database relation R′. We have conducted experiments for both random and deterministic attacks for characterization of malicious data modifications. In random tempering attacks, we randomly attack the digit frequency, length, and range of data values in the database relation, whereas in deterministic attacks, the attack is performed with the specific attack rates. The random tempering attacks are presented in this section and the results of detailed deterministic attacks are shown in the Appendix for reference.

3.2.1. Attacks on Digit Frequency

In this set of attacks, Mallory randomly performs malicious insertion, deletion, and update attacks on digit frequency in Alice's watermarked relation R. For example, in insertion attack, Mallory may attempt to maliciously insert some digits in R. Table 6 shows the experimental results obtained for characterization of malicious insertion attack on digits 9 and 0 as discussed in Algorithm 6. A positive value of Δℱd indicates that ℱ fraction of digits 9 and 0 is maliciously inserted by Mallory in the database relation R. The characteristic of this attack is an attempt to relatively increase the low data values to high in database relation R as an increase of 35.84% and 24.42% is observed in Δℱd of digits 9 and 0, respectively. As the other digits are not attacked, so Δℱd is zero for digits 1–8 and there is no change in the digit frequency Δfd of these digits. This characteristic of attack, when combined with the nature of data, may provide useful information about the attacker intention. For example, in the product sales environment, these malicious insertions indicate that the attacker may have attempted to increase the low volume and amount of product sales.
Table 6

Characterization of malicious insertion attacks on digit frequency.

d i r fd i fd i r fd i fd iΔfd i Δℱd i Characteristic
0 8.63 1435163 10.28 1785659 +350496 +24.42% Low to High
118.02299577117.24299577100No change
219.48323881818.64323881800No change
311.70194557211.20194557200No change
48.2213660627.86136606200No change
57.4812440897.16124408900No change
66.6511052106.36110521000No change
76.4510723636.17107236300No change
86.6010976696.32109766900No change
9 6.76 1123819 8.78 1526545 +402726 +35.84% ↑Low to High
Table 7 shows the result for random malicious deletions of digits 9 and 0 made to the database relation R. A negative value of Δℱd indicates that ℱ fraction of digits 9 and 0 is maliciously deleted by the attacker. The characteristic of this attack is an attempt to relatively decrease the high data values to low in the database relation R. In this attack, 14.70% of digit 9 and 12.44% of digit 0 are randomly deleted from the database relation. As the other digits are not deleted, so Δℱd is zero for digits 1–8. Table 8 shows similar result for random malicious update for digits 9 and 0 made to the database relation. In this attack, digits 9 and 0 are randomly replaced with some other digits, so the digit frequency Δfd of digits 9 and 0 is decreased (high to low), where as the digit frequency Δfd of digits 1–8 is increased (low to high).
Table 7

Characterization of malicious deletion attacks on digit frequency.

d i r fd i fd i r fd i fd iΔfd i Δℱd i Characteristic
0 8.63 1435163 7.72 1256568 −178595 −12.44% High to Low
118.02299577118.40299577100No change
219.48323881819.89323881800No change
311.70194557211.95194557200No change
48.2213660628.39136606200No change
57.4812440897.64124408900No change
66.6511052106.79110521000No change
76.4510723636.59107236300No change
86.6010976696.74109766900No change
9 6.76 1123819 5.89 958569 −165250 −14.70% High to Low
Table 8

Characterization of malicious update attacks on digit frequency.

d i r fd i fd i r fd i fd iΔfd i Δℱd i Characteristic
0 8.63 1435163 5.71 948993 −486170 −33.88% ↓High to Low
118.02299577119.013159784+164013+5.47%↑Low to High
219.48323881820.973485451+246633+7.61%↑Low to High
311.70194557211.911980325+34753+1.79%↑Low to High
48.2213660628.521416889+50827+3.72%↑Low to High
57.4812440898.221365803+121714+9.78%↑Low to High
66.6511052106.901146565+41355+3.74%↑Low to High
76.4510723637.141187586+115223+10.74%↑Low to High
86.6010976696.781127651+29982+2.73%↑Low to High
9 6.76 1123819 4.85 805489 −318330 −28.33% ↓High to Low
Figure 4 summarizes the malicious insertion, deletion, and update attacks on digits 9 and 0. The insertion attack shows a positive increase (low to high) on attacked digits, where as a negative trend (high to low) is observed in attacked digits for deletion attack. In update attack, both negative (high to low) and positive trends (low to high) are observed for attacked and unattacked digits, respectively.
Figure 4

Characterization of malicious insertion, deletion, and update attacks on digits 9 and 0 of data values.

In another set of attacks, we randomly insert, delete and update 10% (lower bound) and 90% (upper bound) of the tuples from the database relation R. Table 9 shows the effect on fractional change in digit frequency Δℱd for each digit. It is to be noted that, in insertion attack, a k fraction of positive trend (low to high) is being observed in each digit frequency of database relation R. For example, when 10% of similar tuples are inserted in database relation, an increase of approximately 10% is being observed in Δℱd for each digit of database relation. Similarly, in deletion attack, a k fraction of negative trend (high to low) is observed in Δℱd for each digit of database relation. In update attack, no specific trend is observed in Δℱd as k fractions of digits are randomly replaced by some other digits.
Table 9

Characterization of malicious modifications on digit frequency.

Insertion attackDeletion attackUpdate attack
Attack rate10%90%10%90%10%90%
d i Δℱd i%Δℱd i%Δℱd i%Δℱd i%Δℱd i%Δℱd i%
0+9.51+90.77%−9.63%−89.65%+0.41+0.45
1+10.51+94.47%−10.11%−89.05%+0.80+15.91
2+10.49+90.14%−10.67%−90.20%−0.57−3.72
3+9.63+94.72%−9.87%−90.30%−2.12+3.39
4+9.47+83.78%−10.07%−90.82%−0.28−7.34
5+9.01+82.00%−9.43%−91.26%+0.40−11.43
6+9.08+81.84%−9.48%−91.01%+0.13−7.78
7+10.17+88.20%−9.95%−89.83%+0.05+1.46
8+10.23+88.86%−10.06%−89.89%−0.22+2.16
9+10.25+88.36%−10.37%−90.66%−0.96−2.62

The detailed experiments for this set of attacks are presented in the Appendix (Tables 18(a)–18(f)).

It is to be noted that the attack on digit frequency (as discussed above) can be characterized on parameters like the digits being attacked, the fraction of each digit attacked, the type of attack (insertion, deletion, or update) on each digit, and the effect of attack (low to high, high to low, or no change) on data values.

3.2.2. Attack on Length of Data Values

In this set of attacks, Mallory randomly performs malicious insertion, deletion, and update attacks on length of data values. Table 10 shows the experimental result for characterization of malicious insertion on data values of length 3 in the database relation R. A positive value of Δℱl indicates that ℱ fraction of length l is maliciously inserted in the database relation R. The characteristic of this attack is to relatively increase the low data values to high as an increase of 18.27% is observed in Δℱl for data values of length 3. Also, Δℱl is zero for lengths 1, 2, and 4, which shows that the data values of these lengths are not attacked.
Table 10

Characterization of malicious insertion attacks on length of data values.

l j r fl j fl j r fl j fl jΔfl j Δℱl j Characteristic
15.743258945.2732589400No change
220.19114646918.54114646900No change
3 48.66 2762791 52.85 3267609 +504818 +18.27% ↑Low to High
425.41144290623.34144290600No change
Table 11 shows result of random malicious deletion for data values of length 3. As in deletion of digit frequency attack, a negative value of Δℱl indicates that ℱ fraction of length l is maliciously deleted with characteristic of decreasing high data values to low in database relation. Also, as in malicious insertion, the Δℱl is zero for lengths 1, 2, and 4, which indicates that the data values of these lengths are not deleted. Table 12 shows results for malicious updates on data values of length 3. In this attack, the data values of length 3 are randomly replaced by lengths 1, 2, and 4. This attack shows a decrease in Δℱl for length 3, where as the Δℱl for lengths 1, 2, and 4 is increased.
Table 11

Characterization of malicious deletion attacks on length of data values.

l j r fl j fl j r fl j fl jΔfl j Δℱl j Characteristic
15.743258946.5232589400.00No change
220.19114646922.92114646900.00No change
3 48.66 2762791 41.71 2085761 −677030 −24.51% ↓High to Low
425.41144290628.85144290600.00No change
Table 12

Characterization of malicious update attacks on length of data values.

l j r fl j fl j r fl j fl jΔfl j Δℱl j Characteristic
15.743258946.31358142+32248+9.90%↑Low to High
220.19114646927.571565462+418993+36.55%↑Low to High
3 48.66 2762791 34.27 1945657 −817134 −29.58% ↓High to Low
425.41144290631.861808799+365893+25.36%↑Low to High
Figure 5 summarizes the malicious insertion, deletion, and update attacks on length 3 of data values. The insertion attack shows a positive increase (low to high) in attacked length, where as a negative trend (high to low) on attacked length is observed in deletion attack. In modification attack, a negative trend (high to low) is observed on attacked length, where as a positive trend (low to high) is observed on un-attacked length of data values.
Figure 5

Characterization of malicious insertion, deletion, and update attacks on length 3 of data values.

Table 13 shows the effect on fractional change in length frequency Δℱl , when 10% (lower bound) and 90% (upper bound) of tuples are maliciously inserted, deleted, and updated in the database relation. In insertion attack, the fractional change in length frequency Δℱl has a k fraction of positive trend (low to high) for each length of data values. Similarly, in deletion attack, a k fraction of negative trend (high to low) is observed for each length of data values. For example, when 10% of tuples are randomly deleted from a database relation, a decrease of approximately 10% is observed in Δℱl for each length of data values. The update attack does not show any specific trend as k fraction of different length of data values are randomly replaced by some other length of data values.
Table 13

Characterization of malicious modifications on length of data values.

Insertion attackDeletion attackUpdate attack
Attack rate10%90%10%90%10%90%
l j Δℱl j%Δℱl j%Δℱl j%Δℱl j%Δℱl j%Δℱl j%
1+9.55100.58−8.94−91.25−3.65−38.67
2+10.2089.64−10.21−89.281.896.13
3+10.3589.91−9.98−88.921.5711.03
4+9.4187.71−10.16−91.93−2.94−12.33

The detailed experiments for this set of attacks are presented in the Appendix (Tables 19(a)–19(f)).

It is to be noted that the attack on length of data values can be characterized on parameters like the length of data values being attacked, the fraction of each length of data values attacked, the type of attack (insertion, deletion, or update), and the effect of attack (low to high, high to low, or no change) on each length of data values.

3.2.3. Attack on Range of Data Values

In this set of attacks, Mallory randomly performs insertion, deletion, and update attack on range 1, that is, (100–999) of data values present in the database relation R. Table 14 shows the experimental results for characterization of malicious insertion for range 1 of data values. The characteristic of this attack is to relatively increase the low data values to high as an increase of 17.33% is observed in Δℱr for range 1 of data values. The Δℱr for range 0 and 2 is zero as the data values of these ranges are not attacked.
Table 14

Characterization of malicious insertion attacks on range of data values.

Range r k r fr k fr k r fr k fr kΔfr k Δℱr k Characteristic
00–9925.59143698623.60143698600No change
1 100–999 48.73 2736731 52.72 3210988 +474257 +17.33% Low to High
21000–999925.68144222323.68144222300No change
Table 15 shows the results of random malicious deletion for data values of range 1. As in deletion of digit frequency attack, a negative value of Δℱr indicates that ℱ fraction of range 1 is maliciously deleted with characteristic of transforming high data values to low in database relation R. As the data values of ranges 0 and 2 are not attacked, so the Δℱr is zero for these ranges. Table 16 shows the results for malicious updates on data values of range 1. In this attack, the data values of range 1 are randomly replaced by ranges 0 and 2. This attack shows a decrease in Δℱr for range 1, where as the Δℱr for range 0 and 2 is increased.
Table 15

Characterization of malicious deletion attacks on range of data values.

Range r k r fr k fr k r fr k fr kΔfr k Δℱr k Characteristic
00–9925.59143698629.12143698600No change
1 100–999 48.73 2736731 41.65 2054875 −681856 −24.92% High to Low
21000–999925.68144222329.23144222300No change
Table 16

Characterization of malicious update attacks on range of data values.

Range r k r fr k fr k r fr k fr kΔfr k Δℱr k Characteristic
00–9925.59143698630.071688965251979+17.54%↑Low to High
1 100–999 48.73 2736731 40.42 2269854 −466877 −17.06% ↓High to Low
21000–999925.68144222329.511657121214898+14.91%↑Low to High
The malicious insertion, deletion, and update attacks on range 1 of data values are summarized in Figure 6. A positive increase is observed in the attacked range for insertion attack (low to high) and a negative trend (high to low) is observed in attacked range for deletion attack. The modification attack shows a negative trend (high to low) for attacked range, that is, range 1 of data values and a positive increase for nonattacked ranges, that is, range 0 and 2 of data values.
Figure 6

Characterization of malicious insertion, deletion, and update attacks on range 1 (100–999) of data values.

In another set of attacks, we randomly inserted, deleted, and updated 10% (lower bound) and 90% (upper bound) of tuples from the database relation R. Table 17 shows the effect on fractional change in range frequency Δℱr , for each range of data values. The fractional change in range frequency Δℱr has a k fraction of positive trend (low to high) for malicious insertion in each range of data values. Similarly, in deletion attack, a k fraction of negative trend (high to low) is observed for each range of data values. For example, when 10% of tuples are randomly deleted from a database relation, a decrease of approximately 10% is observed in Δℱr for each range of data values. The update attack does not show any specific trend as k fraction of different range of data values are randomly replaced by some other range of data values.
Table 17

Characterization of malicious data modifications on range of data values.

Insertion attackDeletion attackUpdate attack
Attack rate10%90%10%90%10%
Range r k Δℱr k%Δℱr k%Δℱr k%Δℱr k % Δℱr k%Δℱr k%
00–99+9.97+92.33−9.84−89.760.39−5.91
1100–999+10.34+89.86−9.99−88.901.4810.46
21000–9999+9.41+87.71−10.16−91.93−2.94−12.41

 *The detailed experiments for this set of attacks are presented in the Appendix (Tables 20(a)–20(f)).

It is to be noted that the data characteristics used for our experiments like digit, length, and range of data values are cohesive to each another. Due to this relationship, we evaluated the effect of malicious data modifications on these three data characteristics. For example, if Mallory maliciously inserts a digit in a data value, the length and range of the data value are also increased. Similarly, if Mallory maliciously decreases the length of a data value, the digit count and range of the data value are also decreased (Tables 9, 13, and 17). At the end, we summarize our findings and observations for characterization of malicious data modifications as follows. If there is a positive trend in fractional change Δℱ of data values in tempered database relation R′, it means that ℱ fraction of digit, range, and length of data values is maliciously inserted by Mallory in Alice's watermarked relation R. The characteristic of this attack is to relatively increase the low data values to high in database relation R (Tables 6, 10, and 14). If there is a negative trend in fractional change Δℱ of data values in tempered database relation R′, it means that ℱ fraction of digit, range, and length of data values is maliciously deleted by Mallory from Alice's watermarked relation R. The characteristic of this attack is to relatively decrease the high data values to low in database relation R (Tables 7, 11, and 15). If there is both positive and negative trends in fractional change Δℱ for digit, range, and length of data values in tempered database relation R′, it means that the negative trend fractional change Δℱ of data values is maliciously replaced (updated) by positive trend fractional change Δℱ of data values (Tables 8, 12, and 16). If there is a uniform increase of k in fractional change Δℱof all data values in tempered database relation R′, it means that k fraction of similar tuples is maliciously inserted by Mallory in Alice's watermarked relation R. The characteristic of this attack is to relatively increase the low data values to high in database relation R (Tables 9, 13, and 17). If there is a uniform decrease of k in fractional change Δℱ of all data values in tempered database relation R′, it means that k fraction of tuples is maliciously deleted by Mallory from Alice's watermarked relation R. The characteristic of this attack is to relatively decrease the high data values to low in database relation R (Tables 9, 13, and 17).

4. Conclusions

In this paper, a fragile watermarking scheme to detect and characterize malicious tempering made in database relations is presented. The proposed scheme is based on zero watermarking approach that does not alter the database original content, and thus it overcomes the limitation of data integrity and data usability in existing watermarking schemes. In the proposed scheme, the watermarks are generated by using the local characteristics of database relation itself, like frequency distribution of various digits, lengths, and ranges of data values. This enables us to characterize the malicious modifications made to the database relations. Experimental results showed that the proposed scheme can detect and characterize malicious data modifications successfully. In the future, we intend to work on some other local characteristics of relational databases for watermark generation and to extend the proposed scheme to semifragile watermarking schemes.

(a) Characterization of malicious insertion attacks on digit frequency (η = 106, attack rate = 10%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.442473458.4027085823513+9.51%↑Low to High
116.2847736216.3752753550173+10.51%↑Low to High
219.8858261619.9764374561129+10.49%↑Low to High
311.3733337811.3436547132093+9.63%↑Low to High
48.832587488.7928325124503+9.47%↑Low to High
58.312435548.2426551021956+9.01%↑Low to High
67.142093997.0922841819019+9.08%↑Low to High
76.401875866.4120665919073+10.17%↑Low to High
86.491902686.5120973719469+10.23%↑Low to High
96.862010506.8822165320603+10.25%↑Low to High

(b) Characterization of malicious insertion attacks on digit frequency (η = 106, attack rate = 90%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.442473458.51471852224507+90.77%↑Low to High
116.2847736216.73928331450969+94.47%↑Low to High
219.8858261619.971107797525181+90.14%↑Low to High
311.3733337811.70649150315772+94.72%↑Low to High
48.832587488.57475532216784+83.78%↑Low to High
58.312435547.99443279199725+82.00%↑Low to High
67.142093996.86380768171369+81.84%↑Low to High
76.401875866.36353031165445+88.20%↑Low to High
86.491902686.48359339169071+88.86%↑Low to High
96.862010506.83378696177646+88.36%↑Low to High

(c) Characterization of malicious deletion attacks on digit frequency (η = 106, attack rate = 10%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.442473458.48223528−23817−9.63%↓High to Low
116.2847736216.27429114−48248−10.11%↓High to Low
219.8858261619.74520475−62141−10.67%↓High to Low
311.3733337811.40300483−32895−9.87%↓High to Low
48.832587488.83232703−26045−10.07%↓High to Low
58.312435548.37220577−22977−9.43%↓High to Low
67.142093997.19189555−19844−9.48%↓High to Low
76.401875866.41168921−18665−9.95%↓High to Low
86.491902686.49171131−19137−10.06%↓High to Low
96.862010506.83180196−20854−10.37%↓High to Low

(d) Characterization of malicious deletion attacks on digit frequency (η = 106, attack rate = 90%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.442473458.8925612−221733−89.65%↓High to Low
116.2847736218.1452276−425086−89.05%↓High to Low
219.8858261619.8057072−525544−90.20%↓High to Low
311.3733337811.2232342−301036−90.30%↓High to Low
48.832587488.2423758−234990−90.82%↓High to Low
58.312435547.3821281−222273−91.26%↓High to Low
67.142093996.5318815−190584−91.01%↓High to Low
76.401875866.6219071−168515−89.83%↓High to Low
86.491902686.6719239−171029−89.89%↓High to Low
96.862010506.5218780−182270−90.66%↓High to Low

(e) Characterization of malicious update attacks on digit frequency (η = 106, attack rate = 10%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.4380482473458.492483571012+0.41%↑Low to High
116.2849647736216.464811863824+0.80%↑Low to High
219.8756558261619.81579315−3301−0.57%↓High to Low
311.3730233337811.16326311−7067−2.12%↓High to Low
48.8270552587488.82258020−728−0.28%↓High to Low
58.308722435548.36244519965+0.40%↑Low to High
67.1435392093997.17209680281+0.13%↑Low to High
76.39941875866.4218767892+0.05%↑Low to High
86.4908951902686.49189845−423−0.22%↓High to Low
96.8587182010506.81199111−1939−0.96%↓High to Low

(f) Characterization of malicious update attacks on digit frequency (η = 106, attack rate = 90%)

d i rf d i fd i rf d i fd iΔfd i Δℱd i Characteristic
08.4380482473458.462484621117+0.45%↑Low to High
116.2849647736218.8455333475972+15.91%↑Low to High
219.8756558261619.10560965−21651−3.72%↓High to Low
311.3730233337811.7434466511287+3.39%↑Low to High
48.8270552587488.16239767−18981−7.34%↓High to Low
58.308722435547.35215727−27827−11.43%↓High to Low
67.1435392093996.58193116−16283−7.78%↓High to Low
76.39941875866.481903302744+1.46%↑Low to High
86.4908951902686.621943844116+2.16%↑Low to High
96.8587182010506.67195785−5265−2.62%↓High to Low

(a) Characterization of malicious insertion attacks on length of data values (η = 106, attack rate = 10%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710857.08778756790+9.55%↑Low to High
219.7619760519.8021776720162+10.20%↑Low to High
345.1845178745.3249853146744+10.35%↑Low to High
427.9527952327.8030582726304+9.41%↑Low to High

(b) Characterization of malicious insertion attacks on length of data values (η = 106, attack rate = 90%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710857.5014258471499+100.58%↑Low to High
219.7619760519.72374737177132+89.64%↑Low to High
345.1845178745.16857986406199+89.91%↑Low to High
427.9527952327.62524693245170+87.71%↑Low to High

(c) Characterization of malicious deletion attacks on length of data values (η = 106, attack rate = 10%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710857.1964733−6352−8.94%↓High to Low
219.7619760519.71177421−20184−10.21%↓High to Low
345.1845178745.19406721−45066−9.98%↓High to Low
427.9527952327.90251125−28398−10.16%↓High to Low

(d) Characterization of malicious deletion attacks on length of data values (η = 106, attack rate = 90%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710856.226221−64864−91.25%↓High to Low
219.7619760521.1921186−176419−89.28%↓High to Low
345.1845178750.0450039−401748−88.92%↓High to Low
427.9527952322.5522554−256969−91.93%↓High to Low

(e) Characterization of malicious update attacks on length of data values (η = 106, attack rate = 10%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710856.8568488−2597−3.65%↓High to Low
219.7619760520.132013333728+1.89%↑Low to High
345.1845178745.894588697082+1.57%↑Low to High
427.9527952327.13271310−8213−2.94%↓High to Low

(f) Characterization of malicious update attacks on length of data values (η = 106, attack rate = 90%)

l j rf l j fl j rf l j fl jΔfl j Δℱl j Characteristic
17.11710854.3643599−27486−38.67%↓High to Low
219.7619760520.9720971112106+6.13%↑Low to High
345.1845178750.1650162249835+11.03%↑Low to High
427.9527952324.51245068−34455−12.33%↓High to Low

(a) Characterization of malicious insertion attacks on range of data values (η = 106, attack rate = 10%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698626.4928850626164+9.97%↑Low to High
1100–99948.73273673145.4349470946373+10.34%↑Low to High
21000–999925.68144222328.0830580626304+9.41%↑Low to High

(b) Characterization of malicious insertion attacks on range of data values (η = 106, attack rate = 90%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698626.83504566242224+92.33%↑Low to High
1100–99948.73273673145.27851193402857+89.86%↑Low to High
21000–999925.68144222327.90524663245161+87.71%↑Low to High

(c) Characterization of malicious deletion attacks on range of data values (η = 106, attack rate = 10%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698626.54236532−25810−9.84%↓High to Low
1100–99948.73273673145.28403562−44774−9.99%↑Low to High
21000–999925.68144222328.18251104−28398−10.16%↓High to Low

(d) Characterization of malicious deletion attacks on range of data values (η = 106, attack rate = 90%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698627.0826857−235485−89.76%↓High to Low
1100–99948.73273673150.1849757−398579−88.90%↑Low to High
21000–999925.68144222322.7422547−256955−91.93%↓High to Low

(e) Characterization of malicious update attacks on range of data values (η = 106, attack rate = 10%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698626.612633701028+0.39%↑Low to High
1100–99948.73273673145.974549566620+1.48%↑Low to High
21000–999925.68144222327.41271271−8231−2.94%↓High to Low

(f) Characterization of malicious update attacks on range of data values (η = 106, attack rate = 90%)

Range No r k rf r k fr k rf r k fr kΔfr k Δℱr k Characteristic
00–9925.59143698625.01246838−15504−5.91%↓High to Low
1100–99948.73273673150.1849524846912+10.46%↑Low to High
21000–999925.68144222324.81244819−34683−12.41%↓High to Low
  1 in total

1.  GenInfoGuard--a robust and distortion-free watermarking technique for genetic data.

Authors:  Saman Iftikhar; Sharifullah Khan; Zahid Anwar; Muhammad Kamran
Journal:  PLoS One       Date:  2015-02-17       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.