A Robust, Distortion Minimizing Technique for Watermarking Relational Databases Using Once-for-All Usability Constraints

A Robust, Distortion Minimizing Techniquefor Watermarking Relational DatabasesUsing Once-for-All Usability Constraints

M. Kamran, Sabah Suhail, and Muddassar Farooq, Member, IEEE

Abstract—Ownership protection on relational databases—shared with collaborators (or intended recipients)—demands developing a

watermarking scheme that must be able to meet four challenges: 1) it should be robust against different types of attacks that an

intruder could launch to corrupt the embedded watermark; 2) it should be able to preserve the knowledge in the databases to make

them an effective component of knowledge-aware decision support systems; 3) it should try to strike a balance between the conflicting

requirements of database owners, who require soft usability constraints, and database recipients who want tight usability constraints

that ensure minimum distortions in the data; and 4) last but not least, it should not require that a database owner defines usability

constraints for each type of application and every recipient separately. The major contribution of this paper is a robust and efficient

watermarking scheme for relational databases that is able to meet all above-mentioned four challenges. The results of our experiments

prove that the proposed scheme achieves 100 percent decoding accuracy even if only one watermarked row is left in the database.

Index Terms—Data usability constraints, distortion free, database watermarking, data quality, right protection, ownership protection,

robust watermarking

Ç

1 INTRODUCTION

WATERMARKING, without any exception, has been used

for ownership protection of a number of data

formats—images, video, audio, software, XML docu-ments, geographic information system (GIS) related data,

text documents, relational databases and so on—that are

used in different application domains. Recently, intelli-

gent mining techniques are being used on data, extracted

from relational databases, to detect interesting patterns

(generally hidden in the data) that provide significant

support to decision makers in making effective, accurate,

and relevant decisions; as a result, sharing of databetween its owners and data mining experts (or

corporations) is significantly increasing. Consequently, it

has become relevant (in this context) to explore suitable

watermarking techniques for ownership rights protection

of relational databases that should be imperceptible,

robust with blind decoding. Moreover, once the owner

of data embeds the watermark, the distortions in the

original data1 are kept within certain limits, which aredefined by the usability constraints, to preserve the

knowledge contained in the data.An intended recipient (Bob2) wants the data owner

(Alice3) to define tight usability constraints so that he gets

accurate data. For maximum robustness of watermark,

Alice, on the other hand, wants to have larger bandwidth on

manipulations performed during embedding of a water-

mark which is only possible if she puts soft usability

constraints [1], [2]. To conclude, Bob and Alice have

conflicting requirements: Bob wants “minimum distortions

in the watermarked data” while Alice wants to produce

“watermarked data having strong ownership.” Any water-

mark embedding technique that strives for a compromise

bandwidth allows an attacker (Mallory4) to corrupt or

remove the watermark by slightly surpassing the available

bandwidth. The compromise bandwidth is achieved

once Alice defines the usability constraints (by studying

the semantics of a given application) in such a way that the

embedded watermark is not only robust but also causes

minimum distortions to the underlying data. The job of

analyzing the semantics of each application and use it to

define usability constraints is not only cumbersome but also

inefficient for a data owner. Remember, the robustness of a

watermark is measured by the watermark decoding

accuracy that in turn depends on the bandwidth available

for manipulation. These observations provide us the

motivation for undertaking research reported in this paper.

2694 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 12, DECEMBER 2013

. M. Kamran is with the Next Generation Intelligent Networks ResearchCenter (nexGIN RC), the COMSATS institute of Information Technology,Wah Campus, Wah Cantt 47010, and the Department of ComputerScience, National University of Computer and Emerging Sciences(NUCES), A.K. Brohi Road, H-11/4, Islamabad 44000, Pakistan.E-mail: [email protected].

. S. Suhail is with the Next Generation Intelligent Networks ResearchCenter, Islamabad 44000, Pakistan. E-mail: [email protected].

. M. Farooq is with the Next Generation Intelligent Networks ResearchCenter (nexGIN RC) and the Institute of Space Technology, Islamabad44000, Pakistan. E-mail: [email protected].

Manuscript received 5 Mar. 2012; revised 11 Aug. 2012; accepted 1 Nov.2012; published online 20 Nov. 2012.Recommended for acceptance by G. Miklau.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-2012-03-0146.Digital Object Identifier no. 10.1109/TKDE.2012.227.

1. In this paper—unless otherwise specified—the terms data, data set,and database are used interchangeably.

2. In this paper, Bob is considered as the recipient of the data.3. In information theory, Alice is considered as the owner of the data.4. In information theory, Mallory is considered as an attacker.

1041-4347/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society

The major contributions of the work presented in this paperare presented in the following:

. We propose a novel watermark decoding algorithmthat ensures that its decoding accuracy is indepen-dent of the usability constraints (or availablebandwidth). As a result, our approach facilitatesAlice to define usability constraints only once for aparticular database for every possible type ofintended application. Moreover, it also ensures thatthe watermark introduces the least possible distor-tions to the original data without compromising therobustness of the inserted watermark.

. The proposed algorithm embeds every bit of amultibit watermark (generated from date-time) ineach selected row (in a numeric attribute) with theobjective of having maximum robustness even if anattacker is somehow able to successfully corrupt thewatermark in some selected part of the data set.

. We prove the robustness of our watermarkingscheme by analyzing its decoding accuracy underdifferent types of malicious attacks using a real-world data set.

. We provide solutions to resolve conflicting owner-ship issues in case of the additive attack in whichMallory inserts his own watermark in Alice’swatermarked database.

The remainder of the paper is organized as follows:Section 2 discusses the existing watermarking techniquesand their shortcomings. A brief overview of the differentstages of the proposed watermarking technique is describedin Section 3. Section 4 provides a detailed description of theproposed technique. The robustness study of the water-marking approach is presented in Section 5. Finally, weconclude the paper with an outlook to future work.

2 RELATED WORK

In this section, we will provide a brief overview of recentlyproposed watermarking techniques for relational databases.The objective is to clearly understand the limitations ofprior art. Agrawal and Kiernan [3] proposed a bit-resettingalgorithm that employs the principle of setting the leastsignificant bit (LSB) of the candidate attribute of the selectedsubset of tuples. The parameters selection for watermarkingis based on computing message authenticated code (MAC),where MAC is calculated using the secret key and thetuple’s primary key. This technique assumes unconstrainedLSB manipulation during watermark embedding process.Such out-of-bound modification of data might also generateundesirable results. Although LSB-based data hidingtechniques are efficient, but an attacker is able to easilyremove watermark by simple manipulation of data: forexample shifting LSB. Other bit-resetting techniques like [1],[4], [5], [6], [7], [8], [9] also have similar robustness relatedshortcomings as well.

Sion et al. [10] proposed a statistical-based algorithm inwhich a database is partitioned into a maximum number ofunique, nonintersecting subsets of tuples. The data parti-tioning concept is based on the use of special marker tuples,making it vulnerable to watermark synchronization errors

particularly in the case of tuple insertion and deletionattacks, as the position of marker tuples is disturbed bythese attacks. Such errors may be reduced if marker tuplesare stored during watermark embedding phase and thesame may be used for constructing the data partitionsagain during watermark decoding phase. But using thestored marker tuples to reconstruct the partitions violatesthe requirement of “blind decoding” of watermark.Furthermore, the threshold technique for bit decodinginvolves arbitrarily chosen thresholds—without followingany optimality criteria—that are responsible for the error inthe decoding process. The concept of usability bounds ondata is used in this technique to control distortionsintroduced in the data during watermark embedding.However, an attacker can corrupt the watermark bylaunching large scale attacks on large number of rows.Moreover, the decoding accuracy is dependent on theusability bounds set by the data owner; as a result, thedecoding accuracy is deteriorated if an attacker violatesthese bounds. An important shortcoming of this approach isthat the data owner needs to specify usability constraintsseparately for every type of application that will use data.Later improvements [2], [11], [12], [13] have tried to solvethe problem of synchronization errors only.

Another class of watermarking techniques is distortion-free techniques. Using these techniques, data is delivered tothe intended recipients without making any distortion inthe data. The techniques reported in [14] and [15] arevulnerable to even minor malicious attacks, and therefore,cannot be used for enforcing ownership protection.

The constrained data content modifying techniques addsome new content in the database to embed the watermark.The content is added subject to given usability constraints.If an attacker successfully attacks the watermarked content,the watermark information is lost without compromisingdata quality. The techniques reported in [16] and [17] facesuch problems.

A recent survey [18] presented a review of databasewatermarking techniques but, to the best of our knowledge,no watermarking technique for relational databases existswhich ensures that the decoding accuracy is independent ofthe usability constraints; as a consequence, an owner doesnot need to define them for each type of intendedapplication and use. Our proposed technique not only hasthis feature but it is also able to meet the conflicting“robustness requirement” of the data owner and “minimumdistortions requirement” of the intended recipient.

3 APPROACH OVERVIEW

Fig. 1 shows the block diagram summarizing the maincomponents of our watermarking technique. The date-timeis used to generate the watermark bits. Using date-time, asthe foundation of a watermark, may also serve the purposeto counter additive attacks. We will shortly discuss thispoint in detail.

A robust watermark algorithm is used to embed water-mark bits into the data set of Alice. The watermarkembedding algorithm takes a secret key (Ks) and thewatermark bits (W ) as input and converts a data set D intowatermarked data set DW . For an easy reference, Table 1lists the major symbols used in this paper.

KAMRAN ET AL.: A ROBUST, DISTORTION MINIMIZING TECHNIQUE FOR WATERMARKING RELATIONAL DATABASES USING ONCE-FOR-... 2695

The modifications (distortions) made by watermarkingare bounded by the usability constraints matrix G. In ourtechnique, it is defined only once for every possible type ofapplication that will eventually use the data set.

The watermark encoding process can be summarized inthe following steps:

Watermark bits generation. Watermark bits string “W” isgenerated from UTC (Coordinated Universal Time) date-time which is the primary time standard used to synchro-nize the time all over the world [19]. These bits are given asinput to the watermark encoding function.

Data partitioning. The data set D is partitioned intom nonoverlapping partitions by using the secret key Ks inconjunction with a cryptographic secure hash function.

Selection of data set for watermarking. To minimizedistortions, only few tuples are selected for watermarkingin this step.

Watermark embedding. The watermark bits are embeddedin the selected tuples using a robust watermarking function.The bit embedding statistics � are used to compute thecorrection factor � . Our technique embeds each bit of thewatermark in every selected tuple of each partition; as aresult, it is robust against malicious attacks even (after anattack) if only one watermarked row is left in the data. Thewatermarked data set DW is delivered to Bob where anintruder—Mallory—aims at destroying the watermark bylaunching different types of attacks.

Watermark decoding is the process of extracting theembedded watermark from the watermarked data set DW ,using the secret parameters: the secret key Ks, correction

factor � , and decoding threshold �. The decoding algorithmis blind as the original data set D is not needed forsuccessfully decoding the embedded watermark.

The watermark decoding process can be summarized inthe following steps:

Data partitioning. The data partitions are generated byusing the same data partitioning algorithm as in thewatermark encoding phase.

Identification of watermarked data set. The watermarkedtuples are identified using the same procedure that has beenused to select them for inserting watermark bits in theencoding phase.

Watermark decoding. In this stage, the correction factor �and the decoding threshold � are used to decode thewatermark bits. The decoding algorithm is blind and itsdecoding accuracy does not depend on the usabilityconstraints. As a result, 100 percent decoding accuracy isachieved irrespective of the amount of data alterationsmade by an attacker in the watermarked data.5

Majority voting. The majority voting is used to correctlydecode an inserted watermark bit. This step is optional and isdone to provide security against a sophisticated attackerwho is able to flip the watermark bits in selected tuples only.

4 PROPOSED METHODOLOGY

4.1 Data Partitioning

The data set D is a database relation with schemeD ¼ ðPK;A0; . . . ; An�1Þ, where PK is the primary keyattribute and A0; . . . ; An�1 are n other attributes. Thepartitioning algorithm divides the data set D into mnonoverlapping partitions namely fS0; . . . ; Sm�1g such thatfor any two partitions Si \ Sj ¼ ;. Moreover, the partitionsets must be nonempty and collectively exhaustive to Dsuch that S0 [ S1 [ � � � [ Sm�1 ¼ D. The data partitioningalgorithm partitions the data set into logical groups byusing data partitioning algorithm proposed in [2]. Partition-ing is based on a secret key Ks and a cryptographic hashfunction Message Digest (MD5) [20].

Definition 1 (Hash Function). A hash function H maps avariable-size input � to a fixed-size string h, called the hashvalue h, as

H : �! h: ð1Þ


TABLE 1Notations

5. The decoding accuracy may decrease in case of combination ofdifferent attacks.

Fig. 1. Stages of watermark encoding and decoding.

For each tuple r�D, the data partitioning algorithmcomputes MAC to assign tuples to the partitions using ahash function H as

parðrÞ ¼ HðKskHðr:PKkKsÞÞ mod m ð2Þ

where r:PK is the primary key of the tuple r, HðÞ is asecure hash function and k is the concatenation operator.Algorithm 1 lists the steps of data partitioning process.

Algorithm 1. Get_Partitions.

Input: Data Set D, Secret Key Ks, Number of partitions mOutput: Data partitions S0; . . . ; Sm�1

1: for each Tuple r�D do

2: parðrÞ HðKsk Hðr:PKkKs)) mod m

3: insert r into SparðrÞ4: end for

5: return S0; . . . ; Sm�1

4.2 Selection of Data Set for Watermarking

The following two steps are applied on the data set to selecttuples for watermarking.

4.2.1 Threshold Computation

In this step, a threshold is computed for each attribute. If thevalue of any attribute of a tuple is above its respectivecomputed threshold, it is selected for watermarking.

Definition 2 (Data Selection Threshold). Given a data set D,a function � is used to calculate data selection threshold forconstructing D

0

T from D:

� : D! D0

T : ð3Þ

The data selection threshold for an attribute is calculatedby using the following equation:

T ¼ c � �þ �; ð4Þ

where � is the mean, � is the standard deviation of thevalues of an attribute A in D, and c is the confidence factorwith a value between 0 and 1. The confidence factor c iskept secret to make it very difficult for an attacker to guessthe selected tuples in which the watermark is inserted.

We select only those tuples, during the encoding process,whose values are above T .

Remark 1. Note that the manner in which T is calculatedhere is different from the one used in [10] and [2], and[11], where c is multiplied by � instead of �. The majorshortcoming of the data selection threshold formula(or reference point formula) in [10], [2], and [11] is that anattacker may roughly guess the fact that the tuples—having value above �—were watermarked by simplyobserving that adding any positive number in � willresult in a value higher than �, if the value of confidenceparameter c is between 0 and 1. On the other hand, in ourapproach an attacker cannot guess the values of water-marked tuples because any tuple— having value below,equal to, or above �—can be selected for watermarkingdepending on the value of the secret parameter c.

In this way, we reduce the number of tuples to bewatermarked; as a result, data distortions during watermark

embedding are minimized. Algorithm 2 depicts differentsteps of this phase.

Algorithm 2. Get_Data_Selection_Threshold.

Input: Data partitions S0; . . . ; Sm�1; c

Output: Data Set D0

T

1: for i ¼ 0 to m� 1 do

2: for each Attribute A�Si do

3: Compute � and � on A

4: Calculate T using (4)

5: end for

6: end for

7: return D0

T 8R>T

To ensure that the tuples, for which if any of the attributevalue is above T , are included in the to-be-watermarkedtuples set, a union of tuples in this phase is taken. Formallyspeaking, a table may be represented as

A0 A1 . . . An � 1

R0

R1

..

.

R��1

�00 �01 . . . �0ðn�1Þ

�11 �11 . . . �1ðn�1Þ

..

. ...

. . . ...

�ð��1Þ0 �ð��1Þ1 . . . �ð��1Þðn�1Þ

0BBBBB@

1CCCCCA

If we consider the tables’ rows and columns as a matrix,the union of attributes (A0 toAn�1) of a tuple (R0) based on thecomputed threshold value (T0) is: �00 U �01 U � � �U �0ðn�1Þ.Where� represents the data values. Finally, the tuples that areabove the threshold value of their respective attributes, i.e.,8R>T will transform the data set D into D

0

T . Hence,

D0

T 8R>T : ð5Þ

After this step, the data set D0

T is given as an input to thenext phase.

4.2.2 Hash Value Computation

In this step, a cryptographic hash function MD5 is appliedon the selected data set to select only those tuples whichhave an even hash value. This step achieves two objectives: 1) itfurther enhances the watermark security by hiding the identity ofthe watermarked tuples from an intruder; and 2) it furtherreduces the number of to-be-watermarked tuples to limitdistortions in the data set.

The data set D0T is used to select tuples with even hash

values and put them in the data set data set D00

T . The stepsinvolved in this phase are illustrated in Algorithm 3.

Algorithm 3. Get_Even_Hash_Value_Data Set.

Input: Data Set D0

T ;Ks

Ouput: D00T

1: for each r�D0T do

2: Even ValueðrÞ ¼ HðKskr:PKÞ mod 2

3: if Even ValueðrÞ ¼¼ 0 then

4: insert r into D00T

5: else

6: don’t consider this tuple for watermarking

7: end if

8: end for

9: return D00T


The data set D00T , consisting of tuples, is the subpart of

the data set D and is not physically separated from the rest

of the parts of D.Note: As selection of tuples is also based on the value of

data selection threshold, Mallory may try to corrupt the

embedded watermark by changing the data values such thatthe data selection threshold value is disturbed and hence Alice

is unable to detect the watermarked tuples in the watermark

detection phase. But, since Mallory has no knowledge of theconfidence factor c; therefore, he may be able to only

arbitrarily attack some selected part of the watermarked

data to corrupt the watermark with some probability P . Thisprobability is made smaller by using data selection threshold T

and even hash values as proven in Proposition 1.

Definition 3 (PSuccessð!Þ). PSuccessð!Þ is the probability that an

attacker is successful in changing the values of !% water-

marked tuples such that data selection threshold is modified.

Proposition 1. PSuccessð!Þ approximately approaches to zero, in

case of large databases.

Proof. If PSuccessð!Þ denotes the probability that Mallory is

successful in changing the values of !% watermarkedtuples such that data selection threshold is modified. Now,

for a watermarked data set with a total of � tuples and n

numeric attributes (other than the primary key), Malloryhas to target at least (�2 þ 1) tuples to change the value of

more than 50 percent watermarked tuples to undo the

effect of the majority voting, so

! ¼ �2: ð6Þ

If represents the number of watermarked tuples, theprobability for any particular tuple being watermarkedis

� . So, for Mallory, the probability of successfullychanging the value of �2 þ 1 watermarked tuples such thatdata selection threshold is modified is

PSuccessð!Þ ¼

�

� � �2þ1ð Þ

: ð7Þ

Substituting (6) in (7), we get

PSuccessð!Þ ¼

�

� �ð!þ1Þ: ð8Þ

Recall that all the numeric attributes play their role indata selection threshold; therefore, the above equationbecomes

PSuccessð!Þ ¼

�

� �ð!þ1Þn

: ð9Þ

It is clear from (9) that smaller values of the fraction �

means that the probability of successfully attackingwatermarked tuples is also smaller. This value will becomeeven smaller for larger databases; hence we can write

lim�ð Þ!0

) PSuccessð!Þ ! 0: ð10Þ

tu

Let us take an example of a very small data set that has100 tuples and each tuple has three attributes. Suppose inthis data set, only 10 tuples are selected for watermarking.The probability of successfully changing the value of dataselection threshold by attacking (50 percent þ 1) of water-marked tuples (six tuples to be more precise) is

PSuccessð!Þ ¼10

100

� �ð1002 þ1Þ3

¼ 10

100

� �ð51Þ3

¼ 1� 10ð�51Þ3 ¼ 1� 10�153:

This probability is very small, and will become smallerfor larger data sets. Moreover, ideally speaking, anattacker would not like to change the values of tuplestoo much because this will make the data absolutelyuseless for any recipient.

4.3 Generation of Watermark Bits

In this step, date-time stamp is used to generate water-mark bits.

Definition 4 (Watermark Generating Function). A water-mark generation function � transforms an alpha-numericstring to an l-bits long binary bit string fb0b1b2 . . . bl�1g:

� : ! fb0b1b2 . . . bl�1g: ð11Þ

The watermark generating function � takes date-timestamp as an input and then generates watermark bitsfb0b1b2 . . . bl�1g from this date-time stamp. The date-timestamp “might” also help to identify additive attacks inwhich an attacker wants to rewatermark the data set. Toconstruct a watermarked data set, these watermark bits areembedded in the original data set by using the followingwatermark embedding algorithm.

4.4 Watermark Embedding Algorithm

The watermarking algorithm uses multibit watermarkingproperty and is scalable to any number of attributes. For abetter understanding, we assume that the partition set Si indata set D

00T contains a single attribute �i"Si. The encoding

function generates bits b0; b1; b2; . . . ; bl�1, where l is thelength of the watermark. Since our technique embedswatermark bits—b0; b1; b2; . . . ; bl�1—in each partition of Si;therefore, the watermark bits can be recovered from theremaining partitions if the watermark is removed from aparticular partition Si.

Definition 5 (Watermark Embedding Function). Watermarkembedding function � transforms the input data D to awatermarked data DW after performing some datamanipulations.

� : ðD;WÞ ! DW: ð12Þ

Definition 6 (Data Manipulations Vector). Data manipula-tions vector keeps a record of transformations of D into DW as

� ðDW �DÞ: ð13Þ

A parameter �ij denotes the one instance of datamodification performed on a tuple i in jth column.


The watermark embedding function � uses the para-meter �ij to modify the attribute data value �ij of a tuple i injth column. The watermark embedding function for a tuplei in jth column is

� ¼ �ij þ �ij: ð14Þ

If the bit b is equal to 1, the bit encoding algorithmcomputes the �ij, subject to the constraint set G, on the datavalue (�ij) of an attribute as

�ij ¼ % of �ij with > 0: ð15Þ

Similarly, if the bit b is equal to 0, then �ij is computed as

�ij ¼ % of �ij with < 0: ð16Þ

The value of remains fixed for every watermarkedtuple. It is same in magnitude for watermark bit 0 and 1 buthas an opposite sign.6 This value of (along with theassociated sign) is used during watermark decoding phase.It is important to emphasize that is not contained in theoriginal data; therefore, its use in the watermark decodingphase does not violate the property of blind watermarkdecoding. Its value is defined by the data owner and is alsokept secret. Our technique brings an overall change in thedata values of attributes using (15) and (16) instead of LSBonly to overcome shortcomings of LSB-based techniques.

To ensure that �ij changes the value of �ij within allowedlimits, a feasible range for manipulating the data set isdefined by the constraint setG. This set ensures the usabilityof data by enforcing a tolerance level on the value of eachattribute �ij. As a rule of thumb, the usability constraints aredependent on an application: for example, a player couldonly be placed in the data set of a Junior team if his age is inbetween a maximum and minimum value. In this applica-tion, the minimum and maximum value could be used todefine the degree by which the age of a player can bemodified (�ij) during the watermark encoding process:

�min <¼ �ij <¼ �max:

Sometimes, the constraints are imposed on data statistics.For example, a data set may require that the mean or thestandard deviation of the watermarked data must be equalto the mean or the standard deviation of the original data:

�D ¼ �DW; and �D ¼ �DW

:

In our technique, we define the usability constraints interms of the mean and the standard deviation of thewatermarked attribute. For this, it is ensured that the twomeasures remain approximately the same before and afterwatermarking of data. This objective is achieved by usingvery tight usability constraints; as a consequence, therequirement for minimum data distortions is implicitlymet. The data manipulation statistics in � are recorded foreach encoding step in � and are used to compute the secretcorrection factor � .

Since the data set D00T is the part of a particular partition

Sk; therefore, the partitions SWk contains these watermarkedrecords. All such SWk ’s are inserted into DW to get thewatermarked data set. The data set DW is then made

accessible for the intended recipients. The steps of embed-

ding phase are depicted in Algorithm 4.

Algorithm 4. Embed_Watermark.

Input: Data Set D, Data Set D0

T , W, Secret

Key Ks

Ouput: Watermarked data set DW , Correction factor �

1: DW ¼ D2: D

00T ¼ Get_Even_Hash_Value_Data Set(D

0T ,Ks)

3: for each row r in D00T do

4: temp¼ r.PK5: if b ¼¼ 1 then

6: compute � using (15) subject to G

7: else

8: compute � using (16) subject to G

9: end if

10: DWtemp ð� þDWtemp

Þ11: end for

12: insert � into �

13: Compute �

14: return DW; �

4.4.1 Computing Correction Factor Value for Making

Decoding Accuracy Independent of Usability

Constraints

Definition 7 (Data Usability Constraints). Given a data setD, the data usability constraints G is a set with elements � and�, for bounding data manipulations vector �, to transform Dinto DW .

These data usability constraints usually affect the water-

mark decoding accuracy. But we want to make the decoding

accuracy independent of these usability constraints.

Definition 8 (Usability Independent Watermark Decod-

ing). Given a set of usability constraints G, usabilityindependent watermark decoding implies that the decodingaccuracy is not a function of any element of G.

The set of attacks may include different combinations oftuple deletion, insertion, and alteration attacks and mightwell surpass the available bandwidth identified by usingthe usability constraints.

Lemma 1. The watermark decoding accuracy can be independentof usability constraints G, if and only if the watermarkdecoding accuracy remains unchanged even if an attacker isable to surpass the bounds on usability constraints.

Proof (Proof by Contradiction). Assume to the contrarythat the decoding accuracy is changed, if an attacker isable to surpass the usability constraints and still thedecoding accuracy is said to be independent of theusability constraints. Let DA1 be the decoding accuracyof watermarking decoding algorithm for decoding theembedded watermark from an unattacked watermarkeddata set DW1. For testing the resilience of embeddedwatermark, the data owner Alice launches an attackAttack1, which surpasses some of the usability con-straints set G with elements � and �, and produces anattacked data set DW2. Now, Alice decodes the em-bedded watermark from DW2 with DA2 accuracy.


6. The negative value of means the value of an attribute is decreased by percent.

Now, there are two cases: 1) DA2 ¼ DA1 and2) DA2 6¼ DA1.

But, since the data usability constraints are surpassed(� and � have changed), so according to our assumption:

DA2 6¼ DA1:

But this is not possible for decoding accuracy to beindependent of usability constraints as it is against thedefinition of usability independent watermark decodinggiven in Definition 8; so DA2 needs to be equal to DA1 forour assumption to be true. tu

Definition 9 (Decoding Threshold). Given a watermarkeddata set DW , decoding threshold � is a variable that is used todecode a watermark W from DW .

The decoding threshold is computed using a correctionfactor.

Definition 10 (Correction Factor). Given a watermarked dataset DW , correction factor � is a constant to set the value of �such that a watermark W is correctly decoded from DW .

The relation for computing the value of � is

� ¼ val� � ð17Þ

The parameter val is calculated using the relation:

val ¼ % of �0

ij; ð18Þ

where �0

ij represent the value of jth attribute in ith row. Themotivation for using � along with val in (17) is to account forthe possible errors introduced by an attacker in thewatermarked data set.

We have done a number of pilot studies to conclude thatthe correction factor must be less than the minimum value of� for every watermarked tuple if the decoding accuracy wereto be made independent of the usability constraints.Following theorem explains the logic behind this conclusion.

Theorem 1. The value of correction factor � must be less than theminimum absolute value of �� to get an appropriate value of�, if the decoding accuracy were to be made independent of theusability constraints.

Proof. Let �ij be the value of an unsigned numeric attributeA in ith tuple in the original data set. After embedding awatermark bit b, the value of the attribute becomes �

0

ij.Now, according to Algorithm 4, if b ¼ 1, then > 0 asdepicted in (15), and if b ¼ 0, then < 0 according to (16).

The value �0ij is computed as

�0

ij ¼ �ij þ �ij: ð19Þ

Suppose that the value of decoding threshold �(Definition 9) is greater than or equal to zero if theembedded bit was 1; otherwise, the value of � is lessthan zero.

Now using (17) and (18), and the fact that �0

ij > 0: if > 0 then val > 0; and if < 0 then val < 0. So, theparameter val is always greater than zero if embeddedbit is 1 (as is positive according to (15)). Therefore, for �to be greater than or equal to zero (and hence to decodethe watermark bit as 1), the necessary condition is

val > �:

Similarly, if the embedded bit was 0 then theparameter val is always less than zero, (as is negativeaccording to (16)). Therefore, for � to be less than zero(and hence to decode the watermark bit as 0), thenecessary condition is

val < �:

These two conditions are only true if and only if � isless than the absolute value of the parameter val, so

� < valj j:

According to Lemma 1, the watermark decodingaccuracy can only be independent of usability constraintsif the above condition is met for every possible value ofthe parameter val even if an attacker is able to surpassthe bounds on the usability constraints. Also, if usabilityconstraints are tight and the parameter val is calculatedusing (18); then it is similar to calculating � (see (15) and(16)), except val will be calculated from a different dataset (the watermarked data set). As a consequence, thevalues of parameters val and � will be approaching eachother; so valij � �ij for a tuple i in jth column. Hence, wecan conclude, the necessary condition for value of � is

� < �j j:

This condition needs to be satisfied for everypossible absolute value of �; therefore, the abovecondition becomes

� < j�minj;

where �min denotes the minimum value of � among all its

possible values and hence the theorem is proven. tuFig. 2 shows the feasible region for different values of � for

different usability constraints, as identified by the outcome of

our pilot studies. In this figure y-axis shows the minimum

absolute values of � obtained with values of ¼ f0:1%;

0:2%; . . . ; 1%; 20%; . . . ; 10%g and x-axis shows the corre-

sponding optimum upper bound for the values of � bounded

by its maximum value for corresponding value of .

Corollary 1. For every possible value of � (defined using

Theorem 1) and �0ij > 0; if � is positive, a bit is decoded as 1;

and if � is negative, and a bit is decoded as 0.

Corollary 2. � as defined above, ensures that the watermark

decoding accuracy is independent of each element of usability

constraints set G with elements � and �.


Fig. 2. Feasible region for values of � .

Proof. Let �1 and �1 be the mean and standard deviation of

the all the values of jth attribute in a data set DW . Also,

consider that all values of jth attribute are greater than

zero. If DW has � tuples, �1 can be calculated as

�1 ¼P�

i¼1�0ij

� and �1 as �1 ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�

P�i¼1

��0ij � �1

�2q

where

�0ij is the value of ith tuple in jth attribute.

Suppose, an attacker—Mallory—launches attacks onthis attribute such that new mean and standard deviationbecome �2 and �2 and all values of jth attribute remaingreater than zero. If �

0ij��

0represents the change (positive

or negative) made by Mallory in a tuple i of column j, thenew value becomes

�00

ij ¼ �0

ij þ �0

ij: ð20Þ

And for all the tuples in DW , the new data set afterattacks can be represented as

D0

W ¼ DW þ�0: ð21Þ

Since, the tuples’ values have been changed and valueof �2 and �2 depend on the values of tuples (see formulasfor � and �); therefore, we have four cases:

1. �2 ¼ �1 and �2 ¼ �1;2. �2 ¼ �1 and �2 6¼ �1;3. �2 6¼ �1 and �2 ¼ �1; and4. �2 6¼ �1 and �2 6¼ �1.

For all of these four cases, only two options for � areavailable: 1) for > 0, (17), (18), and (19) give � > 0 and abit is decoded as 1 (Corollary 1 of Theorem 1) irrespectiveof the amount change represented by �2 and �2; and2) similarly for < 0, (17), (18), and (19) give � < 0 and abit is decoded as 0 (Corollary 1 of Theorem 1) irrespectiveof the amount change represented by �2 and �2.

Therefore, � ensures that the watermark decodingaccuracy is independent of each element of usabilityconstraints set G. tu

4.4.2 Once-for-All Usability Constraints

Before going into the details of this section, it is important todefine some basic concepts.

Definition 11 (Information Loss). If I is the informationobtained from the original data D, and IW is the informationobtained from the watermarked data DW , then percentageinformation loss ILoss, as a result of watermarking is

ILoss ¼ðI � IW Þ

I� 100

��: ð22Þ

We measure information loss in terms of data statisticssuch as mean, standard deviation, and data distribution andso on. The data distortions, in turn, are represented in termsof information loss. The small values of information lossmean less data distortions and vice versa.

Definition 12 (Fit Data). A watermarked data DW is said to befit for a particular application App if it does not violate thebounds on data distortions related to App.

Lemma 2. If a watermarked data set DW1 is fit for an applicationwhich allows the minimum possible distortions d1 in the

original data then DW1 is fit for all other possible applications

of the same data set.

Proof. Let fdu1 ; du2 ; . . . ; du@g be the sorted list (in ascendingorder) of constraints on the upper bound of data

distortions fd1; d2; . . . ; d@g acceptable by @ data recipi-

ents fRec1; Rec2; Rec3; . . . ; Rec@g for applications fApp1;

App2; App3; . . . ; [email protected] fILoss1; ILoss2; . . . ; ILoss@g is the information loss after

distortions fd1; d2; . . . ; d@g in the original data set D, thenby using Definition 11:

ILoss1 < ILoss2 < ILoss2 . . . < ILoss@�1 < ILoss@:

Now, as the minimum information loss ILoss1 ispossible with distortions d1 having upper bound du1 ; and:

du1 < du2 < � � � < du@�1 < du@:

So, the constraints on upper bounds fdu2 ; du3 ; . . . ; du@g ofdata distortions fd2; d3; . . . ; d@g, acceptable by datarecipients fRec2; Rec3; . . . ; Rec@g, will always be satisfiedby du1 and hence, the data set DW1 will be fit for allapplications fApp1; App2; App3; . . . ; App@g. tu

Definition 13 (Severity of Attack). The severity of an attack, &,

is a vector containing percentage of attacked tuples and degree

of alteration in certain statistics:

& ¼ f�; �g; ð23Þ

where � is the percentage of attacked tuples and � is the

degree of alteration. The degree of alteration for tupleinsertion and tuple alteration attacks is the amount of

alteration. In comparison, for tuple deletion attacks &

contains � denoting the percentage of usability constraints

violated during launch of deletion attacks.

Definition 14 (Once-for-All Usability Constraints). Usabil-

ity constraints can be said to be “once-for-all” if a particular

usability constraints matrix Gp results in minimum possible

distortions, acceptable to the recipient, yet ensuring the

maximum possible watermark robustness—acceptable to the

data owner.

Theorem 2. If watermark decoding accuracy is independent of

usability constraints, then usability constraints definition is

“once-for-all.”

Proof. Let fG1; G2; . . . ; G@g be the usability constraints for

watermarking to deliver the watermarked data to @ data

recipients. Also, suppose that the set fd1; d2; . . . ; d@gdenotes the sorted list (in ascending order) of the

amount of data distortions acceptable by @ datarecipients fRec1; Rec2; . . . ; Rec@g with the corresponding

usability constraints fG1; G2; . . . ; G@g. So, Rec1 is the

data recipient who accepts the minimum possibledistortions d1 in the original data set D after embedding

a watermark W . Also, let that du1 be the upper bound forthe distortions, d1.

If Robmax is the maximum watermark robustnessachieved by watermark W with du1 distortions, then forRobmax, the decoding accuracy DA after any attack(insertion, deletion, alteration, and every possible com-bination of these attacks), with severity &, is 100 percent.


But according to Theorem 1, the value of � is such thatDA is 100 percent—independent of usability constraintsand for that matter DA is independent of the amount ofdistortions fd1; d2; . . . ; d@g. So, Robmax is achieved byhaving any amount of distortion from the possible datadistortions set fd1; d2; . . . ; d@g.

Also other data recipients fRec2; Rec3; . . . ; Rec@gwould allow data distortions with an upper boundfdu2 ; du3 ; . . . ; du@g respectively, which could effectivelyallow more robustness than allowed by data distortionsdu1 . But, as Alice has already achieved Robmax with du1 ;therefore, she does not need to define different datausability constraints for any other possible application oruse of the same data set with maximum alloweddistortions in the set fdu2 ; du3 ; . . . ; du@g. Moreover, accord-ing to Lemma 2 the data with distortions d1 will be fitfor all applications which allow data distortions fd2;d3; . . . ; d@g. So, strictly speaking Alice has defined “once-for-all” usability constraints for watermarking a data setD for every possible application and use. tu

Here, we revisit the claims made about the majorcontributions of this paper in Section 1 and we prove thatthey have been met by our watermark embedding approach.

1. A novel watermark decoding algorithm which:a) ensures that its decoding accuracy is independentof the usability constraints (or available bandwidth);and b) enables “once-for-all” usability constraintsdefinition by providing the maximum robustnesswith the least possible distortions.

2. The robustness of watermark is achieved even if anattacker is somehow able to successfully corrupt thewatermark in some selected part of the data set.

3. The robustness of proposed watermarking schemeby analyzing its decoding accuracy under differenttypes of malicious attacks other than additive attacks.

4. The robustness of proposed scheme against additiveattacks is achieved by ensuring that Mallory cannotremove the Alice’s watermark from her water-marked data set and hence she can prove thepresence of her watermark in Mallory’s water-marked data set.

Remark 2. Theorems 1 and 2 prove the major claim (Claim 1)by achieving two objectives: 1) making the decodingaccuracy independent of the usability constraints; and2) defining “once-for-all” usability constraints. The otherthree claims are dependent (minor) claims that have beenimplicitly proven.

4.5 Watermark Decoding

The watermark decoding algorithm extracts the embeddedwatermark using the secret parameters: Ks;m; � . Thewatermark bits are decoded in the reverse order—the lastembedded bit is decoded first and so on. This order ispreferred because it is easier to detect the manipulationsdone while computing the last watermark bit. Thealgorithm starts by generating the data partitionsS0; . . . ; Sm�1 using the watermarked data set DW , thesecret key Ks and the number of partitions m as input to

the data partitioning algorithm. It then generates data setD00T from D

0T by computing even hash values as discussed

in Section 4.2. In the next step, a parameter val iscomputed using (18).

To extract the embedded bit, the decoding threshold �(calculated using (17)) is used to decide whether thedecoded bit is 0 or 1.

During the decoding phase, if � for a tuple is greaterthan or equal to 0, the decoded watermark bit is 1;otherwise, it is 0. Recall that in the watermark embeddingphase, if for a particular watermark bit, the change in datastatistics was negative then that change can always bedetected as negative by utilizing the knowledge of untiland unless an attacker is able to change the value ofwatermarked tuple to zero or multiply the same with anegative number. After decoding all the bits from thewatermarked data set, the majority voting scheme is usedto eliminate decoding errors (if any) as a result of maliciousattack (or attacks). The steps of watermark decoding areshown in Algorithm 5.

Algorithm 5. Detect_Watermark.

Input: Watermarked data set DW;Ks;m; � , Watermark

length lOuput: Detected Watermark WD

1: ones ¼ 0

2: zeros ¼ 0

3: SW0; . . . ; SWm�1

¼ Get_Partitions (DW; Ks; m)

4: for each partition SWido

5: D0WT¼ Get_Data_Selection_Threshold(RW>T

,c)

6: D00WT¼ Get_Even_Hash_Value_Data Set(D

0WT;Ks)

7: for each row r in D00WT

do

8: � ¼ val-�

9: if � � 0 then

10: ones½i� ¼ ones½i� þ 1

11: else

12: zeros½i� ¼ zeros½i� þ 1

13: end if

14: end for

15: if ones½j� > zeros½j� then

16: b½j� ¼ 1

17: else

18: b½j� ¼ 0

19: end if

20: end for

21: return WD

It is important to emphasize that our watermark decodingalgorithm is blind because it does not need the original data or theembedded watermark bits during the watermark decoding process.

5 EXPERIMENTS AND RESULTS

In this section, we report the results of our experiments tosubstantiate the claimed merits of our watermarkingapproach. The major motivation of designing experimentsis to prove the Claim 1 which guarantees that the decodingaccuracy is independent of the usability constraints underdifferent attack scenarios (see last section). We have selecteda subset of 50,000 tuples from a real-life data set that showsthe power consumption rates of consumers. The data set is


available through CIMEG project.7 The watermark length

is 29 bits (as the conversion of UTC data-time to binary

string yields a bit string consisting of 29 bits). The number

of partitions m ¼ 100 and ¼ 0:3 is used. Moreover,

the value of � is calculated using Theorem 1. All experi-

ments have been performed on a server that has Pentium(R)

Dual-Core CPU 2.10 GHz with 4 GB of RAM.

5.1 Data Distortions

Table 2 shows the values of parameters � and � for original

and watermarked data once we set the above-mentioned

parameters. One can easily notice that the changes in

� (mean) and � (standard deviation) are 0.004 and

0.003 percent, respectively, that are very small; hence, Bob’s

requirement for minimum data distortions is fulfilled. We

can further reduce these values by applying tighter

usability constraints and it will not affect the decoding

accuracy as proven in Theorem 1. Hence, a subpart of the

Claim 1 about “minimizing data distortions” has been

proven by this experiment.

5.2 Attack Analysis

Consider Alice generates a data DW by inserting a water-

mark W in the data set D using our watermark embedding

function �. An attacker—Mallory—wants to launch differ-

ent types of attacks on the watermarked data to corrupt or

delete the watermark but at the same time he wants to

preserve data quality so that it remains useful for recipients

as well. We suppose that he has no access to the original

data set D and does not have access to the secret parameters

used in the embedding of the watermark: Ks;m; c; , and � .

Mallory, with the above-mentioned assumptions, cannot

generate the data partitions fS0; . . . ; Sm� 1g because this

requires the knowledge of Ks and m; as a result, he cannot

corrupt (or remove) certain watermark bits from some

selected partitions. Moreover, it is also not possible for him

to guarantee that his attack will not violate the usability

constraints because he does not have access to the original

data set D. Mallory’s Dilemma (as an attacker) is: How can

he successfully corrupt the watermark without violating the

usability constraints? We have studied the robustness of our

watermarking scheme against tuple deletion, insertion, and

alteration attacks. Moreover, we have also analyzed

sophisticated attacks—multifaceted and additive—in our

attack model as well. We also test the distinguished

characteristic of our technique, usability constraints indepen-

dent watermark decoding accuracy, by launching the afore-

mentioned attacks with a severity that violates the usability

constraints by surpassing the available bandwidth.

5.2.1 Deletion Attack

In this attack, Mallory deletes selected tuples from awatermarked data set with an aim to remove the water-mark. He can either randomly delete tuples or selects themin a sophisticated manner on the basis of statisticaldistribution of attribute values. Suppose he drops � tuplesfrom Alice’s watermarked data set. We have used differentvalues of � and then analyzed its effect on the decodingaccuracy and also compared our results with that of athreshold-based database watermarking technique (seeFig. 3). In Fig. 3, WRDOBT refers to a threshold-baseddatabase watermarking technique [2], that also useddifferent optimization schemes for watermark embedding.It is evident from Fig. 3 that the decoding accuracy of ourtechnique remains 100 percent even when more than90 percent of the tuples are deleted, whereas the decodingaccuracy of [2] is deceased when more than 80 percenttuples are deleted. Moreover, our technique also over-comes the synchronization errors issue of [10] because itdoes not use any marker tuple or position of tuples forpartitioning or watermark embedding.

The same robustness is observed once an attackerlaunches high frequency deletion attacks (see Fig. 4) such thatusability constraints are violated, that is, � and � arechanged after these attacks. Our investigation reveal thatthe defense against such attacks is made possible because:1) we have embedded each bit of the watermark in everywatermarked tuple; and 2) the decoding accuracy is indepen-dent of usability constraints as the value of � was setaccording to Theorem 1. So, usability constraints are“once-for-all” according to Lemma 2 and Theorem 2.

5.2.2 Insertion Attack

We tested our technique against two type of insertionattacks: fixed insertion and constraint reliant insertion. Inthe first attack, Mallory inserts new � tuples by replicatingvalues of existing � tuples. Our technique is resilient againstthis attack as shown in Fig. 5. The reason for this robustnessis that: 1) blind insertion simply does not affect thewatermarked tuples; and 2) marker-free data partitioningand watermark embedding also ensure that synchroniza-tion errors are prevented.

In the second attack, he generates the tuple values basedon the mean � and the standard deviation � of watermarkeddata set. He deviates tuples values by � percent from theoriginal values in the watermarked data set. The results inFig. 5 show that for all possible values of � the decodingaccuracy remains 100 percent. In Fig. 5, the lines showingdecoding accuracy for different values of � have beensuperimposed on each other.


TABLE 2Distortions Introduced in the Data

Fig. 3. Resilience to deletion attack.

7. CIMEG: Consortium for the Intelligent Management of the ElectricPower Grid. http://helios.ecn.purdue.edu/~cimeg.

We have also performed experiments for insertion attacks

with high severity which violate usability constraints with

an aim to destroy the embedded watermark. The results

reported in Fig. 6 show that the proposed scheme is able to

correctly detect the watermark with 100 percent accuracy

without taking into account the amount of usability

constraints violated while the decoding accuracy of

WRDOBT [2] decreases significantly when usability con-

straints are violated. Our technique is able to handle every

range of data usability constraints violations, but, in Fig. 6

the results for data usability constraints violated uptil

25 percent are reported for brevity. The reason for this

desirable behavior is ensured through Theorem 1 by

ensuring that the decoding accuracy of our decoding algorithm

does not depend on the usability constraints. Hence, the usability

constraints are “once-for-all” according to Lemma 2 and

Theorem 2.

5.2.3 Alteration Attack

Mallory alters the attribute values with an aim to flip the

watermark bits. He can again do it in two ways: fixed

alteration attack and constraint reliant alteration attack. In

the fixed alteration attack, he alters � selected tuples from

the total of � tuples. He may choose a fixed value � and

alters all � tuples with this amount. Fig. 7 shows the results

for fixed alteration attack on DW and it shows that the

proposed technique is robust against this attack as well.In constraint reliant attack, Mallory alters tuple values in

the range �% such that mean � and standard deviation �

of watermarked data set is preserved. Fig. 7 show that our

decoding algorithm achieves 100 percent watermark decod-

ing accuracy and hence shows strong resilience against

constraint resilience attack is proven.We have also performed experiments for high range

alteration attacks in which an attacker may perform large-

scale alteration attacks with a high severity, using larger

values of �, with an aim to destroy the embedded

watermark by violating some of the usability constraints.

Fig. 8 shows that the proposed scheme is able to correctly

detect the watermark with 100 percent accuracy without

taking into account the amount of alterations made by an

attacker. However, in Fig. 8, the results for violations only up

to 25 percent in the usability constraints are reported for

brevity. Again it is clear that our technique outperforms

WRDOBT [2] and we believe that the reason for this desirable

behavior stems in Theorem 1 that ensures that the decoding

accuracy of our decoding algorithm is independent of the usability

constraints; and as a consequence, the usability constraints are

“once-for-all” according to Lemma 2 and Theorem 2.


Fig. 6. Robustness of proposed technique against insertion attacks thatviolate usability constraints.

Fig. 7. Resilience to alteration attack. (Again note that the lines showingdecoding accuracy have superimposed each other.)Fig. 5. Resilience to insertion attack.

Fig. 4. Robustness of proposed technique against high-frequencydeletion attacks.

5.2.4 Multifaceted Attack

A sophisticated attacker can generate any permutation ofinsertion, deletion, or alteration attacks—choosing inbetween fixed and constraints reliant attacks—to launch amultifaceted attack. Our pilot studies reveal that combininginsertion with alteration attack is more lethal. It is obviousfrom Table 3 that our technique is resilient against multi-faceted attack as well. The reason behind this desirablebehavior is that the majority voting corrects the decodingerrors introduced due to attacks. This behavior, as expected,is in compliance with Theorem 1. However, if Malloryinserts more than 50 percent (of original tuples) new tuplesand deletes more than 50 percent of original tuples, thedecoding accuracy may decrease and we recommend theuse majority voting for that purpose to reduce the decodingerrors to some extent.

Let us consider an interesting scenario: if an attackeralters the Alice’s data to some signed or zero-valued datathen the decoding accuracy might be decreased. Considerthe case when an attacker inserts the signed data (bychanging the sign of an attribute value), the watermarkdecoding accuracy will definitely be degraded. The reason isthat, for a particular watermark bit, the expected decodervalue (�) should be positive but it turned out to be negative.In another case, its expected value of � should be negativebut it turned out to be positive. In such cases, if an attacker isable to alter sign of more than 50 percent (to evade the effectof majority voting) of the watermarked tuples, the water-mark decoding accuracy might suffer. However, Aliceknows that her watermarked attribute is unsigned so shemay easily take the absolute values of the signed tuples; as aresult, she can detect the watermark form the absolutevalues of the tuples. So strictly speaking, an attacker cannotchange the sign of the attribute values in some selectedtuples, if the same attribute does not contain any signedvalue prior to the attack. Next bet for an attacker is tocorrupt the embedded watermark by changing the values ofsome tuples to zero. But Alice can chose to watermark tupleswith nonzero values only; as a result, she will definitelyleave the rows with zero values during the process ofwatermark detection.

Our technique is best suited for data sets that containunsigned numeric attributes. In the real world, almost allkinds of numeric data sets contain attribute(s) withunsigned numeric data: for example, medical data sets,

weather data sets, sales data sets, game data sets and so on.However, if a data set contains signed attribute (or anattribute having valid zero values) then that attributeshould not be selected for watermarking using theproposed approach. If such an attribute is selected forwatermarking, the majority voting scheme helps to removethe decoding errors if an attacker is able to change the signbit of less than 50 percent of marked tuples.

Remark 3. We believe such accurate decoding was achievedbecause we set the value of � according to Theorem 1.Hence, another subpart of the Claim 1 about “makingthe watermark decoding accuracy independent ofusability constraints” and the second and third claims(minor claims) regarding robustness of the proposedscheme against various malicious attacks have beenproven empirically.

5.2.5 Collusion Attack

The collusion attacks are possible if the different sets of thesame data are watermarked with a different watermark.Remember that our technique provides a data owner anefficient way of defining “once-for-all usability constraints”such that a particular data can be used in every possibleapplication (Lemma 2 and Theorem 2); therefore, an ownerdoes not need to use different watermarks for different datarecipients who may or may not use the data for differentapplications. The proposed technique makes the watermarkrobustness independent of usability constraints; therefore,it is possible for a data owner to deliver the same setof watermarked data to multiple recipients withoutcompromising watermark robustness and data usability.Consequently, collusion attacks are implicitly handled bythe proposed technique.

5.2.6 Additive Attack: Scenarios and Solutions

In additive attack, Mallory may attempt to establish aplausible but spurious claim of ownership by trying tosplice (or insert) his watermark with that of Alice. Theconflict in ownership can be resolved by integrating atrusted third party which facilitates distribution of keyamong the involved parties. When Alice shares her data,she affixes the key issued by the trusted third party to thedata set. Using this secure append-only key, governed bythe trusted third party, can resolve the data ownership


TABLE 3Resilience of Proposed Technique to Multifaceted Attack(See Table 1 for Description of Symbols %; �; !, and DA)

Fig. 8. Robustness of proposed technique against high range alterationattacks that violate usability constraints.

dispute by verifying that Alice’s watermark is present inthe data set and also Alice’s key is appended beforeMallory key [3].

The other option might be that the owner of thedatabase can request a secret key—from the trustedparty—which is usually employed as a secret parameterduring the encoding and decoding phases. The key isobviously delivered on a particular date-time. Such timeconstraints can also help in resolving ownership conflicts:the owner can claim the insertion of watermark before anattacker did so by taking date-time, issued by the trustedparty, as a reference.

One of the postulation for thwarting ownershipargument is in which both the parties are able tosuccessfully extract their watermarks from each other’soriginal data sets. But this is not possible because Alice(the owner) can demonstrate the presence of watermarksin Mallory’s data set D0 since it belongs to her; whereasMallory cannot illustrate the existence of his marks inAlice’s original data set D.

Remark 4. Hence, the last claim (minor claim) for providingdifferent solutions to counter additive attacks has beenproven in Section 5.2.6. The major requirement of Aliceto have “100 percent robustness” against every kind ofmalicious attack is ensured by our decoding schemeirrespective of the severity of the any particular attack.Similarly, the major requirement of Bob to have “mini-mum distortions” in the watermarked data set has beenmet also; therefore, now Alice does not need to definedifferent usability constraints when she wants to shareher data for any possible application or use as proven inLemma 2 and Theorem 2; as a result, a subpart of theClaim 1 regarding “once-for-all” usability constraints hasbeen proven.

6 CONCLUSION

In this paper, we have proposed a technique that is highlyresilient against insertion, deletion, alteration, and multi-faceted attack yet it results in minimum distortions in theoriginal data set. Regardless of the severity of maliciousattack on the watermarked data, the watermark bits aresuccessfully decoded with 100 percent accuracy because thedecoding accuracy of the proposed approach is indepen-dent of the usability constraints. Moreover, our securitymechanism also helps to resolve ownership conflicts overwatermarked data set in case of additive attacks. All thesefeatures facilitate Alice to define “once-for-all” usabilityconstraints for her data set for its every possible applicationor use. Furthermore, our technique provides her “max-imum possible robustness” and delivers data to Bob with“minimum data distortions.” The results of our experi-ments on a real-world data set substantiate our claims.Recall that the proposed technique is restricted to numericunsigned data only. A logical extension of this research is tomake it scale to signed data and non-numeric relationaldata sets as well. We are also looking to find more elegantways to solve the problem of additive attacks. Our solutionsto these challenging problems will be the subject of theforthcoming publications.

ACKNOWLEDGMENTS

The first author would like to thank Higher EducationCommission of Pakistan for funding a PhD fellowshipunder its indigenous scheme with the grant number 063-111271-eg3-028. The first and third authors of this paper aresupported, in part, by the National ICT R&D Fund, Ministryof Information Technology, Government of Pakistanthrough a project “Remote Patient Monitoring System.”

REFERENCES

[1] R. Agrawal, P. Haas, and J. Kiernan, “Watermarking RelationalData: Framework, Algorithms and Analysis,” The VLDB J., vol. 12,no. 2, pp. 157-169, 2003.

[2] M. Shehab, E. Bertino, and A. Ghafoor, “Watermarking RelationalDatabases Using Optimization-Based Techniques,” IEEE Trans.Knowledge and Data Eng., vol. 20, no. 1, pp. 116-129, Jan. 2008.

[3] R. Agrawal and J. Kiernan, “Watermarking Relational Databases,”Proc. 28th Int’l Conf. Very Large Data Bases, pp. 155-166, 2002.

[4] H. Guo, Y. Li, A. Liu, and S. Jajodia, “A Fragile WatermarkingScheme for Detecting Malicious Modifications of DatabaseRelations,” Information Sciences, vol. 176, no. 10, pp. 1350-1378,2006.

[5] X. Zhou, M. Huang, and Z. Peng, “An Additive-Attack-ProofWatermarking Mechanism for Databases’ Copyrights ProtectionUsing Image,” Proc. ACM Symp. Applied Computing, pp. 254-258,2007.

[6] Y. Wang, Z. Zhu, F. Liang, and G. Jiang, “WatermarkingRelational Data Based on Adaptive Mechanism,” Proc. Int’l Conf.Information and Automation (ICIA ’08), pp. 131-134, 2008.

[7] G. Gupta and J. Pieprzyk, “Reversible and Blind DatabaseWatermarking Using Difference Expansion,” Proc. First Int’l Conf.Forensic Applications and Techniques in Telecomm., Information, andMultimedia and Workshop, pp. 24-29, 2008.

[8] H. Cui, X. Cui, and M. Meng, “A Public Key Cryptography BasedAlgorithm for Watermarking Relational Databases,” Proc. Int’lConf. Intelligent Information Hiding and Multimedia Signal Processing,pp. 1344-1347, 2008.

[9] G. Gupta and J. Pieprzyk, “Database Relation WatermarkingResilient against Secondary Watermarking Attacks,” Proc. Int’lConf. Information Systems Security, pp. 222-236, 2009.

[10] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection forRelational Data,” IEEE Trans. Knowledge and Data Eng., vol. 16,no. 6, pp. 1509-1525, Dec. 2004.

[11] A. Deshpande and J. Gadge, “New Watermarking Technique forRelational Databases,” Proc. Second Int’l Conf. Emerging Trends inEng. and Technology (ICETET), pp. 664-669, 2009.

[12] M. Farfoura and S. Horng, “A Novel Blind Reversible Method forWatermarking Relational Databases,” Proc. IEEE Int’l Symp. Paralleland Distributed Processing with Applications, pp. 563-569, 2010.

[13] S. Horng et al., “A Blind Reversible Method for WatermarkingRelational Databases Based on a Time-Stamping Protocol,” ExpertSystems with Applications, vol. 39, no. 3, pp. 3185-3196, 2011.

[14] Y. Li and R. Deng, “Publicly Verifiable Ownership Protection forRelational Databases,” Proc. ACM Symp. Information, Computer andComm. Security, pp. 78-89, 2006.

[15] S. Bhattacharya and A. Cortesi, “A Distortion Free WatermarkFramework for Relational Databases,” Proc. Fourth Int’l Conf.Software and Data Technologies (ICSOFT ’09), pp. 229-234, 2009.

[16] S. Shah, S. Xingming, H. Ali, and M. Abdul, “Query PreservingRelational Database Watermarking,” Informatica, An Int’lJ. Computing and Informatics, vol. 35, no. 3, pp. 391-396, 2011.

[17] R. Halder and A. Cortesi, “A Persistent Public Watermarking ofRelational Databases,” Proc. Int’l Conf. Information Systems Security,pp. 216-230, 2011.

[18] R. Halder, S. Pal, and A. Cortesi, “Watermarking Techniques forRelational Databases: Survey, Classification and Comparison,”J. Universal Computer Science, vol. 16, no. 21, pp. 3164-3190, 2010.

[19] D. Allan, N. Ashby, C. Hodge, and H.-P. Company, The Science ofTimekeeping. Hewlett-Packard, 1997.

[20] B. Schneier, Applied Cryptography. John Wiley, 1996.


M. Kamran received the MS and PhD degrees incomputer science, in 2008 and 2012, respec-tively, from the National University of Computerand Emerging Sciences (NUCES), Islamabad,Pakistan. Currently, he is working as an assis-tant professor at COMSATS Institute of Informa-tion Technology, Wah Cantt, Pakistan. Hisresearch interests include data security, healthinformatics, machine learning, nature inspiredcomputing, and decision support systems.

Sabah Suhail received the BS degree in soft-ware engineering from Fatima Jinnah WomenUniversity, Rawalpindi, Pakistan, in 2008 andthe MS degree in information security from theNational University of Science and Technology,Islamabad, Pakistan, in 2012. Currently, she isworking as a research student at nexGIN RC-NUCES. Her research interests include datasecurity and decision support systems.

Muddassar Farooq received the BE degree inavionics engineering from National University ofSciences and Technology (NUST), Pakistan, in1996, the MS degree in computer science andengineering from the University of New SouthWales (UNSW), Australia, in 1999, and the DScdegree in informatics from the Technical Uni-versity of Dortmund, Germany, in 2006. In 2007,he joined the NUCES, Islamabad, Pakistan, asan associate professor. Currently, he is working

as professor and dean, Department of Engineering, Institute of SpaceTechnology ( IST), Islamabad, Pakistan. He is also the director ofnexGIN RC. He is also a winner of the Presidential Award–awarded bygovernment of Pakistan–for contribution towards technology. He is theauthor of the book Bee-inspired Protocol Engineering: from Nature toNetworks published by Springer in 2009. He has also coauthored twobook chapters in different books on swarm intelligence. He is on theeditorial board of Springer’s Journal of Swarm Intelligence. He is alsothe workshop chair of the European Workshop on Nature-inspiredTechniques for Telecommunication and Networked Systems (EvoCOM-NET) held with EuroGP. He also serves on the program committee ofwell known EC conferences like GECCO, CEC, and ANTS. He is theguest editor of a special issue of the Journal of System Architecture(JSA) on nature-inspired algorithms and applications. His researchinterests include agent based routing protocols for fixed and mobile adhoc networks (MANETs), nature inspired applied systems, usablesecurity, natural computing and engineering and nature inspiredcomputing, and network security systems and information security. Heis a member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Documents

A Robust, Distortion Minimizing Technique for Watermarking Relational Databases Using Once-for-All Usability Constraints