Upload
jasmin-gray
View
224
Download
2
Tags:
Embed Size (px)
Citation preview
©2015, David C. Roberts, all rights reserved
1
TranslucentDatabasesTranslucentDatabases
CSCI 6442
Recommendation
Buy Translucent Databases, Second Edition, by Peter Wayner. It’ll be a valuable addition to your library
2
3
Translucent DatabasesThis is a new technique for controlling
access to database information. It is being used in some state-of-the-art
software products. It’s not widely known, is not found in any database textbooks.
We will use it. You will learn to use it and teach others to use it.
One-Way Functions• Translucency typically relies on the
validity of one-way functions• There are a number of functions
used as one-way • There are no proofs that these
functions are one-way
One-Way FunctionA one-way function h(x) is a function
such that h(x) can be computed easily but it is impossible, given y, to find x such that h(x) = y
May uses of translucency are based on one-way functions
5
One-Way Function• Wikipedia definition: a function
that is easy to compute but hard to invert, given the image of a random input.
• “Hard” may be hard enough for some commercial purposes, not hard enough for others
Which Are One-Way?• Modulo• Multiply by a prime• Hashing
One-Way Functions• Secure Hash Algorithm (SHA) from NIST-
designed as one-way function• From file of any length, produces a 160-
bit value• Arbitrary input size allows great
flexibility: can be used as message digest
• Generalizes earlier MD-5 work
8
Locus of Access Control• Where do you think access control
should be located?
DatabaseDatabas
e System
Application
In the Application• Access control may also be in the
application• Such control must be built-in to
program logic• It is hard to verify, hard to change, hard
to completely test• But it’s not so hard for a programmer to
insert (and hide) a back door
11
By The Database System?
Usually, database access is controlled by built-in access controls that are administered through GRANT and REVOKE commands.
Grant• Usual technique:
– GRANT CONNECT TO SMITH IDENTIFIED BY PASSWORD
– GRANT SELECT ON EMP TO SMITH– GRANT UPDATE ON EMP TO SMITH
• But what if new users arrive all the time, even over the Internet?– Not enough DBAs in the world– New users may arrive 24x7
12
Translucency• Translucent techniques allow privileges to be
controlled by the DBMS but not using the DBMS’s relatively static controls
• Translucent access control can be made external to the application
• Audit of translucent access controls is relatively straightforward
• Typically, translucent techniques are used to allow users to see and change their own information, in a controlled fashion
13
14
Motivation• Translucency exposes some parts of the
database to the public and protects other parts.
• With translucency, the whole database content is never exposed to a single individual, and access control administration is not required.
• The database is designed to let out some information, keep other information protected.
Security• Translucency provides protection that’s
generally fine for privacy purposes• It doesn’t replace more heavy-duty
forms of protection• For example, nuclear weapon launch
codes need stronger security than translucency!
16
Early TranslucencyUNIX password file: stored using
irreversible scrambling function. The password entered by a user seeking access is scrambled and compared to the stored scrambled password. The password entered by the user is never stored in its original form.
Passwords
17
johnsmith
password
User Enters
johnsmith
uejsgqkkd
Stored
Encrypted password
Advantages• Compromise of password list won’t
compromise control of access to the system
• Compromise of the encryption function won’t compromise access control if the encryption function is a one-way function
18
Attacking Translucency• How would you attack the UNIX
password system?
Attacks• Most attacks are dictionary
attacks, trying all possible combinations
• Counter such attacks in two ways:1. Limit number of attempts2. Provide large number of possible
combinations
21
Examples of Uses for Translucency
Personal scheduling: personal schedules have considerable malicious values.
Keep personal schedules for many users in a single table, but expose each user’s information to only that user
Users can come and go without administering accesses
Software as a Service (SaaS)—multi-user software offered over the Web
22
More ExamplesPreference information: clothing sizes
and preferences, food ordering history, travel history all have potential malicious value.
They can be entered by the customer and then accessed later only by that customer.
They can be used for analysis without exposing identities to the analysts.
23
General Principles• Translucency usually employs “stunt
data”; that is, data that stands in for real data and behaves similarly but does not have the original value.
• Stunt data is usually computed by a one-way transformation from the original data.
• In the password example, the encrypted password is stunt data
24
One-Way TablesTranslucency is added to tables by passing sensitive
values through a one-way function before storing them.
Diary(HashedUserID,HashedPW,ID,Content)
MD5(UserID) and MD5(PW) are stored as HashedUserID and HashedPW. User enters UserID and PW. Query to retrieve all comments is
Select MD5(UserID),MD5(PW),ID,ContentFrom DiaryWhere HashedUserId=MD5(UserID)
and HashedPW=MD5(PW);
25
Use of UserID and PW• For translucency, we generally use
UserID and PW• Users can choose a UserID at their
first login• We can require UserID to be
unique on its own, as long as we have PW
Question: why do we need a PW for UserID to be unique?
26
VulnerabilitySimple use of one-way function is vulnerable to
dictionary attack.SELECT * FROM PURCHASES WHERE
NAMEHASH=MD5(“Fred Smith”)
Can append password to the vulnerable value, hash both of them togetherINSERT INTO PURCHASES VALUES MD5(“Fred
Smith/swordfish”), ….
Now dictionary attack becomes geometrically more difficult
Simplification• What about small variations in how
user enters information? What if we want to not be sensitive to them?
• Can clean up one-way input; remove spaces, convert to upper case, remove punctuation, remove non-printing characters, even use Soundex.
What happens to security if we “clean up” the one-way input?
28
Security Trade-Offs• Today’s strong hash is tomorrow’s
broken protection• A desktop machine can compute
1,000,000 MD5 hashes per second• Difficulty of dictionary attacks can be
estimated numerically, providing an estimate of the strength of a transformation
• Normalizing input increases vulnerability to dictionary attack
29
Salting• Data can be “salted” by a salt column of
random numbers, appended to the value that is hashed, before it is transformed.
• Dictionary attack would now have to guess the salt string as well as UserID and PW.
• Salt can be unique per row or use the same salt for a whole table
Does salting improve security by a little or a lot?
30
UserID and PW• Is it better to concatenate UserID
and PW and then hash, or to put their hashed values in separate columns?
• What are the tradeoffs?
31
More About UserID, PWShould we keep the original form of
UserID and PW in the database?
What are the tradeoffs?
32
One-Way Transformations
1. Pure one-way functions2. Trapdoor functions3. Symmetric encryptionPure one-way functions cannot be reversed, so
their effect cannot be undone, and once obscured, encoded information cannot be recovered. On the other hand, trapdoor functions and symmetric encryption functions allow some users to be given additional access, or they can allow “just a peek” when needed.
33
Pure One-Way Functions
Let h(x) stand for a one-way function. For a pure one-way function, it is easy to compute y=h(x) but impossible, given y, to find x such that y=h(x).
In general, there are not proofs of the irreversibility of one-way functions.
The MD5 hash function is implemented in MySQL. It is widely used in industry under an assumption that it is one-way; however, there is no proof that it is truly one-way.
One common use of the MD5 function is for elimination of file duplicates. An MD5 hash is computed for each file, and if a new file is encountered with an identical MD5 hash, that file is compared with the original.
34
Trapdoor FunctionsTrapdoor functions: appear to be one-
way functions, but there is another value called the key that can be used to reverse h(x).
Such functions can also be used for public-key encryption, where one key is used to encrypt and the other can be used to decrypt.
35
Symmetric EncryptionSymmetric encryption: the same
key is used to encrypt and decrypt. Not truly one-way because the person who encrypts can also decrypt.
In a translucent database, the compromise of the key would open up all the protected contents.
36
MySQL Implementations
MD5(“data”) produces a 32-character long stringPASSWORD(“password”) Produces a 16-
character string. NOT the algorithm used by UNIX.
ENCODE(“data”,”password”) encodes data into a binary string. DECODE reverses the process
DES_ENCRYPT and DES_DECRYPT use the DES to encrypt and decrypt. Any user can encrypt, but only users with access to keys can decrypt.
37
Inserting Redundancy• Very secure translucent tables can hash
several columns together, such as name/address/ssn/birthday to encode HR information
• Such a hash is difficult to attack• However, retrieval won’t work if the user
misspells just one of the entries • Can design the table to match three of the
four by constructing four hashed columns, one with each of the four values omitted.
38
Three Out of FourEmp(HashedNameAddressSSN,
NameAddressBirthday, HashedNameSSNBirthday, HashedAddressSSNBirthday, … <protected information> … )
39
Protecting Repetitions• When a value is repeated, all the hashes of it will be
identical• This is true even if a password or other values are
appended• This behavior may be acceptable, or it might be a
weakness—you may want to protect repetitions• To protect repetitions, add a serial number to each
entry for a given value• For example, “001/Fred Smith”, “002/Fred Smith”,
“002/Fred Smith”, etc.• Decoding takes longer, since all Fred Smith values have
to be decoded one at a time until there are no more
40
Coordinating Users• Previous techniques show how to protect with
a single mechanismInformation is indexed with h(info)—if you have h(info)
then you get the whole rowh(x) acts like a password
• Can use two different values, require both to get a row
• Example: bulletin board of communications between two people. Can append a password
• Can also use public-private keys
Bulletin Board• CREATE TABLE bb (FROM CHAR(32), TO
CHAR(32), MESSAGE BLOB);• Put hashed from and to names into first
two columns• Receiver logs on with TO userid, uses it
to retrieve messages• Note that userid must be validated for
this to be secure
Prediction• Party wants to make a choice, seal into an envelope,
have it opened later and reliably read• Let party choose string b as that choice. Party also
chooses random strings r1 and r2• Database stores hash(r1 r2 b). r1 and hash(r1 r2 b) are
both published for those who want to review results• When outcome is known, r2 and b (Party’s expected
outcome) are released. If the published r1, along with r2 and b produce the released hash result, then b is the prediction and there was no cheating.
Bidding• Need to validate that the person
bidding is the same person as the last bid
• Can require a person to authenticate by presenting h(x)n+1 when h(x)n was presented previously
• Or for more security go in the other direction
Access Control Lists• Sometimes we need to establish an access
control list• Certain individuals are given access to certain
rows• Create a table with hashed column from the
row and hashed userid of user who has access• Query joins the access control table and the
table of data
45
Interesting Applications• Babysitting Exchange—controls access
to addresses of clients, schedules of sitters
• Blog site—only I can add and delete to my own blog
• Store—only I can see my record of sizes that I have purchased
• SaaS—single application used by multiple customers
Quantization• Another technique of translucency• Quantization is rounding off• Can reduce visibility into details by
rounding the data
Example: remove home address, keep zip code
Quantization • Has many variations• Similar to adding small error to
data• Can project out some dimensions
(ie exclude one or more columns)
Quantization Examples• Military location as lat and long, but not
minutes and seconds• $ amount as number of figures, but no
figures given—really logarithmic rounding
• $ as number of commas—also logarithmic rounding
• Rounding creates quanta
49
Thank YouThank You