View
213
Download
0
Category
Tags:
Preview:
Citation preview
INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation
Information-Flow Control for Location-based
ServicesNishkam Ravi
Joint work with Marco Gruteser*, Liviu IftodeComputer Science,
*Winlab,Rutgers University
Motivation• Personal data commonly used in
internet-based computing– Social security number– Credit card information– Contact information
• User concerns– Where is my data going?– How is it being used?
• Identity theft incidents prevalent
• Database community working on countering illegitimate use of private information
Credit Card Number Social Security Number
Privacy
• Sharing sensitive information while preserving privacy is a challenging task
• Access Control is not sufficient– No control over data after it is read and
shared
• Need to restrain flow of information
?
Credit Card Number Social Security Number
Access Control
Privacy Solutions
• Prevention– Anonymization/Pseudonymization– Data supression/cloaking
• Avoidance– Information-flow control– End-to-end policies
• Cure– Tracking illegitimate flow of information– Punishing adversary
Prevention
Avoidance
Cure
Context-aware Computing
• Shift from “internet” to “ubiquitous” computing
• Ubiquitous computing heavily relies on user context– Location– Activity – Environment
• Context is dynamic in nature– Changes with time and space
Location-based Services
Location
Location
Jams, Accidents,
Gas-Station Location
Restaurants
• Location deemed most important context info• Immense interest in Location-based services (LBS)
LBS
911, Preferential Billing, Asset Tracking, Personnel Tracking
Location Privacy
• Potential for privacy abuse– They know where I am!
• More serious consequences– Location information could aid in criminal investigations
• Recognized by US government– “Location Privacy Protection Act, 2001”– “Wireless Privacy Protection Act, 2003”
Solutions for Location Privacy
• k-anonymization using spatial/temporal cloaking [Gruteser ’03)• Instead of disclosing location, disclose an interval
(x, y) ([ x1, x2], [y1, y2]) (x1 < x < x2, y1 < y < y2)
x1 x2
y2
y1
k = 3
How good is location cloaking?
• Cannot support applications which need precise location information
• Value of k not tailored for services
• Quality of service suffers– Inferior accuracy of results
• Can we have a framework + information-flow control model that preserves both location privacy and quality of service?
Framework for service-specific location privacy
Locatio
n
Location
Location
LocationTrusted Server
Location-Based Service
Results: f (x,y,d)
• Location of subjects maintained on a trusted server• When an LBS needs location information, it migrates a piece of code
to the trusted server• The code executes, reads location information and returns a result
– Distance– Density, Average Speed
Function fData d
Example Applications
• Application of density, average speed– Traffic information service
• Application of distance function– Geographical Routing Service
Main Problem
• The trusted server needs to ensure that the code is location safe– Should not leak location information
Information-flow Control• Information-flow control models restrict flow of
sensitive information in a program/system
• State of the art: Non-interference– Isolates public data from private data
int f(int a, int b){ int c = (a + b)/2;
output (c); }
Public
c
Private
a, b
Isolation Broken
Unix-style Password Checker
byte check(byte username, byte password){
byte match = 0;
for (i=0; i < database.length; i++){
if(hash(username, password) == hash(salts[i], passwords[i])){
match == 1;
break;
}
}
output(match);
}
Value of match depends on private variables
Violates Non-Interference
Non-inference
• In many real systems data isolation is not possible, including LBS
• We propose a new model of information-flow control
that – allows public data to be derived from private data– requires that the adversary does not infer private data from
public data from a single execution of the program
• Example:int f(int a, int b){
int c = (a + b)/2;
output (c);
}
Value of either a or b cannot be inferred from c
Non-inference satisfied
Theoretically…
• Non-inference is undecidable in general
• Decidable for independent executions/uni-directional information flow
Independent Executions
Example:
int f(int a, int b, int i){
int c;
if (i > 1)
c = (a + b)/2;
else
c = (a * b);
output (c);
}
If a, b are x-cordinates of two cars, their values would be different for the two executions
Private: a, b Public: i, c
a and b can be derived from (a + b)/2 and (a * b)
(a1 + b1)/2, (a2 * b2)
Protection Systems [Ullman 1976]
{ (S, O, P), R, Op, C}
{read}
{write}
{write}{read}{read}
o1 o2 o3
s1
s2
s3enter r into (s1,o1) delete r from (s1,o1)create subject s1 create object o1
command c(s1,s2,s3,o1,o2,o3) { if {read} in (s1, o2){
enter {read} in (s1,o2); enter {write} in (s1,o3);
} }
Q Q’
Safety: Can c on Q leak r? - Undecidable! - Decidable without create primitive
Proof Idea: Non-Inference == Safety
1 0 0 0 1 0
0 1 1
• Undecidability: Reduce safety to non-inference– Given a configuration Q find an equivalent program M
• Decidability: Reduce non-inference to safety without create – Given a program M find an equivalent configuration Q– No Create == Independent Executions
p1 p2 p3
o1
o2
o3
Deciding Non-inference: Overview
• Derive information-flow relations for a program– Static analysis– Abstract interpretation
• Rewrite information-flow relations as linear equations, and apply theory of solvability of linear equations– We assume all input and output variables are scalars– Type length of variables determined by minimum number of
bits required to store location information (1 byte for now)
Information-flow relations: R1
int f(int a, int b){ int c = (a + b)/2;
output (c); }
V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}
R1(v, e): “the value of variable v may be used in evaluating e”
R1(a, a+b/2) = 1, R1(b, a+b/2) = 1
Information-flow relations: R2
int f(int a, int b){ int c = (a + b)/2;
output (c); }
V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}
R2(e, v): “value of expression e may be used in evaluating variable v”
R2(a+b/2, c) = 1, R2(a+b/2, c) = 1
Information-flow relations: R3
int f(int a, int b){
int c = (a + b)/2;
output (c);
}
V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}
R3(v1, v1): “value of variable v1 may be used in evaluating variable v2”
R3 = R1R2 A, where A is the set of assignments
R3(a, c) = 1, R3(b, c) = 1
M = R3(P, O) =
1
1
Linear Equations
• A set of linear equations can be written as:
Ax = B
• Solvable if Rank(A) = Rank([A|B]) = N• Where A is an KxN matrix
• We can show that:
A program satisfies non-inference if
MTP = O and all its subsystems are not solvable
Linear Equations for the Example
MTP = O:
cb
a
11
Rank(MT) = 1 < ( |P| = 2)
Not solvable satisfies non-inference
Approach Overview
• Perform use-def analysis, def-use analysis
• Take transitive closures of def-use and use-def analysis to obtain R1 and R2
• R3 = R1R2 A
• Store R3(P,O) in matrix M
• Inspect solvability of MTP = O
Exampleint f(int x1, int y1, int x2, int y2, int k){
int x, y, dist, avg_x, avg_y;
int x = (x2 – x1)^2;
int y = (y2 – y1)^2;
dist = sqrt(x + y);
output(dist);
if (k > 100){
avg_x = (x1 + x2)/2;
avg_y = (y1 + y2)/2;
output(avg_x);
output(avg_y);
}
}
V = {x1, x2, y1, y2, k, x, y, dist, avg_x, avg_y} , P = {x1, x2, y1, y2}
E = {(x2 – x1)^2, (y2 – y1)^2, sqrt(x + y) , k > 100 , (x1 + x2)/2, (y1 + y2)/2}
Information-flow relations
000000
000000
000000
001000
001000
000111
011001
101010
011001
101010
1 EVR
0000000001
0000000010
0000000011
0000000100
0000001100
0000010100
2 VER
0000000001
0000000010
0000000100
0000001100
0000010100
0000100011
0001001100
0010010110
0100001101
1000010110
3 VVR
101
110
101
110
OPM
Linear Equations for the Example
)4|(|2)(
_
2
2
1
1
0101
1111
)4|(|2)(
_
_
2
2
1
1
0101
1010
)4|(|2)(
_
2
2
1
1
1010
1111
)4|(|3)(
_
_
2
2
1
1
0101
1010
1111
3
2
1
PMRank
yavg
dist
y
x
y
x
PMRank
yavg
xavg
y
x
y
x
PMRank
xavg
dist
y
x
y
x
PMRank
yavg
xavg
dist
y
x
y
x
T
T
T
T
)2(|1)(
_2
111
)2|(|1)(
_2
111
)4|(|1)(
][
2
2
1
1
1111
26
15
4
PMRank
yavgy
y
PMRank
xavgx
x
PMRank
dist
y
x
y
x
T
T
T
None of the subsystems is solvable Satisfies non-inference
Implementation and Evaluation• Implemented a static analyzer that decides non-
inference for Java programs– Doesn’t handle inter-procedural data analysis yet– Used Soot (API for Java bytecode analysis)– Used Indus (API for dataflow analysis)
• Evaluated by testing it on a benchmark– Distance (calculates distance between 2 cars)– Speed (calculates speed in a region)– Density (calculates density of cars in a region)– Attacks such as Wallet, WalletAttack, PasswordChecker,
AverageAttack, IfAttack
Case Study 1: AverageAttack
int average(int x1, int x2, ..int xn){average = (x1 + x2 ..+ xn)/n;
output(average);}
int average-attack(int x1, int x2, ..int xn){x1 = x3; x2 = x3; x4 = x3….; xn = x3;
average = (x1 + x2 ..+ xn)/n; output(average);}MTP = O:
This system is solvable, rejected by our analyzer
Average
xn
x
x
x
..
3
2
1
0..0010
Case Study 2: WalletCan there be false negatives?
int wallet(int p, int q, int c){
if (p > c){
p = p – c;
q = q + c;
}
output(q);
}
MTP = O: [1][p] = [q]
System is solvable, rejected by our analyzer:
False Negative! (Implicit information flows)
p is private : amount of money in the wallet
q is public : amount of money spent
c is public : cost of an item
Case Study 3: Wallet AttackHow bad are false negatives?
int wallet-attack(int p, int q, int c){ n = length(p); while(n >= 0){ c = 2^(n-1); Leaks value of p bit by bit if (p > c){
p = p – c; q = q + c; n = n – 1; } } output(q);}
MTP = O: [1][p] = [q]
System is solvable, rejected by our analyzer
Conclusions
• Non-inference : a novel information-flow control model
• Allows information to flow from private to public but not vice-versa
• Enforceable using static analysis for uni-directional information flow
• Applicable to location based services
INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation
Probabilistic Validation of Aggregated Data in VANETs
Nishkam RaviJoint work with Fabio Picconi, Marco Gruteser*, Liviu Iftode
Computer Science,*Winlab,
Rutgers University
Motivation• Traffic information systems based on V2V data
exchange (e.g TrafficView)
a
Location
Speed
Car Id
e
e a
a
a
b
b c
a
a,b
a,b,c
b
c
d
a
b
c
d
a
Spoof/bogus information
• How can data be validated?
Existing Solutions
• Cross-validation (Golle 2004)– Cross-validate data against a set of rules– Cross-validate data from different cars– Assumes: honest cars > malicious cars– Assumes multiple sources of information
• Use PKI and strong identities (Hubaux 2005)– A tamper-proof box signs data– Keys are changed periodically for privacy– Cross-validation used– High data overhead
• Desired solution: high security, low data overhead
LocationSpeed
TimestampSignatureCertificate
4 bytes
88 bytes
Syntactic Aggregation
Location 1Speed 1
TimestampSignatureCertificate
Location 2Speed 2
TimestampSignatureCertificate
Location nSpeed n
TimestampSignatureCertificate
. . . . . . .
Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2. . . . . . . . . . Location n’, Speed n’, id n
TimestampSignatureCertificate
Location 1’, id 1 Location 2’, id 2
. . . . . . Location n’, id n
TimestampSignatureCertificate
Malicious aggregator can Include bogus information
Semantic Aggregation
Location 1Speed 1
TimestampSignatureCertificate
Location 2Speed 2
TimestampSignatureCertificate
Location nSpeed n
TimestampSignatureCertificate
. . . . . . .
n cars in segment [(x1,y1), (x2,y2)].
TimestampSignatureCertificate
n cars (id1, id2 . .id n)in segment: [(x1,y1),(x2,y2)]
TimestampSignatureCertificate
or
Assumptions
• Tamper-proof service– Stores keys– Signs, timestamps, generates random numbers– Provides a transmit buffer
• Applications are untrusted and implement their own aggregation modules
• Principle of economy of mechanism– “the protection system’s design should be as simple and
small as possible”
Tamper-proof Service
• Trusted Computing– Every layer of the software stack is attested using binary hash– Only well-known software/applications allowed to execute
• BIND (Shi,Perrig,Leendert 2005)– Partial attestation– Data isolation– Provides flexibility
• Implement tamper-proof service in software– Attest using BIND
Our solution
Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2
. . . . . . . . . . .Location n’, Speed n’, id n
Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2
. . . . . . . . . .Location n’, Speed n’, id n
TimestampRandom Number r
Tamper-proofService
Transmit buffer
r mod n
Location 2Speed 2
TimestampSignatureCertificate
SignatureCertificate
Receiver validates the aggregated record
Multiple random numbers and proof records improve probability of success
Evaluation
• Metric: security/bandwidth
• Base Case 1– All records signed and certified– High security, high bandwidth usage
• Base Case 2– Semantic aggregation, no certificates – Minimal bandwidth usage, no security
• Our solution– Somewhere in between
Bandwidth usage
Bandwidth requirement of our solution compared with the two base cases
Bandwidth requirement of our solution = m*d + n*(d + 90) + 88
n = 1, d = 4 bytes n = 4, d = 4 bytes
Security
Security of our solution compared with the two base cases (f/m = 0.5)
Security of our solution: 1 – (1 – f/m)^n
Conclusions
• Used the idea of random checks to validate data
• PKI based authentication, tamper-proof service
• Evaluated our solution on a new metric: security/bandwidth
Demo: Indoor Localization Using Camera Phones
• User wears phone as a pendant• Camera clicks and sends images to a web-server via GPRS• Web-server compares query images with tagged images and
sends back location updates• No infrastructure required
– Neither custom hardware nor access points are required– Physical objects do not have to be “tagged”– Users do not have to carry any special device
• User orientation is also determined
Images tagged with location
Web
ServiceImages
Location Update
Recommended