Geo-Indistinguishability: Differential Privacy for
Location Based ServicesMiguel Andres, Nicolas Bordenabe,
Konstantinos Chatzikokolakis, Catuscia Palamidessi
Overview Formal Definition Mechanism for Geo-Indistinguishability Enhancing Location Based Services Case Study Strengths and Weaknesses Future Work
Outline
Suppose a tourist in Paris wishes to obtain information about restaurants near the Eiffel Tower
However, this presents many potential privacy issues
Real-World Example
Provide information based on a user’s location
Fine vs. Coarse Grained◦ Coarse Grained—weather, location-based
advertising, etc.◦ Fine Grained—Point of Interest (POI) services
involving exact location
Location Based Services (LBS)
Smartphones equipped with GPS use LBS’s Untrusted LBS’s could lead to user privacy
breach◦ Discover home location◦ Develop user profiles
No current way to use LBS’s without revealing to a server your location
Problem
LBS’s need user coordinates in order to provide their service
Trade-off: The user wants privacy, but also good results
The method of obtaining privacy must be computationally efficient enough to run on a smartphone
Problem—Continued
Adding controlled noise to user’s location Send the approximate location to LBS Achieves quasi-indistinguistability within a
certain area User is equally likely to be anywhere within
a radius r of the Eiffel Tower Generalization of the notion of differential
privacy
Solution: Geo-Indistinguishability (GI)
User specified:◦ Radius: r◦ Level of discrepancy between two points: l
Tradeoff:◦ As r gets larger, privacy level becomes greater
but results become more inaccurate Ratio of l to r is the level of privacy ε
Solution—Continued
Since none of these work well, came up with GI
Say a user is located at some point x Value that the user reports to LBS is a point
z What constitutes a truly private value z?
◦ Must report a value within Paris or else it won’t be useful
Formal Definition
When the radius of interest is small, one must have a large level of privacy in order to be well-protected
When the radius of interest is large, the level of privacy does not need to be as large in order to be well-protected
Therefore the level of privacy is proportional to the radius of the user’s choice
Formal Definition—Continued
Formal Definition—Continued
GI is independent of any side information from an attacker
Every point within one unit of distance from each other within the region specified by the level of privacy is equally likely to be returned
Level of privacy depends on the distance between the two points
Formal Definition—Continued
Similar to DP, GI is independent from side information of the attacker
Euclidean distance vs. Hamming distance◦ Euclidean distance—spatial, linear distance
between two points◦ Hamming distance—distance between sets of
data
Comparison to Differential Privacy
Output perturbation using Laplace distribution
Three step process:◦ Using Laplacian noise on a continuous space◦ Discretize it in order for it to be useful for real
world coordinates◦ Truncate points to reasonable points
Mechanism for GI
Perturbate the output by noise generated by the Laplace distribution
Results in a Probability Density Function (PDF)
Choose a random point within the PDF
Continuous Domain
Coordinates on a map are given as discrete points (latitude and longitude)
Map the random point chosen in the continuous domain to the nearest point in a discrete domain
Discrete Domain
Truncate Eliminate unrealistic points that may be
returned by the output perturbation function
The concept and mechanism of GI is most appropriately applied to LBS in smartphones
LBS use a simple client-server model to obtain information
User sends the current location x and server sends back POI info
Enhancing Location Based Services
An approximate location will be generated on the client and sent to the LBS
For mildly location-sensitive LBS’s, results are approximately the same even if the reported location is relatively far away
For highly location-sensitive LBS’s, results are undesirable unless within the specified radius
Mild vs. Highly Location-Sensitive LBS
For highly location-sensitive LBS’s, an area of retrieval larger than the intended area of retrieval must be specified
Data sent to the server is only the approximate location and area of retrieval
Results from LBS are filtered on the client to match the user’s original area of retrieval
Enhancing Location Based Services
Area of Interest
Area of Interest vs. Reported Position
Area of Interest vs. Area of Retrieval
Potential Locations within Area of Retrieval
Potential Locations within Area of Interest
The Census Bureau contains information in the form of (hBlock, wBlock)◦ hBlock—where the worker lives◦ wBlock—where the worker works
This data is publicly data in sanitized form Their goal is to sanitize information from the
U.S. Census Bureau and compare to the original sanitized data
Case Study—U.S. Census Bureau
The GI algorithm takes each point of the census data and randomizes it according to specified values of l and r
Home to work commute distance was used as a verification
As the value of l decreases for a given r, the sanitization results begin to differ more with actual results
Therefore, as the privacy level increases, the accuracy of the data decreases
Case Study Continued
Case Study Continued
Strengths and Weaknesses Formalized definition
of GI Allows users the
ability to choose privacy levels
Still provides useful data from LBS
Paper does not present a software solution
Current method of user privacy settings could be tedious
Encryption of user preferences
Case study was not a complete verification of the process
Adding software solution Appears that this has been attempted
through Location Guard Client add-on to POI services
Future Work
http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/
http://www.icdcit.ac.in/archive/2015/pdf/ppt/7-Catuscia-Palamidessi.pdf
http://www.stronati.org/presentations/slides-pets14.pdf
Sources