Transcript
Page 1: Differential Privacy Xintao Wu Oct 31, 2012

Differential Privacy

Xintao WuOct 31, 2012

Page 2: Differential Privacy Xintao Wu Oct 31, 2012

Sanitization approaches

• Input perturbation– Add noise to data– Generalize data

• Summary statistics– Means, variances– Marginal totals– Model parameters

• Output perturbation– Add noise to summary statistics

Page 3: Differential Privacy Xintao Wu Oct 31, 2012

Blending/hiding into a crowd

• K-anonymity based approaches

• Adversary may have various background knowledge to breach privacy

• Privacy models often assume “the adversary’s background knowledge is given”

Page 4: Differential Privacy Xintao Wu Oct 31, 2012

Classic intuition for privacy

• Privacy means that anything can be learned about a respondent from the statistical database can be learned without access to the database.

• Security of encryption– Anything about the plaintext that can be learned

from a ciphertext can be learned without the ciphertext.

• Prior and posterior views about an individual should not change much

Page 5: Differential Privacy Xintao Wu Oct 31, 2012

Motivation

• Publicly release statistical information about a dataset without compromising the privacy of any individual

Page 6: Differential Privacy Xintao Wu Oct 31, 2012

Requirement

• Anything that can be learned about a respondent from a statistical database should be learnable without access to the database

• Reduce the knowledge gain of joining the database

• Require that the probability distribution on the public results is essentially the same independent of whether any individual opts in to, or opts out of the dataset

Page 7: Differential Privacy Xintao Wu Oct 31, 2012

Definition

Page 8: Differential Privacy Xintao Wu Oct 31, 2012

Sensitivity function

• Captures how great a difference must be hidden by the additive noise

Page 10: Differential Privacy Xintao Wu Oct 31, 2012

Guassian noise

Page 11: Differential Privacy Xintao Wu Oct 31, 2012

Adding LAP noise

Page 12: Differential Privacy Xintao Wu Oct 31, 2012

Proof sketch

Page 13: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1, epsilon varies

Page 14: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=0.01

Page 15: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=0.1

Page 16: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=1

Page 17: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=2

Page 18: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=1 epsilon=10

Page 19: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=2, epsilon varies

Page 20: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=3, epsilon varies

Page 21: Differential Privacy Xintao Wu Oct 31, 2012

Delta_f=10000, epsilon varies

Page 22: Differential Privacy Xintao Wu Oct 31, 2012

Composition

• Sequential composition

• Parallel composition --for disjoint sets, the ultimate privacy

guarantee depends only on the worst of the guarantees of each analysis, not the sum.

Page 23: Differential Privacy Xintao Wu Oct 31, 2012

Example

• Let us assume a table with 1000 customers and each record has attributes: name, gender, city, cancer, salary. – For attribute city, we assume the domain size is 10;

– for attribute cancer, we only record Yes or No for each customer;

– for attribute salary, the domain range is 0-10k.

– The privacy threshold \epsilon is a constant 0.1 set by data owner.

• For one single query “How many customers got cancer?”

 • The adversary is allowed to ask three times of the query

shown the above.

 

Page 24: Differential Privacy Xintao Wu Oct 31, 2012

Example (continued)

• “How many customers got cancer in each city?”

• For one single query “What is the sum of salaries across all customers?”

Page 25: Differential Privacy Xintao Wu Oct 31, 2012

Type of computing (query)

• some are very sensitive, others are not

• single query vs. query sequence

• query on disjoint sets or not

• outcome expected: number vs. arbitrary

• interactive vs. not interactive

Page 26: Differential Privacy Xintao Wu Oct 31, 2012

Sensitivity

• Global sensitivity

• Local sensitivity

• Smooth sensitivity

Page 27: Differential Privacy Xintao Wu Oct 31, 2012

Different areas of DP

• PINQ

• DM with DP

• Optimizing linear counting queries under differential privacy.

-Matrix mechanism for answering a workload of predicate counting queries

Page 28: Differential Privacy Xintao Wu Oct 31, 2012

PPDM interface--PINQ

• A programmable privacy preserving layer

• Add calibrated noise to each query

• Need to assign privacy cost budget

Page 29: Differential Privacy Xintao Wu Oct 31, 2012

Data Mining with DP

• Previous study—privacy preserving interface ensures everything about DP

• Problems—inferior results if the interface is utilized simply during data mining

• Solution—consider both together• DP ID3

—noisy count

—evaluate all attributes in one exponential mechanism query using entire budget instead of splitting budget among multiple

Page 30: Differential Privacy Xintao Wu Oct 31, 2012

DP in Social Networks

• Page 97-120 of pakdd11 tutorial


Recommended