29
A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015 <3

A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Embed Size (px)

Citation preview

Page 1: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

A Whirlwind Tour of Differential Privacy

Aaron Roth

February 14, 2015 <3

Page 2: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Protecting Privacy is Important

Class action lawsuit accuses AOL of violating the Electronic Communications Privacy Act, seeks $5,000 in damages per user. AOL’s director of research is fired.

Page 3: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Protecting Privacy is Important

Class action lawsuit (Doe v. Netflix) accuses Netflix of violating the Video Privacy Protection Act, seeks $2,000 in compensation for each of Netflix’s 2,000,000 subscribers. Settled for undisclosed sum, 2nd Netflix Challenge is cancelled.

Page 4: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Protecting Privacy is Important

The National Human Genome Research Institute (NHGRI) immediately restricted pooled genomic data that had previously been publically available.

Page 5: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

But what is “privacy”?

Page 6: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

But what is “privacy” not?

• Privacy is not hiding “personally identifiable information” (name, zip code, age, etc…)

Page 7: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

But what is “privacy” not?

• Privacy is not releasing only “aggregate” statistics.

Page 8: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So what is privacy?

• Idea: Privacy is about promising people freedom from harm. – Attempt 1: “An analysis of a dataset D is private if

the data analyst knows no more about Alice after the analysis than he knew about Alice before the analysis.”

Page 9: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So what is privacy?

• Problem: Impossible to achieve with auxiliary information.– Suppose an insurance company knows that Alice is a

smoker.– An analysis that reveals that smoking and lung cancer

are correlated might cause them to raise her rates!• Was her privacy violated?• This is exactly the sort of information we want to be

able to learn…– This is a problem even if Alice was not in the database!

Page 10: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So what is privacy?

• Idea: Privacy is about promising people freedom from harm. – Attempt 2: “An analysis of a dataset D is private if

the data analyst knows almost no more about Alice after the analysis than he would have known had he conducted the same analysis on an identical database with Alice’s data removed.”

Page 11: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

D

Differential Privacy[Dwork-McSherry-Nissim-Smith 06]

Algorithm

Pr [r]

ratio bounded

AliceAlice BobBob ChrisChris DonnaDonna ErnieErnieXavierXavier

Page 12: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Differential Privacy

: The data universe. : The dataset (one element per person)

Definition: Two datasets are neighbors if they differ in the data of a single individual

Page 13: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Differential Privacy

: The data universe. : The dataset (one element per person)

Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :

Page 14: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some Useful Properties

Theorem (Postprocessing): If is -private, and is any (randomized) function, then is -private.

Page 15: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So…

𝑥=¿

Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :

Page 16: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So…

𝑥=¿

Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :

Page 17: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So…

𝑥=¿

Definition: An algorithm is -differentially private if for all pairs of neighboring datasets , and for all outputs :

Page 18: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some Useful Properties

Theorem (Composition): If are -private, then:

is -private.

Page 19: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So…

You can go about designing algorithms as you normally would. Just access the data using

differentially private “subroutines”, and keep track of your “privacy budget” as a resource.

Private algorithm design, like regular algorithm design, can be modular.

Page 20: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Numeric Queries

Def: A numeric function has sensitivity if for all neighboring :

Write • e.g. “How many PhDs are in the building?” has

sensitivity 1. • “What fraction of people in the building have

PhDs?” has sensitivity .

Page 21: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Numeric Queries

The Laplace Distribution:

Page 22: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Numeric Queries

The Laplace Mechanism:

Theorem: is -private.Pf:

4 2 2 4

0 .1

0 .2

0 .3

0 .4

0 .5

4 2 2 4

0 .1

0 .2

0 .3

0 .4

0 .5

Page 23: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Numeric Queries

The Laplace Mechanism:

Theorem: The expected error is (can answer “what fraction of people in the

building have PhDs?” with error

Page 24: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Non-numeric Queries

“What is the modal eye color in the building?”

• If you can define a function that determines how “good” each outcome is for a fixed input,– E.g.

“fraction of people in D with red eyes”

Page 25: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Some simple operations:Answering Non-numeric Queries

Theorem: is -private, and outputs such that:

(Can return an eye color that is within < 0.03% of the most common in this building.)

Page 26: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

So what can we do with that?

• Machine Learning/Regression• Statistical Estimation• Graph Analysis (Next talk!)• Combinatorial Optimization• Spectral Analysis of Matrices• Anomaly Detection/Analysis of Data Streams• Convex Optimization• Equilibrium computation• Computation of optimal 1-sided and 2-sided matchings• Pareto Optimal Exchanges• …

Page 27: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

A Synthetic Netflix Dataset

• 18,000 movies ( possible data records)• 3-way marginal: ``What fraction of people watched movie A, B, and C’’• Error and running time on 1,000,000 randomly selected 3-way marginals

𝜖 𝜖

Error

Seconds

Page 28: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Other Uses of “Privacy” as Stability

• Economic– A tool for developing new incentive-compatible

mechanisms• For combinatorial auctions, stable matchings, and more

• Analytic– To prevent over-fitting and improve the accuracy

of data analysis and scientific discovery (Third talk this session!)

Page 29: A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015

Thanks!

To learn more: