18
Unlinkability: A Data Science Perspec5ve Ed Felten Department of Computer Science Woodrow Wilson School of Public and Intl. Affairs Center for Informa5on Technology Policy Princeton University 1

Unlinkability: A+DataScience+Perspec5ve+ · given+side+info++answers+

Embed Size (px)

Citation preview

Unlinkability:  A  Data  Science  Perspec5ve  

Ed  Felten    

Department  of  Computer  Science  Woodrow  Wilson  School  of  Public  and  Intl.  Affairs  

Center  for  Informa5on  Technology  Policy  Princeton  University  

1  

Why  we’re  discussing  this  

 Want  defini5on  of  “unlinkable”  data    Ra5onale:  okay  to  override  user’s  DNT  choice,  if  user  has  no  real  privacy  interest  at  stake    Complements  other  excep5ons  

2  

3  

Available  design  space  

privacy  

data  u5lity  

4  

Available  design  space  

privacy  

data  u5lity  

What  does  it  mean  for  a  data  opera5on  to  be  “privacy  preserving?”  

5  

What  does  it  mean  for  a  data  opera5on  to  be  “privacy  preserving?”  

6  

40+  years  of  research  on  this  ques5on  

intui5on  is  an  unreliable  guide  

7  

Intui5on  says:                If  you’re  not  in  the  dataset,  it  can’t  convey  info  about  you.  

8  

Intui5on  says:                If  you’re  not  in  the  dataset,  it  can’t  convey  info  about  you.  

Actually:              Dataset  can  convey  info  about  people  who  aren’t  in  it.            That  can  be  a  good  thing.  

9  

Intui5on  says:                If  you’re  not  in  the  dataset,  it  can’t  convey  info  about  you.  

Actually:              Dataset  can  convey  info  about  people  who  aren’t  in  it.            That  can  be  a  good  thing.  

Example:      You  are  a  smoker.              Dataset  shows  that  smoking  increases  cancer  risk.  

10  

Intui5on  says:  aggregate  data  is  always  safe  

Answer  ques5ons  about  yourself  Get  recommenda5ons  Aggregate  data  over  >1M  users  

Researchers:  can  recover  nearly  all  individual  data  

Similar  demonstra5ons:  

But:  

What  does  it  mean  for  a  data  opera5on  to  be  “privacy  preserving?”  

11  

12  

 raw  data  

analyst  query  

   answer  

query  

   answer  

.  

.  

.  side  informa5on  

Minimal  requirements  for  a  defini5on:  

•  Feasible  •  Technically  ac5onable  •  Does  not  ban  all  data  release  •  Implies  some  limit  on  data  inference      Modest  goals,  but  very  hard  to  achieve!  

13  

K-­‐Anonymity  

K-­‐anonymity  (aka  “large  bucket  size”)    fails  to  meet  requirements  

 Why?    does  not  limit  data  inference    assumes  only  one  query    assumes  no  side  info    does  not  speak  to  all  cases  (e.g.  aggregate  data)  

14  

Dalenius’s  Goal    

What  analyst  learns  about  you,    given  side  info  +  answers  

≈  What  analyst  learns  about  you,  

given  side  info  only  

 

15  

Dalenius’s  Rule    

What  analyst  learns  about  you,    given  side  info  +  answers  

≈  What  analyst  learns  about  you,  

given  side  info  only      

Fails  –  not  feasible  16  

Differen5al  Privacy  

 Answer  based  on  the  full  dataset  

≈  

Answer  if  your  data  wasn’t  there  

 17  

Differen5al  Privacy:  Advantges  

•  only  defini5on  known  to  meet  requirements  •  intui5on  for  user:  anything  bad  that  happens  would  have  happened  anyway  even  without  your  data  

•  adjustable  “leakage  level”  –  a  knob  to  turn,  to  trade  off  privacy  vs.  u5lity  

•  mul5ple  queries:  leakage  combines  addi5vely  •  not  affected  by  side  informa5on  (enhancement)  •  known  methods  exist  to  achieve  DP  for  (e.g.)  aggregate  coun5ng  queries  

18