PRIVACY USC CSCI430 Dr. Genevieve Bartlett USC/ISI
Slide 2
Critter Lab Project
http://steel.isi.edu/Projects/critter/install/index.htm l
http://steel.isi.edu/Projects/critter/install/index.htm l Windows 7
and 10.10 installer OK, Linux try from source Mac results may be
unpredictable Set proxy setting for your browser Surf the web and
tune out ;-)
Slide 3
Privacy The state or condition of being free from
observation.
Slide 4
Privacy The state or condition of being free from observation.
Not really possible todayat least not on the internet.
Slide 5
Privacy The right of people to choose freely under what
circumstances and to what extent they will reveal themselves, their
attitude, and their behavior to others.
Slide 6
Privacy is not black and white Lots of grey areas and points
for discussion What seems private to you may not seem private to me
Three examples to start us off: HTTP Cookies Google Street View
Facebook
Slide 7
HTTP cookies: What are they? Cookies = small text file Received
from a server, stored on your machine Usually web Purpose: HTTP is
stateless, so cookies maintain state for the HTTP protocol Eg
keeping the contents of your shopping cart while you browse a
site
Slide 8
HTTP cookies: 3 rd party cookies You visited your favorite site
unicornsareawesome.com unicornsareawesome.com pulls ads from
lameads.com You get a cookie from lameads.com, even though you
never visited lameads.com lameads.com can track your browsing
habits every time you visit any page with ads from lameads.com
those might be a lot of pages
Slide 9
HTTP cookies: Grey Area? 3 rd party cookies allow ad servers to
personalize your ads = more useful to you. Good! But You choose to
go to unicornsareawesome.com = ok with unicornsareawesome.com
knowing about how you use their site Nowhere did you choose to let
lameads.com monitor your browsing habits
Slide 10
Short Discussion: Collusion: tool to track these 3 rd party
cookies TED talk on Tracking the Trackers
http://www.ted.com/talks/gary_kovacs_tracking_the_t rackers.html
http://www.ted.com/talks/gary_kovacs_tracking_the_t
rackers.html
Slide 11
Google Street View: What is it? Google cars drive around and
take 360 panoramic pictures. Images are stitched together and can
be browsed through on the Internet
Slide 12
Google Street View: Me
Slide 13
Google Street View: Lots to See
Slide 14
Google Street View: Grey Area Expectation of privacy? Im in
public, I can expect people will see me Expectations? Picture
linked to location Searchable Widely available Available for a long
time to come
Slide 15
Facebook: What is it? Social networking site Connect with
friends Share pictures, interests (likes)
Slide 16
Facebook: Grey Area Who uses Facebook data and how is data
used? 4.7 million liked a page about health conditions or
treatments. Insurance agents? 4.8 million shared information about
dates of vacations. Burglars? 2.6 million discussed recreational
use of alcohol. Employers?
Slide 17
Facebook: More Grey Security issues with Facebook Confusion
over privacy settings Sudden changes in default privacy settings
Facebook tracks browsing habits, even if a user isnt logged in
(third-party cookies) Facebook sells user information to ad
agencies and behavioral trackers
Slide 18
Slide 19
Why start with these examples? 3 examples: HTTP cookies, Google
Street View, Facebook Lots more every day examples Users gain
benefits by sharing data Tons of data generated, widely shared and
accessible and stored (for how long?) Are users really aware of how
and who?
Slide 20
Todays Agenda Privacy and Privacy & Security How do we
safely share private data? Privacy and Inferred Information Privacy
and Social Networks How do we design a system with privacy in
mind?
Slide 21
Privacy and Privacy & Security How do we safely share
private data? Privacy and Inferred Information Privacy and Social
Networks How do we design a system with privacy in mind?
Slide 22
Examples private information Tons of information can be gained
from Internet use: Behavior Eg. Person X reads reddit.com at work.
Preferences Eg. Person Y likes high heel shoes and uses Apple
products. Associations Eg. Person X and Person Y are friends. PPI
(private, personal/protected information) credit card #s, SSN, nick
names, addresses PII (personally identifying information) Eg. Your
age + your address = I know who you are, even if Im not given your
name.
Slide 23
How do we achieve privacy? policy + security mechanisms + law +
ethics + trust Anonymity & Anonymization mechanisms Make each
user indistinguishable from the next Remove PPI & PII Aggregate
information
Slide 24
Who wants private info? Governments surveillance Businesses
targeted advertising, following trends Attackers monetize
information or cause havoc Researchers medical, behavioral, social,
computer
Slide 25
Who has private info? You and me End-users Customers Patients
Businesses Protect mergers, product plans, investigations
Government & law enforcement National security Criminal
investigations
Slide 26
Privacy and Security Security enables privacy Data is only as
safe as the system its on Sometimes security at odds with privacy
Eg. Security requires authentication, but privacy is achieved
through anonymity Eg. TSA pat down at the airport
Slide 27
Privacy and Privacy & Security How do we safely share
private data? Privacy and Inferred Information Privacy and Social
Networks How do we design a system with privacy in mind?
Slide 28
Why do we want to share? Share existing data sets: Research
Companies Buy data from each other Check out each others assets
before merges/buyouts Start a new dataset: Mutually beneficial
relationships Share data with me and you can use this service
Slide 29
Sharing everything? Easy, but what are the ramifications?
Legal/policy may limit what can be shared/collected IRBs:
Institutional Review Board HITECH & HIPAA: Health Insurance
Portability and Accountability Act Future use and protection of
data?
Slide 30
Mechanisms for limited sharing Remove really sensitive stuff
(sanitization) PPI & PII (private, personal & private
identifying) Without a crystal ball, this is hard Anonymization
Replace information to limit ability to tie entities to meaningful
identities Aggregation Remove PII by only collecting/releasing
statistics
Slide 31
Anonymization Example Network trace: PAYLOAD
Slide 32
Anonymization Example Network trace: PAYLOAD All sorts of PII
and PPI in there!
Slide 33
Anonymization Example Network trace: PAYLOAD Routing
information: IP addresses, TCP flags/options, OS
fingerprinting
Slide 34
Anonymization Example Network trace: PAYLOAD Remove IPs?
Anonymize IPs?
Slide 35
Anonymization Example Network trace: PAYLOAD Removing IPs
severely limits what you can do with the data. Replace with
something identifying, but not the same data. IP1 = A IP2 = B
Etc.
Slide 36
Aggregation Examples Fewer U.S. Households Have Debt, But Those
Who Do Have More, Census Bureau Reports 3 people in class got As on
the final
Slide 37
Methods can be bad or good Just because someone uses
aggregation or anonymization, doesnt mean the data is safe 87% of
the population of the United States can be uniquely identified by
gender, date of birth, and 5- digit zip code Even if a dataset
sanitizes names, if it includes zip, gender & birthdate the
data its not preserving privacy.
Slide 38
Formalizing anonymization for better privacy K-anonymity "A
release provides k-anonymity protection if the information for each
person contained in the release cannot be distinguished from at
least k-1 individuals whose information also appears in the
release. L-diversity Each group contains at least L different
values of sensitive information. Guards against Homogeneity attack
And others (which we wont cover) T-closeness, m-invariance,
delta-presence...
Slide 39
Example: Birth YearZip CodeGenderTest Result 1961 1960 1965
1963 1968 1964 1962 1966 1972 1971 1970 1984 1985 00198 00197 00196
00296 00298 00275 00279 00275 00277 00278 00356 00357 Male Female
Male Female Male Female A B+ A C A A- B C B B+ B A D
Slide 40
Example: Birth YearZip CodeGenderTest Result 1961 1960 1965
1963 1968 1964 1962 1966 1972 1971 1970 1984 1985 00198 00197 00196
00296 00298 00275 00279 00275 00277 00278 00356 00357 Male Female
Male Female Male Female A B+ A C A A- B C B B+ B A D
SENSITIVE!
Slide 41
Example: Anonymized data Birth YearZip CodeGenderTest Result
196* 197* 198* 0019* 0029* 0027* 0035* Male Female Male Female Male
Female A B+ A C A A- B C B B+ B A D
Slide 42
Example: k-anonymity Birth YearZip CodeGenderTest Result 196*
197* 198* 0019* 0029* 0027* 0035* Male Female Male Female Male
Female A B+ A C A A- B C B B+ B A D
Slide 43
Example: k = ? Birth YearZip CodeGenderTest Result 196* 197*
198* 0019* 0029* 0027* 0035* Male Female Male Female Male Female A
B+ A C A A- B C B B+ B A D
Slide 44
Example: k=1 Birth YearZip CodeGenderTest Result 196* 197* 198*
0019* 0029* 0027* 0035* Male Female Male Female Male Female A B+ A
C A A- B C B B+ B A D
Slide 45
Example: getting to k=2 Birth YearZip CodeGenderTest Result
196* 197* 198* 0019* 0029* 0027* 0035* Male Female Male Female Male
Female A B+ A C A A- B C B B+ B A D This k=1 group can be merged
with another group.
Slide 46
Example: k=2 anonymized data Birth YearZip CodeGenderTest
Result 196* 197* 198* 0019* 002** 0027* 002** 0027* 0035* Male
Female Male Female Male Female A B+ A C A A- B C B B+ B A D
Anonymize so these groups can be merged (by removing an extra digit
in the zip).
Slide 47
Example: k=2 anonymized data Birth YearZip CodeGenderTest
Result 196* 197* 198* 0019* 002** 0027* 0035* Male Female Male
Female A B+ A C A C A- B B+ B A D After merging, the smallest group
is k=2. This now meets k=2 anonymization.
Slide 48
Example: l-diversity Birth YearZip CodeGenderTest Result 196*
197* 198* 0019* 002** 0027* 0035* Male Female Male Female A B+ A C
A C A- B B+ B A D K=2, l = ?
Slide 49
Example: l = 2 Birth YearZip CodeGenderTest Result 196* 197*
198* 0019* 002** 0027* 0035* Male Female Male Female A B+ A C A C
A- B B+ B A D
Slide 50
Example: l = ? Birth YearZip CodeGenderTest Result 196* 197*
198* 0019* 002** 0027* 0035* Male Female Male Female A B+ A C A C
A- B B+ B A D
Slide 51
Example: l = ? Birth YearZip CodeGenderTest Result 196* 197*
198* 0019* 002** 0027* 0035* Male Female Male Female A B+ A C A C
A- B A D
Slide 52
Example: l = 1 Birth YearZip CodeGenderTest Result 196* 197*
198* 0019* 002** 0027* 0035* Male Female Male Female A B+ A C A C
A- B A D How is l=1 affect privacy? Was l=2 better?
Slide 53
l-diversity Not always possible Eg. gender: l cant ever be more
than 2 Can be difficult to achieve Data is not always that
diverse
Slide 54
Differential privacy The presence/absence of a record in
database doesnt affect the result of data release, or the effect is
negligable. Whether your information is in the database or not, the
data analysis result will not be affected. How to achieve this?
Adding noise. Data release is no longer deterministic, but in a
error range. Level of protection and accuracy controlled by the
level of noise added.
Slide 55
Privacy and Privacy & Security How do we safely share
private data? Privacy and Inferred Information Privacy and Social
Networks How do we design a system with privacy in mind?
Slide 56
What is Inferred? Take 2 sources of information, correlate data
X + Y = . Example: Google Street View + what my car looks like +
where I live = you know where I was back in November
Slide 57
Example: Netflix & IMDB Netflix prize: released an
anonymized dataset Correlated with IMDB: undid anonymization
(University of Texas)
Slide 58
K-anonymity, l-diversity & inferred information High k and
l values dont guard against inferring
Slide 59
Privacy and Privacy & Security How do we safely share
private data? Privacy and Inferred Information Privacy and Social
Networks How do we design a system with privacy in mind?
Slide 60
What is social networking data? Associations Not what you say,
but who you talk to OMG NEW BOYFRIEND
Slide 61
Why is social data interesting? From a privacy point of view:
Guilt by association Eg. Government very interested Phone records
(US) Facebook activity (Iran)
Slide 62
Computer Communication Computer communication = social network
What sites/servers you visit/use = information on your relationship
with those sites/servers Never mind the contentHow often you visit
and who you visit may reveal a lot! You Unicornsareawesome.com
Slide 63
How do we provide privacy? Of course encrypt content (payload)!
But: Network/transport layer = no encryption (for now) Anyone along
the path can see source and destination so now what?
Slide 64
Onion Routing General idea: bounce connection through a bunch
of machines
Slide 65
Dont we bounce around already? Not actually what happens
Slide 66
Dont we bounce around already? Closer to what actually
happens.
Slide 67
Dont we bounce around already? Yes, we route packets through a
series of routers BUT this doesnt protect the privacy of whos
talking to whom Why? PAYLOAD
Slide 68
Dont we bounce around already? Yes, we route packets through a
series of routers BUT this doesnt protect the privacy of whos
talking to who Why? Contains routing information. ENCRYPTED
Slide 69
Yes, we bounce but: Everyone along the way can see src &
dst Routes are easy to figure out Contains routing information =
Cant encrypt Everyone along the path (routers and observers) can
see who is talking to whom ENCRYPTED
Slide 70
Onion routing saves us Each router only knows about the
last/next hop Routes are hard to figure out Change frequently
Chosen by the source
Slide 71
The Onion part of Onion Routing Layers of encryption PAYLOAD
Last hops key Second hops key First hops key
Slide 72
Onion Routing Example: Tor You Unicornsareawesome.com
Slide 73
Onion Routing Example: Tor You Tor directory Get a list of Tor
Routers from the publically known Tor directory Tor Router IPs +
public key for each router
Slide 74
Onion Routing Example: Tor You Unicornsareawesome.com Tor
Routers
Slide 75
Onion Routing Example: Tor You Unicornsareawesome.com Choose a
set of Tor routers to use 1st 2nd 3rd
Slide 76
Onion Routing Example: Tor You Unicornsareawesome.com Packets
are now encrypted with 3 keys 1st 2nd 3rd
Slide 77
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Source: YOU, Dest: 1 st Tor router
Slide 78
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Decrypts 1 st layer
Slide 79
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Source: 1 st Tor router, Dest: 2 nd Tor router
Slide 80
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Decrypts 2 nd layer
Slide 81
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Source: 2nd Tor router, Dest: 3rd Tor router
Slide 82
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Decrypts last layer
Slide 83
Onion Routing Example: Tor You Unicornsareawesome.com 1st 2nd
3rd Original (unencrypted) packet sent to server. Source: 3rd Tor
router, Dest: Unicornsareawesome.com
Slide 84
What does our attacker see? Encrypted traffic from You, to 1 st
Tor router You
Slide 85
What does our attacker see? Other view points? Not easily
traceable to you. You
Slide 86
What does our attacker see? Global view points? Very
unlikely... But if so trouble!
Slide 87
What does our attacker see? Also unlikely can perform
correlation between end-to-end.
Slide 88
Reliance on multiple users What would happen here if You were
the only one using Tor? You
Slide 89
Side note: Tor is an overlay Tor routers are often just
someones regular machine. Traffic is still routed over regular
routers too.
Slide 90
Onion Routing: Things to Note Not perfect, but pretty nifty End
host (unicornsareawesome.com) does not need to know about the Tor
protocol (good for wide usage and acceptance) Data is encrypted all
the way to the last Tor router If end-to-end application (like
HTTPS) is using encryption, the payload is doubly encrypted along
the Tor route.
Slide 91
Privacy and Privacy & Security How do we safely share
private data? Privacy and Inferred Information Privacy and Social
Networks How do we design a system with privacy in mind?
Slide 92
Designing privacy preserving systems Aim for the minimum amount
of information needed to achieve goals Think through how info can
be gained and inferred Inferred is often a gotcha! x + y =
something private, but x and y by themselves dont seem all that
special Think through how information be gained On the wire? Stored
in logs? At a router? At an ISP?
Slide 93
Privacy and Stored Information Data is only as safe as the
system How long is the data stored affects privacy Longer term =
bigger privacy risk (in general) Longer time frame, more data to
correlate & infer Longer opportunity for data theft Increased
chances of mistakes, lapsed security etc.
Slide 94
Bringing it all together Example from current research at ISI:
The Critter Project Critter@home is a continuously updated archive
of content-rich network data, contributed by volunteer users. Data
contributors join the Critter overlay whenever online, offering
their data to interested researchers.
Slide 95
Critter: Why? Networking and cybersecurity research critically
need publicly available, fresh and diverse application-level data,
for data mining and for validation. There are very few publicly
available network traces that contain application-level data.
Outdated, contain very specific data useful only to some
researchers Content-rich network data has enormous privacy risks
for sharing, because it is rich with personal and private
information (PPI) that Internet criminals can monetize E.g., human
names, social security numbers, phone numbers, usernames,
passwords, credit card numbers, etc.
Slide 96
Critter: Architecture
Slide 97
Critter: Key designs Users can host their own data locally
PPI-sanitization process to replace all personal and private
information (PPI). Data is always stored and transmitted in an
encrypted format. No human apart from the contributor will ever
access the raw, PPI-sanitized, data. Instead, researchers access
data via a query system which only returns aggregate statistics.
All contact with a contributor is at her discretion and is done via
an anonymizing network where contributor identities are hidden both
from researchers and the Internet at large. Contributors (if they
so desire) can have full, fine-grained control over their data at
all times via policy settings.
Slide 98
Accessing collected data (1) A researcher submits a query via
the public portal. (2) Critter clients connect and poll for new
queries via an anonymizing network. (3) The researchers stored
query is sent to clients. (4) Patrol processes the query if the
Query Policy permits, and returns encrypted results along with
information on how a contributor wants its response aggregated. (5)
Aggregated results are stored and can be retrieved
Slide 99
http://steel.isi.edu/critter/examples.ht ml
Slide 100
Querying Critter Data in beta version
http://steel.isi.edu/critter/examples.html
http://steel.isi.edu/critter/examples.html UI interface:
steel.isi.edu/critter Types of queries: Boolean (1 or 0, eg. 5
users said yes) Histogram (eg. 2 users said 3 3 users said 1) Sum
(eg. user 1 says 1, user 2 says 3, answer is 4)